An introductory primer for modeling data as networks
Learning Outcomes
At the end of this article, you should be able to:
State the definition of a network
Describe the process of modeling data using networks
Explain the advantages of network modeling and what situations call for network modeling
Sketch a network model to apply to your own data
A network, sometimes interchangeably called a graph, is a mathematical structure used to model pairwise relations between objects. Graph theory is the study of such structures [Wikipedia] because, as it turns out, such structures have very useful properties.
To give some examples of networks:
A collection of people, in which people are the objects, and whether any pair of people are friends of each other are the relationships.
A collection of companies and news events, in which companies and news events are the objects, and whether a news event affects a company's stock price is the pairwise relation between two objects.
A collection of proteins, in which proteins are the objects, and the interaction between any two proteins is a pairwise relationship between two objects.
You get the idea. In a network, there are entities (objects), and there are relationships between entities. Networks are often visualized with circles and lines. When they are in this visual format, the circles are often called nodes and the lines are called edges. (Note: when networks are visualized, it's sometimes the case that not all relationships are visualized — all edges are relationships, but not all relationships are visualized as edges.)
The nice thing about graph theory is that it treats relationships as first-class citizens, meaning they're tangible things you can see, draw, touch, save, edit, etc. Contrast this to lists of objects, where the connections amongst objects are only implied.
When we are interested in modeling our data using networks, the first thing to ask if whether using networks is right for you. The answer comes down to this: If the relationships in the data are important, you'll want to model your data as a network. For example, if your use case is just to calculate the average height in a group of people, you don't need to model your data as a network.
Sometimes it can be hard to figure out what the relationships are in your data. So let's do an exercise: grab a piece of paper and a pen, and do the following questions:
First, think about the entities in your data. What are the objects in your data? For example, if you are an educational designer, entities of interest are often courses, learning outcomes, topics, assessments, etc. Draw these entities as circles, spaced apart.
Next, think about the relationships in your data. What are the connections that exist between any two entities? For example, does a learning outcome have a prerequisite of another learning outcome? For any two entities that have a relationship, draw a line between their two circles.
You should have a bunch of circles and lines on your paper. Congrats, that's a network model of your data!
Obviously, the exercise we just did isn't scalable. Sketching a few entities and a few more relationships on a piece of paper is fine, but we need to do this on a large-scale manner in a way where we can reuse and easily edit the data. This is where the computer comes in.
In our experience, a spreadsheet is the best bet for the non-developer. It's editable, everyone knows how to edit a spreadsheet, and it's easily convertible to a more "developer format". To get you started using spreadsheets, Rhumbl provides templates and examples to get you started with entering your own data into these spreadsheets. Then, you can import these spreadsheets and get a visualization of your own data.
Or, continue onto our next article on tools and tips for graph visualization.
You get one fully-featured map, forever free. No credit card required.