Last updated: Feb 29, 2020
In the last article, we learned about modeling data as a graph and how network modeling is particularly useful for mapping relationships. In this article, we'll explore what to do once you have modeled your data in a graph form.
Often, the most useful next step (after you have put your data in network form), is to visualize it. Why? Visualization helps you spot errors, it gives you a mental picture of your data, and it helps you make sense of any ensuing graph metrics. Visualization also helps your audience better connect with your data.
But first, before we discuss the details of making graph visualizations, let's discuss what tools are available to make them. Most tools require you to know how to code, but we'll give a run-down of three popular options:
Open-source, download required, coding required
GraphViz is one of the most established open-source graph visualization tools out there. It lets users generate graph diagrams from descriptions of graphs in a text-like language, and outputs diagrams in PNG or vector formats. The key factor here is that some setup and coding is required, as well as massaging of your data into the DOT language format. One of the distinguishing strengths of GraphViz lies in its mathematical prowess in graph layout — its authors have authored dozens of papers on efficient ways of implementing routers and drawing graphs. GraphViz is also highly-customizable due to its requirement that users write code. All in all, GraphViz has been around for a long time and is a solid choice for those who have the time to introduce a coding component to their research and process.
Pros: Free, mathematically-sophisticated layout algorithms
Open-source, download required
Much like GraphViz, Gephi is another download-required open-source graph visualization tool. Unlike GraphViz, no coding is required for Gephi, however, Gephi also requires you to download and install software on your computer. Depending on the size of your data, it requires different miniumum levels of memory to run quickly, and also has a Java dependency. Gephi is a good option f you are a researcher with hundreds of thousands of data points and don't need to be able to easily share and collaborate with others on graph visualization.
Pros: Open-source, highly-customizable
Free, 100% online, no coding
Since this article is by Rhumbl, we'll take the chance to contrast how Rhumbl is different from the other tools. Firstly, Rhumbl is 100% online, which means you don't need to worry about downloading or installing software. Secondly, the data spreadsheets that you import into Rhumbl is in an intuitive, easy-to-understand format. This means that you and collaborators can easily edit and reason about the data. Rhumbl also has the concept of groups – a visually-intuitive way to group nodes. Rhumbl is a good choice for those who need to make pretty and interactive graph visualizations without needing to code or write scripts to massage their data.
Pros: Free, user-first design, easy to import data format
For a more comprehensive list of tools for graph visualization, check out this Medium article.
Now that we've discussed tools to making graph visualizations, let's delve into the design of a graph visualization. Here are some tips as given by the cartographers here at Rhumbl:
The more fine-grained your entities are, the more rich and informative your graph visualization.
Be specific in the wording of your relationships, e.g. "is related to" is vague; "is influenced by" is better; "is a protege of" is even better.
High-saturation, jarring colors should be used sparingly. Opt instead for lower-saturation colors. Many color adjuster tools abound online, e.g. Paletton.
Set node radii to be a big-enough size so that users can easily mouseover over nodes. Tiny nodes will lead to frustrating user experiences.
Try coloring nodes by their node degrees. This leads to immediate visual recognition of highly-connected nodes.
Also try sizing nodes by their node degrees. This, as well, leads to immediate visual recognition of highly-connected nodes, but adopt a consistent coloring and sizing scheme.
Edges between nodes should be colored a muted color — strong, high-saturation colors make the network look "busy".
Edge thickness can be used to indicate the strength of a relationship between two nodes. However, it is often helpful to adopt a logarithmic scale when sizing edge thicknesses, as overly-thick edges can look very strange.
In the next upcoming article, we will be delving into graph metrics and the kinds of analysis you can do with your graph visualization.
You get one fully-featured map, forever free. No credit card required.