This is the first post of a series on network visualisation.

Thanks to the facilitated access to network analysis tools and the growing interest in many disciplines towards studying the relations structuring datasets, networks have become ubiquitous objects in science, in newspapers, on tech book covers, all over the Web, and to illustrate anything big data-related (hand in hand with word clouds.). Unfortunately, the resort to networks has reached a point where in a conference I heard a speaker say:

“Since this is mandatory, here is a network visualisation of these data. Sorry if you cannot see anything in this big hairball.“

You would expect in a conference that everything presented has a purpose. Sadly, it seems that there is underlying pressure in scientific communities to create such horrors.

A network is easy to create, easy to draw, easy to export, and usually nobody ask questions, because they are often difficult to grasp. This could be different.

Question the relevance

Frankly, who isn’t bored by network visualisations appearing in talks or peer-reviewed journals where you cannot “read” anything? by slides that do not generate questions but bring the discussion to a close?

To present it more clearly: when a network is more than, say, thirty nodes , it is often difficult to find answers to legitimate questions like:

What is the structure under that layer of edges darkening everything? Am I allowed to draw any conclusion from that figure? (Spoiler)

For any given node, which nodes is it connected to?

Which layout algorithm was used to position the nodes? Are the nodes in the middle also the nodes in the center?

What/where are the labels? (Since most of the time they are missing, unreadable or overlapping.)

Curved edges, really?

What do node colours represent ? Which community detection algorithm was selected and what is the modularity score?

Answers to these questions must accompany the figure , either orally (in a talk), by appearing in the legend, or simply by being integrated in the design. And in published cases, the network should be readable independently.

Nevertheless, sometimes it is simply impossible to achieve this: because of space, because of printing quality limitations, because of the size of the network… Those are hints that perhaps we should not include that motley saturated network “visualisation” in our paper.

Why networks?

The aim of network modelling is to render and allow to study the structural features of a group of objects (actors, words, cells, places, etc). This means putting emphasis on the relations between the entities of the said group. If the network is large, such an analysis clearly does not require any visualisation , only network metrics, perhaps simulations, etc. Meaning for the subsequent paper the display of tables and diagrams of micro-structures.

Here are a few reasons that may tempt to include an irrelevant network visualisation:

When wanting to suggest the size or density of a network.

When unsure about metrics, unwilling to use any.

When building the network is the result and not a step.

Because there was a way to interpret the data set as relational…

Since a burst at the very end of 20th century , there has been an ever-growing passion for networks, which is a great thing for methodological and interdisciplinary reasons. Software and hardware have become more reliable, the entry cost being constantly lowered. Networks have been generalised, and that is a great thing. However, all this has come with a cost, and I believe that the scientific community needs to maintain a high level of requirement by questioning the relevance of network visualisations, as they may lower the debate rather than improve it.

Good practice

A few suggestions to resolve the previous criticisms:

Visualising large networks may provide insight, but should remain an intermediary result. Include them only if they add to the understanding.

If the network is too dense and not too large, increase the minimal link distance, diminish the node size, and move the node labels away.

If the network is too dense and/or too large, compute a subnetwork based on an edge weight threshold , or contract the network to densely connected components and mention the method you used.

Explain which layout algorithm is used. This helps to interpret the visualisation (wherever possible). Use a layout algorithm that keeps the variance of link distance distribution low while being efficient at minimising the number of edge crossings .

Draw edges straight. Draw arcs curved only if there are reciprocal arcs .

Have legends explaining the size of nodes and the width of edges.

Have you considered interactive networks?

I hope this post will prove to be helpful. I’ll be pleased to hear comments and propositions.

On the next episode…

… I will discuss interactivity and layout algorithms. Moreover, I will provide a tutorial to create networks like the tiny one below (please, click on it!). Interactive networks allow the reader to bypass obstacles and thus to solve many problems.





Update (nov. 11 2015). I’ve been reminded me of this post introducing hive plots (thanks Ioannis). Hive plots are an attempt to visualise large networks with suitable node attributes. In particular, the authors’ criticisms about visualisation of large networks are absolutely relevant (and coming with creepy examples). While in many cases I do not believe that a visualisation is necessary, interactive versions of hive plots are promising. See also this discussion. By the way, this is also the case with interactive network visualisations.

Share this: Twitter

More

Email

Print



Facebook

LinkedIn



Reddit

Tumblr



Pinterest

Pocket



