Saturday, April 6, 2013

Northwest Nirvana Nursery?

Ben Shneiderman is one of the grand viziers of  visualization (does this make him a visviz?). He and Cody Dunne proposed, in a research report (C. Dunne and B. Shneiderman, “Improving graph drawing readability by incorporating readability metrics: A software tool for network analysts,” University of Maryland, HCIL Tech Report HCIL-2009-13, May 2009), a set of guidelines they called NetViz Nirvana. You can see Prof. Shneiderman talk about this at the Graph Drawing 2012 conference in an online video.

So what is NetViz Nirvana? They suggest that people creating a network visualization should "aspire to these four principles", as listed in the report:
  • Every node is visible
  • For every node you can count its degree
  • For every edge you can follow it from source to destination
  • Clusters and outliers are identifiable
So how does BioFabric stack up against these four principles?  Let's have a look!

Every Node is Visible
BioFabric passes this test. Every node is a line that is assigned to a unique row, so by definition two nodes cannot obscure each other. Furthermore, with the default layout at least, you are also guaranteed to be able to read the node name label over at the far left end of the node line.  Take a look:
Valjean, Gavroche, and Marius
Click on picture to enlarge

You could perhaps argue that nodes are being hidden behind the edges, but in fact, the majority of the node line will always be visible thanks to the fixed spacing between the edges. Finally, the square glyphs clearly highlighting the end of each edge incident on a node serve as a guarantee that the node cannot be considered invisible. So what's the Nirvana score so far? 1 of 1. Off to a good start!

For Every Node You Can Count its Degree
With BioFabric's normal presentation mode, you could argue that counting a node's degree could get difficult, since the edges can be distributed anywhere along the node line. But it is true that you are guaranteed to be able to count this number, as each incident edge has a unique, unambiguous, unobscured glyph located somewhere on the node line.

But if that answer is not good enough, we can turn to BioFabric's shadow link mode, which I described in a couple of previous postings here and here. When you use shadow links, you can see all the edges incident on a node in a single contiguous stretch of a node line. Quick, what is Joly's degree in the network below?

What is Joly's degree?
Click on picture to enlarge
Count 'em: 12. What's more, the nodes above in the stretch around Joly were arranged left-to-right by degree, so I can immediately say that Bahorel, to Joly's left, has degree >= 12 (it's actually 12), and Combeferre, on the right, has degree <= 12 (it's actually 11).

In a traditional node-link diagram, with nodes as points, you can easily imagine a situation where so many edges are converging on a high-degree node that you end up looking at a solid blob of ink surrounding the node. But with BioFabric, incident edges are guaranteed their own little bit of elbow room.  So BioFabric's current Nirvana score is now 2 out of 2. It's looking promising!

For Every Edge You Can Follow it From Source to Destination
I think this principle is clearly the killer for the traditional node-link diagrams when the network starts to get large. Edge intersections and edge "tunneling" (edges passing under a node) can make it very hard to accomplish this task in a crowded network visualization. But BioFabric does not suffer from those problems. Edges can never intersect, again by definition, and the intersections of edges with nodes is so formalized, uniform, and regular as to be totally unambiguous. The presence of the distinctive glyph is what marks the source and destination of an edge, and following an edge is as simple as looking straight up or down on the page to find the associated glyph. Nirvana score? 3 of 3!

Clusters and Outliers are Identifiable 
Network layout in BioFabric is simply a linear ordering of the node rows and another linear ordering of the edge columns. If we are going to be able to identify clusters, we will need to build a layout that groups the nodes and edges of a cluster together into contiguous sets. I keep showing the Les Miserables network over and over, but I think it does a good job of showing how BioFabric can make it easy to pick out clusters in a network:

BioFabric Version of Knuth's Les Miserables Network (Clustered)
Click on picture to enlarge

As for outliers, assigning outlier nodes to the very bottom rows can make those stand out as well. The nodes that are assigned to the bottommost rows when the default layout is applied to the same network illustrate this principle:

BioFabric Version of Knuth's Les Miserables Network (Default Layout)
Click on picture to enlarge

If we wanted to have the nodes only attached to Valjean stand out in a similar fashion, we could have created a custom layout that moved those node rows to the bottom as well.

As I have stressed in previous posts, a clustered layout algorithm is not yet built into BioFabric, but such a layout can be specified by importing node row and and edge column assignments as attribute files.

Personally, I feel that BioFabric provides a great way to visualize clusters, and is better than other existing approaches. So I feel completely justified in awarding the last remaining Nirvana point. Final NetViz Nirvana score for BioFabric: 4 out of 4. 100%!

Northwest Nirvana Nursery?
So can the U.S. Pacific Northwest be called a Nirvana Nursery? It is certainly famous for being the birthplace of the grunge band Nirvana, which emerged from Aberdeen, Washington in the late 1980's. Has Seattle-born BioFabric perhaps achieved some small measure of [NetViz] Nirvana in its own right?

2 comments:

  1. Nice visualization but already invented... sorry:
    Check http://www.aviz.fr/Research/Geneaquilts
    https://sites.google.com/site/dglabprojects/Quilts

    ReplyDelete
    Replies
    1. Thanks for your comment. I know about both Quilts and Geneaquilts, and disagree that BioFabric is "already invented", and it sounds like a full blog posting on the differences is in order. Short version: I consider a Quilt to be a compressed version of an adjacency matrix. Crucially, edges are still represented as points. Geneaquilts (still described as a matrix technique) are more similar to BioFabric, but the semantics of the columns in a Geneaquilt are extremely specific, in that they represent nuclear families, and the location of column with respect to node label has specific semantics as well (family child versus parent). BioFabric, which approaches the problem from the direction of the node-link diagram, just uses the vertical dimension to draw completely general links. So while BioFabric shares some visual similarities with the quilt techniques, I disagree that you could say BioFabric is "already invented".

      Thanks for reading!

      Delete