Thursday, February 21, 2013

BioFabric Phrenology

The previous post of the small network from Les Miserables hopefully helped to get people more comfortable with thinking about nodes as horizontal lines. Keep exploring it until the concept starts to become second nature! But today's example is more typical of the types of networks BioFabric has to deal with. It is an example from the Stanford Large Network Dataset Collection: Wikipedia voting patterns. This network is available here, and is a directed graph weighing in at 7115 nodes and 103689 edges. To quote the page, "Nodes in the network represent wikipedia users and a directed edge from node i to node j represents that user i voted on user j."  Here it is, in the typical BioFabric high-aspect ratio presentation:


BioFabric Network Visualization
Click on image to enlarge

Click on the image above to get a somewhat larger version (1024 x 82 pixels).  Like the Les Miserables example, it is posted in the BioFabric Gallery, where you can get a 4 MB 10300 x 826 pixel PNG (which is actually still pretty low resolution) , as well as a 3.1 MB ZIP archive of the .bif file.

When I first looked at this network, one interesting feature quickly popped out at me.  Looking about 75% of the way across on the top edge (going left to right), you can see a bump. This close-up detail of that part of the network shows it clearly:

BioFabric Network Visualization: BioFabric Phrenology!
Click on image to enlarge

This marks the end of what I'll call the "first neighbor bump".  These distinctive bumps along the top edge of the network (there are more to the right of the one shown above) are a feature of the default layout algorithm used to draw this network; this algorithm is described in the BioFabric paper. Briefly, this layout assigns node rows using a breadth-first search of the network, starting at the node with the highest degree, and visiting neighbors in order from highest degree to lowest. Because we finish laying out the last lowest-degree neighbors of the starting node #1 just before we jump to start laying out the remaining unvisited high-degree neighbors of node #2, we typically get the bump shown above. So the prominent "edge wedge" just to the right of the discontinuity contains the previously untraversed edges incident on the first node that is two hops away from the start. (Keep in mind that edges incident on this first two-hop node that connect to nodes one hop from the start were already traversed and laid out somewhere on the left.) So the nodes laid out above this node line are all only one hop away from the starting node; the nodes laid out on or below this node line are at least two hops from the start.  This also means that the edges to the left of the discontinuity are originating either from the starting node, or one of the first neighbors.

So this is what popped out at me: the starting node and its first neighbors represent about 10% of the nodes in the network (which I can eyeball by comparing the height of this first bump to the total height of the network).  Yet the edges incident on these nodes account for about 75% of the total edges in the network (based on the placement of the end of this "first neighbor bump")!  An important fact about this large network with over 100000 edges comes almost for free.

I find these bumps help me to navigate and think about the network, and the fact that everything is laid out on a regular grid means that these features can help you to quickly estimate percentages as well.  So start to get tuned in to interpreting the bumps on the top of the network: "BioFabric phrenology"!

No comments:

Post a Comment