Wednesday, May 29, 2013

Banish the Bipartite Blues!

I'm going to take a detour yet again from my bank network series to talk about an unrelated topic, which is the problem of using the default layout algorithm on a bipartite graph. To begin, let's say we want to load the bipartite graph that is described by this .sif file into BioFabric:

Click to enlarge
As a reminder, a bipartite graph is one where each and every node belongs to one, and only one, of two sets of nodes (e.g. A and B). Furthermore, every edge in the graph goes between a node in set A and a node in set B. In the above example, every node in A is connected to every node in B, and vice versa.

So if you just import the above .sif file into BioFabric, you get the following result:

Click on picture to enlarge
And if you turn shadow links and zone shading on, you get a clearer picture of what is going on:

Click on picture to enlarge

These views are probably not the way you want your bipartite network to look. This is a case where the default layout algorithm (remember: breadth-first search from the highest degree node, traversing neighbors in order of decreasing degree) does a substandard job. In this particular case, since all the nodes in the network have the same degree, the algorithm will choose the nodes in alphabetical order, i.e. A1 comes first. Then, since A1 is not connected at all to any of the other A nodes, the B nodes all get laid out next. Finally, when B1 gets its chance, the remaining A nodes will get assigned to the last rows. So the set of A nodes do not get assigned to a contiguous run of node rows.

The preferable arrangement will have all the A nodes show up before the B nodes. This is pretty easy to accomplish, using the Layout->Layout Using Node Attributes... feature from the main menu. First, you need to create a node attribute file, which looks like this:


Click on picture to enlarge


Note carefully that the count of rows starts at 0. Furthermore, all nodes need to be present, and there can be no omitted or duplicated row numbers. If a node that is not in the graph is specified, you will also get an error. Finally, the "Node Row" first line and the " = " tokens are required. My apologies, but errors are not called out with the offending line identified (a necessary future enhancement).

After creating that file in a plain text editor (i.e. not Word), just load it in using the aforementioned Layout->Layout Using Node Attributes... command, and the result will look like this, with a nice point symmetry and contiguous row assignments for both node sets:

Click on picture to enlarge

So if you find yourself working with a bipartite network, keep this technique in mind to get the best network arrangement! 


Wednesday, May 22, 2013

Spurred boldly on, and dashed through thick and thin...

As the title quote by John Dryden (1682) implies, this short post is going to be about the "thick and thin" of the BioFabric World Bank Major Contract Awards network, as the following two subgraphs from the network will show. First is the set of contracts associated with all the countries in Africa:


African countries (click on image to enlarge)

Second is the set of contracts associated with all the countries in Central and South America:


Central and South American countries (click on image to enlarge)


(Again, I apologize for the substandard quality of the images; I think they are really too light. I need to work on optimizing the image export function for small scales when I get the time to handle this.)

When I first started looking over the BioFabric version of the contracts network, something jumped out at me pretty quickly, and that was the striking visible differences between the thicknesses of what I have been calling the "umbilical links" going from each country down to the bottom of the network. To recap from a previous post, the node lines at the bottom of the graph represent the "global players", i.e. supplier companies who have contracts with more than one country. Looking at these two different sets of countries, there certainly appears that a typical African country has much larger fraction of contracts with these global players than does a typical Central/South American country. With the latter, these contracts seem to form only a tiny fraction of the total.

Is this a real effect? I don't know, and I'm not planning to dive in here and show that this observation is, or is not, statistically significant. But I find it notable that after spending just a few minutes simply browsing across a global view of a network with 44,213 nodes and 66,021 edges, I could easily glean an interesting observation about the data that could warrant further investigation. I think this is another case where the visual impact of BioFabric's edge wedges can reveal important insights into the network structure.

Dryden's quote, by the way, was actually part of a satirical jab aimed at the "heroically mad" work of a marginally talented rival poet, so maybe I am ill-advised to use it here. But it seemed appropriate, since I think I'm being driven a little mad by this series of posts on the World Bank network even as I'm spurred on to finish it. Thankfully, I think there is only one more to go!

Monday, May 20, 2013

Learn, Compare, Collect the Facts!

If we are going to apply Ivan Pavlov's above maxim while using BioFabric, we need to be able to compare network subgraphs. So I'm going to take another quick detour and provide a brief HOWTO on BioFabric's Compare Multiple Nodes feature that I have been making use of to prepare my next World Bank network post.


A few weeks back, I introduced how to create a submodel, so you can focus in on interesting pieces of your network. But the step-by-step process I previously described can be time-consuming and error-prone if you have a lot of nodes you need to select. So I added the Compare Multiple Nodes feature to BioFabric to make this common task easier.

Continuing to work with the World Bank Major Contract Award network, say I wish to create a submodel that contains the contracts associated with fourteen selected countries. The easiest way to do this is to first create a plain text file (using e.g. Notepad or TextEdit) that is simply a list of the countries we want to compare:

Click on picture to enlarge

Having started BioFabric and loaded the full network file, you just choose Tools->Compare Multiple Nodes... from the main menu:

Click on picture to enlarge

Which brings up a dialog that allows you to enter a list of nodes you wish to compare:

Click on picture to enlarge

If you just have a few, you can just use the Add New Entry... button to create a few empty rows in the table, and type the names in. But since we have already created a list in a text file, just click on the Load Names From File button and choose that file from the file chooser dialog that pops up. The dialog table will then be stocked from the file entries:

Click on picture to enlarge

Clicking OK then handles all the steps for creating the submodel: selecting the nodes, selecting the first neighbors and associated links, and launching the submodel view:

Click on picture to enlarge

In my next post, I will show an interesting feature of the network that we can see by comparing a couple of subgraphs that were built using this feature.


Saturday, May 11, 2013

Pitching Wedges


I don't play golf. But there is a long tradition of bankers playing golf that is memorialized in the famous 3-6-3 rule: bankers pay 3% interest on deposits, lend at 6% interest, and are out on the golf course by 3 PM.

So while this banking-themed post about the the World Bank Major Contract Award Network may seem be named after a type of golf club that is descended from the niblick, I'm actually going to just keep doing my job of selling everybody on the advantages of BioFabric "edge wedges".

This week, I'll focus in on a single edge wedge for one country, Niger, which I picked because it gives a manageably-sized high-resolution image, and has examples of each type of node I want to talk about. The small-scale version is below, and there is a high-resolution version available in the BioFabric gallery:

Niger Submodel Edge Wedge: BioFabric Network Visualization
Click on picture to enlarge

As I outlined in a previous post, most of the contracts in the network are between a borrower country and suppliers who are exclusive to that country. So I created a custom layout that tucks those exclusive supplier nodes into the node rows right under the appropriate borrower country row. And of those exclusive suppliers, most are usually from the same country, but there are also some from other countries. Finally, for each country, there are some suppliers whose also have contracts with other borrower countries. Those nodes are not tucked in under a particular borrower country, but are at the very bottom of the network.

So, the nodes for each country are organized as follows:

  1. Suppliers from that country who only have contracts with that country, ordered by number of contracts.
  2. Suppliers from other countries who only have contracts with that country, ordered by number of contracts.

Those are exclusive links whose node lines do not extend beyond the wedge for the country. At the bottom of the network, we have:

  • Suppliers from any country who have contracts with more than one country, ordered by total number of all contracts across all countries.

A close look at the Niger example above shows this arrangement in detail. Start by noticing how the regular gridding of the edges allows you to use the slope changes of the edge wedge to get distinct, and meaningful, visual clues as to what types of suppliers you are looking at. For example, the 45-degree slope regions are where you will find all the suppliers with only one contract. The parts of the wedge where the slope is shallower are where you will find the contracts for multi-contract vendors. 

So, for Niger, we have the following (looking at the high-resolution version from the Gallery may make this easier to follow):

There are two Niger-based suppliers with five contracts, and these appear in the very upper left, where the edge wedge slope is shallowest. Moving right and down, the wedge slope gets steeper, and we soon encounter a run of 22 suppliers with two contracts, followed by the prominent 45-degree section of Niger-based suppliers with only one contract. Note that within suppliers with the same number of contracts, the nodes are ordered alphabetically. (The sharp-eyed viewer will also note here that I had problems getting the correct encoding of the non-ASCII character set; this bug needs to be addressed.) The next prominent feature is the start of another shallow-sloped region of the wedge representing the foreign suppliers having multiple contracts with Niger (and only Niger), followed by the 45-degree slope region of foreign suppliers to Niger with one and only one contract in the network.

That marks the end of the suppliers in the network who only have contracts with Niger. The rest of the Niger contracts are with suppliers who have contracts with more than one country. Note that the node ordering here is based upon the global number of contracts for each supplier, and so the edge wedge slope is not smooth. We can see that the biggest "global player" with a Niger contract is UNICEF (not really surprising), with three contracts, but there are other suppliers lower down with more Niger contracts, though fewer global contracts overall. Note that in the subgraph view shown above, these nodes are tucked right in with the others, though they are distinct due to their long node lines. In the main view, they are at the bottom, with the long edges from each borrower forming sort of an umbilical cord for each. My last blog post about this network talked about how subgraph views are minimized using this sort of compression.

BioFabric Network Visualization: World Bank Major Contract Awards
Click on image to enlarge
With this new understanding of how each borrower edge wedge is organized, you can download the BioFabric .bif file, uncompress it, load it up into BioFabric, and start exploring World Bank Major Contract Awards.  Note how BioFabric's ability to explicitly order nodes and edges permits the data to be presented in a completely organized fashion, despite having over 44,213 nodes and 66,021 edges. Also, note that this is actually a multigraph, i.e. there are multiple edges between nodes, and while adjacency matrices can be highly ordered, they have issues with showing multigraphs.

There should be one more post forthcoming to wrap up this discussion, but I'm also busy thinking about fun new D3-based demos. Stay tuned to see what's next.

Sunday, May 5, 2013

Super-Quick-Bio-Fab-ric!

My blogging has taken a hit recently as I have been working hard on a demo for explaining how BioFabric represents a network. So my apologies for not yet finishing my ongoing tour of the World Bank Contracts network, and I promise I will wrap that up shortly with a couple more posts.

But first, I want to present my Super-Quick BioFabric Demo, which got finished early this morning after several really late nights. That was because this was my first foray into using Mike Bostock's D3.js JavaScript library for building graphical web applications. The demo uses my trusty standby, Donald Knuth's character co-occurence network from Les Miserables. It starts by showing the network as a node-link diagram with a force-directed layout:

SuperQuick BioFabric Starting Point
Click on picture to enlarge

and finishes it up with the BioFabric version:

SuperQuick BioFabric Finishing Point
Click on picture to enlarge

It shows how to transform the first into the second, step by step, in a continuous animation  (a D3.js speciality!). Please go take a look, and share the URL with friends, family, colleagues, and acquaintances:


That's it for today. Time to go get some sleep!