Monday, June 24, 2013

"This only is the wedge-craft I have used."


OK, the title for this posting is not exactly what Shakespeare said (Othello, Act 1, Scene 3). But it's becoming clear to me as I continue to use BioFabric that putting lots of thought into creating informative edge wedge shapes (wedge-craft?) can pay off. It is well-known that many aspects of form (e.g. length, width, size, and curvature) are detected via preattentive processing, and this implies that BioFabric's edge wedges can take advantage of this perceptual "fast track" to enhance network visualizations.  

I expect this to be my final post on my World Bank network (phew!), and I want to use this to show how edge wedges allow the viewer to quickly extract information from a BioFabric network. If you've been following this blog for a couple months, the World Bank network should be familiar to you. But if not, I'll provide a quick summary of the posts. The network was introduced on April 11th, and explained further on April 13th. It was used as an example to describe how to build submodels on April 21st, and also how to use the Compare Multiple Nodes feature on May 20th

But the most important background for today's post is the May 11th Pitching Wedges post, where I talked about how the edge wedge for each borrower country is organized. Reading that post is a prerequisite for understanding this one! Finally, the post on May 22nd highlighted a network feature (thick vs. thin umbilicals to global player suppliers) that is closely related to today's topic, and is good background as well.

This post will show how the judicious organization of edge wedges can make it easy to quickly compare how each country contracts out to suppliers using its World Bank loans. I'm going to do that by showing four different countries, which are shown below together in a single view. From left to right, the countries are Nicaragua, Madagascar, Peru, and Ethiopia. Let's look at each one in turn, though in a different order.

Comparing all four countries at once

OK, last chance to go back and review the Pitching Wedges post, or the following is not going to make any sense, since I'm not going to redefine terms like "global players", nor review how the wedges are organized. Also, the images below (though you can click on them to make them larger) are low resolution, so loading up the network directly in BioFabric using the downloadable .bif file is the best way to explore these countries in detail.

First, look at Peru, which is shown in detail below. From the shape of the edge wedge, i.e. the slope changes along the bottom of the edge wedge, we can instantly see that maybe 90% (just eyeballing it here) of suppliers are in Peru, and maybe two thirds of suppliers have single contracts. There are a fair number of suppliers with two to three contracts, but there are very few super-suppliers with many contracts. A close look shows that there are only 11 suppliers with more than three contracts, with one supplier having seven, and the biggest supplier having 11. Finally, we see the tiny number of contracts with the global-player suppliers. Note: if you look closely at the Peru data inside BioFabric, you will see that half of those few global-player suppliers in this case are actually based in Peru, so it is important to keep in mind that "global player" is not synonymous with a contract out of a country, though that is almost always the case.

Peru: Almost all in-country, few super-suppliers

Quickly contrast this with Nicaragua, shown below. A much larger fraction are multi-contract suppliers; maybe half, with about 40% single suppliers. Furthermore, the very sharp point on the left instantly reveals some super-suppliers with many contracts, which is in stark contrast to Peru's profile. Close inspection reveals that there are nine suppliers with 14 or more contracts, and the biggest supplier has 37 contracts. Finally, just like Peru, and in fact just like most Latin American countries (see the May 22nd post), Nicaragua has very few contracts with the global player suppliers.

Nicaragua: Almost all in-country, some big super-suppliers
Next, let's look at Madagascar. It looks like about half the contracts go out of the country, and a big chunk (maybe 15 percent?) go to the global player suppliers. Most of the out-of-country contracts are to multi-contract suppliers. And, like Nicaragua, the sharply pointed left side indicates a few big super-suppliers.

Madagascar: About half to out-of-country suppliers
Finally, consider Ethiopia. Maybe 80% of all the contracts go to suppliers out of the country, and the large multi-contract suppliers are out of country as well. There are no in-country super-suppliers, which we can glean from the blunt angle on the left of Ethiopia's wedge. Finally, like other African countries (again, see the May 22nd post), Ethiopia has many contracts with the global player suppliers.

Ethiopia: 80% out of country, many global player suppliers
One caveat on this World Bank network example I need to acknowledge is that I have just been talking about the number of contracts, and not in any way accounting for the dollar amounts of each contract (you can see dollar amounts in the tag on each contract edge when you view it in BioFabric). One can argue that dollar amount is the relevant metric to be using here, and I won't disagree. But I've been trying to keep this example simple, and it is possible to address dollar amounts in another fashion that I will cover in a future blog post.

Finally, this particular network has the unusual feature that most nodes (the suppliers) are each uniquely connected to just one borrower country, which allows those supplier nodes to be ordered independently, and precisely, on a country-by-country basis. Thus, this network  lends itself to the compact, highly organized edge wedges that I have been showing. Most network topologies are not nearly as cooperative, but it is still possible to organize meaningful edge wedges. One powerful tool for doing that easily is link groups, which are mentioned in the BioFabric paper. I will cover those in a future blog post, as well.

I hope this example has shown that a careful and well-thought-out approach to organizing BioFabric edge wedges allows the viewer to rapidly extract and compare network features. With that, it's time to finally move past the World Bank network and onto other data sets I have in the pipeline for future blog posts, but keep this network in mind as you go forth and practice entrancing wedgecraft

Saturday, June 22, 2013

BioFabric Boffin BoF!

My blog postings have been mighty thin (as in non-existent) so far this month, as I've been traveling for work. I'm now writing my next post, and it should show up soon (and I've got about a half-dozen in the pipeline right now).

This post is just a heads-up that I'm organizing a BioFabric Birds of a Feather (BoF) session at the upcoming 21st Annual International Conference on Intelligent Systems for Molecular Biology/12th European Conference on Computational Biology, i.e. ISMB/ECCB 2013. The conference will be in Berlin, Germany from July 21-23, 2013. The BoF sessions are scheduled for Monday July 22, 5:40 PM - 6:40 PM, with the rooms still to be announced.  Hope to see you there!

Wednesday, May 29, 2013

Banish the Bipartite Blues!

I'm going to take a detour yet again from my bank network series to talk about an unrelated topic, which is the problem of using the default layout algorithm on a bipartite graph. To begin, let's say we want to load the bipartite graph that is described by this .sif file into BioFabric:

Click to enlarge
As a reminder, a bipartite graph is one where each and every node belongs to one, and only one, of two sets of nodes (e.g. A and B). Furthermore, every edge in the graph goes between a node in set A and a node in set B. In the above example, every node in A is connected to every node in B, and vice versa.

So if you just import the above .sif file into BioFabric, you get the following result:

Click on picture to enlarge
And if you turn shadow links and zone shading on, you get a clearer picture of what is going on:

Click on picture to enlarge

These views are probably not the way you want your bipartite network to look. This is a case where the default layout algorithm (remember: breadth-first search from the highest degree node, traversing neighbors in order of decreasing degree) does a substandard job. In this particular case, since all the nodes in the network have the same degree, the algorithm will choose the nodes in alphabetical order, i.e. A1 comes first. Then, since A1 is not connected at all to any of the other A nodes, the B nodes all get laid out next. Finally, when B1 gets its chance, the remaining A nodes will get assigned to the last rows. So the set of A nodes do not get assigned to a contiguous run of node rows.

The preferable arrangement will have all the A nodes show up before the B nodes. This is pretty easy to accomplish, using the Layout->Layout Using Node Attributes... feature from the main menu. First, you need to create a node attribute file, which looks like this:


Click on picture to enlarge


Note carefully that the count of rows starts at 0. Furthermore, all nodes need to be present, and there can be no omitted or duplicated row numbers. If a node that is not in the graph is specified, you will also get an error. Finally, the "Node Row" first line and the " = " tokens are required. My apologies, but errors are not called out with the offending line identified (a necessary future enhancement).

After creating that file in a plain text editor (i.e. not Word), just load it in using the aforementioned Layout->Layout Using Node Attributes... command, and the result will look like this, with a nice point symmetry and contiguous row assignments for both node sets:

Click on picture to enlarge

So if you find yourself working with a bipartite network, keep this technique in mind to get the best network arrangement! 


Wednesday, May 22, 2013

Spurred boldly on, and dashed through thick and thin...

As the title quote by John Dryden (1682) implies, this short post is going to be about the "thick and thin" of the BioFabric World Bank Major Contract Awards network, as the following two subgraphs from the network will show. First is the set of contracts associated with all the countries in Africa:


African countries (click on image to enlarge)

Second is the set of contracts associated with all the countries in Central and South America:


Central and South American countries (click on image to enlarge)


(Again, I apologize for the substandard quality of the images; I think they are really too light. I need to work on optimizing the image export function for small scales when I get the time to handle this.)

When I first started looking over the BioFabric version of the contracts network, something jumped out at me pretty quickly, and that was the striking visible differences between the thicknesses of what I have been calling the "umbilical links" going from each country down to the bottom of the network. To recap from a previous post, the node lines at the bottom of the graph represent the "global players", i.e. supplier companies who have contracts with more than one country. Looking at these two different sets of countries, there certainly appears that a typical African country has much larger fraction of contracts with these global players than does a typical Central/South American country. With the latter, these contracts seem to form only a tiny fraction of the total.

Is this a real effect? I don't know, and I'm not planning to dive in here and show that this observation is, or is not, statistically significant. But I find it notable that after spending just a few minutes simply browsing across a global view of a network with 44,213 nodes and 66,021 edges, I could easily glean an interesting observation about the data that could warrant further investigation. I think this is another case where the visual impact of BioFabric's edge wedges can reveal important insights into the network structure.

Dryden's quote, by the way, was actually part of a satirical jab aimed at the "heroically mad" work of a marginally talented rival poet, so maybe I am ill-advised to use it here. But it seemed appropriate, since I think I'm being driven a little mad by this series of posts on the World Bank network even as I'm spurred on to finish it. Thankfully, I think there is only one more to go!

Monday, May 20, 2013

Learn, Compare, Collect the Facts!

If we are going to apply Ivan Pavlov's above maxim while using BioFabric, we need to be able to compare network subgraphs. So I'm going to take another quick detour and provide a brief HOWTO on BioFabric's Compare Multiple Nodes feature that I have been making use of to prepare my next World Bank network post.


A few weeks back, I introduced how to create a submodel, so you can focus in on interesting pieces of your network. But the step-by-step process I previously described can be time-consuming and error-prone if you have a lot of nodes you need to select. So I added the Compare Multiple Nodes feature to BioFabric to make this common task easier.

Continuing to work with the World Bank Major Contract Award network, say I wish to create a submodel that contains the contracts associated with fourteen selected countries. The easiest way to do this is to first create a plain text file (using e.g. Notepad or TextEdit) that is simply a list of the countries we want to compare:

Click on picture to enlarge

Having started BioFabric and loaded the full network file, you just choose Tools->Compare Multiple Nodes... from the main menu:

Click on picture to enlarge

Which brings up a dialog that allows you to enter a list of nodes you wish to compare:

Click on picture to enlarge

If you just have a few, you can just use the Add New Entry... button to create a few empty rows in the table, and type the names in. But since we have already created a list in a text file, just click on the Load Names From File button and choose that file from the file chooser dialog that pops up. The dialog table will then be stocked from the file entries:

Click on picture to enlarge

Clicking OK then handles all the steps for creating the submodel: selecting the nodes, selecting the first neighbors and associated links, and launching the submodel view:

Click on picture to enlarge

In my next post, I will show an interesting feature of the network that we can see by comparing a couple of subgraphs that were built using this feature.


Saturday, May 11, 2013

Pitching Wedges


I don't play golf. But there is a long tradition of bankers playing golf that is memorialized in the famous 3-6-3 rule: bankers pay 3% interest on deposits, lend at 6% interest, and are out on the golf course by 3 PM.

So while this banking-themed post about the the World Bank Major Contract Award Network may seem be named after a type of golf club that is descended from the niblick, I'm actually going to just keep doing my job of selling everybody on the advantages of BioFabric "edge wedges".

This week, I'll focus in on a single edge wedge for one country, Niger, which I picked because it gives a manageably-sized high-resolution image, and has examples of each type of node I want to talk about. The small-scale version is below, and there is a high-resolution version available in the BioFabric gallery:

Niger Submodel Edge Wedge: BioFabric Network Visualization
Click on picture to enlarge

As I outlined in a previous post, most of the contracts in the network are between a borrower country and suppliers who are exclusive to that country. So I created a custom layout that tucks those exclusive supplier nodes into the node rows right under the appropriate borrower country row. And of those exclusive suppliers, most are usually from the same country, but there are also some from other countries. Finally, for each country, there are some suppliers whose also have contracts with other borrower countries. Those nodes are not tucked in under a particular borrower country, but are at the very bottom of the network.

So, the nodes for each country are organized as follows:

  1. Suppliers from that country who only have contracts with that country, ordered by number of contracts.
  2. Suppliers from other countries who only have contracts with that country, ordered by number of contracts.

Those are exclusive links whose node lines do not extend beyond the wedge for the country. At the bottom of the network, we have:

  • Suppliers from any country who have contracts with more than one country, ordered by total number of all contracts across all countries.

A close look at the Niger example above shows this arrangement in detail. Start by noticing how the regular gridding of the edges allows you to use the slope changes of the edge wedge to get distinct, and meaningful, visual clues as to what types of suppliers you are looking at. For example, the 45-degree slope regions are where you will find all the suppliers with only one contract. The parts of the wedge where the slope is shallower are where you will find the contracts for multi-contract vendors. 

So, for Niger, we have the following (looking at the high-resolution version from the Gallery may make this easier to follow):

There are two Niger-based suppliers with five contracts, and these appear in the very upper left, where the edge wedge slope is shallowest. Moving right and down, the wedge slope gets steeper, and we soon encounter a run of 22 suppliers with two contracts, followed by the prominent 45-degree section of Niger-based suppliers with only one contract. Note that within suppliers with the same number of contracts, the nodes are ordered alphabetically. (The sharp-eyed viewer will also note here that I had problems getting the correct encoding of the non-ASCII character set; this bug needs to be addressed.) The next prominent feature is the start of another shallow-sloped region of the wedge representing the foreign suppliers having multiple contracts with Niger (and only Niger), followed by the 45-degree slope region of foreign suppliers to Niger with one and only one contract in the network.

That marks the end of the suppliers in the network who only have contracts with Niger. The rest of the Niger contracts are with suppliers who have contracts with more than one country. Note that the node ordering here is based upon the global number of contracts for each supplier, and so the edge wedge slope is not smooth. We can see that the biggest "global player" with a Niger contract is UNICEF (not really surprising), with three contracts, but there are other suppliers lower down with more Niger contracts, though fewer global contracts overall. Note that in the subgraph view shown above, these nodes are tucked right in with the others, though they are distinct due to their long node lines. In the main view, they are at the bottom, with the long edges from each borrower forming sort of an umbilical cord for each. My last blog post about this network talked about how subgraph views are minimized using this sort of compression.

BioFabric Network Visualization: World Bank Major Contract Awards
Click on image to enlarge
With this new understanding of how each borrower edge wedge is organized, you can download the BioFabric .bif file, uncompress it, load it up into BioFabric, and start exploring World Bank Major Contract Awards.  Note how BioFabric's ability to explicitly order nodes and edges permits the data to be presented in a completely organized fashion, despite having over 44,213 nodes and 66,021 edges. Also, note that this is actually a multigraph, i.e. there are multiple edges between nodes, and while adjacency matrices can be highly ordered, they have issues with showing multigraphs.

There should be one more post forthcoming to wrap up this discussion, but I'm also busy thinking about fun new D3-based demos. Stay tuned to see what's next.

Sunday, May 5, 2013

Super-Quick-Bio-Fab-ric!

My blogging has taken a hit recently as I have been working hard on a demo for explaining how BioFabric represents a network. So my apologies for not yet finishing my ongoing tour of the World Bank Contracts network, and I promise I will wrap that up shortly with a couple more posts.

But first, I want to present my Super-Quick BioFabric Demo, which got finished early this morning after several really late nights. That was because this was my first foray into using Mike Bostock's D3.js JavaScript library for building graphical web applications. The demo uses my trusty standby, Donald Knuth's character co-occurence network from Les Miserables. It starts by showing the network as a node-link diagram with a force-directed layout:

SuperQuick BioFabric Starting Point
Click on picture to enlarge

and finishes it up with the BioFabric version:

SuperQuick BioFabric Finishing Point
Click on picture to enlarge

It shows how to transform the first into the second, step by step, in a continuous animation  (a D3.js speciality!). Please go take a look, and share the URL with friends, family, colleagues, and acquaintances:


That's it for today. Time to go get some sleep!