When Charles Darwin wrote that line in The Origin of Species (1859), he was talking about a landscape feature:
"It is interesting to contemplate an entangled bank, clothed with many plants of many kinds, with birds singing on the bushes, with various insects flitting about, and with worms crawling through the damp earth, and to reflect that these elaborately constructed forms, so different from each other, and dependent on each other in so complex a manner, have all been produced by laws acting around us."
Unfortunately, that is as close to biology as we are going to get in this next series of blog postings, since the BioFabric network I am about to cover is actually about another bank: the World Bank, established in 1944 at the Bretton Woods Conference, and whose "...official goal is the reduction of poverty."
Why am I working with a network from the World Bank? Well, the Guardian, Google, and the Open Knowledge Foundation announced a competition in February to visualize "...an open dataset from any government open data website". But I ran across this link in late March, and the contest was ending on April 2nd. So after a brief and admittedly desultory search through their list of open data websites, trying to find a compelling network data set, I decided to bag it.
But then, almost immediately after, I ran across the "World Bank Global Development Sprint" being hosted at visualizing.org. Apparently also triggered by the February push to raise the profile of open government data, this effort is working to build a collaborative web-based network visualization of a data set of World Bank Major Contract Awards.
Go take a look at the site. What they have been creating is well-crafted, visually stunning, and fun to watch, but it uses the traditional node-link diagram approach with nodes-as-points. I was not feeling that I was getting deep insights into the data set, and I was wondering what I could do with the data in BioFabric to find interesting patterns. So my interest was piqued, and this gave me the kick I needed. I went to the World Bank site, downloaded the data, and started playing with it.
As mentioned above, this is a network visualization of the World Bank Major Contract Awards for fiscal years 2007-2013. The network consists of 44,213 nodes and 66,021 edges. Each edge corresponds to a row in the data table on the World Bank site, though three apparent duplicate rows have been dropped. Each contract row thus creates a edge that is basically of the form (Borrower Country) --> (Supplier Country:Supplier); network nodes are either countries or suppliers. Basically, we get to see what countries are borrowing from the World Bank, and who is getting the contracts from each country. Additionally, since each supplier node is also tagged with the supplier country, we get to see what countries the contracts are going to. I'll go into lots more details in subsequent blog posts.
To get started, here is the full network view, followed by a screen shot of BioFabric looking at the contract wedge for Niger. I'll be the first to admit that it looks pretty uninspiring compared to the snazzy version over at visualizing.org, but I think it offers an exciting way to deeply and systematically probe the data set:
Click on picture to enlarge |
Click on picture to enlarge |
Some higher resolution network snapshots, the BioFabric .bif file, and links to the data sources are now up on the BioFabric Gallery. As with other networks, the best way to look at them is by loading the .bif file into BioFabric and then go exploring.
That's it for this introductory post. In my next posts, I'll talk much more about the details of this network. So get ready to go and "contemplate an entangled bank"!
No comments:
Post a Comment