Combing the Hairball

Sunday, April 27, 2014

I'd Like to Use my Lifeline!

As I mentioned in the BioFabric paper, one type of existing visualization where people are used to thinking of "nodes as lines" is the Unified Modeling Language (UML) sequence diagram. There, lifelines are parallel vertical lines that represent objects that are passing messages between themselves in some time sequence. The messages are represented by horizontal lines drawn between the two interacting vertical lifelines. If you rotate the diagram to make the lifelines horizontal, you now have a visualization that would look similar to BioFabric.

But the key difference is that the lifelines are representing objects as they progress though the dimension of time. Of course, representing an object passing through time as a line is a familiar one, perhaps even second nature, for most people. Particularly if the object is a car or a train!

So let's use that insight to provide another way of gaining some intuition about a BioFabric network. Remember that the default layout just uses a breadth-first search of the network, starting at the node with the highest degree (number of incident links); neighbors are visited in the order of their degree as well, highest to lowest.

So think about that search as it proceeds through time, maybe calling out each new link at one-second intervals, so that every second you draw a new link as a "message" between the lifelines of the two nodes. We start drawing the timeline/lifeline for a node when it sends or receives its first message, and stop drawing it when it receives or sends its last message. Thus, a BioFabric network drawing is just a record of this message-passing procedure as it proceeds through time, and we are drawing this step-by-step, with time proceeding left to right. If it helps more, think of the "nodes as points" walking from left to right, one step a second, as they pass these messages:

That's a lot of messages between those people marching left to right!

So if you are having trouble wrapping your head around the BioFabric idea of "nodes as lines", you can always tell Regis that you'd like to use your lifeline!

Friday, April 25, 2014

National Hairball Awareness Day!

I missed it last year, but this year I'm on it! Today is National Hairball Awareness Day. So if you have a pet cat, follow the link and get up to speed on the feline variety. But if you're doing networks, you can do your part to increase your awareness on how to end the hairball menace by following this link instead! (nodes == lines) -> !hairballs

Saturday, March 8, 2014

Poster Posting

OK, the longest dry spell yet here on the blog. It turns out that right after my last post in December I started working in earnest on BioFabric Version 1.1, as well as starting to explore how to make BioFabric into a Cytoscape 3 app. Both efforts are still ongoing, and I will be getting back to blog posting as well.

I'm posting this from Heidelberg, Germany the day after the VIZBI 2014 conference wrapped up, and I figured I would provide some links to the BioTapestry and BioFabric posters that I presented.

2014 BioTapestry Poster (in collaboration with Suzanne Paquette and Kalle Leinonen): BioTapestry: Organized and Scalable Visualization of Gene Regulatory Networks

http://vizbi.org/Posters/2014/C12

2014 BioFabric "Art and Biology" Poster: Escherichia coli K-12: A Gene Regulatory Network

http://vizbi.org/Posters/2014/Y06

While I'm at it, I'll provide the links to my 2013 posters as well.

2013 BioTapestry/BioFabric Poster: From Orthogonal Directed Hyperedges to "Nodes as Lines": BioTapestry and BioFabric

http://vizbi.org/Posters/2013/A02

2013 BioFabric "Art and Biology" Poster: BioFabric Displays the Human Interactome Network

http://vizbi.org/Posters/2013/Y04

Finally, here is the conference poster for VIZBI 2014. Sharp-eyed BioFabric fans might recognize the source of the art on the left side:

http://vizbi.org/Posters/2014/Y01

I had a good time at the conference, and also really enjoyed the chance to give a Flash talk using my Super-Quick BioFabric D3 demo at the Heidelberg Unseminar in Bioinformatics that was held in conjunction with VIZBI 2014. Remember:

Knoten als Linien bedeutet keine Haarballen!

Or something like that (I'm trusting Google Translate here). Till my next post, keep on combing!

Monday, December 9, 2013

Also Sprach Zarathustra

OK, before we get started with this post, you first have to view this video clip to set the appropriate mood:

And with that memorable introduction, I present the BioFabric version of Brendan Griffen's Graph of Influential Thinkers, with 7239 nodes and 14560 edges:

BioFabric Network Visualization of Brendan Griffen's Graph of Thinkers

Click on picture to enlarge

And what's that got to do with the opening credits of 2001: A Space Odyssey? Well, that memorable piece of music, Einleitung, oder Sonnenaufgang (Introduction, or Sunrise), is the famous opening section of Richard Strauss's tone poem Also Sprach Zarathustra. And who was the author of the book Also sprach Zarathustra: Ein Buch für Alle und Keinen that inspired Strauss? Friedrich Nietzsche, who happens to hold the premier, top-left, row #1 position in the BioFabric version of the network:

BioFabric Network Visualization of Brendan Griffen's Graph of Thinkers: Nietzsche

Click on picture to enlarge

The graph was built using the "influenced" and "influenced by" links that appear in the sidebar of many Wikipedia articles about historical and current figures. Go and visit Dr. Griffen's blog post to learn about the creation of his network, and to see his beautiful Gephi-based renderings!

I'll be spending the next couple blog posts discussing this network, which will give me a chance to discuss BioFabric's "similar connectivity" algorithm, since it was used to layout the network instead of the default method. But to get started in this post, I've just included some screen shots of BioFabric showing some of the same thinkers as were depicted in the original blog post. First some artists, with Pablo Picasso as the most visible node:

BioFabric Network Visualization of Brendan Griffen's Graph of Thinkers: Artists

Click on picture to enlarge

Some authors, where Stephen King and H.P. Lovecraft are prominent:

BioFabric Network Visualization of Brendan Griffen's Graph of Thinkers: Authors

Click on picture to enlarge

The comedians include George Carlin and Richard Pryor:

BioFabric Network Visualization of Brendan Griffen's Graph of Thinkers: Comedians

Click on picture to enlarge

More philosophers, who are placed a little further over than Nietzsche:

BioFabric Network Visualization of Brendan Griffen's Graph of Thinkers: Philosophers

Click on picture to enlarge

And some more writers, with Beat poets and other Beat Generation writers showing prominently on the left:

BioFabric Network Visualization of Brendan Griffen's Graph of Thinkers: Beat Writers

Click on picture to enlarge

Of course, the best way to explore the network is to view it in BioFabric. Head on over to the BioFabric Gallery to pick up the .bif file (in a compressed gzip archive file) and have fun! Thanks to Brendan Griffen for providing the data, and keep an eye out for my next blog posts on the network.

Thursday, December 5, 2013

I Was Lost, But Now I'm Found...

In my last posting, I showed a variety of node ordering schemes that could be applied to the combined glucose/oleate network. Of course, the only way to actually do these different layouts is to create a .noa node attribute file that specifies a node ordering and then install it using the Layout->Layout Using Node Attributes... feature. To make that whole process more understandable, I've posted the code for the little standalone Java program I used to create the files up at my BioFabric Github repository at:

https://github.com/wjrl/BioFabric/blob/master/src/org/systemsbiology/biofabric/layouts/GluOleNodeOrder.java

It's a quick-and-dirty implementation that is totally hardwired to this specific example, but taking a look at that code can give you an idea of how to extend it to your particular situation.

Friday, November 29, 2013

I View Yeast to the Breadth and Height BioFabric Can Reach

It's time to pick up where I left off last month with the yeast glucose versus oleate network. In that first post, I introduced the network, and then I showed how the target node rows can be logically arranged in an order so that targets with the same combination of inputs are grouped together.

But I wrapped up that introductory post after showing the two different experimental conditions as two separate networks. But by using the link grouping feature, we can create a single combined network that allows us to directly compare the two conditions side-by-side, and that's the topic of this posting.

First, you should go and review how to set up link groups; I covered that in my Caltech dorm post, and so I won't cover that ground again in detail. In this case, I simply created a single .sif file by combining the results from the two different conditions. For the glucose condition links, I added a "-g" suffix to the link tag, while I added an "-o" suffix to the oleate condition links. I then used these two tags to create the link groups. By putting the glucose tag first in the list of groups, I ensured that the edge wedges for the glucose condition would always show up to the left of the oleate condition edge wedge.

I also created a node attribute file that I used to order the nodes, just like I did in my two original networks. Since there are four transcription factors, there are (4 x 4) - 1 = 15 possible non-zero combinations of the inputs. If you recollect from my first post, I showed how we could consider these 15 different input combinations simply as binary numbers, and then order the targets by just sorting those numbers. This put the target nodes with all four inputs in the topmost rows, and the nodes with only an Adr1p (A) input in the lowest rows. I did exactly the same thing in this case, though in this case I have (4 x 4) x (4 x 4) - 1 = 255 possible non-zero combinations across the four inputs for the two different experimental conditions. Here's the result:

BioFabric network visualization of combined glucose oleate network

Click on picture to enlarge

Let's get oriented here. The leftmost edge wedge contains the links for the targets of Oaf1p (O) under the glucose condition; the next wedge to its immediate right contains links for Oaf1p targets under the oleate condition. Most, but not all, of the O-glucose targets are O-oleate targets too, and there are a whole bunch more new O-oleate targets as well. The pattern of glucose wedge followed by the oleate wedge for each of the four transcription factors is the direct result of our using link grouping to organize the two different experimental conditions. Thus, the same pattern of glucose followed by oleate links repeats across the remaining three source nodes Oaf3p (Y), Pip2p (P), and Adr1p (A).

The crucial point here is to note how we can now directly compare the networks for the two separate conditions. For example, as I just alluded to above, looking at the O targets for glucose versus oleate, we see that maybe 20% of the O targets under glucose are not O targets in the oleate condition. For any combination of inputs and conditions, we can quickly scan the network to find such patterns.

But is the arrangement of node rows that I chose above really the best one for doing these comparisons? I don't think so, and we have complete freedom to arrange the node rows in whatever way works best. The node row ordering I used above simply matched the one I introduced in my last posting. You can see that pattern, starting with the leftmost edge wedge (O-glucose), which has two bands of rows: the band of nodes with edges from O, sitting above the band of nodes without edges from O. That second band of no-edge nodes might require a little bit of imagination to spot, since it's just the empty space below the wedge, but there it is if you think about it a bit! For shorthand, I'll call this banding arrangement (1,0). Then, the next wedge to the right (O-oleate) has four bands (1,0,1,0), the third (Y-glucose) has eight bands (1,0,1,0,1,0,1,0), and and so forth. With this scheme, the rightmost (A-oleate) wedge is the most fragmented, with 256 possible bands (1,0,1,...,0) though there are fewer than that because not all possible combinations are present. Another way to view this arrangement is like a car odometer: the rightmost column is always changing, while the leftmost column almost never changes.

So let's try different node row orders. First, compare the following arrangement with the first. Here, we make the four glucose wedges the most coherent, with the fewest bands, and the oleate wedges are more fragmented:


Click on picture to enlarge

Compare this to the glucose-only network I presented in my introductory post, which I have reproduced here:

BioFabric network visualization of glucose network

Click on picture to enlarge

See how the original pattern of the edge wedges is retained? The glucose-only version reappears with this arrangement, it's just interspersed with the oleate edge wedges.

So I like to view the above arrangement as being "glucose condition centric". You can think of it as perhaps the best organization to use if you want to view and think about the changes between the two conditions where the first, glucose condition, serves as the starting point, or baseline.

But perhaps you want to view the two conditions the other way around, where the oleate condition edge wedges are the most coherent:

Click on picture to enlarge

Comparing that version to the oleate-only network, that I am again showing here, you can see how the original oleate edge wedges are the ones to retain their shapes in the combined version shown above:

BioFabric network visualization of oleate network

Click on image to enlarge

So again, this version of the combined network is perhaps the best organization to use if you want to understand the changes across the conditions with the oleate condition serving as the baseline.

It's important to remember that the above visual changes are being made on exactly the same network file, just with different node attribute files used to lay out the network with different node row orders. Futhermore, those differences are created simply by changing the sorting order used, specifying which edge wedges vary the fastest versus the slowest.

To wrap things up, let's look at an example of how we can use the glucose baseline version to visualize the network changes going from glucose to oleate. Consider the set of nodes that only have inputs from Adr1p (A) in glucose; the thick circle in the following figure highlights that group of nodes. In the oleate condition, most of these nodes now become targets of O, P and/or Y as well, in various combinations. The other four thin circles highlight where to look to see these changes:

Click on image to enlarge

So, for example, about half of these glucose A-only targets become O targets as well in oleate; look at the leftmost red circle to see this. And though it is challenging with the limited resolution of these images, we can also spot two targets that go from A-only in glucose to having all four inputs in oleate. Given the node ordering, they are the two uppermost nodes in the band. You can pick them out at the very top of the P-target set (the third red highlight circle from the left).

So if you have two (or more) networks you want to compare, combine them all into one while using unique link suffix tags to tell them apart. Then use the link group feature to represent each network as separate edge wedges. Finally, change the node row ordering as needed, using the node attribute layout feature, to visualize your data from different perspectives.

Sunday, November 24, 2013

Everybody Wants to be a Node!

My apologies for another long stretch of no postings this fall! First, I was helping to teach the Gene Regulatory Networks in Development course at the MBL in Woods Hole, MA during a good portion of October. Then I got very busy taking a class through Coursera for the last one and a half months, and that ate up my evenings. So the blog fell behind. But I'll now be back at it again, and anticipate that my next post will follow up with the second installment of my last post, which is talking about using link groups to visualize the differences in a network under different experimental conditions.

But before that, I have an example of a BioFabric network in action. Last month was Leroy Hood's 75th birthday celebration, held at the Institute for Systems Biology. As part of the celebration, we assembled some visualizations of Lee's "influence network". One of these networks was based on information from a questionnaire that was sent to Lee's colleagues, and was depicted as a 10 foot long BioFabric network posted on the wall. The pictures here, courtesy of ISB Senior Research Scientist Gustavo Glusman, were taken during the set-up for the party:

Photo by Gustavo Glusman

The network had 330 nodes and 1400 edges; there were nodes for people, places, and research interests. Since the node lines for the people Lee knew were organized in chronological order from when they met him, the viewer could easily spot Lee's professional development, his evolving research interests, and his Caltech to UW to ISB path over the last 40+ years. What was interesting is that people would walk up to the giant poster, find their own node line, and trace their finger along their node to see their associations:

Photo by Gustavo Glusman

Which is exactly what I had hoped they would do, and that's why I think that BioFabric not only enables, but actively invites exploration of very large networks. You can start by seeing the whole structure at once, and subsequently drilling down to the smallest detail does not require you to prune away anything before you can clearly see any relationship you want. Just trace across a node to see how it fits into the whole picture. Let your fingers do the walking! (Does anybody under the age of 30 even know what that means anymore?)