Monday, July 8, 2013

Big Data, Big Documents: The 100-Foot-Wide PDF


RBioFabric Version 0.3


I've just committed RBioFabric Version 0.3 on Github.  You can go take a look at it: 


You can install it directly from GitHub using the following command sequence and then start working with it:
# You need 'devtools':
install.packages("devtools")
# load it:
library(devtools)
# install 'RBioFabric' from GitHub:
install_github('RBioFabric',  username='wjrl')
And here's a screen shot of RBioFabric in action inside RStudio:

RBioFabric running in RStudio
RBioFabric in action!


This new version has added a couple of necessary features to the very bare-bones first version:
  • You can specify a node order via an ordered list of node names, or a supplied reordering function.
  • You can display shadow links.

There's still lots to do, but RBioFabric actually provides some neat features that are not available in the Java version:
  • You can easily read in a variety of graph formats, since RBioFabric operates on the graphs provided by the igraph package. 
  • You can have BioFabric do a default layout that starts at a user-specified node, instead of the highest-degree node. This is shown in the example documentation for the defaultNodeOrder function. 
  • You can create PDF files of your BioFabric network.

RBioFabric and PDF


It's true that RBioFabric is, for the moment, the only way to create a BioFabric PDF output. This is because the current Java BioFabric version can only directly export to PNG. While it is possible to print a network to a PDF target with Java BioFabric, I have found that the results for large networks are unacceptable, apparently due to precision issues. For example, the one time I tried it, the endpoint glyphs did not coincide with their corresponding link ends! So if the shortcomings of the current RBioFabric are not an issue (e.g. you cannot mix directed and undirected link types, there is no explicit edge ordering, etc.), you can use it for getting a PDF of your network.

But there are some caveats to be aware of when doing PDF outputs. First, some PDF viewers are better than others. Specifically, PDF viewers (or PNG viewers, for that matter) that cannot do antialiasing of line art are a terrible choice for viewing BioFabric networks. The closely spaced parallel lines of a BioFabric plot MUST be antialiased to get acceptable results when you are zoomed out to view the whole network. It's also useful for the viewer to have a decent maximum zoom level and a "Hand Tool" to be able to navigate by dragging the cursor over the image. I've tested a few viewers, and here is what I found. Note that all my computers are pretty old, so newer versions of these tools may do a better job:
  • Evince on Linux (Document Viewer 2.30.3 tested): Antialiasing is always on, and the visuals are good. But there are a few problems. First, very tiny text below some size threshold explodes to a huge size.  Second, you cannot zoom above 400 percent, which is simply insufficient to explore your network. Finally, there is no hand tool to navigate by mouse dragging, which is essential. As a side note, for Postscript output, Evince does not antialias the image, giving very poor results.  
  • Preview on Mac (Version 4.2 tested): Be sure that Anti-alias text and line art is checked on the PDF tab in Preferences, which gives adequate visuals (I feel they are way too dark at the full-network level). It has a very good maximum zoom level, and the Move cursor provides convenient mouse-drag navigation. You can also Select a rectangle and then zoom to it using command-* (i.e. command-shift-8).
  • Adobe Reader on Mac (Version 9.5.5 tested): Be sure that Smooth line art is checked, and (VERY IMPORTANT) Enhance thin lines is NOT checked, on the Page Display Preferences. The Hand tool is available via Tools->Select & Zoom->Hand Tool, and the Marquee Zoom from the same Select & Zoom menu allows you to quickly zoom to a selected rectangle. The maximum 6400 percent zoom level is very good for exploration.

Sizing the PDF Document


It is important to make sure that your PDF document is large enough! If you don't set your PDF document height and width to a large enough value, the small text labels will not appear. My experiments show that both Adobe Reader and Mac Preview will no longer display the smallest node labels when the document gets smaller than about .0145 inches per link, which is about 69 links per inch. To get labels that are correctly proportioned, it appears to be best to actually stay above .0175 inches per link, i.e. 57 links per inch. So, for the yeastHighQuality.sif network displayed on the www.BioFabric.org home page, which contains 6888 links, you need to make a PDF file about 100 inches wide (i.e. 8 feet, 4 inches) to be just able to view it, and 120 inches wide (10 feet) to really do a decent job.

You read that last dimension correctly. To be able to explore a 6,888 link BioFabric network right down to the smallest detail, you need to make your PDF 10 feet wide! The implication is that a network with 69,000 links would need to be 100 feet wide. And that starts to limit what can be handled in the PDF viewer tools I tested, using the network from the Cytoscape HumanInteractomeMay.sif file, which has 61,263 links. Using the resolution guidelines I gave above, I made the PDF document 1200 inches wide by 240 inches high; that's 100 feet wide by 20 feet tall. The Adobe Reader just would not load it; it seems that it hits a limit at documents that are 200 inches square. The Mac Preview tool was able to load it, and you could zoom all the way in to view the labels. Unfortunately, the full-network view looks really bad.

The bottom line is that if you need to make a poster-sized (e.g. 48 inch-wide) image of a network with maybe 2,750 links or less, PDF can (in theory) provide a scalable, completely readable image. I say in theory, since your plotter driver rendering the PDF may have its own issues with tiny text and hairlines. If you have more links than that, a PNG file of 300 to 600 dpi will provide a decent image, though it will not have per-link resolution. I've been successful with the PNG route for the posters I have made so far.

So RBioFabric now provides a route to creating huge PDF documents for your network. But by far the best way to interactively explore a large network is to still use the BioFabric Java application, since the tool is designed to view large networks using the built-in interactive search, magnifier, touring, mouseover, submodeling, and view-tiling features.

Be sure to keep watching this blog for more announcements of future RBioFabric improvements! 

1 comment:

  1. Note: On Linux systems, if you try to install devtools and get error messages about Rcurl, you will need to make sure that nlibcurl-devel is installed on your system. If not, install it e.g.: sudo yum install libcurl-devel

    ReplyDelete