Very sparse, 4k-edges graph: problems with layouts

Mar 10, 2010 at 10:16 PM

I am using v1.0.1.113 to layout a 4000 node, 3700 edge graph, with a very low edge density and 300 connected components, most of which look like trees (I know this because I have displayed it with another tool). I expected the Harel-Koren layout to do a pretty good job: coarsened versions of such a sparse, spindly graph should be very easy to lay out. However, this is not the case: most of the graph remains a hairball. Two suggestions:

  • Harel-Koren seems to be unaware of components during layout, and I have verified that the hairball is produced by overlaps of multiple independently-layed out components. If this is the case, a possible fix would be to use a synthetic node connected to all components during the layout (to prevent overlaps), and then to remove this synthetic node and its edges once the layout is complete.
  • Is it possible to extend the layout time per click on the 'lay out again' button allotted to Harel-Koren or other layouts? In particular, HK does not seem to be incremental, so that "layout progress" is not carried on between successive clicks.

Running Fruchterman-Reingold instead of Harel-Koren does not completely detangle the graph, even after several clicks to the 'again' button. Some questions/comments:

  • Is it possible to tune the layout parameters? It seems that edges attraction is lower than it should for such a sparse graph.
  • When attempting to use FR layout after HK, the hairball in HK "explodes" outwards in a ring of clumped-together edges and nodes, due to an enormous initial repulsion. If this is the case, the "expanding ring" can be avoided by setting a maximum to the displacement that any given node can undergo per layout tick (placing an upper limit on "node speed").

I have posted a version of this dataset at http://hornbake-313.umd.edu/grants-subset-nodexl.xlsx ; feel free to download it and try it out (it contains a 6% subset of the 2008-2010 grants from NSF).

Finally, I would like to acknowledge the great work that has gone into this tool. Yes, it is not perfect - but it is already very impressive.

Graph Type Undirected
   
Vertices 3,993
   
Unique Edges 3,658
Edges With Duplicates 71
Total Edges 3,729
   
Self-Loops 0
   
Connected Components 307
Single-Vertex Connected Components 0
Maximum Vertices in a Connected Component 205
Maximum Edges in a Connected Component 206
   
Maximum Geodesic Distance (Diameter) Not Available
Average Geodesic Distance Not Available
   
Graph Density 0
   
NodeXL Version 1.0.1.113
Mar 11, 2010 at 1:54 AM
Edited Mar 11, 2010 at 1:57 AM

Our team is discussing various improvements to the graph layouts in NodeXL, including adding new algorithms.

You are correct about Harel-Koren not being incremental.  I think that's by design, although I'm not the person who implemented that particular algorithm.

For Fruchterman-Reingold, you can tweak the strength of the repulsive force between vertices, as well as the iterations per click of the Lay Out Again button.  In version 1.0.1.113 and later, go to NodeXL, Graph, Layout, Layout Options.  In earlier versions, go to Options in the graph pane and click Layout in the Options dialog box.

Have you tried using one of the geometric layouts (Circle, Grid, etc.) or the Random layout as a starting point for Fruchterman-Reingold?  I've found that helps untangle some graphs, but it's obviously very dependent on your particular data.

Thanks for your supportive comments.  NodeXL is an ongoing project and we strive for continuous improvement.

-- Tony

Mar 11, 2010 at 7:06 PM

Hi Tony,

Adjusting the layout options worked beautifully, thanks for the help. In the interest of making it easier to find for novices like me, there is a small inconsistency between the ribbon's layout menu (which does not contain the "Layout options..." item) and the network-view layout menu (which does). In general, anyone wanting to use FR layout on a multi-thousand node network will need to adjust this setting.

Thanks again!

Mar 11, 2010 at 8:13 PM

I'm glad to hear that helped.

On the "Layout Options..." menu item issue, I just checked on two computers whether the menu item appears at the end of the list of available layout algorithms when I open the NodeXL, Graph, Layout drop-down in the ribbon.  It does show up, right after "None," as expected.  Are you certain that it's not in the list on your computer?

-- Tony