API usage

Feb 23, 2011 at 12:47 AM


I'm new to NodeXL and I'm looking for some tutorial to use the API. For example loading data, and running cluster algorithm.

The excel addin is great but not useful for massive data. In my case I have billions of Pearson Correlations of gene expression profiles. What I want to do is to load a simple tab seperated file, calculate the cluster and write each cluster to a new output file.

Would be nice to have a simple tutorial to get started.




Feb 23, 2011 at 2:01 AM
Edited Feb 23, 2011 at 2:03 AM


I wouldn't recommend trying to use NodeXL to process billions of rows, because you will be unhappy with its performance.  It will take forever and will consume huge amounts of memory.  NodeXL is optimized for ease of use, both at the Excel and API level, when working with graphs that have a few thousand vertices and a few thousand edges.  You can push it with tens of thousands of vertices and edges if you are the patient type, but billions is simply out of NodeXL's league.

Instead, I would suggest taking a look at SNAP, the Stanford Network Analysis Platform developed by Jure Leskovec and his team.  SNAP is optimized for use with very large graphs and might be just what you're looking for.

-- Tony

Feb 23, 2011 at 3:43 AM

I thought the ClausetNewmanMoore algorithm is using SNAP. So would you expect that it takes forever or a couple of hours/days? I just have to process the data once.





Feb 23, 2011 at 5:43 AM

That's correct.  But first you would have to populate a NodeXL Graph object with billions of vertices and edges, and your machine will likely run out of managed memory long before that finishes.  And even if that worked somehow, the NodeXL Graph object would then have to pass through a C#-to-C++ translation layer (NodeXL is C#, SNAP is C++), which would require even more memory, gigabytes of disk space, and I don't know how much time.

You can try it, but really, it's just not practical.  Working directly with SNAP would make much more sense for your application.  You need a Ferrari; don’t settle for a Honda.

-- Tony

Feb 23, 2011 at 6:27 AM

Ok. I just tried a filtered subset of highly correlated genes (1 600 000 edges) and applied the WakitaTsurumi clustering. It was pretty fast and it is good enough for know. The clustering returns communities and vertices within the communities, right. Can i also get the corresponding subgraph to a community somehow??



PS: Is there a recommendation of ClausetNewmanMoore over WakitaTsurumi?

Feb 23, 2011 at 4:48 PM
Edited Feb 23, 2011 at 4:52 PM


To extract a subgraph from a graph (is that what you're asking for?), use Microsoft.NodeXL.Algorithms.SubgraphCalculator.GetSubgraphAsNewGraph().  You'll find it in the NodeXLApi.chm help file.

You can read about the various differences between the clustering algorithms in the papers that introduced them:

* http://arxiv.org/PS_cache/cs/pdf/0702/0702048v1.pdf

* http://www.santafe.edu/media/workingpapers/01-12-077.pdf

* http://www.ece.unm.edu/ifis/papers/community-moore.pdf

-- Tony

Feb 23, 2011 at 5:12 PM

Many thanks, Tony.

It's exactly what I was looking for. So what the paper are saying is that WakitaTsurumi is basically more sacalable version of ClausetNewmanMoore which what I want in my scenario.

One more question. I 'm using the SimpleGraphAdapter to load my data. IS their a way to tell the adapter to create an undirected graph or do I have to implement my own adapter.


Feb 23, 2011 at 6:11 PM


Here is a comment I found in the SimpleGraphAdapter code:

        // For now, support only directed graphs.  This may get modified in the
        // future to support undirected graphs as well, at which time some
        // mechanism must be added to tell this class the directedness of
        // the graph that is about to be created.  Possible solution: Add a
        // SimpleGraphAdapter.Directedness property.

So that is a "to-do" item that hasn't been done yet.

You could do one of the following:

1. Implement your own adapter from scratch.

2. Copy the SimpleGraphAdapter code, rename the class, and adapt it to your needs.  I believe the only changes required are to change the GraphDirectedness argument to the internal "new Graph()" call, and to change the oEdges.Add() call to use a third argument of false.  If you go this route, the source code is available as a download on the Downloads tab on CodePlex.  You don't need to rebuild the source code; just steal the SimpleGraphAdapter.cs file.

3. Use the SimpleGraphAdapter as is, then copy the vertices and edges to a new undirected graph and discard the first graph.  That's what I would do, assuming I had enough memory.

-- Tony

Feb 24, 2011 at 3:20 AM

Tony, many thanks!!! Everything is working. Unfortunately, the CalculateStronglyConnectedComponents ends wirth stack overflow with large networks but I have my own implementation to find independant subgraphs formy domain models in place. So it's not a big deal. NodeXL is working very good even with larger networks. Good work!!


Feb 24, 2011 at 5:07 PM

I'm glad to hear that you have found it useful.  Thanks for letting me know of your success.

-- Tony