trying to find clusters and changing schemes to weighted graph

Jul 19, 2010 at 7:51 PM

Hey, I have a graphML file that contains information about vertex1 vertex2 and edge weight along with instruction to draw undirected graph. Now I am trying to find out clusters, and then redraw the graph with scheme set to weighted graph. Please tell me the steps to do so. I tried using class ClusterCalculator explained in http://nodexl.codeplex.com/Thread/View.aspx?ThreadId=217889 but how to redraw graph with this new information along with drawing scheme set to weighted graph. I am writing an console application and saving the graph on disk.

Thanks for the help and pointers in advance

Jul 20, 2010 at 4:22 AM

You don't want to use the NodeXL Excel Template's Weighted Graph Scheme, because in addition to setting the edge widths, that Scheme also sets the color and shape of each vertex.  I believe you want to assign the color and shape of each vertex based on their membership in a cluster, and that conflicts with the Weighted Graph Scheme.

Instead, just set the width of each edge to a value proportional to its edge weight, using linear interpolation between the minimum and maximum edge weight values in your data set.  Populating your graph then looks something like this, in pseudocode:

foreach (cluster calculated by the ClusterCalculator)
{
    foreach (vertex in the cluster)
    {
        set the shape and color of the vertex to some unique shape/color pair;
    }
}

foreach (edge in the graph)
{
    set the edge width to a value proportional to its edge weight;
}

-- Tony

Jul 20, 2010 at 4:25 PM

Thank You so much Tony . I will try doing this what you explained..

My next question is :

1. the graphML file I have contains duplicate edges as well as self loop edges for eg ( vertex1 = XXX and vertex2=XXX edge-weight=0) . So do I have to remove the duplication as well as self loops or there is a way in NodeXL that does it for me.

2. I tried plotting graph using 

                 GraphMLGraphAdapter adapter = new GraphMLGraphAdapter();


                IGraph graph = adapter.LoadGraphFromFile(@"XYZL.graphml");

                ILayout oLayout = new SpiralLayout();     //FruchtermanReingoldLayout();     //HarelKorenFastMultiscaleLayout();  //CircleLayout();   // SugiyamaLayout();  //RandomLayout();

but when I choosing FruchtermanReingoldLayout , HarelKorenFastMultiscaleLayout , SugiyamaLayout  for the graph layout . It hangs and then in the end says System Out of Memory exception. but things work fine with rest of the layouts and I get a image.

as I said my graphML file do contain duplicates and self loops and till now I am passing the file like that. Is this what am doing wrong or some other imp thing I am missing.

Thanks again
Tony  for the pointers and I really appreciate your time.


 

Jul 20, 2010 at 5:47 PM

NodeXL doesn't require that duplicate edges or self loops be removed from the graph.  Whether you do that is entirely up to the requirements of your application.

How large is your graph -- roughly how many vertices and how many edges?  The three layouts you mentioned are the most memory intensive, and if you have a large graph and limited memory, you may indeed lack the memory that NodeXL requires to perform the requested layout.

-- Tony

Jul 20, 2010 at 7:29 PM

Hi Tony,

Right now I have aprox 15000 entires in my gpraphML file . Entries are in format ( vertex1  vertex2  edge-weight) so this way aprox 15000 entries. I have 3gb of RAM .

if you think that this is not enough then can you kindly suggest me aprox RAM requirement for such large file or even larger.

Another Problem I ran into was that for the smaller number of nodes say ~500 to ~900 I get good graph and bitmap image turns out to be neat but again for the larger node edge entries say ~7000 it get messed up and looks only like a back hole in the center.

Is there any  way I can get neat and clear image with such large files. 

Thanks a lot 

Jul 21, 2010 at 12:31 AM

You're probably hitting .NET's limits for 32-bit processes. It's not going to help to add more memory to your 3GB 32-bit computer, which can likely take only 4GB anyway. (An extra GB would probably be wasted in any case, for reasons involving memory-mapped I/O that I won't go into here.)

With data sets as large as yours, you can either move to a 64-bit computer, where the memory limits are far higher, or try to reduce your graph to a more manageable size. As you've discovered, making visual sense of such dense graphs is a real challenge --not just in NodeXL, but in the graphing world in general-- and reducing the graph size cuts memory requirements, speeds up layouts (if that is an issue for you), and makes the graph more legible.

Of course, graph reduction may or may not be practical in your case; it all depends on the nature of your data. Are there tons of duplicate edges? If so, can you merge them before populating the NodeXL graph? NodeXL doesn't currently draw duplicate edges in a very useful way anyway (it draws them on top of each other), so maybe this will work in your case. Or can you filter the vertices and edges using some application-dependent criteria before populating the graph? Or remove unconnected vertices, or vertices with small numbers of edges?

I should mention that NodeXL isn't really optimized for large graphs, although we hope to scale better in future releases. Right now, I usually recommend it for graphs with a few thousand vertices and a few thousand edges. It can handle larger graphs (much larger graphs if you have enough memory and patience), but not very efficiently.

By the way, of the three layouts you mentioned you were having trouble with, Fruchterman-Reingold is the least memory-intensive.

Jul 21, 2010 at 2:34 PM

Thanks Tony . You always been great help. I really appreciate your time and effort.