NodeXL to support really large datasets (100000+ relationships)

Jun 6, 2012 at 10:17 AM

Hello Everyone,

 

I have been playing around with NodeXL for some time and I must say I am mighty pleased. It is very well documented and provdies a good platform to build custom applications on top of the base engine. I do however have one issue. I somehow feel this may not be a very good engine for handling really large data sets. I have had some success with a file with upto say 20000 rows, but beyond that the eprformance keeps on deteriorating exponentially.

 

So my question is, how can we make NodeXL support a larger data set? I agree that this may not be an immediate requirement but sooner or later we will have to do it. I have tried with a machine with i7 processor and 6 GB RAM and still not much benefit. So I am assunimg its the architecture we need to focus on. Can somebody tell me where exactly we need to focus to make this more "stronger"? I am willing to do the development and contribute back but I do not have much bandwidth to do the initial R&D? Any comments/suggestions/help please?

 

Abhi

Coordinator
Jun 6, 2012 at 10:28 AM

Hello!

Thank you for the interest in NodeXL!

Our project has focused on making micro and meso scale networks easy to manage for non-programmers.  We recognize that mega and giga scale networks are an interest to many people but we chose to focus on ease of use first.

64 bit Excel and Windows on a machine with 16 or 24 GB of RAM will allow NodeXL to manage larger data sets.  But even then, Excel is limited to 4 million rows and NodeXL is not likely to perform well against more than 100K or so rows.

Your message suggests that you are a software developer so you may want to avoid NodeXL and work with the library we use under the hood for all our metrics calculations: SNAP (http://snap.stanford.edu) (the "Stanford Network Analysis Platform") from Prof. Jure Leskovec at Stanford.  SNAP is highly performant and does not have the limits and overhead of Excel.  SNAP does not, however, perform visualization of the network.

Regards,

Marc

Jun 6, 2012 at 11:05 AM
Thanks Marc.

I was actually prepared for that reply :). Yes I am a developer and SNAP may not be useful to me because of the "no visuallization" feature. I have had a look at libraries like cytoscape but I like the whole look and feel of NodeXL.

On a separate note, I have been tryiong to create a windows application using NodeXL control. Now using this I am easily able to make graphs like the following -

http://www.nodexlgraphgallery.org/Pages/Graph.aspx?graphID=650

What I really want is to create a graph like the one you have created -

http://www.nodexlgraphgallery.org/Pages/Graph.aspx?graphID=662

Now, I see that the above graph is created by you, so who better to ask this question to. the main difference in the two graphs is namely the look and feel of the nodes and edges. Mainly using pics for nodes and coloured and CURVED edges. Is this done using the properties in the provided xml or by using EdgeDrawer or VertexDrawer? Now I was playing around with EdgeDrawer but I was not able to find any property to make my edges curved? I see that there are a lot of properties in the xml (http://www.nodexlgraphgallery.org/Pages/WorkbookOptions.ashx?graphID=662), but how to set these properties in teh NodeXL control for windows application?

Apologies for asking a question apart from the topic at hand, but would greatly appreciate if you could help.

Regards,
Abhi

On Wed, Jun 6, 2012 at 2:58 PM, marcsmith <notifications@codeplex.com> wrote:

From: marcsmith

Hello!

Thank you for the interest in NodeXL!

Our project has focused on making micro and meso scale networks easy to manage for non-programmers. We recognize that mega and giga scale networks are an interest to many people but we chose to focus on ease of use first.

64 bit Excel and Windows on a machine with 16 or 24 GB of RAM will allow NodeXL to manage larger data sets. But even then, Excel is limited to 4 million rows and NodeXL is not likely to perform well against more than 100K or so rows.

Your message suggests that you are a software developer so you may want to avoid NodeXL and work with the library we use under the hood for all our metrics calculations: SNAP (http://snap.stanford.edu) (the "Stanford Network Analysis Platform") from Prof. Jure Leskovec at Stanford. SNAP is highly performant and does not have the limits and overhead of Excel. SNAP does not, however, perform visualization of the network.

Regards,

Marc

Read the full discussion online.

To add a post to this discussion, reply to this email (NodeXL@discussions.codeplex.com)

To start a new discussion for this project, email NodeXL@discussions.codeplex.com

You are receiving this email because you subscribed to this discussion on CodePlex. You can unsubscribe on CodePlex.com.

Please note: Images and attachments will be removed from emails. Any posts to this discussion will also be available online at CodePlex.com


Jun 6, 2012 at 5:24 PM
Edited Jun 6, 2012 at 5:24 PM

Hello, Abhi:

As Marc points out, there are known scalability bottlenecks in NodeXL.  The Excel platform is a big one and the WPF-based display layer is another, but there are numerous other chokepoints scattered throughout the product.  As you know, there are always design tradeoffs to be made, and we've chosen to emphasize ease of use for Excel Template users; ease of use for application developers, who get a simple .NET API and a modern WPF control; and last but not least, speed of implementation, because we have just one full-time developer, me.  SNAP, by contrast, is massively scalable and blazingly fast, but you have to be familiar with C++ templates to use it, and you don't get visualization, at least not right now.

I'll get back to you on your other questions.

-- Tony

Jun 6, 2012 at 5:34 PM
Edited Jun 6, 2012 at 5:44 PM

Abhi:

To display an image for a vertex in the NodeXLControl, you need to first add the image to the vertex's metadata:

    IVertex myVertex = some vertex;
    System.Windows.Media.ImageSource myImage = some image;
    myVertex.SetValue(ReservedMetadataKeys.PerVertexImage, myImage);

Then you need to either change the shape of the individual vertex, like so:

    myVertex.SetValue(ReservedMetadataKeys.PerVertexShape, VertexShape.Image);

...or change the shape of all vertices, like so:

    nodeXLControl.GraphDrawer.VertexDrawer.Shape = VertexShape.Image;

These things must be done before you call NodeXLControl.DrawGraph().

-- Tony

Jun 6, 2012 at 5:38 PM

Abhi:

To draw curved edges, do this:

    nodeXLControl.GraphDrawer.EdgeDrawer.CurveStyle = EdgeCurveStyle.Bezier;

...or this:

    nodeXLControl.GraphDrawer.EdgeDrawer.CurveStyle = EdgeCurveStyle.CurveThroughIntermediatePoints;

You can find out more about these options in the NodeXLApi.chm file, which is distributed with the NodeXL class libraries and the source code.

-- Tony


Jun 6, 2012 at 5:54 PM
Hi Tony,

Many thanks for your comments. I was able to work my way through changing vertex metadata. My question was more around the edges. As in how do I get something like -


http://www.nodexlgraphgallery.org/Pages/Graph.aspx?graphID=662

How can I get these curved/rounded edges as in the graph above using the NodeXL control?

Also are all the properties in the XML mapped to NodeXL control?

Regards,
Abhi

On Wed, Jun 6, 2012 at 10:08 PM, tcap479 <notifications@codeplex.com> wrote:

From: tcap479

Abhi:

To draw curved edges, do this:

nodeXLControl.GraphDrawer.EdgeDrawer.CurveStyle = EdgeCurveStyle.Bezier;

...or this:

nodeXLControl.GraphDrawer.EdgeDrawer.CurveStyle = EdgeCurveStyle.CurveThroughIntermediatePoints;

You can find out more about these options in the NodeXLApi.chm file, which is distributed with the NodeXL class libraries and the source code.

-- Tony



Read the full discussion online.

To add a post to this discussion, reply to this email (NodeXL@discussions.codeplex.com)

To start a new discussion for this project, email NodeXL@discussions.codeplex.com

You are receiving this email because you subscribed to this discussion on CodePlex. You can unsubscribe on CodePlex.com.

Please note: Images and attachments will be removed from emails. Any posts to this discussion will also be available online at CodePlex.com


Jun 6, 2012 at 6:21 PM

Abhi:

Please see my comments regarding curved edges, which I posted separately.

What do you mean by "are all the properties in the XML mapped to NodeXL control"?  What XML are you referring to?

-- Tony

Jun 6, 2012 at 6:44 PM
Hi Tony,

Thanks, I just saw that email. I was referring to this xml ( (http://www.nodexlgraphgallery.org/Pages/WorkbookOptions.ashx?graphID=662)). Its available under the "Download the NodeXL Options Used to Create the Graph " on the sample graphs page.

Thanks again!!

Regards,
Abhi
On Wed, Jun 6, 2012 at 10:51 PM, tcap479 <notifications@codeplex.com> wrote:

From: tcap479

Abhi:

Please see my comments regarding curved edges, which I posted separately.

What do you mean by "are all the properties in the XML mapped to NodeXL control"? What XML are you referring to?

-- Tony

Read the full discussion online.

To add a post to this discussion, reply to this email (NodeXL@discussions.codeplex.com)

To start a new discussion for this project, email NodeXL@discussions.codeplex.com

You are receiving this email because you subscribed to this discussion on CodePlex. You can unsubscribe on CodePlex.com.

Please note: Images and attachments will be removed from emails. Any posts to this discussion will also be available online at CodePlex.com


Jun 7, 2012 at 12:42 AM
Edited Jun 7, 2012 at 12:42 AM

Abhi:

That XML file is a NodeXL options file, which stores all the options for a workbook created from the NodeXL Excel Template application.  Some of those options map directly to NodeXLControl properties; for example, the file contains options for the default vertex shape and edge style we've been talking about.  Other options in the file are specific to the Excel Template application and do not map to the NodeXLControl; settings for visible column groups and dynamic filters, for example.

So the answer to your question is that some, but not all, of the options stored in the options file map to the NodeXLControl.

-- Tony

Jun 11, 2012 at 8:40 AM
Hi Tony,

I checked for the option EdgeDrawer.CurveStyle but I cannot see any property with the anme curvestyle in edgedrawer available properties. I can opnly see the following properties -

Color

DrawArrowOnDirectedEdge

FilteredAlpha

GraphScale

MaximumLabelLength

RelativeArrowSize

SelectedColor

SelectedWidth

Width


I would really appreciate if you could help me here. Maybe there is something very trivial that I am missing.

Thank You.

abhi

On Thu, Jun 7, 2012 at 5:12 AM, tcap479 <notifications@codeplex.com> wrote:

From: tcap479

Abhi:

That XML file is a NodeXL options file, which stores all the options for a workbook created from the NodeXL Excel Template application. Some of those options map directly to NodeXLControl properties; for example, the file contains options for the default vertex shape and edge style we've been talking about. Other options in the file are specific to the Excel Template application and do not map to the API; settings for visible column groups and dynamic filters, for example.

So the answer to your question is that some, but not all, of the options stored in the options file map to the NodeXLControl.

-- Tony

Read the full discussion online.

To add a post to this discussion, reply to this email (NodeXL@discussions.codeplex.com)

To start a new discussion for this project, email NodeXL@discussions.codeplex.com

You are receiving this email because you subscribed to this discussion on CodePlex. You can unsubscribe on CodePlex.com.

Please note: Images and attachments will be removed from emails. Any posts to this discussion will also be available online at CodePlex.com


Jun 11, 2012 at 4:51 PM
Edited Jun 11, 2012 at 4:53 PM

Abhi:

Are you using an older version of NodeXL that didn't have the EdgeDrawer.CurveStyle property?  You can tell which version you have by right-clicking the Smrf.NodeXL.Visualization.Wpf.dll file in Windows Explorer, selecting Properties from the right-click menu, going to the Details tab in the Properties dialog box, and looking at "Product version."  The latest version as of today is 1.0.1.210.

The latest version definitely has an EdgeDrawer.CurveStyle property.

-- Tony