problem importing large network

Apr 10, 2012 at 9:45 PM

Hello,

I'm trying to import a large network subdivided in different graphml files. 

My network is composed by two and a half million of edges and 1600vertices. 

I'm merging the diffent graphml files like explained in this thread: http://nodexl.codeplex.com/discussions/247666

For the first 500thousands edges everything went ok. Afterwards when I try to import the next 50thousands, I got an exception at line 1574 ("return ( (Object[,]

)range.get_Value(Missing.Value) );" ) of the file "Common/Excel/ExcelUtil.cs" and it says: "Invalid Variant was detected".

Anybody could help me plz?

 

 



Apr 11, 2012 at 4:48 PM

Are you using the NodeXL Excel Template?  If so, I don't think that's going to work.  Excel allows only about a million rows in a worksheet, so your two-and-a-half million rows aren't going to fit.

I should mention that in general, NodeXL is meant for use with a few thousand vertices and edges.  You can get away with tens of thousands if you can tolerate slow performance, but when you get into the millions, you are out of NodeXL territory.

If you are a programmer, please check out Jure Leskovec's SNAP project at Stanford (http://snap.stanford.edu/).  This is a graphing library that scales very well to hundreds of millions of vertices and edges.

-- Tony

Apr 11, 2012 at 11:55 PM

Hello,

Thank you for your reply. 

Yes, I'm using NodeXL Excel Template.

I know that Excel allows only about a million of edges. In fact I would like to make an application allowing the user to choose which part of the network he wants to visualize, so that it will import only that part of the network interested.

But I dont see why I got that exception since I had imported only 500thousands, which are far from 1 million. I checked the whole space allocated from the task "Excel.exe" and it reached the maximum of only 500MB (compared to my 4GB of RAM).

Thank you in advance for your help.

Apr 12, 2012 at 12:15 AM

And thanks for advising me SNAP. I already checked it out.

I sent an email to Jure Leskovec to know if exists some kind of C#/SNAP wrapper.

Apr 12, 2012 at 5:42 AM
Edited Apr 12, 2012 at 5:44 AM

You are using the program way beyond its intended limits.  I've never tested it beyond 30,000 edges, where it is painfully slow, and frankly I'm surprised it made it to half a million in your case.

I can point out a couple of things you might look at, though.  The failure is occurring during a workbook read operation.  I assume that you have rebuilt the source code and can run it in the debugger.  (The fact that you quoted a line number tells me that.  I don't distribute the PDB files that contain the line numbers because they are so large.)  When the failure occurs, you can look at the stack trace and see exactly what is being read.  Is it a huge range?  One possibility is that Excel limits the size of the range you can read via the .NET object model wrapper, and that you have reached that limit.  For example, Excel limits the length of a string that can be written to a cell to about 8,200 characters.  This is inexplicable, undocumented, seemingly arbitrary (8,200???) and very frustrating, but it's a fact.  Perhaps there is also a limit for reading a range, and it's around 500,000 rows.

Another possibility is that there is troublesome data in a particular cell that is giving Excel fits.  The "invalid variant was detected" message might imply that.  Can you change your GraphML data and see if the problem still occurs?  If you get beyond half a million with different data, then it might be a problem with your particular data.  Otherwise, it might just be a too-much-data problem.

In any case, be prepared to spend a lot of time tracking this down.  I can't begin to tell you how many hours I've spent chasing weird Excel behavior.

-- Tony