Out of Memory exception errors when importing large Twitter networks

Apr 22, 2014 at 9:28 PM

Trying to import a reasonably sized Twitter network - circa 10K user to 1.5 degrees. IT keeps failing with 'Out of Memory' and I'm wondering what are the limits here? Is NodeXL trying to cache the whole import as it progresses - or is there a log file I can find where I can retrieve the nodes and edges discovered unto the point of the crash?


Apr 23, 2014 at 4:10 AM
Edited Apr 23, 2014 at 5:51 AM
Hello, Mike:

To answer your last question first, there is no log from which you can recover the partial network. When you get "out of memory," the network is gone. And yes, NodeXL is building the entire network in memory as it progresses. There are certainly alternative techniques, but they are more difficult to implement and with our limited development resources, we've generally opted for quick, simple implementation over difficult mega-scalability.

10,000 users is actually a large network for NodeXL, but there are a few things you can try to make more memory available and avoid the problem, starting from easier to more difficult:

1) Close all programs, including Excel, then run only NodeXL again.

2) Restart your computer, then run only NodeXL again.

3) If you are using VMWare or Parallels on a Mac, tell VMWare or Parallels to assign more memory to the Windows virtual computer.

4) Add more memory to the computer.

Note that the standard 32-bit version of Excel that most people have can use only 2GB of memory, so NodeXL can still run out of memory even if your computer has 16GB. And that leads to the final option:

5) Use 64-bit Excel, which will use all the memory you can throw at it.

-- Tony
Apr 23, 2014 at 11:10 PM
Thx Tony - good thought on 32bit, I'm using excel 2007 which is only 32bit so I'll do that upgrade first and report back if that solves the problem.

Wondering if for large networks if there are any plans to allow NodeXL to gather data in stages e.g. first 5K nodes, second 5K nodes, third 5K nodes etc etc? Could this be a future enhancement?


Apr 23, 2014 at 11:20 PM
Edited Apr 23, 2014 at 11:21 PM
Hi, Mike:

We've talked about addressing the overall scaling/performance issues in NodeXL, and making the Twitter importers less resource-intensive would certainly be a part of that. That task isn't scheduled yet, but I'm taking the liberty of putting your vote in the "yes, we need this" column.

Apr 23, 2014 at 11:24 PM
Oh wait, I misread your post. But another work item we're discussing is allowing smaller networks to be fetched and then combined. That's a tricky problem, but it might address the issue you're running into.

-- Tony
Apr 24, 2014 at 2:58 PM
Hmm - anybody tried NodelXL 2014 template with 64bit Excel? The install runs OK, says it has installed, but the nodeXL tab does not appear in the ribbon and NodeXL does not appear in the Start-Programs list. Any ideas where to start looking
Apr 24, 2014 at 5:16 PM
Edited Apr 24, 2014 at 5:25 PM

I've moved your latest question to a new discussion at http://nodexl.codeplex.com/discussions/543222 .

-- Tony