Decimation problem

Apr 30, 2009 at 4:48 AM
I'm exploring ways to reduce the size of a large graph to make charting more tractable. Since I have edges with various weights, I deleted those with weights 1 and 2, leaving the heavier ones.  I then need a way to delete the edges that are no longer connected.  I tried recalculating the graph metrics to find vertices with zero degree, but reproducibly get the error message below.

Thanks in advance for any suggestions, and in general for a wonderful app
Pierre

---------------------------
Microsoft NodeXL
---------------------------
An unexpected problem occurred.  If it occurs again, please copy the details to the clipboard by typing Ctrl-C, then post the details to http://www.codeplex.com/NodeXL/Thread/List.aspx.

 

Details:

 

[IndexOutOfRangeException]: Index was outside the bounds of the array.

 

   at Microsoft.Research.CommunityTechnologies.AppLib.ExcelUtil.TryGetNonEmptyStringFromCell(Object[,] cellValues, Int32 rowOneBased, Int32 columnOneBased, String& nonEmptyString)

   at Microsoft.NodeXL.ExcelTemplate.VertexWorksheetReader.AddVertexSubrangeToGraph(Range oVertexSubrange, VertexTableColumnIndexes oVertexTableColumnIndexes, KeyValuePair`2[] aoCustomMenuItemPairIndexes, ReadWorkbookContext oReadWorkbookContext, IGraph oGraph)

   at Microsoft.NodeXL.ExcelTemplate.VertexWorksheetReader.AddVertexTableToGraph(ListObject oVertexTable, ReadWorkbookContext oReadWorkbookContext, IGraph oGraph)

   at Microsoft.NodeXL.ExcelTemplate.VertexWorksheetReader.ReadWorksheet(Workbook workbook, ReadWorkbookContext readWorkbookContext, IGraph graph)

   at Microsoft.NodeXL.ExcelTemplate.WorkbookReader.ReadWorkbookInternal(Workbook workbook, ReadWorkbookContext readWorkbookContext)

   at Microsoft.NodeXL.ExcelTemplate.WorkbookReader.ReadWorkbook(Workbook workbook, ReadWorkbookContext readWorkbookContext)

   at Microsoft.NodeXL.ExcelTemplate.GraphMetricCalculationManager.ReadWorkbook(Workbook oWorkbook)

   at Microsoft.NodeXL.ExcelTemplate.GraphMetricCalculationManager.CalculateGraphMetricsAsync(Workbook workbook, IGraphMetricCalculator2[] graphMetricCalculators, GraphMetricUserSettings graphMetricUserSettings)

   at Microsoft.NodeXL.ExcelTemplate.GraphMetricCalculationManager.CalculateGraphMetricsAsync(Workbook workbook, GraphMetricUserSettings graphMetricUserSettings)

   at Microsoft.NodeXL.ExcelTemplate.CalculateGraphMetricsDialog.OnLoad(EventArgs e)
---------------------------
OK  
---------------------------

Apr 30, 2009 at 5:14 PM
Hello, Pierre:

There are two problems here.  The first is the original one you're trying to solve, and the second is the bug that arose when you tried to solve it.

First things first.  I'm not sure I understand what you need to do.  NodeXL never displays edges that are not connected to two vertices, so I assume that's not what you mean by "edges that are no longer connected."  And by default, it doesn't display vertices that don't have edges.  (Are you setting the Visibility column on the Vertices worksheet to Show?  That changes things.)  Could you please clarify what you need to do?

On the bug front, the "reproducible" part is good news, but I haven't yet been able to reproduce it.  I'm probably not deleting and calculating things in the right order to get the bug to occur.  Would you be able to send me the workbook in which the bug arises?  I'll send you my email separately.  If not, is there a simple sequence of steps I can follow?  "Add three edges, delete the second edge, compute graph metrics," for example.

Once I reproduce the bug, I should be able to fix it quickly.

Thanks,
    Tony
Apr 30, 2009 at 6:43 PM
Thanks Tony

Re the problem I'm trying to solve: my typo didn't help... I meant to say "delete the _vertices_ that are no longer connected", not _edges_.

Re the bug: I upgraded to build 81, and it doesn't occur.  So it may have been something in the earlier build I was using (79).

For the record, let me try explaining again what I was doing.  There may well be a simpler way...

I have a relatively large graph: around 180k edges. My machine can't really process the chart at that scale. So in order to reduce the size, I removed edges with weights 1 and 2.  That would orphan some vertices.  In order to find and remove them, I recalculate the degree metric.  This flags a bunch of vertices with zero (blank) degree, and I can then filter and remove them.

I did this on one graph without trouble. I ran into problems on the second graph I tried this with.  It may have been that I sorted the edges by weight in the first go-round before deleting, whereas with the second graph I just filtered on weight 1 and 2, then deleted on the filtered list. (It took almost forever to process doing it this way, too.)  I got the error message reported above when I tried to recalculate the graph metrics on the second graph.

Bottom line - this wasn't as reproducible as I thought, which I guess is good news in the end ;-)

thanks for all your support
Pierre
Apr 30, 2009 at 7:41 PM
Edited Apr 30, 2009 at 7:41 PM
Pierre:

Well, that would explain why I couldn't reproduce the bug: apparently, I've already fixed it.  That's my favorite kind of bug.

By default, NodeXL doesn't display orphan vertices, and so by default, you shouldn't have to remove orphan vertices after removing edges.  Let's say you add three edges to the Edges worksheet:

A,B
B,C
C,D

If you read the workbook, you'll see four vertices and three edges.  If you then delete the C,D row and read the workbook again, you'll see three vertices and two edges.  The orphan D will still be listed in the Vertices worksheet, but because it is an orphan, it will not be displayed.

Did you set the Visibility column in the Vertices worksheet to "Show"?  That's the only way that orphan vertices can be displayed.  The default Visibility is "Show if in an Edge."

-- Tony