Problem with duplicate vertices

Jul 11, 2013 at 11:55 AM
Hi.
First of all, great work! I´ve been using the Excel template for a while and I love it. Now, however, I´m trying to work the functionality into a Windows forms application (vb.net) and have encountered some problems.

I am feeding the results of a database query into Graph.Vertices and Graph.Edges in a row by row manner, that is, for each row in my datatable I create two vertices and one edge, labeling them with the appropriate column names. The resulting graph is a set of binary relations with each instance of a node (people, in this case) occurring several times instead of having several edges. Since the ID property of vertices is a ReadOnly integer, I can´t work out how to make NodeXL understand that two vertices with different IDs (but identical labels) are in fact the same one. Even if I try to NOT add the vertex if it is already added, I need to give it some sort of variable value to reference it by, in order to perform that test. Hence:

Is there a way to give vertices an identity so that two vertices will be recognized as the same?

If not, is there a way to merge vertices with identical labels?

Or is it possible to feed an array to Graph.Vertices and Graph.Edges rather than doing it row by row, and to let NodeXL work out the appropriate vertices and edges. Or should I perhaps go about the whole thing in a different manner.

I have provided some of the code I´m trying to use below. Any suggestions are appreciated.
Dim NodeXlControl1 As New Smrf.NodeXL.Visualization.Wpf.NodeXLControl
Mainform.ElementHost1.Child = NodeXlControl1

Using cnn As OleDbConnection = CConnection.Connection
Dim Selectstring As String = "SELECT Entity1, Entity2, LinkType FROM MyTable
Dim da As New OleDbDataAdapter(Selectstring, cnn)
Dim ds As New DataSet
da.Fill(ds, "MyTable")

Dim oVertices As Smrf.NodeXL.Core.VertexCollection = NodeXlControl1.Graph.Vertices
Dim oEdges As Smrf.NodeXL.Core.EdgeCollection = NodeXlControl1.Graph.Edges

For i As Integer = 0 To ds.Tables(0).Rows.Count - 1
Dim Entity1 As String = ds.Tables(0).Rows(i).Item("Entity1")
Dim Entity2 As String = ds.Tables(0).Rows(i).Item("Entity2")
Dim EdgeType As String = ds.Tables(0).Rows(i).Item("LinkType")

Dim oVerticeA As Smrf.NodeXL.Core.IVertex = oVertices.Add
Dim oVerticeB As Smrf.NodeXL.Core.IVertex = oVertices.Add

oVerticeA.SetValue(ReservedMetadataKeys.PerVertexLabel, Entity1)
oVerticeB.SetValue(ReservedMetadataKeys.PerVertexLabel, Entity2)

Dim oEdge1 As Smrf.NodeXL.Core.Edge = oEdges.Add(oVerticeA, oVerticeB, True)
oEdge1.SetValue(ReservedMetadataKeys.PerEdgeLabel, EdgeType)

Next
NodeXlControl1.DrawGraph(True)
Jul 11, 2013 at 5:10 PM
Never mind, I solved it.

For anyone with the same issue:
Dim Dictionary As New Dictionary(Of String, IVertex)

If oVertices.Contains(ENID1) Then
                        For Each kvp As KeyValuePair(Of String, IVertex) In Dictionary
                            If kvp.Key = ENID1 Then
                                oVerticeA = kvp.Value
                            End If
                        Next
                    Else
                        oVerticeA = oVertices.Add
                        oVerticeA.SetValue(ReservedMetadataKeys.PerVertexLabel, Entity1)
                        oVerticeA.Name = ENID1
                        Dictionary.Add(ENID1, oVerticeA)
                    End If

                    If oVertices.Contains(ENID2) Then
                        For Each kvp As KeyValuePair(Of String, IVertex) In Dictionary
                            If kvp.Key = ENID2 Then
                                oVerticeB = kvp.Value
                            End If
                        Next
                    Else
                        oVerticeB = oVertices.Add
                        oVerticeB.SetValue(ReservedMetadataKeys.PerVertexLabel, Entity2)
                        oVerticeB.Name = ENID2
                        Dictionary.Add(ENID2, oVerticeB)
                    End If
Jul 11, 2013 at 5:29 PM
Edited Jul 11, 2013 at 6:14 PM
Two vertices are two vertices. Their identities cannot change (hence the read-only nature of their ID properties) and they cannot be merged.

You populate a NodeXL graph by adding unique vertices to the Vertices collection, and then connecting those vertices with edges. When you add an edge, you specify two vertices that are already in the Vertices collection. If you add another edge that reuses one of those vertices, you do not add the vertex again; instead, you specify the existing vertex again.

The way to accomplish this is as follows, in pseudocode :
for (each edge that needs to be added to the graph)
{
    for (each of the edge's two vertex names)
    {
        if (the vertex has already been added to the Vertices collection)
        {
            retrieve the existing Vertex object;
        }
        else
        {
            add a new Vertex object to the Vertices collection;
        }
    }

    create an edge that uses the two Vertex objects;
}
Now the problem becomes this: How do you determine if the vertex has already been added to the Vertices collection? One way is to use Graph.Vertices.Find() to look for an existing vertex by name. For small graphs, that will work fine. For larger graphs the Find() method is too slow, because it performs a brute-force linear search through the entire collection every time you call it. In that case, you can use a Dictionary<String, IVertex> object to keep track of your vertices as you create them. The Dictionary's key is a vertex name, and the value is the corresponding vertex. The Dictionary allows you to quickly determine if a vertex already exists. When you finish populating the graph, you discard the Dictionary.

The code now looks something like this:
for (each edge that needs to be added to the graph)
{
    for (each of the edge's two vertex names)
    {
        if (the vertex is in the Dictionary)
        {
            retrieve the existing Vertex object from the Dictionary;
        }
        else
        {
            add a new Vertex object to the Vertices collection;
            add the Vertex object to the Dictionary;
        }
    }

    create an edge that uses the two Vertex objects;
}
If you haven't used a .NET Dictionary before, it's documented here:

http://msdn.microsoft.com/en-us/library/xfhwa508.aspx

The relevant methods are TryGetValue() and Add().

-- Tony
Jul 11, 2013 at 5:40 PM
Our posts crossed paths.

Your approach is somewhat similar to what I suggested, but you are not using the Dictionary correctly. By iterating through it you are defeating its purpose, which it to provide quick lookups. Use Dictionary.TryGetValue() instead. And if you use a Dictionary, you do not also need to use Graph.Vertices.Contains().

Also, I hope you are not implementing the same code twice, once for each vertex. Create one GetOrAddVertex() function instead, and call it twice.

-- Tony
Jul 12, 2013 at 8:42 AM
Hi Tony, thanks for your reply.

The posted code is not what I will use, I will rewrite it for efficiency once I move it to its proper module. But thanks for your advice on using the dictionary, I am not familiar with it and I will follow your suggestions.