Auto-calculate frequency/size

Apr 14, 2010 at 3:38 PM

I've reviewed a couple previous posts about frequency / size but I haven't quite seen a clear answer about if NodeXL can auto-calculate the node size based upon the data entered. I'm going to use the core example of social media networks. Take the Kite Network data on page 23 of the NodeXL tutorial. Assume that the data there came from, let's say, 3 social networks. For example, Andre is on Facebook, Twitter, and Ning. On those different networks he has various connections - staying true to the Kite data, none of his connections would exist on multiple networks. Would there be a way to use NodeXL's graph to also show each person's frequency of membership of the various networks referenced? In other words, of the 3 networks used how many does Andre participate in? I would presume the size of his node could indicate that. It would be valuable, at least to me, as another piece of data. Now I understand that if I had that data independently I could enter it in the "size" column. But imagine I don't. All I know is which network I pulled the data from. So I knew Vertex1, Vertex2, and Source (Source being a term I'm making up here). I'd love NodeXL to automatically calculate the size of each node based on that data alone - because the manual work in excel for me to figure it out then enter it with each corresponding line item is overwhelming. Another example would be if the senate data you had was Vote, Senator1, Senator2 where Vote = unique name of the vote, Senator1 and Senator2 = a pair of like voting senators (you can imagine Senator1 and Senator 2 being a list generated much like the social network list would be).

I really hope that makes sense. I'd be happy to be contacted directly if that would help. Many, many thanks. I think the program shows great promise (for helping with my data needs).

Apr 15, 2010 at 12:07 AM
Edited Apr 15, 2010 at 5:06 PM

In NodeXL, you can easily set the size of the graph's vertices based on the values in a numeric column on the Vertices worksheet.  You can use Autofill Columns to do that, and you can find it at NodeXL, Visual Properties in the Excel ribbon.

The problem in your case is that the information of interest does not exist in a numeric column on the Vertices worksheet.  It's on the Edges worksheet in a Source column that contains text, and that information must be aggregated on a per-vertex basis.  There is probably some clever way to do this with Excel formulas, but I don't know offhand how to do it that way.  Instead, I would add a VBA (Visual Basic for Applications) routine to do it.  It's pretty easy to loop through the worksheets in VBA and count unique Source values in the Edges worksheet for each person in the Vertices worksheet.

Do you want the VBA routine that will do this?

-- Tony

Apr 15, 2010 at 6:25 PM

Tony - I certainly appreciate your reply. The answer, as I understand it, is that NodeXL doesn't do it but that a VBA routine could likely be built to do so. That's good to know. I'm not yet actively using NodeXL, more so making inquiry into if it's useful. I currently run a homegrown program that demonstrates both co-occurrence and frequency. I run it on research samples in which participants are choosing items from a large set and then grouping them (so the co-occurrence is telling me what they picked with what). I'm really not very sophisticated in my understanding of your program and the methods/terms it's built upon but so far I'm thinking it would work. One big question is always how to prepare the data. Usually we use an Access query to create the matched pairs of data, the only challenge I see is that we usually have duplicated (Item A - Item B, Item B - Item A). I'd have to find a way to overcome that in NodeXL. But overall, I think NodeXL offers a lot more sophisticated analysis potential. So thanks for now. No need to send anything at this time.


Apr 15, 2010 at 7:12 PM
Edited Apr 15, 2010 at 7:12 PM


That's correct: NodeXL won't do the aggregation for you, but the nice thing about running in Excel is that VBA is always available for more advanced tasks.  And that task is pretty simple in your case.

Regarding the duplicate edges, NodeXL will merge them into a single edge on request.  In an undirected graph (NodeXL, Graph, Type), if you have the edges (Item A - Item B) and (Item B - Item A) and you select NodeXL, Data, Prepare Data, Merge Duplicate Edges, then NodeXL will combine those edges into a single (Item A - Item B) edge.  (It will also add an Edge Weight column and set the Edge Weight to 2 in this case, but you can ignore the Edge Weight if you don't need it.)

Also, if you populate a standard Excel workbook using Access, you can import the results into a NodeXL workbook using NodeXL, Data, Import, From Open Edge Workbook.

-- Tony