Any suggestions for making a better visualization

Sep 26, 2013 at 11:24 PM
Hi all,

I am making an authorship patterns graph with around 2000 nodes. Some papers (depicted by edges) are single authored and majority are co-authored work. Can anyone please suggest me which visualization method will be the best?
Which graph algorithm should I use?
With 2000 nodes, I just want to extract and highlight main features that may become apparent.
Later, I would like to highlight some important subgraphs from it such as:
single-author graphs
multi-author graphs
Top-author graphs (based on various criteria that classifies an author as TOP author) etc.

I have been struggling which method to use. Any guidance in this regard will highly be appreciated. Moreover, I wanted to add an image with this message (the image I have been able to make thus far)- but I don't know how to upload attachments here?

Please advise!
Sep 27, 2013 at 1:07 AM
Edited Sep 27, 2013 at 1:08 AM

Thank you for the interest in NodeXL.

The discussion board will not allow for images, but you can link to an image hosted elsewhere on the web or in the "Issue Tracker" section of the NodeXL Codeplex site.

There are many controls for the appearance of a NodeXL visualization. It can be difficult to know all of the features. A shortcut is to apply another user's set of choices to your own network. The NodeXL Options file is like a recipe for data. You can find this recipe file linked from the bottom of the pages on the NodeXL Graph Galley (see:

For example, this is the recipe I use for mapping Twitter networks:
This may not be a perfect fit for your data, but it could be a good place to start.

You can download this file and then import it into NodeXL via the NodeXL>Options>Import menu. If you want to make these settings your new default settings for all new NodeXL workbooks on your machine then select the "Use current for new" button in the NodeXL options menu. Once imported, select the NodeXL>Graph>Automate>Run command. This will apply the settings to your data.

Feel free to write back with any further questions: we will try to get you to where you want to go!


-- Marc
Sep 27, 2013 at 1:10 AM
You can also share your data set by uploading to the NodeXL Graph Gallery. Use the NodeXL>Data>Export>To NodeXL Graph Gallery to do this.

You may want to create an account on the Graph Gallery (which allows you to delete any data you have uploaded).

-- Marc
Sep 28, 2013 at 9:01 PM
Edited Sep 29, 2013 at 12:41 AM
Dear Marc

Thank you very much for your quick response and for sharing the useful links. I have been exploring the sent options to get some results. I do have some issues in obtaining what I really want to get:

Is it possible to do following splitting within a single graph:
  1. Based on the self-loop count from vertices worksheet, if this count is positive then it means that the author has some single authored papers. If this count is zero then the papers are multi-authored. So this is the first splitting I want to achieve in the graph
  2. In both of these sub-graphs (single and multi-authored), I now want to highlight the top authors based on degrees (in case where self-loop count is positive 95% of the nodes have degree=1 but few nodes do have greater than 1 degree).
  3. Lastly, in both of these sub-graphs I want to highlight the top nodes based on edge weights. For single-authored papers the edge weight reflects publication counts and for multi-authored articles an edge weight will show strong collaboration.
I have uploaded partial data set, for your reference. Please let me know when I should delete it :)

Also, is it possible to do temporal analysis? My complete data set spans 50 years. I assume adding extra columns in the vertices worksheet and then the use of GROUP option could help me out in this?
Any advice in this regard will be highly appreciated.

Please let me know.
Sep 28, 2013 at 9:17 PM
Edited Sep 29, 2013 at 12:41 AM
One more thing, in the link you shared for downloading twitter settings, as a start up point....I was wondering how the auto fill option has been filled for the Vertices worksheet? It contains 'Followers' in several rows. How has this parameter 'Followers' been defined? Is it unidirectional relationship, please correct me if I am wrong? For my authorship analysis, the relations needs to be bidirectional. Furthermore, in the following graph you have uploaded ( and the information you have extracted seems relevant to somewhat I am trying to achieve. You have extracted some top features of every group. For this purpose, which sheet (edges or vertices) did you use to add extra relationship columns?

Thanks again!
Sep 29, 2013 at 5:30 PM
Edited Sep 29, 2013 at 5:34 PM
  1. A set of vertices can be grouped by any attribute. If you have network metrics like in-degree, you can use the NodeXL>Analysis>Groups>Group by Vertex Attribute feature to place different vertices with different values in different groups. You could add a column (Insert the column by select the column labeled "Add your own columns here" and right-clikc, Insert) in the Edges worksheet and call it "Self loop?" and fill it with the formula:
=IF([@[Vertex 1]]=[@[Vertex 2]],1,0)

This value would then need to be moved to the Vertices worksheet using a VLOOKUP formula (note: the Edges worksheet MUST be sorted Ascending by Vertex 1).

I inserted a column (again selecting the Add you own column here) in the Vertices worksheet and called it "Self Loop". I used the following formula to look up the Self Loop status in the Edges worksheet:

=VLOOKUP([@Vertex],Edges[[Vertex 1]:[Self loop?]],14)

This value can then be used in the Group by Vertex Attribute Feature:

  1. You can distinguish nodes by size based on any Vertex attribute value. Select NodeXL>Visual Properties>Vertices>Vertex Size> select a value.
  2. NodeXL can use any edge value to control edge weight. See NodeXL>Visual Properties>Autofill Columns>Edges>Edge Weight to select a data element that will drive the size of each edge. Note, the Options for Edge Width in the Autofill Columns dialog allows fine control over the way data is mapped to edge width. You can also manually adjust each edge's weight by editing the values in the Edges worksheet "Weight" column in the Visual Properties section. You may need to reveal these columns (they are hidden by default). To do so, select: NodeXL>Show/Hide>Workbook Columns>Visual Properties.
  3. If you have a date stamp on each Edge you can filter data by date in NodeXL by selecting the "Dynamic Filters" feature from the graph pane (the window in which the graph is displayed, which is labeled "Document Actions". Each Edge and Vertex data element is presented in the filter dialog. You can set a lower and upper limit and all edges or vertices that do not meet that filter will be removed. Moving the filter window across time allows you to animate the network.
  4. "Followers" is defined by Twitter. A "follows" relationship is directed and is not required to a be a mutual tie.
  5. Content analysis is possible in NodeXL. Select the NodeXL>Analysis>Graph Metrics>Words and Word Pairs feature. Set the options to point to any column of text related to the edge or vertex.

-- Marc
Sep 29, 2013 at 6:02 PM
Edited Sep 29, 2013 at 6:03 PM
I created an updated version of your graph:

It is not exactly what you asked for - only authors who only have self loops are segregated yet, but is is an improvement (I think).

Have a look: you can use the "recipe" or "options" for this file for your own use if you like.

-- Marc
Oct 3, 2013 at 10:14 PM
Edited Oct 3, 2013 at 10:25 PM
Hi Marc,

Thanks for the guidance and prompt response. I am working on adding other features and will bug you (hopefully) with more questions. Thanks a million!
Oct 4, 2013 at 5:04 AM

I may need you to ask this question again as I am not sure I am understanding the data input format or the desired outcome.

What I think you are trying to do is to create a bi-modal network that links Authors to the Categories in which their papers are classified.

If your data is in the format:

AuthorName, Topic_Category_1, Topic_Category_2, Topic_Category_3, Topic_Category_N

Is your goal to create the edges:

AuthorName, Topic_Category_1
AuthorName, Topic_Category_2
AuthorName, Topic_Category_3
AuthorName, Topic_Category_N

-- Marc

Oct 8, 2013 at 11:40 PM
Thanks Marc for the guidance.

I was struggling with this question, by taking an alternate route i.e., by collecting all the keywords of an author e.g.,

Author1 in Paper1 has Subject_category1, Subject Category2
Author1 in Paper5 has Subject_category1, Subject Category4
Author1 in Paper7 has Subject_category2, Subject Category7

and I wanted to get it combined as:

Author1 Subject_category1(count = 3), Subject Category2(count=2), Subject Category4(count=1), Subject Category7(count=1)

The above could help me identify which author is most active in which field / sub-field.

The scheme you have proposed is easy and do-able and it could help in classifying the active fields (in terms of most number of authors etc.).

I still need to work more :)

If I want to cite NodeXL, what should I use as a reference?
Oct 8, 2013 at 11:42 PM