Plotting Nodes in Sequential Order (from Adjacency Matrix Data Set)

Jul 5, 2013 at 10:22 PM
I have a set of data representing a series of academic papers and the connections between them (i.e. when one paper cites another). The data is currently in the form of an adjacency matrix, arranged in chronological order; i.e. the most recent papers are in the topmost position in column A, and the leftmost position in Row 1. I have successfully imported the adjacency matrix into NodeXL.

However, the output that I am hoping to create is a node/edge graph in which the nodes are ordered sequentially--that is, ordered top to bottom with the most recent papers on top and the least recent papers on the bottom. Is there any way to do this in NodeXL?
Coordinator
Jul 6, 2013 at 3:12 AM
Hello!

Thanks for the interest in NodeXL.

Every vertex in NodeXL has an attribute "Layout Order". This governs the sequence in which the vertex shapes will be drawn. It is a kind of control of the "z" position of the vertex.

Every vertex also has an "X" and "Y" value. These attributes govern the location of the vertex in the graph pane. Usually, these values are set by a layout algorithm which positions each vertex to optimize the overall visibility of the network.

You may want to try a grid or tree layout (Sugiyama) to get the results you want. Alternatively, let the network get laid out by an algorithm then change the values of the "Y" position of each vertex using the "Autofill columns" feature.

Regards,

Marc
Jul 6, 2013 at 7:55 PM
Thanks for the response, Mark!

Gotcha, this helped a great deal, though I'm still not there. I added two columns to the table in the Vertices tabs: Date and Number of Citations. I was able to successfully use the Autofill Columns feature to link the Number of Citations column with the size of the vertices. However, I was unsuccessful with regards to the Date column.

What I want is for the vertices to be plotted in an x,y grid in which the y-axis represents the date of publication--so the most recent papers will have the highest y-values. Ideally, I'd like the x-value for each vertex (is that the singular of vertices?) to be determined by an algorithm in order to achieve the most visually pleasing layout of the vertices as possible (while keeping the y-values fixed). I did successfully Autofill the (hidden) y-value column. However, the first issue is that it prompted me to also autofill the (hidden) x-value column. That is not a major problem in and of itself because, at least for now, I don't care a great deal about an optimized x-value setup, because my graph is small enough so that I don't think it will be visually cluttered.

The bigger problem is that, even with the Vertex-Y options set to what I think is the highest spread (low value corresponds to 0; high value corresponds to 9,999), the placement of the vertices does not change in the graph pane, even after I push Refresh Graph. I'm wondering if this has to do with the fact that the source column cell is formatted as a date?

Any thoughts would be very much appreciated. So far, I am amazed and delighted by this tool--I have been looking for awhile for a tool like this, and this is by far the best I've found.

Best,

Ben
Jul 7, 2013 at 6:37 PM
Edited Jul 7, 2013 at 6:38 PM
Ben:

Offhand, I can't think of anything that might prevent the graph's vertices from being placed at their autofilled locations. However, the placement should occur when you click the Autofill button in the Autofill Columns dialog box, not when you click Refresh Graph. Clicking Refresh Graph should, in fact, have no effect on the vertex locations.

Can you try a simple experiment? Please do the following:
  1. Create a new NodeXL workbook. (If you already have a NodeXL workbook opened, you can create a new workbook by clicking the Office button and then the New NodeXL Workbook menu item on the Office button menu.)
  2. On the Edges worksheet, create one simple edges by entering some random vertex names into cells A3 and B3.
  3. In the graph pane, click the Show Graph button.
  4. Select the Vertices worksheet using the tabs at the lower-left corner of the workbook.
  5. On the Vertices worksheet, change the text in cell AC2 from "Add Your Own Columns Here" to "Date".
  6. In cell AC3, enter the date "1/1/2013".
  7. In cell AC4, enter the date "1/1/2014".
  8. In the ribbon, select NodeXL, Visual Properties, Autofill Columns.
  9. In the Autofill Columns dialog box, select the Vertices tab.
  10. On the Vertices tab, select "Date" for the "Vertex X" and "Vertex Y" columns, then click the Autofill button.
The two vertices should immediately move to the upper-right and lower-left corners of the graph pane.

When you autofill the X and Y columns, NodeXL automatically sets NodeXL, Graph, Layout to "None", which prevents the vertices from moving again when you click Refresh Graph. That is by design.

If this experiment gives you results that differ from what I describe, please tell me what actually happens on your computer.

-- Tony
Jul 7, 2013 at 8:19 PM
Hi Tony,

Thanks so much for your help. That problem is solved (and I'm sorry to say the issue was more my boneheadedness than anything to do with NodeXL--instead of adding my additional data columns to the right of column AC in the Vertices Workbook--Add Your Column Here--I had added the first one, which was for the date, in column AC itself; later that day, I read the help section on the Autofill feature, realized my mistake, and then this morning created a new NodeXL workbook and put the data in the columns starting with AD; now it works great).

I do have a follow-up question for you. Now, the vertices are plotted on the Y-axis according to their date. The trouble is that, first, the date range is pretty wide (stretches back to the 19th century), and, second, the vertices are heavily concentrated in the more recent date range (i.e. last 50 years). I tried using logarithmic mapping, but that actually exacerbated the problem (I think this may be due to the manner in which Excel numerically represents dates). In any case, what I am wondering is whether there is any way to maintain the chronological sequence (and not worry about linearly plotting the actual dates) while also relying on the algorithms that have been developed (e.g. Fruchterman-Reingold) to optimize the visualization?

Best,

Ben
Jul 7, 2013 at 9:35 PM
A basic solution to the problem of high-concentration of vertices in a relatively small date range is to use a date ranking rather than a date as the Y-value (just used the excel formula rank() to do this). This is an imperfect solution, though, because actually what you would want is for the highly concentrated date range to be disproportionately expanded and the low-concentration date range to be disproportionately concentrated. But this mechanical solution definitely helps.

Thus, the key outstanding question is--is there any way to run a force-directed algorithm on a data set subject to the constraint that the nodes must be ordered in a particular sequence along the y-axis?
Jul 8, 2013 at 5:03 PM
Edited Jul 8, 2013 at 6:43 PM
Ben:

The force-directed layout algorithms do not have a notion of vertex order. The Fruchterman-Reingold algorithm, for example, attempts to satisfy these criteria:
  1. Distribute the vertices evenly in the frame.
  2. Minimize edge crossings.
  3. Make edge lengths uniform.
  4. Reflect inherent symmetry.
  5. Conform to the frame.
(That's from "Graph Drawing by Force-directed Placement," Fruchterman and Reingold's original paper describing their technique.) There is no mention of order, and in fact I don't know what "order" could mean in this context.

I'm wondering if a different type of graph might be more appropriate for your needs--perhaps a scatterplot that allows data points to be plotted within Cartesian coordinates and selectively connected to each other. Trying to get NodeXL to do that would be difficult. You would have to devise and implement a new layout algorithm, or use fancy Excel formula or VBA calculations to set all the X and Y column values.

-- Tony
Jul 9, 2013 at 7:24 PM
Hi Tony,

Thanks for your input on this one. Yes, it seems like the Fruchterman-Reingold algorithm may not be the right fit. But the thing is that, except for the lack of the ability to constrain by order, the properties of this or other force-directed algorithms seem perfect, especially the ability to optimize the visual representation of a large quantity of vertices. For instance, the criteria you list above, especially #2 and #3, are very desirable. A scatter plot that allows points to be plotted and connected could work, though my concern is that the challenge of optimizing the plotting is the very thing that it seems like these force directed algorithms have figured out.

I wonder if it would be useful for me to show you what I have so far (i'm attaching my NodeXL file). If you happen to take a look, note that the vertices are arranged in chronological order from top to bottom. Even in this not-visually-optimized presentation, I think that this representation is very valuable. In fact, I think that figuring out how to visually optimize subject to an ordering constraint has tremendous commercial potential and usefulness in a wide range of areas. Anyways, I would love to hear any thoughts you may have on this, and thanks for your help so far.

Best,

Ben