Using NodeXL for citation analysis

Mar 9, 2010 at 3:00 PM

I'm completely new to NodeXL and have just been going through some of your documentation, tutorials, etc to see if this would be a good program for doing analysis on citations (who cites whom for a set of articles). But before I go further, I just wanted to see if anyone has thoughts on whether this is the right tool for doing this kind of work. I will have roughly 120 articles, and I'm going to create a database that tracks the citations for each of those 120 articles (anywhere from 3-100 citations... so probably around 6000 data points). We're interested in seeing who cites whom, the most influential articles, and maybe going deeper by creating codes for type of article (e.g. commentary, top-tier journal article, think tank research piece, etc) and/or general type of argument set forth in the article.

Does it make sense to use NodeXL for this work or would some other program be better (if you know of one)? Also, any pointers would be welcome on getting started if NodeXL is the right place for doing this? E.g. should each citation in each document get a separate row? We'll be using EndNote or RefWorks to track/download a lot of our citations, but we'll probably have to manually enter some. Appreciate some thoughts on this.

Mar 9, 2010 at 4:12 PM

Yes, I think NodeXL would be appropriate for the work you are doing.  There are many network graph analysis programs out there, but NodeXL's claim to fame is its ease of use: Everything is done within Excel 2007, so if you are comfortable with Excel, you are well on your way to being able to use NodeXL.

At its simplest, all you have to do to display a network graph within NodeXL is enter an edge list into the Edges worksheet.  Each edge is specified with the names of the two vertices the edge connects.  In your case, the vertices are articles and the edges are citations, so the Edges worksheet would look like this:

Vertex 1 Vertex 2
---------- ----------
Article1 Article2
Article3 Article4
Article3 Article1

Article1 has cited Article2, while Article3 has cited both Article4 and Article1.

You can enter an edge list manually, of course, but one of the advantages of working within Excel is that Excel has all sorts of data import facilities you can take advantage of.  If your edge list is in a CSV file, or a tab-delimited file, or an XML file, or in a SQL database, or on a Web page, or in any number of other formats, you can easily get it into an Excel workbook.  And once it's in an Excel workbook, you can import it into the NodeXL workbook using our Import From Open Edge Workbook feature.

The appearance of the edges and vertices can be customized by filling in a variety of "Visual Property" columns: color, width, style, shape, opacity, and so on.  So if you code articles by type, you can show one type as red circles and another type as blue diamonds, for example.

NodeXL can calculate a set of "graph metrics" for you to aid in analysis of your citation network.  These include in-degree, out-degree, various centrality measures, clustering coefficient, and so on.

-- Tony


Mar 9, 2010 at 4:34 PM

Thanks a lot for this very comprehensive answer. This will help me get started.