NodeXL Peeps, Could Use Your Help

Oct 4, 2010 at 2:34 PM

Good afternoon, All :)  I have been utilizing NodeXL to map out Twitter conversation, and as a result, would like to analyze all conversation around #HRTechConf from 9/28/10 through today (or 10/2/10, which would include a date range of the conference + 1 day before through + 1 day after.)  My challenge is that NodeXL is not importing all data properly.  For example, Summarizr accurately shows me (@jletourneau) at a total of 150 Tweets mentioning #HRTechConf, yet NodeXL is only importing 16 In-Degree Tweets (mentions & RTs') and 1 Out-Degree Tweet.  (The accurate Summarizr data can be found at: http://bit.ly/HRTechConfSummary).  I exported the .TAR file from TwapperKeeper as I created an archive during the morning of 9/29 (a little late, but no big deal) . . . however importing into Excel has been extremely problematic.  Do any of you have any advice in how I can map out all the data, not just the small sample of data that NodeXL is importing?  Since NodeXL isn't importing all the data, the map would be useless because it wouldn't necessarily convey the "truth" of the conversation around #HRTechConf.  Thoughts?  Thanks, Everyone!  (Josh Letourneau - 404-418-8152 jl(at)knightbishop.com)

Oct 4, 2010 at 5:19 PM

Josh:

NodeXL's Import from Twitter Search Network feature does not count the number of times an individual mentioned the search term, so I would not expect its results to be the same as those obtained from Summarizr.

This is what NodeXL does in version 1.0.1.137.  If this doesn't answer the first part of your question, or if you have specific questions about importing from the .TAR file, let me know.

1. Ask Twitter for tweets that satisfy the search criteria.  Stop when 1) N tweets with unique authors are received, where N is the "Limit to N people" setting; 2) 1,500 tweets are received (that is the most tweets that Twitter will send); or 3) Twitter has found all recent tweets that satisfy the search criteria, whichever comes first.

2. Add a vertex for each unique author.

3. If "Add an edge for each Follows relationship" is checked, ask Twitter for a list of up to N people whom each unique author follows.  Add an edge for each follows relationship between the unique authors.

4. If "Add an edge for each Replies-to relationship in tweets" is checked, add an edge for each "replies-to" relationship found in the tweets obtained in step 1.

5. Ditto for "Mentions relationships in tweets."

-- Tony

Oct 4, 2010 at 5:52 PM
Edited Oct 4, 2010 at 5:57 PM

Thanks for the quick reply, Tony.  I'm currently "rate limited" by Twitter and have been rejected 4 consecutive times.  I do use the request recommended here, but they still reject the request (the last time, they said rate-limiting was "for programmers, not researchers.")  In regards to your notes, let me ask:

1. I limit at 500 People to get away from the rate-limiting constraint that Twitter will not lift.  Would you or anyone perhaps be able to import the network over 500 (as there are about total 'Twitterers') and email it to me??

2. Where can I specify to "Add a vertex for each unique author"? In the import dialogue ("From Twitter Search Network"), it only offers me to add an edge for each 'Follow', 'RT', and @mention.  Am I missing a piece of the dialogue, as my only criteria listed is edge-specific, not vertex.  Or am I missing something else?

3 - 5. I imagine cleaning up the import will help to resolve these, so I'll stay tuned.

Thanks for your help - really trying to understand what is going on here.  Your help is greatly appreciated! :)

Oct 5, 2010 at 12:27 AM

Josh:

The "add a vertex for each unique author" is not an option -- it's what NodeXL always does.  It asks Twitter for tweets that satisfy the search criteria, then filters out duplicate authors.  That means that Twitter may provide 1,500 tweets, but if 501 of them were tweeted by the same author, then there are only 1,000 unique authors within the 1,500 tweets and you will get only 1,000 vertices in the graph.  Put another way: Each vertex represents a unique author whose latest tweet satisfies the search criteria.

Regarding rate limiting, NodeXL now pauses when Twitter's rate limit is hit, then resumes after about an hour.  This feature was introduced in version 1.0.1.130.  That means that you will eventually get the network you asked for, but it may take a while.

Also, since version 1.0.1.127 of NodeXL, Twitter increased the rate limit even if you are not whitelisted, provided that you have authorized NodeXL to use your Twitter account to import Twitter networks.  (If you don't see the "I have a Twitter account" options in the Twitter dialog boxes, please download and install the latest NodeXL version.)  This may or may not get you the entire network without pausing, but it will increase your chances.

-- Tony

Oct 5, 2010 at 12:42 AM

Wow, Tony - outstanding help.  I completely follow what you mean and realized some of this when I pulled in the raw data.  Between you and I, the .tar file from Summarizr is a beast, though.  I'm glad NodeXL now waits an hour - this will help all of us, for sure.  The new book, "Analyzing Social Media Networks Using NodeXL" is superb in helping us better learn to navigate the tool as well.  Thanks (big-time) for your thoughts and help, and I hope I can return the favor in the near future.  Seriously, just drop me a line - I owe you one!