NodeXL and Twitter Sampling

Nov 28, 2010 at 11:30 PM

I recently imported data from Twitter using NodeXL without limitations on the number of people/vertices, based on a hashtag representing an association's conference event.  I noticed immediately that the imported data did not include several tweets that I had made with @ "mentions" of several others in the network even though "replies to" and "mentions" were included in my search specifications.

Moreover, the number of vertices and edges returned were under 100 each.

Any thoughts?  Also, are there any resources that describe Twitter's search function and how Tweets are selected and returned?

Nov 29, 2010 at 3:38 PM

Please see "NodeXL Peeps, Could Use Your Help" at for an explanation of how NodeXL assembles a search network using information provided by Twitter.

-- Tony

Nov 29, 2010 at 5:57 PM

Thank you very much Tony,

This is more or less what I had guessed.  But, let me restate another way to clarify. Tweeter (X) with numerous tweets satisfying the search term is selected by Twitter, but only the most recent record is returned.  Because of this, even though Tweeter (X) included a direct mention of Tweeter (Y) in one of his tweets, this relationship is not illustrated in NodeXL since the data was scrubbed during the de-duping process. Is this correct?

Also, is it true that the same Tweet will be counted twice if it is a "replies to" and both the "direct mention" and "replies to" search criteria are selected.  Looking over my data, this seems to be the case.

Nov 29, 2010 at 7:01 PM

"Is this correct?":  Yes, that is correct.

" it true...?":  If by "counted twice" you mean that two edges are included in the graph, then yes, it is true.

-- Tony

Nov 30, 2010 at 10:02 PM

Thank you very much Tony.  You've been very helpful.