How does NodeXL import data from Twitter Search Network?

Aug 9, 2011 at 8:06 PM

Hello,

          I am trying to import data from Twitter search network into NodeXL. However, there is a "rate limiting" due to which I am only able to download a maximum of 1,000 entries. So I want to know if these entries are random (a 1000 entries out of many more) or does NodeXL have a specific way of performing it.  Moreover, I also want to know if NodeXL works in space and time mode and downloads a list of the most recent searches in a specific location ? 

 Some very popular topics might have more than a million searches done on it, for example. So how does NodeXL pick only a 1000 people out of all those who performed a searched on the same topic ? This is my key question. Also, let's say that I keep on doing a search on one particular topic for about a week. Is my network graph going to have the same nodes and edges (same people) or is it going to evolve with time (different people every time I run a search operation) ?

I also assume that the network graph produced by NodeXL has people connected by edges, only on the basis of a "particular search term" contained in their tweets. Right? Correct me if I am wrong. So, does it mean that two strangers are connected to each other, solely on the basis of their similar searches ? or is there is real connection (friends, coworkers, etc.) between the two?

 

I would really appreciate if you can try to answer my questions

Thanks.

Aug 10, 2011 at 4:14 AM
Edited Aug 10, 2011 at 4:15 AM

Poojanam:

The Twitter Search Network does not provide information about people who have searched for a term on Twitter.  Instead, it provides information about people who have included the search term in their tweets.

Here are some points that I hope will answer your other questions.  If I’ve missed anything, let me know.

* Each vertex that you get from running the Twitter Search Network represents a person who has tweeted the search term.

* Each edge represents either 1) a follows relationship between those people; or 2) a replies-to relationship in those people’s tweets; or 3) a mentions relationship in those people’s tweets; or 4) the tweets themselves, if they are not replies-to or mentions.  In case 4 the edges are self-loops (they connect the vertex to itself), which signify that the person tweeted the search term without mentioning or replying to any of the other people in the network.

* To make that concrete, let’s say John and Mary tweeted your search term, and you’ve checked all the options corresponding to cases 1 through 4.  There will be one vertex for John and one for Mary.  If John follows Mary, there will be an edge between John and Mary (case 1).  If John’s tweet was a reply to Mary, there will be another such edge (case 2).  If John’s tweet mentioned Mary, there will be another such edge (case 3).  And if John’s tweet neither replied-to nor mentioned Mary, there will be a self-loop from John to John (case 4).

* John may have tweeted several times and mentioned the search term each time.  In that case, there would be one or more edges for each tweet.

* Twitter rate limiting does not limit the size of the network you can get; it just makes getting the network take longer.  If you hit the Twitter limit, NodeXL will pause for about an hour and then continue getting the rest of the network when it wakes up.  So if you ask for a big network you’ll eventually get it.

* If you are seeing only 1,000 vertices, it’s probably because you have “Limit to” checked.  Uncheck it and you will get up to 1,500 vertices, which is the maximum that Twitter will provide.  (Twitter actually provides up to 1,500 tweets, which could result in fewer than 1,500 vertices if someone tweeted the search term in multiple tweets.)

* The returned tweets are not random.  They are the most recent tweets that contain the search term.  They are not by geographic location.

* Because people are constantly tweeting, today's network may look very different tomorrow.

I am considering adding this as a topic in the NodeXL help system (NodeXL, Help, Help in the Ribbon).  The way it all works is not very obvious.

-- Tony

 

Aug 10, 2011 at 11:55 PM

Hi Tony. Thanks a lot for replying. I now have a clear idea of the Twitter Search Network. Your example has helped me understand the functioning of NodeXL

--Poojan

May 23, 2012 at 9:43 AM

"In case 4 the edges are self-loops (they connect the vertex to itself), which signify that the person tweeted the search term without mentioning or replying to any of the other people in the network."

I've noticed deviations from this rule. For a dataset I've gathered in the beginning of december in some tweets the Vertex 2 is not the same, as Vertex 1, although there were not RT, Replies or Mentions in the tweet itself. How is it possible?

May 23, 2012 at 2:40 PM

What is the value in the "Relationship" column on the Edges worksheet for those tweets?

-- Tony

May 23, 2012 at 2:45 PM

Tweet.

May 23, 2012 at 2:52 PM

And Vertex 2 is not the same as Vertex 1?  I don't think that's possible.  Did you edit the worksheet in some way?

-- Tony

May 23, 2012 at 3:00 PM
tcap479 wrote:

And Vertex 2 is not the same as Vertex 1?  I don't think that's possible.  Did you edit the worksheet in some way?

-- Tony

No, I didn't edit it. Here is the screenshot: https://www.dropbox.com/s/wqnhdbwcuzvdfyv/NodeXL_twitter.PNG

May 24, 2012 at 1:20 AM

The worksheet has indeed been edited.  At a minimum, you have added three columns and duplicated the Tweet column.  I'm wondering if you might have inadvertently edited something else to give the unexpected results you're reporting.

If you do find that while using a recent version of NodeXL to get a Twitter user network, you get a Tweet row where Vertex 1 is not the same as Vertex 2, please let me know.  I do not know how that can happen.

-- Tony