import mechanism of twiiter

Feb 19, 2014 at 1:56 PM
Dear Forum,

my colleagues and me are planning to gather twitter data from a couple of conferences. Therefore we tried to find out how the search engine of the twitter api and its nodexl output are working: Are all tweets with a certain hashtag are shown? Is it only the most recent? We haven't found any information about this central question and it would be great, if you could help us!

Best regards
Clemens
Feb 19, 2014 at 5:54 PM
Edited Feb 19, 2014 at 5:54 PM
Hello, Clemens:

When you enter a search term (which doesn't have to be a hashtag) into the Import from Twitter Search Network dialog box, NodeXL asks Twitter for the 100 most recent tweets that contained that term. You can increase the limit, but you won't necessarily get as many tweets as you ask for. For popular topics, Twitter may only give a selection of tweets to NodeXL, and it won't provide tweets older than about a week. Here is how Twitter explains it:

"Please note that Twitter's search service and, by extension, the Search API is not meant to be an exhaustive source of Tweets. Not all Tweets will be indexed or made available via the search interface."

"The Search API is not complete index of all Tweets, but instead an index of recent Tweets. At the moment that index includes between 6-9 days of Tweets."

(The "Search API" is what NodeXL uses to get the tweets from Twitter.)

So that's bad news for anyone wanting to get a complete set of tweets, but at least now you know what the limitations are.

-- Tony
Feb 20, 2014 at 10:05 AM
Thanx for your help, Tony! This is very helpful. So, if I assume that the search API is at least a complete index of all recent public tweets, a complete data set could be colltected by importing data on each day of an event for a number of times and sorting out all doublings. That's circuitous, but manageable.

Best regards
Clemens
Feb 20, 2014 at 11:35 AM
Sorry, we just discussed in our team and I would like to concretize my request: If we chose to limit the data import (with respect to a popular search term) to, let’s say 100 tweets or links – how does the NodeXL search engine work? Is the import reduced to the 100 most recent input/link? Or is it possible to import the most ‘relevant’ data (e.g. most central, most favored or with the highest indegree)? Is there any possibility to tell NodeXL what kind of data to import?

Best regards
Clemens
Feb 20, 2014 at 4:50 PM
Edited Feb 21, 2014 at 7:23 PM
Hello, Clemens:

You read the exact opposite of what I wrote: the Search API is NOT a complete index of all recent public tweets. I'll quote Twitter again:

"Please note that Twitter's search service and, by extension, the Search API is not meant to be an exhaustive source of Tweets. Not all Tweets will be indexed or made available via the search interface."

That means that NodeXL may be unable to get all the tweets that mentioned your search term, no matter how often you ask it to. With a popular topic, Twitter might give NodeXL only 100 of the 100,000 tweets that mentioned the search term in the last hour, and so that's all NodeXL can give to you. With less popular topics, you might actually get all the recent tweets, but there is no way to know how many you might be missing.

The bottom line: You have to think of NodeXL as providing a selection of recent tweets, the selection being determined by Twitter. If you need all such tweets, you have to go to other companies that contract with Twitter to resell tweet data.

-- Tony
Feb 20, 2014 at 4:58 PM
On your question about telling NodeXL what kind of data to import, you cannot say anything like "give me the most central." You can, however, narrow your search using Twitter "search operators," which are documented at https://twitter.com/search-home# under the "operators" link. For example, you can enter this as a search term in NodeXL to get a selection of recent tweets that mentioned "NodeXL" but not "graph":

nodexl -graph

-- Tony