Clarification on Twitter volume/date limits

Feb 13 at 12:46 PM
Hi NodeXL Team,

Can you clarify for me please exactly how the Twitter search function works in relation to the volume of tweets it will retrieve and the timescale these can be drawn from? I always thought it was 'up to 18,000 tweets or the last 7 days, whichever comes first'. However, I have just run a search for a hashtag that has returned just over 10,000 tweets but some of these date back to April 2016. Does the 7 day limit no longer apply?

Thanks,
Wil
Coordinator
Mar 6 at 4:25 AM
Edited Mar 6 at 4:26 AM
Hello!

Thank you for the interest in NodeXL!

Twitter's public free API has many limits. Data is available only for 7-8 days. Queries cannot return more than 18,000 tweets.

NodeXL does now perform a "second pass" on the initial set of collected tweets to look for any tweets that mention or reply to a tweet that is not in the initial data collection. These "thread heads" are now collected in a second set of queries to Twitter. This is performed to ensure that "threads" are complete. The result of this process is that the data set may be made up predominantly of Tweets from the past 7-8 days, it may also contain at least a few older tweets.

NodeXL Pro does not enable the collection of data beyond these limits.

Twitter imposes many rate limits via its data API. NodeXL Basic and Pro are both effected by these limits.

That said, NodeXL can process data from commercial data providers (like Crimson Hexagon or Radian6). While these are expensive options, they may be the only way to get historical data from Twitter.

Commercial services like Radian6 and Crimson Hexagon might provide archival data - but not cheaply!

Please see:

https://nodexl.codeplex.com/discussions/650609

and:

https://nodexl.codeplex.com/discussions/649987

You may be able to get a little bit more data from the public Twitter API by using the SINCE: and UNTIL: operators - example:

QUERYTERM since:2016-01-21 until:2016-01-27

Since: and Until: operators scope the time frame of the query.

Twitter controls its API and throttles it based on unknowable parameters. We notice that the more the volume of tweets == less tweets delivered.

One alternative is to do day long slices and append them in order to maximize the data available from Twitter.

You may also be interested in this: http://graphserverimporter.codeplex.com which enables NodeXL to connect to the "STREAM" API from Twitter (which sometimes delivers larger volumes of data).

Regards,

Marc
Marked as answer by MarcSmith on 3/5/2017 at 9:27 PM