Limitations on Importing Social Media Networks

Jul 11, 2012 at 2:31 PM

I'm wondering about the limitations on importing network data from Flickr, Twitter, or YouTube.


With Flickr, I know there are limitations on whose data I can import, due to privacy settings, but other than that are there any other limitations? What about a maximum number of search results?


On Twitter, I know the API limits how much data you can pull in a certain time frame, but I know I can pull larger networks by letting NodeXL stay open and work through the pausing. However, when looking at twitter search networks, are there any limitations that the twitter API imposes, such as a maximum number of results? And also, how far back will a twitter search go date wise? I know you can't search hashtags from say a year ago, but how far back can you go?


And with YouTube, are there any limitations on the size of the network you can pull? I know privacy settings of users can come into play here, but with video networks, does the YouTube API impose any search limits on the network? And if so, does it rank videos on terms of relevancy to the search terms or by date?


Any other limitations on importing Flickr, Twitter, or YouTube videos?

Jul 11, 2012 at 2:35 PM

Twitter has many limits:

> Unauthenticated users = 150 API calls per hour

> Authenticated users = 350 API calls per hour

Twitter will not return more than 1500 tweets from a query.

Twitter will not return tweets more than 7 to 10 days old.

For long term data collections it is necessary to collect every day or even more frequently to get sufficient coverage.


Flickr and YouTube are more generous with their authenticated users.  I am not sure what the limits are on these APIs.

Jul 11, 2012 at 4:38 PM

Great, thanks for the info. 


Do you happen to know how Twitter selects which 1,500 tweets to deliver? Is it just the 1,500 most recent? Which for tweets containing more common hash-tags, would happen fairly quickly. 

Jul 11, 2012 at 5:22 PM

Please see this page from Twitter:

It's meant for programmers (the "Twitter Search API" in the page title is the web service that NodeXL uses to get tweets), but it contains some general information on how Twitter determines which tweets it provides.

-- Tony

Jul 11, 2012 at 5:31 PM

You are certainly correct that the time span of the collection of up to 1500 tweets returned by Twitter's search API can vary significantly.

If you have turned on the Import option that allows NodeXL to record information about your query, the Graph Summary text will contain information about the start and end dates of the tweets in the data set.

In some cases a slow moving topic will generate only a few tweets over 8 or 9 days, while a fast moving topic (i.e. "Worldcup") may generate that many tweets in less than a fraction of a second.

I think of these data sets as similar to "time lapse photography" - snapshots capturing events within a varying duration.  They are not complete or exhaustive records. Twitter may, for many reasons, return some tweets and not others, for example.  Tweets may be prioritized because based on the profile and prior behavior of authors to ensure "quality" (reducing spam, for example).