NodeXL is only giving me tweets for 3.5 hours rather than the whole day

Jun 12, 2014 at 4:49 PM
Hi all,

I've been trying to get Twitter data for the 10th of June for a specific search term but I'm only importing 219 tweets, all the from the last 3 hours and 31 minutes from that day. I've tried importing multiple times (all using the same search) and have yet to get different results. I've set the limit to 18,000, 2000, 1000 but that doesn't change the number or which tweets I'm importing.

I select "Import from Twitter Search Network"

Search for Tweets that match this query: Noble England since:2014-06-09 until:2014-06-11

Limit: 18,000

I've had success when searching for just June 9th but like I said, entering this, even if I change the since date to the 8th, returns only 219 Tweets from 20:29 to 23:57 on June 10th.

Any ideas?

Jun 12, 2014 at 4:53 PM
Update: I tried importing for all of June 11th using Search for Tweets that match this query: Noble England since:2014-06-10 until:2014-06-12 and got 320 results, once again all from the last 3.5 hours of the day.
Jun 12, 2014 at 5:19 PM
Hello, Jared:

The set of tweets that Twitter provides to NodeXL is entirely up to Twitter, and it's possible that the set will be smaller than what you asked for. The latest version of NodeXL includes some help links in the Import from Twitter Search Network dialog box that explains this, because it's a question that comes up frequently. I'll copy the help text into my next post.

-- Tony
Jun 12, 2014 at 5:20 PM
Here are a few important limitations you should be aware of:
  1. The search results provided by Twitter are often incomplete--you will most likely NOT get all recent tweets that match your search query. The way Twitter puts it is that the results are "focused on relevance and not completeness."
  2. Twitter will not provide NodeXL with tweets older than about a week. It is NOT possible to use NodeXL to get tweets older than that.
  3. The algorithm that Twitter uses to match tweets with a search query is undocumented. It is NOT, however, the same algorithm that Twitter uses on its own search page, so you may get results from NodeXL that differ from what you get directly from Twitter using the same search query.
Jun 12, 2014 at 5:28 PM
I understand all of this I just can't grasp why it will only send the last 3.5 hours of any given day. For example, if I were to put England since:2014-06-06 until:2014-06-12 into the search box with 18000 limit I only get ~100 responses all within the last 2 MINUTES of the 11th of June. Almost like it starts getting results starting from the newest and then gets bored and stops and spits it out.
Jun 12, 2014 at 5:54 PM
It does start with the most recent and then it works backwards. I guess it stops when it reaches "relevance," whatever that means to Twitter. The results are certainly incomplete, as you've noticed.

I suspect that Twitter truncates results to limit the load on its servers, which is understandable. Their API service is free, and they can't have an unlimited number of people asking for an unlimited amount of data. Instead, they provide a subset they hope is relevant to some people. Unfortunately, it sounds like that won't give you what you need.

There are third-party providers of complete Twitter datasets, but they charge for it.

-- Tony
Jun 12, 2014 at 5:56 PM
Ahhhh darn. Thanks for all the help!