Problem using old version for research before official termination date

Jun 6, 2013 at 3:14 PM
I'm currently using NodeXL version 1.0.1.229 to finish gathering data for my Master's thesis. I'm using the older version because the new one (thanks to Twitter's 1.1 API changes) simply cannot gather the data I need (full sample networks of 1000 people for random user-generated trending U.S. hashtags including all types of relationships). However, I was counting on the fact that the old version would still work until June 11th and was trying to hurry through my data collection. Unfortunately, two days ago NodeXL started only being able to pull data from the first search page, and yesterday I started to get a message that "There are no people in that network," regardless of what hashtag I used to search for.

Is there any solution to this problem that would help me get more networks before the 11th? I'm really in a bind on this and would appreciate any help.
Coordinator
Jun 6, 2013 at 3:31 PM
Hello David!

I share your issue. v.234 stopped working yesterday for me as well.

I suggest that Twitter may let the API 1.0 interface run again for a few hours between now and the 11th. Then again, they may not.

If they do, I suggest that you enlist as many of your friend's Twitter accounts and PCs you can muster to run queries in parallel.

I have just upgraded to v.238 and the follows edges are still available there - but at a much slower rate. So, it is possible that you could upgrade to v.238 (and thus to Twitter API 1.1.) and run queries with the followers edges turned on, and wait as long as that takes (it will take a long time - Twitter API 1.1 grants a total of 60 Followers queries per hour).

I hope that helps. Good luck with your research project and thank you for your interest in NodeXL.

Regards,
Marc
Jun 6, 2013 at 3:46 PM
Marc,

Thanks so much for your quick reply, but I'm totally going to take advantage of that by asking another question or two (or three...). :)

To start, I was actually just mobilizing all my friends to help out yesterday right before the API 1.0 interface went down, so if it does go up, I have a few people who will be helping out.

I actually did switch to v.238 when it was released, but it ended up being so unreliable (constantly giving me only partial networks, due, I'm sure, to the API and not to NodeXL) that on Monday I had the idea to dig up my installer for the older version and reinstall that. To my surprise, it was still working, and I was able to bang out a few before it went down.

My fear is that a return to v.238 will be a return to never getting a full sample network again. I seriously never got one completed with even close to 1000 vertices. Do you think this is a problem on my end or just incredibly bad luck? I'm fine with it taking long, I can leave this running for however long it takes, but it's so unreliable that it just ends up wasting way too much time. I don't know what to do after the switch. Since I'm essentially studying the network structure of the diffusion of memes, I NEED those follower/followed by relationships. I'm sure that nothing I'm saying is new, and that everyone's complaining about the same thing, but I'd like to know ways that I might bolster NodeXL's rate of success in getting the types of relationships I need with the new API.

As one final note, do you have any idea what sample size (how many networks) I'd need for a study like this? Originally, assuming I'd have the summer, I wanted 90. I have about six. :( Do you think it would be feasible to make generalities about how these meme/hashtag networks are structured with N= 10-20 with 1000 vertices in each? Sorry for the homework help request, I'm only about a year into SNA and self-taught (as many here probably are).

Anyway, thanks for your help, sorry for the bother, and I love the program. Too bad Twitter's ruining the fun.

David
Jun 6, 2013 at 4:01 PM
The recent problems with Twitter and older versions of NodeXL are caused by server issues at Twitter:

https://dev.twitter.com/issues/1072

There is no indication yet whether they'll fix it or if they'll just let the old Twitter API die an early death. As you know, it wasn't supposed to be turned off until June 11.

-- Tony
Coordinator
Jun 6, 2013 at 4:23 PM
Hello!

As Tony notes, you may have a brief window to run 234 before the 11th of June. Good luck!

I agree that the followers edge is useful and the loss of this data damages many research efforts. Sadly, I doubt that will change Twitter's decision.

Your research design question is challenging: what is a sufficient sample for generalizable results? Without broader understanding of the population (which requires that we have all of Twitter's data) it is hard to validate that any sample is representative.

That said, I think there are research contributions to be made based on simply finding and documenting particular patterns in Twitter. Like Darwin on the Beagle, I think we are in the phase of data collection and initial steps towards taxonomy. If your data can demonstrate various patterns of diffusion, I think knowing the rate of incidence of each diffusion pattern is a great goal but not essential. I would be interested in seeing examples of different ways diffusion occurs. For example, there is a great study of the diffusion of breaking news: http://blog.socialflow.com/post/5246404319/breaking-bin-laden-visualizing-the-power-of-a-single

Regards,
Marc
Jun 6, 2013 at 4:30 PM
Edited Jun 6, 2013 at 4:42 PM
Just to clarify: The Twitter 1.0 API hasn't been turned off yet; it's still running but is returning bad (empty) results. When they finally do turn off the API, which is scheduled for June 11, an error message that says "Gone" will pop up in NodeXL.

In practical terms, though, it makes no difference; at the time I'm writing this, you can't get Twitter search networks using older versions of NodeXL.

-- Tony
Jun 6, 2013 at 4:39 PM
Thanks so much for your help, guys. Do you think the problem with v. 238 constantly only providing partial networks is on my end, or do you think it's just consistently bad luck? Haha, what bad timing this all is.

David
Jun 6, 2013 at 5:31 PM
Edited Jun 6, 2013 at 5:45 PM
David:

There are at least two problems that lead to the "partial network" messages you were seeing. One of them will be fixed in the next release of NodeXL. Here is the release note for that one:
Bug fix: In the Twitter networks, if you checked "Expand URLs in tweets" and a malformed URL was provided by Twitter, you would get a "Partial Network" message and then the graph would have no edges. The Partial Network details included the text "[UriFormatException]: Invalid URI: The format of the URI could not be determined". Now, NodeXL doesn't attempt to expand the bad URL.
The second problem hasn't been fixed yet because it's intermittent and I haven't been able to track down the cause. When the second problem occurs, the partial network details include this text:
[KeyNotFoundException]: The given key was not present in the dictionary
I suspect that the cause is similar to the malformed URL problem--Twitter provides bad data, and NodeXL doesn't handle it well.

If you get a "Partial Network" message, please post the details here. That will tell me if it's the same bug I'm trying to track down.

Thanks,
Tony
Jun 6, 2013 at 8:16 PM
Tony,

Thanks again. I'll be sure to post the details here when the error inevitably happens again once I return to v. 238 on the 11th.

David
Jun 7, 2013 at 3:47 AM
Okay guys, I went back to v. 238 while the older version's functionality was still down. I tried to run a search for 2500 tweets (since I need 1000 vertices and this number seems appropriate), and got the "partial network" error message after the first waiting period. Here are the details of this occurrence:
Getting a network can involve many information requests to a Web service. In this case, 17 requests were made and 1 of them was unsuccessful.

(...)

Here are the details for the most recent unsuccessful request:

The Twitter Web Service refused to provide the requested information. A likely cause is that you have made too many Twitter requests in the last 15 minutes. (Twitter limits information requests to prevent its service from being attacked. Click the 'Why this might take a long time: Twitter rate limiting' link for details.)
Any and all help is appreciated. :)