Changes to how Twitter collection works?

Feb 28, 2012 at 3:36 AM

Hi. I noticed a change in 1.0.1.201 (2012-02-14) that the way Twitter edges are created, and I'm wondering if there's more to the changes than this. 

First, it seems that if the only option selected is Tweet that is not a "replies to" or "mentions" that not all Tweets that meet the search pattern are collected.

Also, below is what was collected from a search for my Twitter ID. I had all three options selected (not "followers) and you'll notice in lines 3 and 9 that other twitter IDs - @toddzolecki and @philaphillies) are "mentioned" (in the plain language sense of "mentioned") but there are no edges except the loop-edge that represents my Tweet.

I'm wondering if this is (a) something I'm boneheadedly overlooking, (b) a bug, or (c) a change in the design that I can't find documentation for. I'm really just looking to understand the logic of the collection engine so I can have confidence that the data I expect to be collecting is actually being collected.

Thanks!



clinked warrensallen   Replies to 2/27/2012 10:25 @warrensallen Whilst you are looking at them maybe have a look at Clinked as well and see how  we compare as a #sharepoint #alternative
warrensallen warrensallen   Tweet 2/22/2012 0:46 Chicken cheesesteak on a wrap? Ugh. Nothing going my way today... #FWP (@ Govinda's Gourmet Vegetarian w/ 3 others) http://t.co/bzhLmy71
warrensallen warrensallen   Tweet 2/24/2012 16:07 Well, there goes the season... RT @toddzolecki: Cliff Lee has some midsection soreness. Amaro said he's not concerned.
warrensallen warrensallen   Tweet 2/24/2012 16:55 Reading up on #alfresco - the extensible, open alternative to #sharepoint with add-ons, or the poorly-supported alternative with poor UX?
warrensallen warrensallen   Tweet 2/27/2012 13:31 Ever mistaken your hot coffee for your cold smoothie? I has. Happy Monday, scolded innards.
warrensallen warrensallen   Tweet 2/27/2012 16:02 Are those the only two options?! Can I be a scienartist? RT @ritukhare: What do u consider yourself ? A scientist, an artist, or both?
warrensallen warrensallen   Tweet 2/27/2012 20:12 RT @philaphillies: Charlie Manuel talks hitting at the cage with Jim Thome: http://t.co/vazHU4tc
warrensallen warrensallen   Tweet 2/28/2012 0:48 I'm at Kimmel Center for the Performing Arts (400 S. Broad St., at Spruce St., Philadelphia) w/ 10 others http://t.co/8nQESZzF

 

Feb 28, 2012 at 3:55 PM
Edited Feb 28, 2012 at 3:55 PM

It's not obvious how the Twitter Search Network feature works.  I have a work item to document it, but I haven't gotten to it yet.  (I'm the NodeXL programmer.)

So here is how it works.  First, NodeXL asks Twitter for the most recent Tweets that include the search term you specify.  NodeXL identifies the unique people in the list that Twitter provides--because the same person may have included the term in more than one tweet and therefore be included in the list more than once--and creates a vertex for each unique person.

The list of most recent tweets is then examined again to create the graph's edges.  For each tweet, depending on the edge options you select, an edge will be created if the tweet is a reply-to another person in the list, if the tweet mentions another person in the list, or if the tweet neither replies-to nor mentions another person in the list.  In the last case, the edge is shown as a self-loop, because there is no other vertex to connect it to.  In the universe of people who have tweeted the search term, this person is, in effect, talking to himself.

So if you ignore the "follows relationship" option, the vertices represent people who have tweeted the search term and the edges represent the tweets.

In light of this, let's get on to your questions:

If you check only "Add an edge for each Tweet that is not a replies-to or mentions," you will indeed not get the tweets (edges) that represent replies-to or mentions.  And if Warren tweets the search term and mentions Todd, but Todd didn't tweet the search term in his own recent tweets, then there will indeed be no edge between Warren and Todd, because there is no vertex for Todd.

-- Tony

Feb 28, 2012 at 3:58 PM
Edited Feb 28, 2012 at 3:58 PM

I forgot to mention: There was a change in the way this works, starting with version 1.0.1.201, although I don't think it's relevant to your question.  Here is the change:

* When you imported a Twitter network (NodeXL, Data, Import, From Twitter...) and you added edges for "replies-to" and "mentions" relationships, a "replies-to" was also considered a "mentions." (The thinking was that if Bob replied to Mary, Bob was also mentioning Mary.) This has been changed so that a "replies-to" is no longer considered a "mentions."

-- Tony

Feb 28, 2012 at 5:24 PM

Thanks for the prompt reply, Tony.

This is a radical departure from the way NodeXL (or the Twitter collector) previously worked, and honestly it (the design, not your explanation) makes little sense.

If you check only "Add an edge for each Tweet that is not a replies-to or mentions," you will indeed not get the tweets (edges) that represent replies-to or mentions.

So, the way this used to work is that one could collect all tweets matching a search string. This is no longer the case.

And if Warren tweets the search term and mentions Todd, but Todd didn't tweet the search term in his own recent tweets, then there will indeed be no edge between Warren and Todd, because there is no vertex for Todd.

It used to be the case that an edge would be created between (to continue the example) warren and todd iff @todd shows up in a tweet by @warren. This is no longer the case.

Finally, I still look what the engine collects with too much doubt to use the tool for research, which is a shame because it is, of course, supposed to be a research tool, and now I have to write some Python.

But before I moved on to that, I rolled back my NodeXL version, and now the option to use the Twitter engine from the spreadsheet is gone! Any ideas on that one?

Feb 28, 2012 at 5:46 PM

I realized as I wrapped up that last message that I could be wrong about point #2. Well, I could be wrong about any of it but anyway...

I just looked at some old data and I noticed that there are indeed rows that suggest that I’m wrong about the way it used to work. My bad. Good thing (for me) I still have the raw data to comb throw and create edges from.

To be clear, if @alice mentions @bob in a tweet that contains the search term “#this” but @bob never mentions “#this” in a tweet, then the edge “@alice mentions @bob” is NOT created because no vertex for @bob exists? Really? @alice DOES MENTION @bob.

So, now what am I missing?

 

 

Feb 28, 2012 at 6:00 PM

Warren:

You can still get all the recent tweets that match a search string; you just have to check all three tweet-related options in the dialog box.  What I said was that if you check only one of them, you'll get only some of the tweets.

Regarding your doubts, you can always check the source code yourself to see exactly what it's doing.  It's available as the latest "NodeXL Source Code" download on the Downloads tab at http://nodexl.codeplex.com/releases/.  The file is called TwitterSearchNetworkAnalyzer.cs, and it derives from a class defined in TwitterNetworkAnalyzerBase.cs.

If you find that the feature isn't designed to do what you need it to do, then you can make a case here for how it really ought to work.  I can't say that  we'll necessarily change it, because we've been through many discussions on it already, but we're interested in hearing about your needs.

Can you tell me which version of NodeXL you were using before?  We've made a series of changes to the Twitter Search Network, and I want to go back and look at what it was doing in your previous version.

If you go backwards, you have to uninstall the newer version before installing an older one.  I don't know if you did that, but in any case you can recover by doing this:

1. In Control Panel, Programs and Features, search for and uninstall all programs that have "NodeXL" in their name.

2. If either of these folders exists, delete it:

C:\Program Files (x86)\Social Media Research Foundation\NodeXL Excel Template
C:\Program Files\Social Media Research Foundation\NodeXL Excel Template

3. Install the older version.

-- Tony

Feb 28, 2012 at 6:04 PM

We're out of sync here.  My previous post was an answer to post #4 in this discussion.  I'll get back to you on your latest question.

-- Tony

Feb 28, 2012 at 6:34 PM

From Warren:

To be clear, if @alice mentions @bob in a tweet that contains the search term “#this” but @bob never mentions “#this” in a tweet, then the edge “@alice mentions @bob” is NOT created because no vertex for @bob exists? Really? @alice DOES MENTION @bob.

From Tony:

That is correct.  alice mentioned bob, but bob didn't tweet the search term.  The network consists of people who have tweeted the search term.  bob didn't, so he is not part of the network.  He therefore has no vertex, and he can't be connected to with an edge.

We could change the definition of the network to "people who have tweeted the search term or have been replied to or mentioned by the people who have tweeted the search term."  In my view, that would make a complicated concept even harder to understand, and would require other changes as well.  For example, if you checked the "add an edge for each follows relationship" checkbox, wouldn't we have to add vertices for all of each person's followers, for consistency?  Right now we show only the follows relationships among the people who have tweeted the search term, which I think is sensible.  We create a set of vertices that satisfy some criterion--people who have tweeted the search term--then analyze the relationships among only those people.