cleaning of social media data in NodeXL

Nov 10, 2016 at 12:32 PM
I wonder if anyone can help me - as a non programer -I need to filter out all non English content from sets of Twitter and Instagram hashtag data before undertaking the semantic visualisations - does anyone have any advice on this??

Ideally also I would like to remove all commercial content from both data sets, though I realise that this might be rather too difficult...?

All ideas are very welcome, I am a qualitative researcher trying to learn new skills and approaches to data collection and analysis.
Many thanks..... Sarah
Nov 10, 2016 at 8:28 PM
Edited Nov 10, 2016 at 8:28 PM
Please have a look at the NodeXL Edges worksheet.

Column AE contains the Language value created by Twitter for each Tweet:


You may use the Excel filtering feature to select just the languages you want to include.


Marked as answer by MarcSmith on 11/11/2016 at 9:15 AM
Nov 10, 2016 at 9:09 PM
Thanks Marc but does this exist for Instagram too? I need to clean both otherwise the data will be without value...

Nov 10, 2016 at 11:20 PM
I am not sure about the Instagram importer, which comes from a 3rd party group.

Please see: for details on their importer.


Nov 11, 2016 at 9:05 AM
yes But I purchased NodeXL and the Instagram plugin specifically to conduct cross platform research, if I can't clean the Instagram data efficiently then the Instagram plugin is worthless. English language content filter would be a common request I suggest for many researchers, I am hoping this can be addressed. Manual cleaning is not an option.
Nov 11, 2016 at 5:13 PM
Understood: this is a great suggestion to direct to the SNATools team!


Nov 12, 2016 at 4:26 PM
Edited Nov 12, 2016 at 4:30 PM
Unfortunately Instagram API doesn't support language detection ability naturally. It needs separate code to detect them. If you are a C# programmer can use this DLL:
We can add this in our update list too but it might take some time.

InstaSearcher Technical Manager (SNA Tools)
Nov 14, 2016 at 11:13 AM
Thanks _ I hope other researchers will find it useful to have an English language filter
Nov 18, 2016 at 9:37 AM
Thanks Marc - I have suggested this to them.

I am now trying to work on getting a word association network visualised for both Twitter and Instagram data as opposed to a person network, I know you did one as an exemplar back in September, but what process did you use? Did you swap stuff over in the vertices/edges?
Searching for a guide on this but can't find one...


Nov 22, 2016 at 9:33 PM
For a guide to semantic network analysis with NodeXL see:
Marked as answer by MarcSmith on 11/22/2016 at 1:33 PM