One of the things I really liked about the network analysis and visualisation tool NodeXL which I wrote about last week was the built-in tools for grabbing data from Twitter. I said:
The advantage of NodeXL, particularly for graphing Twitter communities, is it has built-in features for grabbing the data for you. Not only that the coding is clever enough to handle the data collection for mere mortals, so when you hit your rate limit NodeXL waits until it should be able to get more data.
What I didn’t mention at the time was that it can take a long time to get complex network data (as in set it running overnight) and also I was having problems getting this to work. I haven’t looked closely at how NodeXL is generating the data but say for example I wanted to find out if the people I followed also followed each other. So I can get all the user ids of the people I follow using https://dev.twitter.com/docs/api/1/get/friends/ids which give me 497 ids.
To see if @psychemedia also follows any of these I could get his list of friend ids and see how many id’s match. Assuming everyone I follow has less than 5000 friends (this is the maximum the Twitter API can return in one call) I can do this in 496 API calls. NodeXL also captures more user information (friend/follower counts, bio etc) which it can do in batches of 100 using https://dev.twitter.com/docs/api/1/get/users/lookup
So in summary:
- 1x my following ids
- 496 x who are my friends following (but more if any of my friends follow more than 5000)
- 4 x details user info
Which means at least 501 calls (with a 350 api calls per hour this has to be done in two batches). And if you want to look beyond that and seeing who are the friends of your friends it’s a lot more.
Fortunately there is a way to get this information a lot quicker. A separate API which is mentioned at the very end of Tony’s Visualising Twitter Friend Connections Using Gephi is the Google Social Graph API. The Social Graph API “makes information about public connections between people easily available and useful” and importantly includes connection information from Twitter. Here’s a demo page from Social Graph using my Twitter id as a starting point.
As I’ve already Ported: Tony Hirst’s Using Protovis to Visualise Twitter Connections which uses Social Graph to a Google Spreadsheet it was a quick hack to modify this to take a list of twitter usernames your interested in and create a two column edge list required by NodeXL, Gephi and other network visualisation tools.
How-to get Twitter edges using Google Social Graph
Before I start I should say the endpoint in this is using NodeXL but Gephi is also able to import the edge csv file generated with this spreadsheet.
- Grab a copy of this Google Spreadsheet template for Get Twitter Edges from Screen Names
If you are having problems getting a copy of the template try opening this spreadsheet and File > Make a copy
- Paste a list of twitter usernames you want to graph connections for in the source sheet (I’ve a collection of Google Spreadsheets I’ve developed which can be used to get friends/follows, search terms and more. I also recently wrote about SocialBro which can do a .csv export of usernames (but as pointed out by Angelo you can’t export other peoples data)) If you want to play along here’s a list from #eas11
- In the Twitter Edges spreadsheet open Tools > Script editor… and follow the instructions for registering for a Twitter API key and secret (if you have used any of my other Twitter/Google Spreadsheet mashups you can use the same key/secret).
- Once you’ve entered the key and secret don’t forget to Run > authorize
- Next on line 24 you need to enter the level of data. For this example lets stick with 1.5
- Now Run > getConnections.
- Once the the yellow status bar with Running getConnections disappears close the script window.
- Back in the spreadsheet view open the ‘Edges’ sheet which should be populated with data
- Select File > Download as … > CSV (Current Sheet)
- Start a new NodeXL template
- Open the CSV in Excel and copy the edges data to your new NodeXL template Edges sheet
- From the NodeXL ribbon menu click on Prepare Data (left hand side) and select ‘Merge Duplicate Edges’ and then again from Prepare Data select ‘Get Vertices from Edge List’
And now you can do the rest of your analysis of the data (if you haven’t generated Twitter network visualisations in NodeXL before you can follow parts of the the basic recipe here).
and here’s one I made earlier
One very important caveat – the data from Social Graph isn’t 100% accurate and as I posted in Social media wars: Measuring the battle lines since Google+ has come along this data might be becoming less reliable.
PS In making this I found that when I passed 50 usernames to the Social Graph API I only got data back for 15 usernames. I wasn’t able to find any documentation on why. Reducing the call to 10 names at a time seemed to work (but means there are bugs in my other Social Graphs bits and pieces:(