IWMW12 Data Hacks

Last couple of days I’ve been at IWMW12 hosted this year at University of Edinburgh. I’ve already posted Data Visualisation Plenary/Workshop Resources which has my slides from the plenary. I was teaming up with Tony Hirst (OU) and have included his slides to the page.

Because of living 'almost locally' and other family commitments I missed out on most of the social events, instead I got drunk on data working into the early hours to find what stories I could uncover from the #IWMW12 stream. In this post I’ll show you what I’ve come up with and some of the highlights in trying to turn raw data into something interesting/meaningful (or pointless if you prefer). Interestingly a lot of what I cover here uses the same techniques used in my recent The story data tells us about #CitizenRelay guest post, so I’ve got an emerging templated workflow emerging which I can deploy at events which makes me wonder if I should be getting organisers pay my travel/accommodation as an event data amplifier?

UK University Twitter Account Community

On day one Brian Kelly mentioned some work by Craig Russell to collate a table of UK University Social Media accounts which featured in a guest post on Brian’s blog titled Further Evidence of Use of Social Networks in the UK Higher Education Sector. You can get the data Craig has compiled from a Google Spreadsheet. Looking at this two things immediately sprung to mind. First that the document could be made more ‘glanceable’ just using some simple conditional formatting, and second there was a nice list of Twitter accounts to do something with.

image

Here’s a link to my modified version Craig’s spreadsheet. It uses the importRange formula to pull the data in so it creates a live link to the source document. For the conditional formatting I looked for text containing ‘http’ turning the cell text and background green. The HTML view of this is a lot cleaner looking.

On  the Twitter Accounts sheet extract the account screen names by pulling everything after the last ‘/’ and remove most of the blank rows using a unique formula.

Putting this list into the free MS Excel add-in NodeXL and using the Import > From Twitter List Network lets you get data on which of these accounts follow each other. I played around with visualising the network in NodeXL but found it easier in the end to put the data into Gephi getting the image below. These ‘hairballs’ have limited value and you’re best having a play with the interactive version, which is an export of Gephi visualised using the gexf-js tool by Raphaël Velt (De-hairballing is something Clement Levallois (‏@seinecle) and he kindly sent me a post to a new tool he’s creating called Gaze).

UK HEI Twitter Accounts

The #IWMW12 Twitter Archive Two More Ways

TimelineAs part of #iwmw12 I was collecting an archive of tweets which already gives you the TAGSExplorer view. I also use the Topsy API and Google Spreadsheet to extract tweets which is then passed into Timeline by Vérité which gives you a nice sense of the event. [If anyone else would like to make their own twitter media timeline there is a template in this post  (it is easy as make a copy of the template, enter your search terms and publish the sheet).]

Searchable archive

Searchable archiveNew way number one is a filterable/searchable archive of IWMW12 tweets. Using the Google Visualisation API I can create a custom interface to the Google Spreadsheet of tweets. This solution uses some out-of-the-box functionality including table paging, string filtering and pattern formatting. Using the pattern formatter was the biggest achievement as it allows you to insert Twitter Web Intents functionality (so if you click to reply to a tweet it pulls up Twitter’s reply box.

I also processed the archive using R to get a term frequency to make a d3 based wordcloud (I’ve started looking at how this can be put into a more general tool. Here’s my current draft which you should be able to point any TAGS spreadsheet at (this version also includes a Chart Range Filter letting you view a time range). I definitely need to write more about how this was done!)

Filter by time

Mappable archive

One of the last things I did was to filter the twitter archive for tweets with geo-location. Using the Spreadsheet Mapper 3.0 template I was able to dynamically pull the data to generate a time stamped KML file. The timestamps are ignored when you view in Google Maps, but if you download the kml file it can be played in Google Earth (you’ll have to adjust the playback control to separate the playback heads – I tried doing this in the code but the documentation is awful!)

Google Earth playback

Or if you have the Google Earth browser plugin a web version of IWMW12 geo-tweets is here (also embeded below):

So there you go … or as said by Sharon Steeples