Experiment with interactive wordcloud of #smwgla Twitter community profile descriptions (making a JSON query on Google Spreadsheet data)

I should say this post contains a lot of technical information, doesn't give much background and is mainly for my hard-core followers

This is a very lose sketch of an experiment I might refine which uses Jason Davies wordcloud script (add-on for d3.js) as a way to filter data hosted in a Google Spreadsheet. I was essentially interested in the Twitter account descriptions of the community using the the Social Media Week – Glasgow hashtag, but a minor detour has reminded me you can:

  • get json data straight from a Google Spreadsheet
  • you can build dynamic queries to get what you want

So I fired up NodeXL this morning and got this pretty graph of how people using the #smwgla hashtag at least twice follow each other.

people using the #smwgla hashtag at least twice follow each other

One of the features of NodeXL is to add stats columns to your data which includes friend/follower counts, location and profile descriptions.

NodeXL Stats

Uploading the data from NodeXL (Excel) to Google Spreadsheets allows me to render an interactive version of the community graph using my NodeXL Google Spreadsheet Graph Viewer.

interactive version of the #smwgla community graph

All this is doing is grabbing data from Google Spreadsheets using their Visualization API and rendering it visually using javascript/HTML5 canvas. You can use the query language part of this API to get very specific data back (if you want a play try Tony Hirst’s Guardian Datastore Explorer). Using Tony’s tool I got this query built. One thing you might notice is I’m selecting a column of twitter description WHERE it contains(‘’) <- a blank – if it’s a blank why did I bother with the WHERE statement?

Switching to Jason Davies wordcloud demo we can play with custom data sources if you have some JSON. In Tony’s tool you have options to get the data in html (tqx=out:html) and csv (tqx=out:csv). There is a third undocumented option for json tqx=out:json. Using this we can get a url for the wordcloud generator https://spreadsheets.google.com/tq?tqx=out:json&tq=select%20AH%20where%20AH%20contains%28%2727%29&key=0Ak-iDQSojJ9adGNUUXZnU2k3V1FRTjR3eFp0RmRNZWc&gid=118

To make the wordcloud interactive, so that when you click on a term it filters the data on that term was can use the option to include {word} in our source url e.g. https://spreadsheets.google.com/tq?tqx=out:json&tq=select%20AH%20where%20AH%20contains%28%27{word}%27%29&key=0Ak-iDQSojJ9adGNUUXZnU2k3V1FRTjR3eFp0RmRNZWc&gid=118

And here is the final result, an interactive wordcloud of #smwgla Twitter account descriptions [Note: you need to hit the Go button when you click-through]:

interactive wordcloud of #smwgla Twitter account descriptions

The end result useful? Not sure, but how the data is extracted is (to me anyway).