importHTML is a fantastic formula you can use in Google Spreadsheets. Here’s Google’s support documentation for importHTML:


Syntax: ImportHtml(URL, query, index)

URL is the URL of the HTML page. Either "list" or "table" indicates what type of structure to pull in from the webpage. If it's "list," the function looks for the contents of <UL>, <OL>, or <DL> tags; if it's "table," it just looks for <TABLE> tags. Index is the 1-based index of the table or the list on the source web page. The indices are maintained separately so there might be both a list #1 and a table #1.

Example: =ImportHtml("http://en.wikipedia.org/wiki/Demographics_of_India"; "table";4). This function returns demographic information for the population of India.

Note: The limit on the number of ImportHtml functions per spreadsheet is 50.

What’s even better is you can wrap this formula in other formula to get the data in the shape you want. A case in point I was recently asked:


The answer is yes, you can TRANSPOSE a importHTML. Let use the Demographics of India table from the support documentation as an example. To switch columns into rows we can use =TRANSPOSE(ImportHtml("http://en.wikipedia.org/wiki/Demographics_of_India"; "table";4))

This lets us change the way the data is imported from this:

"=ImportHtml("http://en.wikipedia.org/wiki/Demographics_of_India"; "table";4)"

to this:

"=TRANSPOSE(ImportHtml("http://en.wikipedia.org/wiki/Demographics_of_India"; "table";4))"


Lets now say we are only interested in the population figures for 1991 and 2001.  You could always just import all the data then pull it using a cell reference. Another way of doing this is to wrap our data in a QUERY formula.

The QUERY function is a built-in function that allows you to perform a query over an array of values using the Google Visualization API Query Language.

Anyone used to tinkering with databases will recognise the query language which uses the clauses like SELECT, WHERE, GROUP_BY etc.

There are a couple of ways to query our data for the population of India in 1991 and 2001.


  • Limit - Limits the number of returned rows.
  • Offset - Skips a given number of first rows.

Using these we could use the query "SELECT * LIMIT 2 OFFSET 4". This selects all the columns (using *) and then limits to 2 results starting from the 4th row. The order of limit/offset is important, using these the other way around won’t return any results.

"=QUERY(ImportHtml("http://en.wikipedia.org/wiki/Demographics_of_India"; "table";4),"SELECT * LIMIT 2 OFFSET 4 ")"

SELECT columns

  • Select - Selects which columns to return, and in what order. If omitted, all of the table's columns are returned, in their default order.

Because we are using importHTML as our datasource when selecting the columns we need to use the syntax Col1, Col2, Col3 …. So if you just want the year and population our query could be "SELECT Col1, Col2 LIMIT 2 OFFSET 4"

"=QUERY(ImportHtml("http://en.wikipedia.org/wiki/Demographics_of_India"; "table";4),"SELECT Col1, Col2 LIMIT 2 OFFSET 4 ")"

WHERE rows

  • Where - Returns only rows that match a condition. If omitted, all rows are returned.

One issue with using limit/offset is if more data is inserted into the source table it might push your results out of the range. A way around this is to include a WHERE clause to only include data on certain conditions. WHERE allows various comparison operators like <=, =, >, multiple conditions (‘and’, ‘or’ and ‘not’) and more complex string comparisons like ‘contains’. More information on WHERE conditions here. So if we only wan the population where the year is 1991 or 2001 we can use the query "SELECT Col1, Col2 where Col1='*1991*' or Col1='*2001*'"

For this last example lets also TRANSPOSE the result and remove the table header:

"=TRANSPOSE(QUERY(ImportHtml("http://en.wikipedia.org/wiki/Demographics_of_India"; "table";4),"SELECT Col1, Col2 WHERE Col1='*1991*' or Col1='*2001*'",0))"

So there you using the QUERY formula to be more selective on your html import to Google Spreadsheets. Here is a copy of the spreadsheet with all the examples I’ve used in this post Any questions/clarifications leave a comment.

PS Tony Hirst has also  written about Using Google Spreadsheets Like a Database – The QUERY Formula and this is a place if you want some more query examples.

PPS I’m on leave now which is why this post has very little to do with CETIS or OER.


I came, I saw, I failed. This was a potentially promising hack that didn’t work out. Hopefully you’ll get as much benefit from failure, as from success.

Today I can across oomfo (from the same makers as FusionCharts):

oomfo is a plug-in for Microsoft PowerPoint that brings all the awesomeness of FusionCharts Suite XT to PowerPoint. Its wizard-based interface helps you create great-looking animated and interactive charts in minutes.

Using oomfo, you can create specialized charts like Waterfall, Pareto, Marimekko, Funnel and Pyramid, which PowerPoint forgot to add. Additionally, you can connect to live data sources like Excel, SalesForce, Google Docs and your own back-end systems

I was interested in the Google Docs integration but so far I can only find a Google Analytics connector. It was disappointing to discover that this relied on the user hosting a PHP file on their own webserver. Disappointment turned into shock when I then discovered to get even this to work required the user to pass unencrypted Google usernames and passwords in plaintext!

WTF unencrypted passwords

All the connector file is doing is formatting data from the Google Analytics API in an oomfo/FusionChart XML format. Below is an example for a single series bar chart:

oomfo xml

My thought was if I wrap data from a Google Spreadsheet around the Google Apps Script ContentService I could generate the required XML for oomfo to generate the chart in PowerPoint, no hosting of files, no passing of passwords.

Using my simple electronic voting system hack as a data source I was able to reuse this example on Stackoverflow on how to create a rss feed using class ContentService to create a template and code shown here. Deploying this code as a service/web app gives me a url I can query to get oomfo formatted xml. So if I want responses tagged ‘dev1’ I use:


Unfortunately when I try to use this as an external data source for oomfo I get ‘Unable to retrieve data from the specified URL’:


To check it’s not malformed xml I’ve downloaded the generated markup and uploaded to dropbox, which does work. So I’m not sure if oomfo is unable to follow query redirection or if Apps Script is preventing the data from being used by oomfo (if anyone has any suggestions, that would be great).

There you go. How you can’t embed live data from Google Spreadsheet with Apps Script ContentService in PowerPoint using oomfo.


Poh, Swenson & Picard (2010) Brain ActivityThe graph on the right is taken from a paper mentioned in Eric Mazur's keynote from the first day of the ALT* Conference 2012. The paper, A Wearable Sensor for Unobtrusive, Long-Term Assessment of Electrodermal Activity (Poh, Swenson & Picard, 2010), reports the study of an experimental wristband which can record brain activity. Mazur used the paper to highlight that often being in class generates less brain activity than when asleep and similar levels to when watching TV. Mazur went on to describe the theory and techniques, including Peer Instruction, for moving lectures away from a broadcast mode into a richer interactive experience.

Unfortunately Mazur was unable to incorporate some of these techniques into his keynote and so, enjoyable as it was, I found myself on a verge of a TV watching state. As is becoming increasingly common when watching the box this was augmented by the ‘second screen’, in this case the #altc2012 twitter stream.

Had I been hooked up to a brain monitor I sure it would have recorded frantic activity trying to report some of the c.170 spam tweets (over 20%) pushed into a UK trending stream. But did I learn anything from the remaining c.600 legitimate tweets? On reflection I don’t recall ‘learning’ anything from the backchannel. One theory is that the backchannel is just an amplifier or repeater. As I recently noted in Notes from the Twitter backchannel at eAssessment Scotland 2012 #eas12 the audience is largely in a rebroadcast or note collection mode which could is evident in the lack of @relies. So there is less peer dialogue, but this doesn’t mean other processes aren’t at work. For example, there may some level of cognition in forming a 140 character tweet which provides the opportunity for internal self dialogue.

So I think I’m adjusting my expectation of the backchannel and taking a leaf out of Sheila MacNeill’s Confessions of a selfish conference tweeter. I still think there are opportunities unpick the discourse from Twitter communities, but just when people are in a different mode like in #moocmooc. 

[I really need to blog about how I calculated the number of spam tweets. In the meantime here is a graph of Twitter activity during Eric’s keynote]

Tweets, Replies and Spam for #altc2012 during Eric Mazur's Keynote
Interactive version of Tweets, Replies and Spam for #altc2012 during Eric Mazur's Keynote

*ALT is the Association of Learning Technologists


New item starred in Google Reader then add to DiigoIFTTT is a web service that lets you do some basic plumbing between web services. The idea is that if something happens on one service (referred to by IFTTT as ‘channels’) then you do something else on another (options also exist for other triggers including time, SMS and external sensors). To illustrate this I use IFTTT to bookmark anything I star in Google Reader in my Diigo account. Setting up these recipes takes no more than selecting a trigger and assigning an action, so no coding required.

There are currently 41 channels in IFTTT (including Twitter, Facebook and RSS feeds) users can link together in this way. One of the most recent additions is Google Drive. With Google Drive integration you can store files or, the one that interests me more, insert a new row in a Google Spreadsheet.

A nice feature of IFTTT is the ability to share and reuse recipes. Here’s a selection of recipes that use Google Spreadsheets. You’ll see there is already a range of quick hacks people have shared including things like:

and my contribution:

Backup RSS Feed to Google Spreadsheet

Examples of RSS backup

RSS is great for accessing machine readable data but often one of the limitations, particularly with blogs, is results are limited to the last 10 or so items.  This has created problems in some of my projects like the OERRI Projects Dashboard where I need all the posts. The solution to date has been to rely on 3rd party services like the Google Feed API.

Blog posts

By using IFTTT I’m able to easily capture blog posts as they are written.  Having the data stored in a Google Spreadsheet makes it easy for me to query and export to other services (here’s an example from my blog) [this might be something JISC might want to do to capture and mine funded project feeds – other solutions/techniques already exist]


Is a service which lets users collect posts around topics and display in an online magazine format. I use scoop.it to collect examples of how people are using Google Apps Script. Since June last year I’ve collected over 200 examples. I mainly do this for my own benefit, in part as a directed task to make sure I’m up-to-date with Apps Script developments, but also as a reference source of apps script examples. I can query these examples using a Google search but it concerns me that I’ve got no way to bulk export the data. Recently Scoop.it added a RSS option so I use this recipe to Backup Scoop.it to Google Spreadsheet.


There’s lots more you could do with the data now it’s in a spreadsheet and once I’ve built up a corpus I’ll have a play. One thing to note which might put some people off is to allow IFTTT to add data for you, you need to give them authenticated access to your Google Drive. I can live with that risk, but you might not.


If you haven’t already you should check out Jorum's 2012 Summer of Enhancements and you’ll see it’s a lot more than a spring clean. In summary there are 4 major projects going on:

  • JDEP - Improving discoverability through semantic technology
  • JEAP - Expanding Jorum’s collection through aggregation projects
  • JPEP - Exposing activity data and paradata
  • JUEP - Improving the front-end UI and user experience (UI/UX)
SEO the Game by Subtle Network Design - The Apprentice Card
Image Copyright subtlenetwork.com

As I was tasked to write the chapter on OER Search Engine Optimisation (SEO) and Discoverability as part of our recent OER Booksprint I thought I’d share some personal reflections on the JDEP - Improving discoverability through semantic technology project (touching upon JEAP - Expanding Jorum’s collection through aggregation projects).

Looking through JDEP the focus appears to be mainly improving internal discoverability within Jorum with better indexing. There are some very interesting developments in this area most of which are beyond my realm of expertise.

Autonomy IDOL

The first aspect is deploying Autonomy IDOL which uses “meaning-based search to unlock significant research material”. Autonomy is a HP owned company and IDOL (Intelligent Data Operating Layer) was recently used in a project by Mimas, JISC Collections and the British Library to unlocks hidden collections. With Autonomy IDOL it means that:

rather than searching simply by a specific keyword or phrase that could have a number of definitions or interpretations, our interface aims to understand relationships between documents and information and recognize the meaning behind the search query.

This is achieved by:

  • cluster search results around related conceptual themes
  • full-text indexing of documents and associated materials
  • text-mining of full-text documents
  • dynamic clustering and serendipitous browsing
  • visualisation approaches to search results

An aspect of Autonomy IDOL that caught my eye was:

 conceptual clustering capability of text, video and speech

Will Jorum be able to index resources using Autonomy's Speech Analytics solution?

If so that would be very useful, the issue may be how Jorum resources are packaged and where resources are hosted. If you would like to see Autonomy IDOL in action you can try the Institutional Repository Search which searches across 160 UK repositories.

Will Jorum be implementing an Amazon style recommendation system?

One thing it’ll be interesting to see (and this is perhaps more of a future aspiration) is the integration of a Amazon style recommendation system. The CORE project has already published a similar documents plugin, but given Jorum already has single sign-on I wonder how easy it would be integrate a solution to make resource recommendations based on usage data (here’s a paper on A Recommender System for the DSpace Open Repository Platform).


This is a term I’ve heard of but don’t really know enough to comment on. I’m mentioning it here mainly to highlight the report Cottage Labs prepared Investigating the suitability of Apache Solr and Elasticsearch for Mimas Jorum / Dashboard, which outlines the problem and solution for indexing and statistical querying.

External discoverability and SEO

Will Jorum be improving search engine optimisation?

From the forthcoming chapter on OER SEO and Discoverability:

Why SEO and discoverability are important

In common with other types of web resources, the majority of people will use a search engine to find open educational resources, therefore it is important to ensure that OERs feature prominently in search engine results.  In addition to ensuring that resources can be found by general search engines it is also important to make sure they also are easily discoverable in sites that are content or type specific e.g iTunes, YouTube, Flickr.

Although search engine optimisation can be complex, particularly given that search engines may change their algorithms with little or no prior warning or documentation, there is growing awareness that if institutions, projects or individuals wish to have a visible web presence and to disseminate their resources efficiently and effectively search engine optimisation and ranking can not be ignored1.

The statistics are compelling:

  • Over 80% of web searches are performed using Google [Ref 1]
  • Traffic from Google searches varies from repository to repository but ranges between 50-80% are not uncommon [Ref 2]
  • As an indication 83% of college students begin their information search in a search engine [Ref 3]

Given the current dominance of Google as the preferred search engine, it is important to understand how to optimise open educational resources to be discovered via Google Search. However SEO techniques are not specific to Google and are applicable to optimise resource discovery by other search engines.

By all accounts the only way for Jorum is up as it was recently reported in the JISCMail REPOSITORIES-LIST that “just over 5% of Jorum traffic comes directly from Google referrals”. So what is going wrong?

I’m not an SEO expert but a quick check using a search for site:dspace.jorum.ac.uk returns 135,000 results so content is being indexed (Jorum should have access to Googe Webmaster Tools to get detailed index and ranking data). Resource pages include metadata including DC.creator, DC.subject and more. One thing I noticed was missing from Jorum resource pages was <meta name="description" content="A description of the page" />. Why might this be important? Google will ignore meta tags it doesn't know (and here is the list of metatags Google knows).

Another factor might be that Google, apparently (can’t find a reference) trusts metadata that is human readable by using RDFa markup. So instead of hiding meta tags in the of a page Google might weight the data better if it was inline markup:

Current Jorum resource html source
Current Jorum resource html source

With example of RDFa markup
With example of RDFa markup

[Taking this one step further Jorum might want to use schema.org to improve how resources are displayed in search results]

It’ll will be interesting to see if JEAP - Expanding Jorum’s collection through aggregation projects will improve SEO because of backlink love.

Looking further ahead

Will there be a LTI interface to allow institutions to integrate Jorum into their VLE?

Final thought. It's been interesting to see Blackboard enter the repository marketplace with xpLor (see Michael Feldstein’s Blackboard’s New Platform Strategy for details). A feature of this cloud service that particularly caught my eye was the use of IMS Learning Tools Interoperability (LTI) to allow institutions to integrate a repository within their existing VLE (CETIS IMS Learning Tools Interoperability Briefing paper). As I understand it with this institutions would be able to seamlessly deposit and search for resources. I wonder Is this type of solution on the Jorum roadmap or do you feel there would be a lack of appetite within the sector for such a solution?


Those are my thoughts anyway. I know Jorum would welcome additional feedback on their Summer of Enhancements. I also welcome any thoughts on my thoughts ;)

BTW Here's a nice presentation on Improving Institutional Repository Search Engine Visibility in Google and Google Scholar


On Friday (31st August) I was at eAssessment Scotland 2012. This is an event I’ve had a long running involvement with and it’s always a pleasure to travel up to Dundee with around 300 other delegates to attend one of the premier eAssessment events in the UK (if not Europe the world). Sheila was also at the conference at has already posted some notes and there isn’t much for me to add (although I do want to write something on Open Badges after attending Doug Belshaw's session). Instead I wanted to have a quick look at the conference twitter backchannel.

The archive of 1196 tweets taken between 23rd August and 3rd September has contributions from 184 twitter accounts. The median number of tweets per account was 2.


As well as monitoring the number of tweets being posted during the day, following on from my work on analysing threaded Twitter discussions from large archives using NodeXL #moocmooc, I included the number threaded replies* in the archive. Of the 1,196 #eas12 tweets 10% (n.115) were threaded replies. I was able to monitor this in real-time with the graph below which is embedded in the dashboard.

*the metadata from Twitter includes a ‘in reply to id’ field which identifies if the tweet is part of a thread. This isn’t 100% accurate as it is only recorded if the user uses a reply button.

Graph of tweets and threaded replies at #eas12
Graph of tweets and threaded replies at #eas12

At the time I commented:

interesting that #eas12 tweeters are in broadcast mode (red line bottom left graph) docs.google.com/spreadsheet/cc…

— Martin Hawksey (@mhawksey) August 31, 2012

which was responded to (as a threaded reply which creates all sorts of problems from Twitter’s embedded tweet option):

@mhawksey What does that say about us then... (/makes effort to think, edit and RT...) :D #eAS12

— Derek Jones (@plug103) August 31, 2012

@plug103 perhaps if the person at the front is in broadcast mode then it's okay for us to be our own individual mini-broadcasters ;) #eas12

— Martin Hawksey (@mhawksey) August 31, 2012

So how does this compare with #moocmooc. Hopefully the graph below illustrates this (interactive version here), but in numbers the '#moocmooc archive contains 6,883 tweets where 44% (n.3046) are conversation threads.

Graph of tweets and threaded replies at #moocmooc
Graph of tweets and threaded replies at #moocmooc

It’s perhaps not surprising that different modes of twitter usage produce different conversation patterns. Next week I’ll be at ALT-C 2012 in Manchester where I’ll also be monitoring the twitter backchannel. This is a 3 day event aimed at learning technologists so it’ll be interesting to see if a different pattern of usage emerges.

Something else I’m reminded preparing the TAGSExplorer view of eas12 (graph of replies – solid lines, and in this case mentions – dashed lines, between twitter account using eas12 hashtag) is that there is more interaction going on that could be analysed (there’s a PhD in all of this – if Twitter keep the data flowing).


Other links


I noticed a couple of search referrals around scheduling timed triggers in Google Apps Script so here is what I know and how to do it. Within Google Apps Script you can create basic time-based triggers (poor mans cron) to run functions on specific date/time intervals. Using the ‘Current Script’s triggers’ dialog (accessed from Script Editor > Resources > ‘Current Script’s triggers’):

  • minute (1, 5, 10 ,15 or 30)
  • hour (2, 4, 6, 8 or 12)
  • day (hour of day intervals)
  • week (day of week and hour of day interval)
  • specific date/time (in YYYY-MM-DD HH:MM format)

Current project's triggers dialog

Recently on one of my posts someone asked: ‘Is there a way to schedule when the script runs?’. In particular they were interested in running a particular function every 10 minutes for a set period.

I did briefly look at scripting time-based triggers, but quickly realised that my original plan to control a number of timed triggers from a central spreadsheet wasn’t possible because the class TriggerBuilder doesn’t allow forSpreadsheet on TimeBased triggers. Instead I came up with this code snippet:

function scheduledCollection(){
  var schedule = [];
  // dates/times in mm/dd/yyyy hh:mm - timezone matches settings in File > Project properties 
  schedule.push({start:"08/29/2012 15:00", end:"08/29/2012 16:00"});
  schedule.push({start:"08/29/2012 20:00", end:"08/29/2012 22:00"});

function checkCollect(schedule){
  var now = new Date();
  for (i in schedule){
    var start = new Date(schedule[i].start);
    var end = new Date(schedule[i].end);
    if (now > start && now < end){
      // insert function you want to run here


To use you enter the time ranges you want to use in the first function, enter the sub function you want to run in ‘interest function you want to run here’, and then create basic time-based triggers at a low interval (in the example above every 10 minutes) calling scheduledCollection. Enjoy!

Here's some posts which have caught my attention this month:

Automatically generated from my Diigo Starred Items.


This method uses a UK based SMS gateway and most likely not suitable for international use

Two years ago I wrote how you could have a  free SMS voting using intelliSoftware SMS Gateway service. This recipe automatically forwarded text messages from the IntelliSoftware SMS gateway to a blogger account using posting via email. Votes were then extracted from messages from the blogs RSS feed using some PHP code on my server.

Last year a modified version of this was used to collect votes for the poster competition at eAssessment Scotland 2011. I was recently asked if the recipe would still work for this year’s conference. It does but I thought I could make it better.

30 lines of code, source is in the templateThe main change is to directly ingest SMS messages into a Google Spreadsheet (using 30 lines of code) which makes it easier for manipulation and presentation. The method for doing this is relatively simple because the IntelliSoftware gateway has a HTTP interface and you can also use Google Spreadsheets as a Database – INSERT with Apps Script form POST/GET submit method.

If you would like to do this yourself here’s how:

  1. Signup for an account at intelliSoftware (it’s free!)
    Note: the username you select is also used to direct texts so you might want to use a class or course name)
  2. Open a copy of this Google Spreadsheet template (also free)
  3. Open Tools > Script editor...
  4. Select Run > setup and okay, then Publish > Deploy as web app.. and:
    - enter Project Version name and click 'Save New Version'
    - set execute web app 'as me'
    - security level as 'anyone, even anonymously'
  5. Click Update and copy the service url you are given (it will look like https://script.google.com/macros/s/[random_characters]/exec
  6. Now open your IntelliSoftware control panel
  7. Click on Forwarding and change, tick 'Enable incoming message forwarding' and change forwarding type to http
  8. Copy the web app url into the address field and click Save

To receive messages tell users to send a text message to 07786 XXX XXX with ‘xyz and their message’ (where 07786 XXX XXX is the mobile number found in the Trial Service section and xyz is your username created with intelliSoftware).

Simple response graphIn the example template I show how you can parse messages to generate a response graph. You might also want to look at how I’ve used a Google Form for Hacking stuff together with Google Spreadsheets: A simple electronic voting system, at the very basic level you’ve got a free SMS textwall to play with. If you do come up with any interesting mashups please leave a note in the comments :)


This post is a bit messy. I got caught trying out too many ideas at once, but hopefully you'll still find it useful

Sheila recently posted Analytics and #moocmooc in which she collects some thoughts on the role of analytics in courses and how some of the templates I’ve developed can give you an overview of what is going on.  As I commented in the post I still think there is more work to make archives from event hashtags more useful even if just surfacing tweets that got most ‘reaction’.

There are three main reactions that are relatively easy to extract from twitter: retweets, favouring and replies. There are issues with what these actions actually indicate as well as the reliability of the data. For example users will use ‘favouring’ in different ways, and not everyone uses a twitter client that can or uses a reply tweet (if you start a message @reply without clicking a reply button Twitter looses the thread).

But lets ignore these issues for now and start with the hypothesis that a reaction to a tweet is worth further study. Lets also, for now, narrow down on threaded discussions. How might we do this? As mentioned in Sheila's post we’ve been archiving #moocmooc tweets using Twitter Archiving Google Spreadsheet TAGS v3. As well as the tweet text other metadata is recorded including a tweet unique identifier and, where available the id of the tweet it is replying to.

Google Spreadsheet columns

We could just filter the spreadsheet for rows with reply ids but lets take a visual approach. Downloading the data as a Excel file we can open it using the free add-in NodeXL.

NodeXL allows us to graph connections, in this case conversation threads. NodeXL allows use to do other useful things like group conversations together to make further analysis easier. Skipping over the detail here’s what you get if you condense 6,500 #moocmooc tweets into grouped conversations.

 moocmooc grouped converstations

This is more than just a pretty picture. In NodeXL I’ve configured it so that when I hover over each dot which represents and individual tweet I get a summary of what was said by who and when (shown below).

NodeXL being used to examine nodes

It’s probably not too surprising to see strings of conversations, but by graphing what was an archive of over 6500 tweets we can start focusing on what might be interesting subsets and conversation shapes. There are some interesting patterns that emerge:

conversation group 1 conversation group 2conversation group 3

Within NodeXL I can extract these for further analysis. So the middle image can be viewed as:

Examination of conversation group 2

There’s a lot more you can do with this type of data, start looking at how many people are involved in conversations, number of questions per conversations and lots more. I should also say before I forget that NodeXL can be configured to collect twitter search results with it’s built-in twitter search tool. It can also be configured to do the collection on a regular basis (hmm I should really have a go at doing that myself). So potentially you’ve got a nice little tool to analysis twitter conversations in real-time …

If you’d like to explore the data more it’s available from the NodeXLGraphGallery. I’m going off to play some more ;)