Here's some posts which have caught my attention this month:

Automatically generated from my Diigo Starred Items.
Share this post on:
| | |
Posted in Starred on by .


Last couple of days I’ve been at IWMW12 hosted this year at University of Edinburgh. I’ve already posted Data Visualisation Plenary/Workshop Resources which has my slides from the plenary. I was teaming up with Tony Hirst (OU) and have included his slides to the page.

Because of living 'almost locally' and other family commitments I missed out on most of the social events, instead I got drunk on data working into the early hours to find what stories I could uncover from the #IWMW12 stream. In this post I’ll show you what I’ve come up with and some of the highlights in trying to turn raw data into something interesting/meaningful (or pointless if you prefer). Interestingly a lot of what I cover here uses the same techniques used in my recent The story data tells us about #CitizenRelay guest post, so I’ve got an emerging templated workflow emerging which I can deploy at events which makes me wonder if I should be getting organisers pay my travel/accommodation as an event data amplifier?

UK University Twitter Account Community

On day one Brian Kelly mentioned some work by Craig Russell to collate a table of UK University Social Media accounts which featured in a guest post on Brian’s blog titled Further Evidence of Use of Social Networks in the UK Higher Education Sector. You can get the data Craig has compiled from a Google Spreadsheet. Looking at this two things immediately sprung to mind. First that the document could be made more ‘glanceable’ just using some simple conditional formatting, and second there was a nice list of Twitter accounts to do something with.


Here’s a link to my modified version Craig’s spreadsheet. It uses the importRange formula to pull the data in so it creates a live link to the source document. For the conditional formatting I looked for text containing ‘http’ turning the cell text and background green. The HTML view of this is a lot cleaner looking.

On  the Twitter Accounts sheet extract the account screen names by pulling everything after the last ‘/’ and remove most of the blank rows using a unique formula.

Putting this list into the free MS Excel add-in NodeXL and using the Import > From Twitter List Network lets you get data on which of these accounts follow each other. I played around with visualising the network in NodeXL but found it easier in the end to put the data into Gephi getting the image below. These ‘hairballs’ have limited value and you’re best having a play with the interactive version, which is an export of Gephi visualised using the gexf-js tool by Raphaël Velt (De-hairballing is something Clement Levallois (‏@seinecle) and he kindly sent me a post to a new tool he’s creating called Gaze).

UK HEI Twitter Accounts

The #IWMW12 Twitter Archive Two More Ways

TimelineAs part of #iwmw12 I was collecting an archive of tweets which already gives you the TAGSExplorer view. I also use the Topsy API and Google Spreadsheet to extract tweets which is then passed into Timeline by Vérité which gives you a nice sense of the event. [If anyone else would like to make their own twitter media timeline there is a template in this post  (it is easy as make a copy of the template, enter your search terms and publish the sheet).]

Searchable archive

Searchable archiveNew way number one is a filterable/searchable archive of IWMW12 tweets. Using the Google Visualisation API I can create a custom interface to the Google Spreadsheet of tweets. This solution uses some out-of-the-box functionality including table paging, string filtering and pattern formatting. Using the pattern formatter was the biggest achievement as it allows you to insert Twitter Web Intents functionality (so if you click to reply to a tweet it pulls up Twitter’s reply box.

I also processed the archive using R to get a term frequency to make a d3 based wordcloud (I’ve started looking at how this can be put into a more general tool. Here’s my current draft which you should be able to point any TAGS spreadsheet at (this version also includes a Chart Range Filter letting you view a time range). I definitely need to write more about how this was done!)

Filter by time

Mappable archive

One of the last things I did was to filter the twitter archive for tweets with geo-location. Using the Spreadsheet Mapper 3.0 template I was able to dynamically pull the data to generate a time stamped KML file. The timestamps are ignored when you view in Google Maps, but if you download the kml file it can be played in Google Earth (you’ll have to adjust the playback control to separate the playback heads – I tried doing this in the code but the documentation is awful!)

Google Earth playback

Or if you have the Google Earth browser plugin a web version of IWMW12 geo-tweets is here (also embeded below):

So there you go … or as said by Sharon Steeples

Originally posted on CitizenRelay

Telling stories with data is a growing area for journalism and there is already a strong community around Data Driven Journalism (DDJ). I’m not journalist, by day I’m a Learning Technology Advisor for JISC CETIS, but my role does allow me to explore how data can be used within education. Often this interest spills into the evenings where I ‘play’ with data and ways to visualise the hidden stories. Here are a couple of ways I’ve been playing with data from the CitizenRelay:

A time

One of the first things I did was produce a Timeline of the CitizenRelay videos and images shared on Twitter. This uses the Topsy web service to find appropriate tweets which are stored in this Google Spreadsheet template which are then displayed in the Timeline by Vérité tool (an open source tool for displaying media in a timeline). The result is a nice way to navigate material shared as part of CitizenRelay and an indication of the amount of media shared by people.

 Timeline of the CitizenRelay videos and images shared on Twitter

A time and place

As part of the CitizenRelay Audioboo was used to record and share interviews. For a data wrangler like myself Audioboo is a nice service to work with because they provide a way to extract data from their service in a readable format. One of the undocumented options is to extract all the clips with a certain tag in a format which includes data about where the clip was recorded. Furthermore this format is readable for other services so with a couple of clicks with we can get a Google Map of CitizenRelay Boos which you can click on and find the audio clips.

 Google Map of CitizenRelay Boos

One experiment I tried which didn’t entirely work out the way I wanted was to add date/time to the Audioboo data and also embed the audio player. This datafile (generated using this modified Google Spreadsheet template) can be played in Google Earth allowing to see where Boos were created, when they were created with a timeslider animation and directly playback the clips. This experiment was partially successful because I would prefer the embedded player worked  without having to download Google Earth.

 Google Earth of CitizenRelay Boos

A look at who #CitizenRelay reporters were

So far we have mainly focused on the content but lets now look at the many eyes and ears of the CitizenRelay who helped share and create stories on Twitter.

CitizenRelay Many eyes

The image shows the profile pictures of over 600 people who used the #CitizenRelay tag on Twitter so far this year. This image was generated using a free add-in for Microsoft Excel called NodeXL, read more about getting started with NodeXL. What that image doesn’t show you is how interconnected this community is. Using another free tool called Gephi and with the right data we can plot the relationships in this twitter community, who is friends with who (read more about getting started with Gephi). In the image below pictures of people are replaced with circles and friendships are depicted by drawing a line between circles.

CitizenRelay Community

There are almost 7,000 relationships shown in the image so it can be a bit overwhelming to digest. Using Gephi it is possible to interactively explorer individual relationships. For, example the image below shows the people I’m friends with who used the #CitizenRelay tag.

CitizenRelay Sub-community

A look at what #CitizenRelay reporters said

Using the same technique for plotting relationships it’s also possible to do something similar with what people said using the #CitizenRelay tag. By plotting tweets that mention or reply to other people we get:


This image is evidence that #CitizenRelay wasn’t just a broadcast, but a community of people sharing their stories. Visualising Twitter conversations is one of my interests and I’ve developed this interactive tool which lets you explore the #CitizenRelay tweets.

CitizenRelay Interactive Archive

So there you go some examples of what you can do with free tools and a bit of data, I’m sure there are many more stories to be found in CitizenRelay.

Share this post on:
| | |
Posted in Data, Twitter, Visualisation on by .


On Tuesday 19th June I’ll be presenting at the Institutional Web Manager Workshop (IWMW) in Edinburgh … twice! Tony Hirst and I are continuing our tour, which started at the JISC CETIS Conference 2012, before hitting the stage at GEUG12. For IWMW12 we are doing a plenary and workshop around data visualisation (the plenary being a taster for our masterclass workshop). I’ll be using this post as a holder for all the session resources.

Update: I've also added Tony Hirst's (OU) slides. Tony went on first to introduce some broad data visualisation themes before I went into a specific case study.

The draft slides for my part of the plenary are embedded below and available from Slideshare and Google Presentation (the slides are designed for use with pptPlex, but hopefully they still make sense). For the session I’m going to use the OER Visualisation Project to illustrate the processes required to get a useful dataset and how the same data can be visualised in a number of ways depending on audience and purpose. Update: I should have said the session should be streamed live, details will appear on IWMW site.

Update: As a small aside I've come up with a modified version of Craig Russell's UK Universities Social Media table as mentioned in Further Evidence of Use of Social Networks in the UK Higher Education Sector guest post on UKWebFocus (something more 'glanceable'). Using the Twitter account list as a starting point I've looked at how University accounts follow each other and come up with this (click on the image for an interactive version).

If you have any questions feel free to leave a comment or get in touch.

Today I’m presenting at Google Apps for EDU European User Group meeting (GEUG12) at the University of Portsmouth. I’m doing the session with Tony Hirst (@psychemedia) on Creative Solutions with Google Products which is deigned to show people some of the ways we use Google products. Current slides for the session are below and along with the other parallel session should be streamed using Google Hangouts, so follow GEUG12 or #GEUG12 on Google+ (We’re on at 12:10).

Recording of the session now available (unfortunately beginning is missing – didn’t hit the record button, oops):


Creative Solutions with Google Products

Share this post on:
| | |
Posted in Google, Presentation and tagged on by .


Sheila MacNeill recently shared Some thoughts on web analytics using our work on analytics in which she mentioned a tool I developed which allows us to quickly get social share and page view counts for out blog, category or project feeds. The solution was a quick add-on to Using Google Spreadsheets to combine Twitter and Google Analytics data to find your top content distributors, recycling some other bits of code I had lying around (mostly from Introducing a RSS social engagement tracker in Google Apps Script #dev8d, which has never really worked). The reworking is much simpler and more reliable so if you would like to extract social counts from your own (or other people’s) RSS feeds here’s:

*** RSS Feed Social Share Counter ***

If you are using this with your Google Analytics account you’ll need to authenticate access, otherwise you can just enter a RSS feed in cell B5 and see what you get back.

Social Counts/Page Views for OOH-ER

Limitations/usage notes

As JISC CETIS uses WordPress for it’s blogs it’s easy to get feeds for particular projects. For example I can get a RSS feed for the OER Visualisation Project using The spreadsheet is using the importFeed formula you’re limited to 20 items. Usually this is enough for us as most project feeds fairly go over that.  How meaningful are the numbers? At the end of the day page views probably have the most weight (tweets are cheap ;), I do find social counts useful as a way to find posts worth further investigation to find out who said what when – further work is required though to turn that data into actionable decisions.


Reader Stats: From your 353 subscriptions, over the last 30 days you read 4,721 items, clicked 68 items, starred 9 items, and emailed 55 items.I’ve been a long time user of Google Reader. It’s been an up and down relationship even going through a trial separation, but I always come back. One of the things I’ve struggled with is a workflow that allows me to share and file interesting posts on the mobile version of Google Reader. I’ve previously experimented with custom Yahoo Pipes and services like but find these don’t give me the flexibility in editing tweet text or tags used to bookmark links. The slides below highlight the problem, if I want to personalise how a post is saved/shared you have to navigate out of Google Reader and then you are on reliant on share features of the source post or apps available on your phone.

Send email from Google Reader MobileHere’s my solution. As Google Reader mobile has an email option it’s possible to push posts to different services without navigating away and still let you have control over what is saved/tweeted. Initially I considered using existing services which let you create your own workflows by sending stuff to a dedicated email, example services which let you do this that I already use are Evernote and Instead as there have been a number of Google Apps Script Gmail productivity solutions like Drew Csillag’s (sounds like cheese-log ;) Calorie Counting with Google Apps Script I thought I’d give that technique a go.

So I’ve come up with this:

*** GmailProductivitySheet v1.0 Spreadsheet ***

This solution uses the old Gmail trick of adding +whateveryouwant to your email address to allow you to do custom filtering. For example, if I email m.hawksey+twitter[at] the message still ends up in my inbox but it makes it easy for me to setup a filter rule. So using Google Apps Script I can search my Gmail inbox for messages from me with +twitter, +delicious or +diigo and then using oAuth or basic authentication pass links/messages through those service’s API’s, I hope you agree Google Apps Script at it’s finest.


Here’s a video to help you with setup and usage (recorded in a Google Hangout On Air – still experimenting to get the best quality)

In summary, if I want to tweet something then I email the link to mygmailaccount[email protected], to save a bookmark either mygmailaccount[email protected] or mygmailaccount[email protected], or any combination of the three by sending the message to the email address.

Google Reader Productivity Flow

BTW if you are worried I’m wasting JISC CETIS time to make this fear not as this was an out of hours project. I also feel slightly annoyed with myself for spending so much time taking what was a functional solution (couple of hours work) and trying to make it more user-friendly for re-use particularly as the payback is minimal. Ho-hum.

For the Analytics Reconnoitre I’ve been trying to get my head around ‘Analytics as a service’ asking myself what new “as-a-service” offerings are emerging. Let start by defining what ‘as-a-service’ is before looking at some of the analytics offering. For this I’m going to use the five key characteristics used in the JISC CETIS Cloud Computing in Institutions briefing paper:

As a service: Key characteristics

Rapid elasticity: Capabilities can be rapidly and elastically provisioned to quickly scale up and rapidly released to quickly scale down.

Ubiquitous network access: Capabilities are available over the network and accessed through standard mechanisms that promote use by heterogeneous thin or thick client platforms (e.g., mobile phones and laptops, etc.).

Pay per use: Capabilities are charged using a metered, fee-for-service, or advertising based billing model to promote optimisation of resource use.

On-demand self-service: A consumer can unilaterally provision computing capabilities, such as server time and network storage, as needed without requiring human interaction with each service’s provider.

Location independent data centres: The provider’s computing resources are usually pooled to serve all consumers using a multi-tenant model, with different physical and virtual resources dynamically assigned and reassigned according to consumer demand.

Delivery Levels

The JISC CETIS briefing then goes on to name three delivery levels for as a service offerings in software (SaaS), platform (PaaS) and infrastructure (IaaS). Here are my suggestions for Analytics as a Service and Data as a Service:

Analytics as a Service (AaaS): The capability provided to the consumer is to use the providers applications running on a cloud infrastructure to extract “actionable insights through problem definition and the application of statistical models and analysis against existing and/or simulated future data”* Examples include application specific solutions like Google Analytics and more general offering like Amazon AWS.

*definition proposed by Adam Cooper in Analytics and Big Data. In many instances AaaS is a subset of SaaS

Data as a Service (DaaS): The capability provided to the consumer is to use the provider’s data on demand regardless of location and affiliation. Education specific data services are provided by HESA, UCAS and others (more examples in the JISC infoNet BI infoKit). Cost models include subscription and volume based. As well as DaaS option there are a growing number of Open Data providers including Government initiatives like These fall outside the definition used here of ‘as-a-service’ offering.


Web Analytics: Web analytics as a service is not a new phenomenon and the current market leader Google Analytics has been around since 2005. Google’s ‘as a service’ offering is available for free or as a paid for premium service. The service provides a number of standard web-based dashboards which allow administrators to analyses website traffic. Recently Google have also start recording and reporting social referrals from networks like Facebook, Twitter and their own Google+. Detailed social activity streams are also available from Google’s Social Data Hub partners. These streams extract conversations and social actions like bookmarking around website resources. As well as the web interface Google have options for downloading processed data and API access for use in other applications and services.

Customer Relationship Management: As part of the CETIS Cloud Computing briefing Enrollment Rx was used to illustrate how their CRM solution offered as Software as a Service in turn build upon the Platform as a Service offered by Salesforce. As part of this Enrollment Rx integrate Salesforce’s analytics tools and dashboards within their own product. Within Salesforce’s appexchange there are over 100 other applications tagged with ‘analytics’, including SvimEdu which is a complete enterprise resource planning package targeted at the education sector.

BenchmarkinginHE:  Benchmarking In HE is a HEFCE funded project which aims to offer benchmarking tools and data for universities and colleges. Many of the data sources (listed here) are Open Data provided by organisations like HESA but some are only available on a subscription basis. For example, the Higher Education Information Database for Institutions (heidi) which is managed by HESA is operated on a subscription basis and operated on a not-for-profit basis. The current tool available to institutions via BenchmarkinginHE is BenchmarkerHE, an online database of shared financial data with reporting options.

Big Data Analytics: Similar to the CRM illustration there are other examples of raw analytics services that are also relayered with 3rd party applications. An example of this is Amazon’s Elastic MapReduce (EMR). MapReduce is a programming framework for processing large datasets using multiple computers originally developed by Google and now features in open source frameworks like Apache Hadoop. Elastic MapReduce was developed as part of one of the offering in Amazon Web Services (AWS) based on Hadoop and is ‘elastic’ because it can easily scale.  Karmasphere Analytics for Amazon EMR is a service which provides a graphical layer to interface Amazon EMR providing tools to create queries to generate reduced datasets which can be visually viewed or exported into other tools like MS Excel.

Spare notes

There is one more illustration I have in mind but doesn’t entirely fit with the ‘as-a-service’ ethos. There are a growing number of sites that let you publish datasets for analysis. These services don’t include tools to process the data, instead they provide an infrastructure to set bounties. Examples include Kaggle and Amazon Mechanical Turk, the later being a component of UC Berkeley’s AMPLab, which I’ve written about here.

Risk and Opportunities

A number of risks and opportunities are identified in the JISC CETIS Cloud Computing in Institutions briefing paper. One additional opportunity offered by analytics as a service is the argument that ‘as-a-service’ offering can, to a degree, remove the reliance on the need to have a dedicated data scientist. For example, a recent NY Times article asked ‘Will Amazon Offer Analytics as a Service?’, in which they speculate if Amazon will make and sell pattern-finding algorithms, removing the burden from the customer to develop their own.

Available Products and Services

A range of analytics and data services are available. Here are a couple I’ve mentioned in this post topped up with some more.

Google Analytics: A free Google product that provides website analytics. Standard reporting includes analysis of: audience; advertising; traffic sources; content; and conversions. Data can be analysed via the Google Analytics web interface or downloaded/emailed to users. Analytics also has a Data API allowing which can be used by 3rd party web services or in desktop applications. Website visitors are tracked in Google Analytics using a combination of cookies (rather than server logs) and most recently social activity. Google market share is reported to be around 50% but in a recent survey of 134 Universities UK websites 88% (n.118) were using Google Analytics.

Enrollment Rx (text from CETIS briefing): Is a relatively small company in the US that offers a Customer Relationship Management solution as Software as a Service. The service allows institutions to track prospective students through the application and enrollment process. The system is not free, but the combination of web delivery on the user end, and Platform as a Service at the backend, are intended to keep prices competitive.

Salesforce for Higher Education: Higher education institutions are using the platform for its instant scalability, ease of configuration, and support for multiple functional roles. Imagine a unified view of every interaction prospects, students, alumni, donors and affiliates have with your department or institution. Combine this with all of the tools you need to drive growth and success – campaign management, real-time analytics, web portals, and the ability to build custom applications without having to code – and you’re well on your way to getting your school to work smarter.

Karmasphere Analytics for Amazon EMR: Karmasphere provides a graphical, high productivity solution for working with large structured and unstructured data sets on Amazon Elastic MapReduce. By combining the scalability and flexibility of Amazon Elastic MapReduce with the ease-of-use and graphical interface of Karmasphere desktop tools, you can quickly and cost-effectively build powerful Apache Hadoop-based applications to generate insights from your data. Launch new or access existing Amazon Elastic MapReduce job flows directly from the Karmasphere Analyst or Karmasphere Studio desktop tools, all with hourly pricing and no upfront fees or long-term commitments.

Kaggle: Kaggle is an innovative solution for statistical/analytics outsourcing. We are the leading platform for predictive modeling competitions. Companies, governments and researchers present datasets and problems - the world's best data scientists then compete to produce the best solutions. At the end of a competition, the competition host pays prize money in exchange for the intellectual property behind the winning model.

Share this post on:
| | |
Posted in Analytics and tagged on by .

  1. Hangouts On Air are broadcasted live to Google+ and YouTube so anyone can watch even if they don’t have a G+ account
  2. Participants who can join the hangout for On Air are limited to people in your circles. So if you are planning on having guests you need to make sure they are in your circles before you start.
  3. Google+ users can comment on an On Air broadcast whist watching the feed in the Google+ site.
  4. If you are demoing Google products with multiple accounts best to log in on separate browser (first time around I got chucked out of my own hangout when switching accounts).
  5. Resolution of recording is only 480p which probably not high enough if you want people to see text in browser. I’ve experimented with ManyCams (CamTwist is a Mac equivalent) which allows you to stream your desktop as if it was a webcam. ManyCams also allows you to do picture-in-picture so you can have your desktop and talking head. Resolution still isn’t great so you might want to use partial desktop. Here’s an example of switching from a ManyCams source to Hangout shared desktop:

Here’s an extended example of a Hangout I did recently (my first) demoing a new tool I’ working on.

Final thoughts

Hangouts On Air look like a good way to quickly setup a streamed event. If you are doing desktop sharing some juggling is required if you are using standard desktop applications. If you are planning on interacting with the audience some protocols need to be established and separate channels used (you you use the Google+ comment stream, Twitter, something else). Because a Google+ account isn’t required to view when viewing the YouTube feed it opens it up to a wide community. I think I’ll be hanging out more.

Are there any other tips you’ve got?

Share this post on:
| | |
Posted in Google, GooglePlus on by .


Writing another blog post today which included reference to Google Analytics I pondered:

The response wasn’t promising:

My thought was to detect Google Analytics urchin code from website homepages. Knowing Tony Hirst had done something I asked and at 4:08pm the response was:

At 4:17pm

So how was it done?

I didn’t like the prospect of tweaking Tony’s scraperwiki code but spotted he was getting a list of institutions from Universities UK. Using the Scraper Chrome Extension I was able to export all the institution urls to a Google Spreadsheet:

Scraper Window

Having played around with Google Analytics before I knew if the site was using Google Analytics it would have a unique profile id in the source in the format UA-XXXXXX-X and found this regular expression to extract it using the following Google Apps Script:

function getUA(url) {
  var requestData = {
          method : "get",
          headers: { "User-Agent":""}
  var html = UrlFetchApp.fetch(url,requestData).getContentText();
  var urlPattern = /\bUA-\d{4,10}-\d{1,4}\b/ig;
  return html.match(urlPattern)[0];

I could then use a custom formula in column C to extract an urchin code from a website. This worked for most sites but I got a couple of errors for sites not using Google Analytics. Validating some of the results I noticed that it was because the UrlFetchApp wasn’t following browser redirects e.g. redirects to This is a problem I’ve had before so recycled the code below which uses to follow a link to the destination.

function extractLink(text){
  // create a url pattern
  var urlPattern = /(\b(https?|ftp|file):\/\/[-A-Z0-9+&@#\/%?=~_|!:,.;]*[-A-Z0-9+&@#\/%=~_|])/ig;
  var feedproxyPattern = /(\b(http:\/\/;
  // extract link from email msg
  var url = text.match(urlPattern)[0];
  //if (feedproxyPattern.test(url)){
   // if feedproxy url see if cached (or resolve end url)
   var cache = CacheService.getPublicCache(); // using Cache service to prevent too many urlfetch
    var cached = cache.get(url);
    if (cached != null) { // if value in cache return it
      return cached;
    var requestData = {
                        method : "get",
                        headers: { "User-Agent":"GmailProductivitySheet - Google Apps Script"}
    try {
      // try and get link endpoint using
      var result = UrlFetchApp.fetch(""+encodeURIComponent(url), requestData);
      var j = Utilities.jsonParse(result.getContentText());
      var link = (result.getResponseCode()===200)? Utilities.jsonParse(result.getContentText()).end_url:url;
    } catch(e) {
      // if doesn't work just return extracted url
      var link = url;
    cache.put(url, link, 3600);
    return link;
  return url;

Using this formula in column D for the error results I got a fresh url to point the getUA function. Here’s the final spreadsheet (I’ve copied/pasted as values some of the formula results to save my quota) and the answer to my question:

134 institutional websites, 118 (88%) with Google Analytics code

But as Ranjit Sidhu reminded me