The 2013 New Year’s greetings, from the French advertising and public relations company Publicis Groupe by their CEO Maurice Lévy is making multiple appearances in my circles

When I saw it my immediate thought was how did they do it? In the official press release they say:

It’s just one of the many very human interpretations of the standard YouTube functionality taken for granted today.

Looking for YouTube player functionality left me scratching my head. Looking at the source gave a big clue:

Publicis Groupe custom tab code (click to enlarge)

… basically the entire video area including title and controls is an embedded iframe. This allows Publicis Groupe to create their own Flash based player with custom functionality for pause, play etc.

To get the iframe into their YouTube channel they used YouTube’s Custom tab settings which are available on Branded Channels.

Simple but very effective.

Alistair Brown has written an interesting post on the LSE Impact of Social Sciences blog - Proving dissemination is only one half of your impact story: Twitter provides proof of real-time engagement with the public. The post highlights the case of how a journal paper was picked up by a university media office, which ended in the author being interviewed on BBC Radio 4’s Today programme. As part of this Alistair highlights that:

REF impact involves an assessment of “significance” as well as “reach,” so the mere fact that research has been disseminated to a wide audience does not constitute an impact by itself; one has also to show the effect it has on those to whom it is disseminated. For this reason, citing the fact that a researcher has appeared on a primetime radio show with several million potential listeners might be one element of an impact statement, but one needs also to evidence that the audience has actively listened to what was being put out, and that it has affected, changed or benefitted them in some way

In the age of the second screen Alistair goes on to highlight how Twitter can be used as evidence of engagement, listeners tweeting personal reflections, feedback or just disseminating the information more widely. But as Alistair points out:

When a piece of academic work receives broadcast media coverage, then, it is useful to have a strategy in place to gather emerging responses, and it is also far easier to do this as it happens rather than retrospectively.

A strategy is required because, as Alistair points, out the Twitter search is limited to the last 7 days. While there are ways to view this activity in realtime how do you capture the evidence.  Here’s my response to the problem:

Here is the spreadsheet template I mention. So have you got a strategy for recording impact evidence from Twitter?


Getting Creative with Big Data and Google Apps I was recently asked to write a guest post for Big Data Week on using Google Apps as an interface for Big Data. For the post I decided to revisit an old recipe which uses Google Sheets (Spreadsheets) and Google Apps Script to interface the Twitter and Google Analytics API. One of the results is the bubble graph shown below which shows who has been tweeting my blog posts, how many visits their tweet generated, the number of retweets and how many followers the person has (click on the image for the interactive version). You can read more about his this was done and get a copy of the template in Getting Creative with Big Data and Google Apps

Visits, rewteets bubble graph


Update 16/06/2014: This idea has been revisited by the Elevate team at University Campus Suffolk. You can read more about and get a copy of the code here

Update 18/02/2015: I've revisited the idea of issuing Open Badges with Google Apps Script and embedding in Google Sites which improves this idea further. You can read more about this here.

 Open Badges Issuer Gadget for Google SitesWant to issue badges in Google Sites? That was the challenge I set myself. My solution is the Open Badges Issuer Gadget for Google Sites. This gadget wraps the Mozilla Issuer API to allow you to issue badges from a Google Site. To use the gadget is insert into a Google Site and prefix (base url) is set for your Assertion JSON. To allow users to collect their badges direct them to the web address of your Site page containing the gadget adding ?claim_code={assertion url post fix}.

For example, if my Site page with the Issuer Gadget is found at and by Assertion files are all found in the directory this would be my base url. If one of my Assertion files in this directory was v1mhaws.json, to claim the badge for that person I’d send them a link to

Get the Open Badges Issuer Gadget for Google Sites

The Open Badges Issuer Gadget for Google Sites is located here:

To add to your own site

  1. Open/create a page on your Google Site
  2. Selecting Insert > More gadgets,
  3. Add gadget by URL inserting the address
  4. Insert a prefix (base url) for your Assertion JSON files (you can leave this blank if the host site varies) and click ‘Ok’
  5. For each or collections of Assertions direct users to visit the page your gadget is hosted on adding ?claim_code= followed by a comma separated list of the remainder of you Assertion file locations

Try it

If you’d like to try the gadget complete this form and you’ll be issued with a Gadget Issuer User Badge. Get the question right and you’ll also get the Gadget Issuer Gold Badge.

How the gadget works

For those of you unfamiliar with gadgets/widgets they are an easy way to embed content in other gadget/widget compatible sites. The majority of gadgets are simply XML wrappers for HTML content. The great thing is that gadgets can include JavaScript that doesn’t get sanitized/stripped out. If you want more information about making gadgets see My First Google Gadget. The source code for the is linked to above but can also be viewed on GitHub. Essentially it’s a wrapper for Mozilla’s Issuer API

The Issuer API is a [java]script that can be dropped-in to any badge issuer's website to provide a way for users to add an issuer's badges to their backpack.

Feel free to modify the gadget code to handle the success and error callbacks.

Yep I’m issuing badges from a Google Form/Spreadsheet, here’s how

If you tried the demo you might be wondering how I went from a Google Form to issuing some badges. Here’s how. Google Spreadsheets includes Google Apps Script, a cloud scripting language with uses the JavaScript Syntax to automate processes across Google products and third party services and deploy/publish custom applications and data. Apps Script includes a Content Service, which amongst other things lets you publish JSON data. As the metadata blobs behind open badges are JSON based we can use Apps Script to process the form responses, email the recipient and create the JSON … well almost.

An issue with JSON files generated by App Script  is security measures put in place by Google prevent the cross-domain use when called by AJAX as used by the Issuer API. So currently I have to proxy the JSON via my webhost (Mozilla could fix this by also permitting JSONP, which can also be produced by Apps Script. I imagine this is however low priority. If you have any thoughts on other ways leave a comment).

Here’s a copy of the code running behind my Google Form (you’ll also need to include a rollup of the CryptoJS sha256 library to hash and salt the recipient’s email address).

[A pointer if you want to extend this work is you might want to use the native NoSQL style ScriptDb Service part of Google Apps Script to prepare and serve assertions. Also I found the Open Badges Validator is very useful for testing your Assertions.]

Related work

Some recent posts/upcoming events that have influenced this post are:

So what do you think? Inspired to issue badges from Google Apps? Have I missed anything? What isn’t clear? I look forward to your thoughts ;)


Amber (@ambrouk/at JISC/of UKOER) is leaving JISC today and Lorna has written a lovely post wishing her goodbye. I’ve known amber since my days in JISC RSC Scotland North & East and she has been a fantastic supporter of my work forever ‘pimping my shit’.

Amber thank you and good luck in the new job. Here are a some flowers for you (made of your Twitter friend network, degree 1.5, nodes scaled by betweenness centrality and clustered with Spin-glass community finding algorithm ;)

Here's some posts which have caught my attention this month:

Automatically generated from my Diigo Starred Items.


You may have noticed I missed an analysis of week 5 of CFHE12, but hopefully I’ll capture what I wanted to say and more in this post. In this post I want to pull together a couple of ideas around some of the measurable user activity generated as part of CFHE12. This will mainly focus around Twitter and blogs and will ignore other channels like the course discussion forum, webinar chat or other spaces participants might have discovered or created. I conclude that there are some simple opportunities to incorporate data from twitter into  other channels, such as, summary of questions and retweets.

Twitter Activity Overview

The headline figures for tweets matching the search ‘#CFHE12 OR OR OR’  for 8th October to 18th November 2012 (GMT):

  • 1,914 Tweets from 489 different Twitter accounts
  • 1,066 links shared
  • 10% (n=206) of the tweets were in @reply to another Twitter account
  • Contributions from accounts in 45 countries (top 5  United States of America – 166; Canada – 50; United Kingdom – 43; Australia – 28; Germany – 12) [1]

Map of #cfhe12 participants

Looking at week-by-week distribution of contributors and contributions it can be seen that after the initial first 2 weeks the number of tweets posted each week remained consistent around 200. 75% of the Twitter accounts (n=374) contributed tweets to 1 week of CFHE12.

Twitter contributors and tweets Number of weeks participants contributed in

Comparing Twitter activity with blog posts aggregated with gRSSHopper doesn’t reveal any correlation but it’s interesting to note that whilst the volume of tweets remain relatively consistent for weeks 3 to 6 there is a significant drop in blog posts between week 4 and 5.

CFHE12 number of tweets and blogs

Looking at distribution of tweets over day and time shows a lull on Thursdays, but a peak around 1900hrs GMT which would appear to coincide with the usual time slot used for tweetchats.

CFHE12 Tweets by day of week CFHE12 Tweets by time of day

Getting more technical having collected the friend follower relationships for Twitter accounts using #CFHE12 for each of the weeks it possible to analyse new connections between community members. At the end of week 1 the top 52 contributors were joined by 286 follow relationships. By the end of the course 45 new follow relationships were created increasing the graph density [0.103774 –> 0.120101597] and reducing geodesic distance [2.008136 –> 1.925296]

The graph below highlights the new relationships (bold line). Nodes are sized by the number of new connections (as part of the archive the friend/follower count is captured with each tweet so it may be possible to do further analysis). Its interesting to note that BarnetteAndrew is an isolated node in G7.

cfhe12 interconnection growth

Refining the signals from the Twitter feed

Adam Cooper (CETIS) has recently posted some tips from a presentation John Campbell on development of Signals at Purdue, which includes using a spreadsheet as a starting point as a way to find out what you need. I’ve already got some basic tools to overview a Twitter archive but used the CFHE12 data to experiment with some more.

By week

Adding to the existing Twitter Activity sparklines it’s been possible to extract a summary of week-by-week activity (a basic traffic light). Whilst this is in a way a duplication of the data rendered in the sparkline it has been useful to filter the participants based on queries like ‘who has contributed in week 1’ and ‘who as contributed in all the weeks’. If you were wanting to take this to the next level you’d combine it with the community friendship graph and pay extra attention to the activity of your sub community bridges (for more info on this see Visualizing Threaded Conversation Networks: Mining Message Boards and Email Lists for Actionable Insights).

CFHE12 Activity data

Conversation matrix

Graphing the conversations (@reply, @mentions and retweets) using TAGSExplorer gives this ball of mess.

CFHE12 TAGSExplorer View

Trying to find a more structured way of presenting the data I’ve experimented with an adjacency matrix (shown below or interactive version here)[2]. Each cell is colour coded to indicate the number of interactions (replies, mentions and retweets) between users. For example we can see that gsiemens has had the most interactions with barrydahl. Scanning along the rows gives you a sense of whether a person was interacting with a large number or select few other accounts. For example, pgsimoes has interactions mostly with suifaijohnmak. Filtering tweets for pgsimoes (possible from the link in the left column) it looks like it’s an automatic syndication of suifaijohnmak’s blog posts.

CFHE12 conversation adjacency matrix

Do you have any questions?

CFHE12 Any Questions?

At the beginning of CFHE12 I posted Any Questions? Filtering a Twitter hashtag community for questions and responses. This is a crude tool which filters out tweets with ‘?’ which might indicate they are a question. By counting the number of tweets in the archive which reply to a question you get the following breakdown

  • Total questions 309
  • Questions with replies 44

An important point to note is that as only tweets that meet the search criteria are archived there may be more responses ‘off tag’. The danger in a medium like Twitter used in courses like CFHE12 is that questions may go unanswered, misconceptions are not corrected, feedback is never given.

Part of the issue in the case of CFHE12 is participants are allowed to use tools like Twitter as they please. While this suits ‘visitors and residents’ some additional structure may be beneficial. Two simple approaches would be to direct participants to include the course tag in their reply and highlighting current questions by either using the ‘Any Questions’ tool or in the case of courses using gRSSHopper instead of including all the latest course tweets in the daily email alert filter the search for ‘CFHE12 AND ?


retweetsFor a while I’ve had a basic function that extracted the tweets with the most retweets, but <sigh> I think it on the list of developments I’ve never got to blog about. The routine is a relatively simple bean counter that goes through a list of tweets, removes any hyperlinks and crudely shortens the text by 90% (to account for any annotations) and counts the number of matches before returning a list of the top 12. Slicing the data for each week I get these tables. There is probably more analysis required of what is being retweeted before making a decision about how this data could be used. My main question for the #cfhe12 twitter community is doing they have a sense of what is being retweeted the most. The other angle is pushing some of this data into other communication channels like the Daily Newsletter or discussion forums.


So hopefully the summary data I extracted and experimentation with new data views has been useful. Something that has been reinforced in my own mind that more value could be easily gained using Twitter by either providing guidance on use and/or incorporating data from Twitter in other channels more effective (rather than dumping everything into a daily email select some key data). Now that I’ve got a template which splits some of the data into weekly slices it should be easier to deploy and pass data into other systems buy changing some of the dates on the summary sheet.

But what do you think? Are there any particular data views you found useful? If you’ve participated in a community that uses Twitter a lot what additional tools would you find useful to keep track of what is going on?

Get the Data


[1] Countries were reconciled by extracting location recorded in twitter account profile and generating geo-coordinates using recipe here. Locations were extracted for 404 accounts. Co-ordinates were uploaded to GeoCommons and analysed with a boundary aggregation to produce this dataset.

[2] the matrix was generated by exporting conversation data from TAGSExplorer by adding the query &output=true to the url e.g. like this, importing into NodeXL then filtering vertices based on a list of top contributors. This was exported as a Matrix Workbook and imported into the Google Spreadsheet. Conditional formatting was used to heatmap the cells.

1 Comment

I started writing this last week so the intro doesn’t make sense. Slides from the presentation I did are here (all pictures so probably also makes very little).

To paraphrase Stephen Heppell (I’ve misquoted this before):

Content is king, but community is sovereign

The UKOER Programme is having it’s final meeting next week and while the final set of projects come to a close a strong community has formed and I’m sure will continue. Something I was interested in doing is looking at how the community has evolved over time. I’ve previously looked at data around the #ukoer hashtag, but not everyone uses Twitter so I thought I look for another data source.  As email is still a strong component in most peoples everyday lives I started poking around OER-DISCUSS JISCMail (Listserv) list:

A public list for discussion about the release, use, remix and discovery of Open Educational Resources (OER). Managed jointly by OU SCORE Project, and the JISC / HE Academy OER Programme.

As far as I could see there are limited options for getting data out of JISCMail (some limited RSS/Atom feeds) so cue the music for a good out fashioned scrape and refine. Whilst I’ll walk you through this for OER-DISCUSS the same recipe can be used for other public lists.

Source survey

Instead of going straight into the recipe I wanted to record some of the initial thought processes in tackling the problem. This usually begins with looking at what you’ve got to work with. Starting with the list homepage I can see some RSS/Atom feeds, that don’t take me far, instead I turn my attention to the list of links for each months archives. Clicking through to one of these and poking around the HTML source (I mainly use Chrome so a right click in the page gives the option to Inspect Element) I can see that the page uses a table template structure to render the results – good. Next I checked if the page would render even when I was logged out of JISCMail, which I can – good. So far so good.

page source

Next a step back. This looks scrapable so has anyone done this before. A look on Scraperwiki turns up nothing on Listserv or JISCMail, so next a general Google search. Looking for terms like ‘listserv data scrape’ are problematic because there are lots of listserv lists about data scraping in general. So we push on. We’ve got a page with links to each months archives and we know each archive uses a table to layout results. Next it’s time to start thinking about how we get data out of the tables. Back in the Chrome Element Inspector we can see that the source contains a lot of additional markup for each table cell and in places cells contain tables within them. At this point I’m think OpenRefine (nee Google Refine).

Scraping list of archive links

A feature of OpenRefine I use a lot is fetching data from a url. To do this we need a list of urls to hit. Back on the list homepage I start looking at how to get the links for each month’s archive. Hover over the links I can see they use a standard sequence with a 2-digit year {yy} and month {mm}


I could easily generate these in a spreadsheet but I’m lazy so just point a Chrome extension I use called Scraper to find the part of the page I want and import to a Google Spreadsheet.  


[another way of doing this is creating a Google Spreadsheet and in this case entering the formula =ImportXml("","//tr[2]/td/ul/li/a/@href")

Fetching and Refining the Data

Import projectFinally we can fire up OpenRefine. You could create a project by using the Google Data option, which is used to import data from your Google Docs, instead as it’s not a huge amount of data I use the Clipboard option. At this point the preview will probably separate the data using ‘/’ and use the first row as a column heading so you’ll want to switch to comma or and de-select ‘Parse next’.

  1. Next we want to fetch each month’s archive page by using the Column 1 dropdown to Edit column > Add column by fetching url using the GREL expression ""+value using the column name month_raw
  2. This pulls in each month’s archive page in raw html. Now we want to parse out each row of data in a new column by selecting the dropdown from month_raw and selecting Edit column > Add column based on this column  using the GREL expression forEach(value.parseHtml().select("table.tableframe")[1].select("tr"),v,v).join(";;;") with the column name rows_raw – this selects the second table with class ‘tableframe’ and joins each row with a ‘;;;’
  3. Next from the rows_raw column use Edit cells > Split multi-valued cells using ;;; as the separator
  4. Again from the rows_raw column dropdown select Edit column > Add column based on this column using the GREL expression forEach(value.parseHtml().select("td"),v,v).join(";;;") with the column name rows_parsed – this joins each <td> with a ;;; which will let us spilt the values into new columns in the next step
  5. Now from the rows_parsed column select Edit column > Split into several columns using the separator ;;;

Column split

You should now have something similar to above with columns and rows split out, but still messy with html in the cells. We can cleat these up using Edit cells > Transform using variations of value.parseHtml().htmlText()

Here are the steps I used (the complete operation history you can use in Undo/Redo is here – using this apply all the actions starting with the list of monthly urls)

  1. Text transform on cells in column rows_parsed 4 using expression grel:value.parseHtml().htmlText().replace(" lines","").toNumber()
  2. Rename column rows_parsed 4 to lines
  3. Text transform on cells in column rows_parsed 3 using expression grel:value.parseHtml().htmlText().toDate("EEE, dd MMM y H:m:s")
  4. Rename column rows_parsed 3 to date
  5. Text transform on cells in column rows_parsed 2 using expression grel:value.parseHtml().htmlText().replace(" <[log in to unmask]>","")
  6. Rename column rows_parsed 2 to from
  7. Create column snippet at index 4 based on column rows_parsed 1 using expression grel:value.split("showDesc('")[1].split("','")[0].unescape("html").parseHtml().htmlText()
  8. Create column link at index 4 based on column rows_parsed 1 using expression grel:""+value.parseHtml().select("a")[0].htmlAttr("href")
  9. Text transform on cells in column rows_parsed 1 using expression grel:value.parseHtml().htmlText()
  10. Rename column rows_parsed 1 to subject
  11. Create column subject_normal at index 4 based on column subject using expression grel:trim(value.replace(/^Re:|^Fwd:/i,""))

You’ll probably notice some of the rows don’t contain the data we need. An easy way to remove these is use a timeline facet on the date column selecting non-time, blank and error and then from the All column dropdown menu select Edit rows > Remove all matching rows.

Tony Hirst has a great post on faceting tricks. Something not covered is clustering data using facets. We use this as a way to join rows where authors have multiple logins eg Tony Hirst and Tony.Hirst

To do this add a text facet to the author/from column and click Cluster:

Facet view

Finding the right settings is a bit of trial and error combined with a bit of knowledge about your dataset. In this example I saw that there were rows for Pat Lockley and Patrick Lockley so tried some settings until I got a hit (in this case using nearest neighbour and PPM – which by all accounts is the last resort). You might also need to run clustering a couple of times to catch most of the variations

Clustering columns

What can we do with this data?

In Dashboarding activity on public JISCMail lists using Google Sheets (Spreadsheets) I was able to get an indication of the overall activity of the list. Now that author names are clustered I can get a more accurate picture of the top contributors using a text facet on the author column:

Pat Lockley 269, Amber Thomas 122, Phil Barker 82 

I was tempted to mine the individual posts further using the techniques and code posted by Adam Cooper (CETIS), but it didn’t look like I could easily swap the data source. A couple of questions posed by Phil Barker (CETIS) were:

The easiest way I found to get this was to use the Export > Custom tabular exporter (using this settings file), import into Google Sheets (Spreadsheets) and use a couple of formula to get this summary page (opening the summary page will let you see he formula I’ve used):

And there is much more you could do, but I’ll leave it there. If nothing else I hope you get an idea of some of the thought processes involved in extracting data. As always if something is unclear get in touch.


I’ve written a very long blog post which I’ll publish soon on text-mining public JISCMail (Listerv) lists using OpenRefine. It concludes with displaying list activity, posts over time and individual activity. The technique I used isn’t straight forward but as the output might be of benefit to other people like Brian Kelly who reported the Decline in JISCMail Use Across the Web Management Community I wondered if there was a better way of doing it. Here’s my answer:

*** JISCMail Public List Activity Overview Template ***
[Give it 30 seconds to render the results] 

JISCMail Public List Activity Overview Template

By making a copy of this spreadsheet and entering the url of the homepage of a public JISCMail List like OER-DISCUSS, it goes off and collects each months archives for almost the last 3 years, graphs the overall list activity as well as letting you see individual contributions (a limitation is matching variations in display names so in the OER-DISCUSS example Pat Lockley and Patrick Lockley get counted separately even though they are the same person).

How it works

On the data sheet cell A2 uses importXML to grab all the archive links. In cell B2 the importHTML function is used to grab the table of posts on each month’s archive page and does a QUERY to return post author names, the values being turned into a string from an array using JOIN. In cell A53 a UNIQUE list of author names (minus ‘) is generated using a combination of SPLIT and JOINS. This data is then used on the Dashboard sheet (to get the SPARKLINES I had to write a custom function using Google Apps Script.

function getActivity(user,source) {
  var output = [];
  for (i in source){
    var rows = source[i][0];
    var count = rows.match(new RegExp(user,"g"));
    if (count){
    } else {
      if (source[i][0]!=""){
  return output;

If you are interested in learning more about the functions used I recently posted Feeding Google Spreadsheets: Exercises in using importHTML, importFeed, importXML, importRange and importData (with some QUERY too). You should be able to use this template with any other public JISCMail list. Any questions get in touch.

Search Archive feature in TAGSExplorer

At IWMW12 I made a searchable/filterable version of TAGS Spreadsheets. This feature lets you use the Google Visualisation API to filter tweets stored in a Google Spreadsheet (more about TAGS). It has been available via a separate web interface for some time but I’ve never got around to publicizing it. As TAGSExplorer also uses the Google Visualisation API to wrap the same data in a different visualisation tool (predominantly d3.js) it made sense to merge the two. So now in any existing TAGSExplorer archive (like this one for #jiscel12) you have should now also have a button to ‘Search Archive’.

The archive view has some basic text filtering from tweeted text and who tweeted the message as well as a time range filter (dragging the handles indicated). The scattered dots indicate when messages were tweeted. The denser the dots, the more tweets made.

I’ve hastily thrown this together so feedback very welcome.