Tag Archives: #cfhe12


You may have noticed I missed an analysis of week 5 of CFHE12, but hopefully I’ll capture what I wanted to say and more in this post. In this post I want to pull together a couple of ideas around some of the measurable user activity generated as part of CFHE12. This will mainly focus around Twitter and blogs and will ignore other channels like the course discussion forum, webinar chat or other spaces participants might have discovered or created. I conclude that there are some simple opportunities to incorporate data from twitter into  other channels, such as, summary of questions and retweets.

Twitter Activity Overview

The headline figures for tweets matching the search ‘#CFHE12 OR edfuture.net OR edfuture.mooc.ca OR edfuture.desire2learn.com’  for 8th October to 18th November 2012 (GMT):

  • 1,914 Tweets from 489 different Twitter accounts
  • 1,066 links shared
  • 10% (n=206) of the tweets were in @reply to another Twitter account
  • Contributions from accounts in 45 countries (top 5  United States of America – 166; Canada – 50; United Kingdom – 43; Australia – 28; Germany – 12) [1]

Map of #cfhe12 participants

Looking at week-by-week distribution of contributors and contributions it can be seen that after the initial first 2 weeks the number of tweets posted each week remained consistent around 200. 75% of the Twitter accounts (n=374) contributed tweets to 1 week of CFHE12.

Twitter contributors and tweets Number of weeks participants contributed in

Comparing Twitter activity with blog posts aggregated with gRSSHopper doesn’t reveal any correlation but it’s interesting to note that whilst the volume of tweets remain relatively consistent for weeks 3 to 6 there is a significant drop in blog posts between week 4 and 5.

CFHE12 number of tweets and blogs

Looking at distribution of tweets over day and time shows a lull on Thursdays, but a peak around 1900hrs GMT which would appear to coincide with the usual time slot used for tweetchats.

CFHE12 Tweets by day of week CFHE12 Tweets by time of day

Getting more technical having collected the friend follower relationships for Twitter accounts using #CFHE12 for each of the weeks it possible to analyse new connections between community members. At the end of week 1 the top 52 contributors were joined by 286 follow relationships. By the end of the course 45 new follow relationships were created increasing the graph density [0.103774 –> 0.120101597] and reducing geodesic distance [2.008136 –> 1.925296]

The graph below highlights the new relationships (bold line). Nodes are sized by the number of new connections (as part of the archive the friend/follower count is captured with each tweet so it may be possible to do further analysis). Its interesting to note that BarnetteAndrew is an isolated node in G7.

cfhe12 interconnection growth

Refining the signals from the Twitter feed

Adam Cooper (CETIS) has recently posted some tips from a presentation John Campbell on development of Signals at Purdue, which includes using a spreadsheet as a starting point as a way to find out what you need. I’ve already got some basic tools to overview a Twitter archive but used the CFHE12 data to experiment with some more.

By week

Adding to the existing Twitter Activity sparklines it’s been possible to extract a summary of week-by-week activity (a basic traffic light). Whilst this is in a way a duplication of the data rendered in the sparkline it has been useful to filter the participants based on queries like ‘who has contributed in week 1’ and ‘who as contributed in all the weeks’. If you were wanting to take this to the next level you’d combine it with the community friendship graph and pay extra attention to the activity of your sub community bridges (for more info on this see Visualizing Threaded Conversation Networks: Mining Message Boards and Email Lists for Actionable Insights).

CFHE12 Activity data

Conversation matrix

Graphing the conversations (@reply, @mentions and retweets) using TAGSExplorer gives this ball of mess.

CFHE12 TAGSExplorer View

Trying to find a more structured way of presenting the data I’ve experimented with an adjacency matrix (shown below or interactive version here)[2]. Each cell is colour coded to indicate the number of interactions (replies, mentions and retweets) between users. For example we can see that gsiemens has had the most interactions with barrydahl. Scanning along the rows gives you a sense of whether a person was interacting with a large number or select few other accounts. For example, pgsimoes has interactions mostly with suifaijohnmak. Filtering tweets for pgsimoes (possible from the link in the left column) it looks like it’s an automatic syndication of suifaijohnmak’s blog posts.

CFHE12 conversation adjacency matrix

Do you have any questions?

CFHE12 Any Questions?

At the beginning of CFHE12 I posted Any Questions? Filtering a Twitter hashtag community for questions and responses. This is a crude tool which filters out tweets with ‘?’ which might indicate they are a question. By counting the number of tweets in the archive which reply to a question you get the following breakdown

  • Total questions 309
  • Questions with replies 44

An important point to note is that as only tweets that meet the search criteria are archived there may be more responses ‘off tag’. The danger in a medium like Twitter used in courses like CFHE12 is that questions may go unanswered, misconceptions are not corrected, feedback is never given.

Part of the issue in the case of CFHE12 is participants are allowed to use tools like Twitter as they please. While this suits ‘visitors and residents’ some additional structure may be beneficial. Two simple approaches would be to direct participants to include the course tag in their reply and highlighting current questions by either using the ‘Any Questions’ tool or in the case of courses using gRSSHopper instead of including all the latest course tweets in the daily email alert filter the search for ‘CFHE12 AND ?


retweetsFor a while I’ve had a basic function that extracted the tweets with the most retweets, but <sigh> I think it on the list of developments I’ve never got to blog about. The routine is a relatively simple bean counter that goes through a list of tweets, removes any hyperlinks and crudely shortens the text by 90% (to account for any annotations) and counts the number of matches before returning a list of the top 12. Slicing the data for each week I get these tables. There is probably more analysis required of what is being retweeted before making a decision about how this data could be used. My main question for the #cfhe12 twitter community is doing they have a sense of what is being retweeted the most. The other angle is pushing some of this data into other communication channels like the Daily Newsletter or discussion forums.


So hopefully the summary data I extracted and experimentation with new data views has been useful. Something that has been reinforced in my own mind that more value could be easily gained using Twitter by either providing guidance on use and/or incorporating data from Twitter in other channels more effective (rather than dumping everything into a daily email select some key data). Now that I’ve got a template which splits some of the data into weekly slices it should be easier to deploy and pass data into other systems buy changing some of the dates on the summary sheet.

But what do you think? Are there any particular data views you found useful? If you’ve participated in a community that uses Twitter a lot what additional tools would you find useful to keep track of what is going on?

Get the Data


[1] Countries were reconciled by extracting location recorded in twitter account profile and generating geo-coordinates using recipe here. Locations were extracted for 404 accounts. Co-ordinates were uploaded to GeoCommons and analysed with a boundary aggregation to produce this dataset.

[2] the matrix was generated by exporting conversation data from TAGSExplorer by adding the query &output=true to the url e.g. like this, importing into NodeXL then filtering vertices based on a list of top contributors. This was exported as a Matrix Workbook and imported into the Google Spreadsheet. Conditional formatting was used to heatmap the cells.


This week saw me submit my application to the Shuttleworth Foundation  to investigate and implement cMOOC architecture solutions. It seems rather fitting that for this week’s CFHE12 exploration that an element of this is included. It’s in part inspired by a chance visit from Professor David Nicol to our office on Friday. It didn't take long for the conversation to get around to assessment and David’s latest work in the cognitive processes of feedback, particularly student generated, which sounds like it directly aligns to cMOOCs. It was David’s earlier work, which I was involved in around assessment and feedback principles that got me thinking about closing the feedback loop. In particular, the cMOOC model promotes participants working in their own space, the danger is with this distributed network participants can potentially become isolated nodes, producing content but not receiving any feedback from the rest of the network.

Currently within gRSSHopper course participants are directed to new blog posts from registered feeds via the Daily Newsletter. Below is a typical entry:

CFHE12 Week 3 Analysis: Exploring the Twitter network through tweets
Martin Hawksey, JISC CETIS MASHe
Taking an ego-centric approach to Twitter contributions to CFHE12 looking at how activity data can be extracted and used [Link] Sun, 28 Oct 2012 15:35:17 +0000 [Comment]

One of the big advantages of blogging is that most platforms provide an easy way for readers to feedback their own views via comments. In my opinion this is slightly complicated when using gRSSHopper as it provides it’s own commenting facility, the danger being discussions can get broken (I imagine what gRSSHopper is trying to do is cover the situation when you can’t comment at source).

Even so commenting activity, either from source posts or within gRSSHopper itself, isn't included in the daily gRSSHopper email. This means it’s difficult for participants to know where the active nodes are. The posts receiving lots of comments, which could be useful for vicarious learning or through making their own contributions. Likewise it might be useful to know where the inactive nodes are so that moderators might want to either respond or direct others to comment.

[One of the dangers here is information overload, which is why I think it’s going to start being important to personalise daily summaries, either by profiling or some other recommendation type system. One for another day.]

To get feel for blog post comment activity I thought I’d have a look at what data is available, possible trends and provide some notes on how this data could be systematically collected and used.

Overview of cfhe12 blog post comment activity

Before I go into the results it’s worth saying how the data was collected. I need to write this up as a full tutorial, but for now I’ll just give an outline and highlight some of the limitations.

Data source

An OPML bundle of feeds extracted in week 2 was added to an installation of FeedWordPress. This has been collecting posts from 71 feeds filtering for posts that contain ‘cfhe12’ by using the Ada FeedWordPress Keyword Filters plugin. In total 120 posts have been collected between 5th October and 3rd November 2012 (this compares to the 143 links included in Daily Newsletters). Data from FeedWordPress was extracted from the MySQL database using same query used in the ds106 data extraction as a .csv file.

This was imported to Open (née Google) Refine. As part of the data FeedWordPress collects a comment RSS feed per post (a dedicated comment feed for comments only made on a particular post – a number of blogging platforms have a general comment feed which outputs comments for all posts). 31 records from FeedWordPress included ‘NULL’ values (this appears to happen if FeedWordPress cannot detect a comment feed, or the original feed comes from a Feedburner feed with links converted to feedproxy). Using Refine the comments feed was fetched and then comment authors and post dates were extracted. In total 161 comments were extracted and downloaded into MS Excel for analysis


Below is a graph of cfhe12 posts and comments (the Excel file is also available on Skydrive). Not surprisingly there’s a tail off in blog posts.

CFHE12 Blog Posts and Comments

Initially looking at this on a per post basis (shown below left) showed that three of the posts were been commented on for over 15 days. On closer inspection it was apparent this was due to pingbacks (comments automatically left on posts as a result of it being commented in another post). Filtering out pingbacks produced the graph shown on the bottom right.

CFHE12 blog comments timeline  CFHE12 blog comments timeline (without pingbacks)

Removing pingbacks, on average 3.5 days after a post was published comments would have stopped but in this data there is a wide range from 0.2 days to 17 days. It was also interesting to note that some of the posts have high velocity, Why #CFHE12 is not a MOOC! receiving 8 comments in 1.3 days and Unfit for purpose – adapting to an open, digital and mobile world (#oped12) (#CFHE12) gaining 7 comments in 17 days (in part because the post author took 11 days to respond to a comment).

Looking at who the comment authors are is also interesting. Whilst initially it appears 70 authors have made comments it’s apparent that some of these are the same author using different credentials making them ‘analytically cloaked’ (H/T @gsiemens).

analytically cloaked

Technical considerations when capturing comments

There are technical consideration when monitoring blog post comments and my little exploration around #cfhe12 data has highlighted a couple:

  • multiple personas - analytically cloaked
  • pingbacks in comments – there are a couple of patterns you could use to extract these but not sure if there is a 100% reliable technique
  • comment feed availability – FeedWordPress appears to happily detect WordPress and Blogger comment feeds if not passed through a Feedburner feedproxy. Other blogging platforms look problematic. Also not all platforms provide a facility to comment
  • 3rd party commenting tools – commenting tools like Disqus provide options to maintain a comment RSS feed but it may be down to the owner to implement and it’s unclear if FeedWordPress registers the url
  • maximum number of comments – most feeds limit to the last 10 items. Reliable collection would require aggregating data on a regular basis.

This last point also opens the question about whether it would be better to regularly collect all comments from a target blog and do some post processing to match comments to the posts your tracking rather than hit a lot of individual comment feed urls. This last point is key of you want to reliably track and reuse comment data both during and after a cMOOC course. You might want to refine this and extract comments for specific tags using the endpoints outlined by Jim Groom, but my experience from the OERRI programme is that getting the consistent use of tags by others is very difficult.

Discovering TEL-Map Mediabase

Twitter hasn’t completely abolished 3rd party clients just yet. The text is the red circle is generated from the details a users/company submits when they create an application that uses the Twitter API. As part of the registration the user has to provide a url for the application. In this example ‘TEL-Map Mediabase’ redirects to Learning Frontiers, which is powered by Tel-Map. I should probably know more about TEL-Map because one of the partners is CETIS (before my time). 

But what about ‘Mediabase’. Well a search of ‘TEL-Map Mediabase’ returns the page ‘MediaBase - Collecting, Analysing and Visualizing Large Collections of Social Software Artifacts’ which contains the presentation by Ralf Klamma.

So you basically have a system that can crawl and analysis predefined source feeds, analyse the collected data and either manual or automatically tag links, which can be pushed to a portal or tweeted from the system (and much more). Anyone else thinking cMOOC infrastructure?

[If the system is as good as I think it is I’m expecting another tweet from TEL-Map Mediabase]

1 Comment

For week 3 of cfhe12 analysis I thought I’d turn back to the Twitter data. I’m currently trying to prepare a Shuttleworth Fellowship application which has got me thinking more about the general premise of cMOOCs that “knowledge is distributed across a network of connections, and therefore that learning consists of the ability to construct and traverse those networks”  (from week 1 of cck11).

The aspect, which features in my Shuttleworth application, is providing mechanisms that aggregate data from distributed sub-networks which then can be processed to produce actionable insights to tutors or participants. The process I plan to adopted is to look at the data using heavyweight tools, like NodeXL, or just applying a bit of curiosity (this person has stopped tweeting, why? etc), and then converting some of these patterns into very lightweight applications or views to remove the complexity and highlight key parts of the data.

Some examples for you:

Summary of CFHE12 participant activity


Tweets from CFHE12 are being collected in this Google Spreadsheet. As part of this template there are a number of summary views, one of these being a breakdown of individual participant activity. As part of this sparklines are used to display someone's twitter activity. Looking at gsiemens you can see there is steady activity posting 45 tweets tagged #cfhe12. Towards the bottom of the graph is ViplavBaxi, who after initial high activity is no longer contributing to the hashtag. So what has happened to ViplavBaxi? There are a number of possible answers but let me highlight a couple which also highlights the limitation of the technique:

  • they have lost interest in the course ot time commitments prevent them from contributing (high drop outs aren’t unexpected in MOOCs)
  • no longer using #cfhe12 hashtag – the archive is only of #cfhe12 so if the have joined a sub community communicating without the hashtag it’s not recorded
  • found a different communication channel – this technique is only looking at Twitter activity, the person may have moved to another network channel like the discussion forum

Another interesting activity summary is for dieGoerelebt. They are one of the top 5 contributors in terms number of tweets, but recently their activity has trailed off. You can also see the ‘@s’ column, which is the number of times they’ve been mentioned in tweets is one of the lowest. Is the decline in activity a result of the lack of engagement?

The next question that springs to my mind is what did these people say. Within the spreadsheet it’s easy to filter what they said. To let you see too I’ve got this simple web interface primed with filtered tweets (I modified an existing tool I’ve developed to do this – unfortunately I’ve never documented it, but as I use it more and more I must get around to it):

Summary of CFHE12 participant activity with RT percentageFrom visual inspection dieGoerelebt had a high proportion of retweets. This is confirmed when I added a percentage of tweets that are retweets.

Something I noted in the filtered view for a persons tweets was that a lot of the context is lost (I can see they are @replying to someone, but I don’t know what they said.

To help with this I started looking at modifying the twitter questions filter I built to enable a view of the conversation.

This is a start, but as I noted when I published the question filter clicking through messages like the one showed below reveal there is more of the conversation that is missing.

 Part of the conversation

Bigger picture


So again I start exploring some ideas that branch off into many more avenues to follow. One thought is that the micro analysis of tweets might not my beneficial or practical, and given the issues with extracting a full conversation from Twitter a macro view might be better. Providing a summary of overall activity and the mode in which Twitter is being by people may be of the most use to tutors and participants to identify people they might want to connect with. As always your thoughts are greatly appreciated.

In this post I’ve taken an ego-centric approach contributions. In the next couple of days I’ll share an ego-centric approach to community connections.

1 Comment

Focusing on some of the data behind Current/Future State of Higher Education (CFHE12) has been very stimulating and has got me thinking about the generation and flow of data on many levels. Having recently produced some data visualisations of ds106 for OpenEd12 it was reassuring that one of the first questions was “is the #ds106 data openly licensed?”. Reassuring because it is good to see that people are thinking beyond open content and open learning experiences and thinking about open data as well. So what data is available around CFHE12? In this post I look at data feeds available from CFHE12, see what we can get and suggest some alternative ways of getting the data and pulling it in to other services for analysis. Finally I highlight the issues with collecting participant feeds filtered by tag/category/label.

edfuture homepage sidebar

Working down some of the menu options on the CFHE12 home page lets see what we’ve got and how easy it is to digest.

Newsletter Archives

This page contains a link to each ‘Daily Newsletter’ sent out by the gRSShopper (Stephen Downes’ RSS/content collection, remix and distribution system). I’m not familiar with how/if the Daily data can be exported by an officially API, but with tools like Google Spreadsheet, Yahoo Pipes and others it's possible to extract a link to each edition of the Daily using some basic screen scraping techniques like using XPath. So in this sheet in cell A2 I can get a list of archive pages using =ImportXML("http://edfuture.mooc.ca/cgi-bin/archive.cgi?page=newsletter.htm","//a/@href"). Using the ImportXML using the resulting list of pages it’s possible to get a comma separated list of all the posts in each Daily (column B).

The formula in column B includes a QUERY statement and is perhaps worthy of a blog post in it's own right. Here it is: =IF(ISBLANK(A2),"",JOIN(",",QUERY(ImportXML(A2,"//a[.='Link']/@href"),"Select * WHERE not(Col1 contains('twitter.com'))"))). In brief the pseudocode is: if the cell is blank return nothing otherwise comma join the array of results from importXML for links which use he text ‘Link’ where it doesn’t contain a link to twitter.com. Note: there is a limitation of 50 importXML formula per spreadsheet so I’ll either have to flatten the data or switch to Yahoo Pipes

The resulting data is of limited use but it’s useful to see how many posts have been published via gRSShopper and by who:

CFHE12 posts per day CFHE12 posts per domain

It’s at this point tat we start to get an sign that the data isn’t entirely clean. For example, in cell A153 I’ve queried the Daily Newsletter posts from this blog and I get four results shown below:

You can see there is a double post and actually, at time of writing I've only made two posts tagged with cfhe12. Moving on, but coming back to this point later, lets now look at the feed options.


The course has feed options for: Announcements RSS; Blog Posts RSS; and OPML List of Feeds. I didn’t have much luck with any of these. The RSS feeds have data but aren’t valid RSS and the OPML (a file format which can be used for bundling lots of blog feeds together) only had 16 items and was also not valid (I should really have a peak at the source for gRSShopper and make suggestions for fixes, but time and lack of Perl knowledge has prevented that so far). I did attempt some custom Apps Script to capture the Blog Posts RSS to this sheet, but I’m not convinced it’s working properly, in part because the source feed is not valid. There are other feeds not listed on the home page I might dig into like the Diigo CFHE12 Group which I’m collecting data from in a Google Spreadsheet using this IFTTT recipe.

Generating an OPML and Blog Post RSS from ‘View List of Blogs’ data


All is not lost. gRSShopper also generates a page of blogs it’s aggregating. With a bit more XPath magic (=ImportXML("http://edfuture.mooc.ca/feeds.htm","//a[.='XML']/@href")) I can scrape the XML links for each registered blog into this sheet. Using the Spreadsheet -> OPML Generator I get this OPML file of CFHE12 blogs (because the spreadsheet and OPML generator sit in the cloud this link will automatically update as blogs are added or removed from the Participant Feeds page). For more details on this recipe see Generating an OPML RSS bundle from a page of links using Google Spreadsheets.

Blog posts RSS

Earlier I highlighted a possible issue with posts being included in the Daily Newsletter. This is because it can be very tricky to get an RSS feed for a particular tag/category/label from someone's blog. You only need to look at Jim Groom’s post on Tag Feeds for a variety of Blogging Platforms to get an idea of the variation between platforms. It’s worth underlying the issue here, each blogging platform has a different way of getting a filtered RSS feed for a specific tag/category/label. Also, in certain cases it’s not possible to get a filtered RSS feed. When a student registers a feed for an online course it can be difficult for them to identify their own blogs RSS feed, let alone a filtered feed.

CFHE12 PipeworkAs an experiment reusing the Participant Feeds page as a data source I’ve come up with this Yahoo Pipe which fetches all the feeds and tries to filter the results. It’s slightly crude in the way it’s filtering posts by looking for a course tag: as a category/tag (if exposed in the source feed), or in the post title or in the post content. Using this pipe it’s currently returning 48 items (although I it says 36 when outputting in a different file format) compared to the 77 from the Daily Newsletter Archives. The nice thing about pipes is I can get the data in different formats (e.g. RSS, JSON, CSV) so more mashing up is possible.

Before you go off thinking Yahoo Pipes is the answer for all your open course RSS aggregation there are some big questions over reliability and how this solution would scale. It’s also interesting to note all the error messages because of bad source feeds:

This Pipe ran successfully but encountered some problems:

warning Error fetching http://edfuture.mooc.ca/feed://bigedanalytics.blogspot.com/feeds/posts/default?alt=rss. Response: Not Found (404)

warning Error fetching http://florianmeyer.blogspot.com. Response: OK (200). Error: Invalid XML document. Root cause: org.xml.sax.SAXParseException; lineNumber: 1483; columnNumber: 5; The element type "meta" must be terminated by the matching end-tag " ".

warning Error fetching http://edfuture.mooc.ca/cain.blogspot.com. Response: Not Found (404)

warning Error fetching http://larrylugo.blogspot.com/feeds/comments/default?alt=rss/-/CFHE12. Response: Bad Request (400)

warning Error fetching http://edfuture.mooc.ca/taiarnold.wordpress.com/feed. Response: Not Found (404)

warning Error fetching http://futuristiceducation.wordpress.com/wp-admin/index.php?page=my-blogs. Response: OK (200). Error: Invalid XML document. Root cause: org.xml.sax.SAXParseException; lineNumber: 5; columnNumber: 37; The entity "rsaquo" was referenced, but not declared.

warning Error fetching http://edtech.vccs.edu/feed/: Results may be truncated because the run exceeded the allowed max response timeout of 30000ms.

Rather than trying to work with messy data one strategy would be to start with a better data source. I have a couple of thoughts on this I’ve shared in Sketch of a cMOOC registration system.


So what have I shown? Data is messy. Data use often leads to data validation. Making any data available means someone else might be able to do something useful with it. In the context cMOOCs getting a filtered feed of content isn’t easy.

Something I haven’t touched upon is how the data is licensed. There are lots of issues with embedding license information in data files. For example, I’m sure technically the OPML file I generated should be licensed CC-BY-NC-SA CFHE12 because this is the license of the source data? I’m going to skip over this point but welcome your comments (you might also want to check the Open Data Licensing Animation from the OERIPR Support Project).

[I’ve also quietly ignored getting data from the course discussion forums and Twitter archive (the later is here)]

PS Looking forward to next weeks CFHE12 topic Big data and Analytics ;)

1 Comment

As I mentioned in Filtering a Twitter hashtag community for questions and responses I’ve been asked to do some analysis of the Current/Future State of Higher Education (CFHE12) course. Week 1 has mainly been about creating a toolchain that makes it easier to hit a button and get some insight. The focus has mainly been on tweets with the #cfhe12 hashtag. I’m still scratching my head as to what this all means but there are already discussions to extend the scope trying to establish more context by also looking at blog and discussion forum posts. The danger I also have as a ‘maker of things’ as questions emerge I want to make things to help find the answers.

To easy into this lets start with an overview here are some key stats for 7-13th October 2012 (BST) (and already I resisting the temptation to create an overview template):

  • 762 Tweets
  • 305 Links
  • 172 RTs
  • 244 Unique twitter accounts
  • 14% (n.104) of tweets were in @reply to another person using #cfhe12

This sheet contains more details including a summary of who tweeted the most and got the most @mentions and the ‘Dashboard’ sheet which let me know that this was the most retweeted tweet:

Below are two graphs summarising the Twitter activity for week 1 of #cfhe12 (LHS) and another course earlier in the year #moocmooc (you can click on both of these for interactive versions).

summary of #cfhe12 tweets for week 1
#cfhe12 week 1 tweets

Summary of tweets from #moocmooc
#moocmooc tweets

It’s notable that the volume and proportion of tweets and @replies is higher in #moocmooc. Part of this could be down to the fact that #moocmooc was a condensed course that was one week long. Other factors may include the chosen infrastructure and how this was promoted, size of course and who was participating.

Extracting a conversation graph, which is shown below, there isn’t a great deal of @replies for week 1. In the graph each dot represents a single tweet and dots are joined if the person is @replying that tweet. I probably need to find a way for you to interact with this graph, but for now I’ve prepared these pages with conversations for groups G1-G4:

cfhe12 week 1 conversation graph
[The above graph data can be downloaded from the NodeXL Graph Gallery]

Exploring G3 and G4 some of the limitations of this technique become apparent. For example clicking on the date in the first tweet in G4 reveals the full text from Twitter, which includes text from G3 i.e. they are the same conversation and should be grouped together.

So more work to do, more things to think about, more tools needed to make sense of this easier. In the meantime any of your observations are greatly welcome.


In Notes on technology behind cMOOCs: Show me your aggregation architecture and I’ll show you mine I reached the point in my own mind that the key behind cMOOCs was how you aggregated and shared dispersed activity. At the time I also asked “Given the widespread use of Twitter in MOOCs are there tools/techniques required to aggregate and disseminate the course discussions?” and started looking at techniques to retrospectively analysis Twitter based discussions.  This activity hasn’t gone unnoticed and I was very grateful to be asked by Dave Cormier and George Siemens to do a weekly summary of Twitter data from their latest course Current/Future State of Higher Education (CFHE12) which started this week. This will be outwith my official CETIS work but given the increasing number of enquiries we are getting in this area it will undoubtedly feed in.

As I’ll be reporting on this course it made sense to sign-up. On one of the registration pages I noticed a couple of different hashtags left over from earlier course so asked the question:

Twitter status pageIf you visit the Twitter status page for this tweet you’ll see I got a couple of responses from AJCann and Jez Cope. If I had not sent you to that page how would have you known I got an answer? Did Jez know that Alan had already responded to me?

Given this type of dialogue, but at a higher level is a key aspect of learning and many a Greek has dined out on ‘knowing that they know nothing’ and started wondering how could this activity be aggregated and would this aggregation increase the situational awareness of participants and cause a shift in how the course community interacted with each other (I had recently read Tony Hirst’s post on Conference Situational Awareness and the example from the “London 2012 Olympic Games where it was identified that tweets relating to the congestion of the Olympic park entrances had a direct effect on crowd flow through the site” was still on my mind.

So after some late night code bashing here’s what I’ve come up with (this is very beta so your feedback is welcome – particularly if it doesn’t work). A Filtered Aggregation of #CFHE12 questions and responses (embedded below if you are viewing this post on my site):

What you have here is an aggregation of possible questions from #cfhe12 with buttons to filter for messages with and without replies. Because it’s linked to Twitter’s own embed code users can do the usual Twitter actions (reply, retweet etc). As noted there are some limitations perhaps the biggest is it isn’t 100% reliable in that I’ve got no way to include replies made without the #cfhe12 hashtag … in this version anyway.

I’ll let you go and play with and hopefully you’ll share your thoughts. Two things that spring to mind for me are: it would be nice if this page had RSS feeds just to keep the aggregation juices flowing; and wouldn’t it be interesting to use tweet favouriting to let the community curate questions/answers, a favourite representing an upvote (see Techniques for Live Tweet Curation)

Make your own

*** Open and copy TAGS v3.1Q ***

Run through the Basic and Advanced setup used in the TAGS v3.1 (you need to authenticate with Twitter).

In the spreadsheet open Tools > Script editor and follow the ‘To use Filter Questions Interface’ instructions

Upgrading an existing TAGS v3.1+ Archive

  1. imageOpen and copy TAGS v3.1Q and click on the ‘questionsFilter' sheet active.
  2. Activate the sheet tab menu and chose ‘Copy to…’.
  3. Now find your existing TAGS archive spreadsheet and copy.
  4. Once it has copied open the destination and rename the new sheet from ‘Copy of questionsFilter’ to questionsFilter
  5. Open Tools > Script editor… in your old archive and select New > File > Script file. Call the new file TAGSExtras
  6. In the new script tab copy and paste the code from here, then save
  7. Run > setup twice (first time to authorise, second to fun the function)
  8. File > Manage Versions and enter any description you like and Save New Version
  9. Publish > Deploy as web app... and click Update
  10. Run > getUrl and then open View > Logs... and copy the url into your browser address bar to view the result

How it was made (Non-techies you are free to leave ;)

The starting point was Twitter Archiving Google Spreadsheet TAGS v3. A hidden feature of this is to add a column to you Archive sheet called ‘possible_question’. When the archive collects tweets it looks for the text ‘? ‘ or ‘?’ at the end to identify the tweets might be a question and if so ‘TRUE’ is put in the archive column.

Having got a list of potential questions and associated tweet ids I could have put them in my failed lab experiment (and unfortunately titled) SpreadEmbed, but noticed that the embed.ly api doesn’t return a in-reply-to message with it embed code. To expand upon, because this is quite important, currently when you embed a tweet which is in reply you use something like this:

@mhawksey Most of us are using #cfhe12 ?

— AJCann (@AJCann) October 8, 2012

Although this text doesn’t include the text of the message it is replying to Twitter clever bit of javascript renders it like this:


re-writing our little <blockquote> as:

Now you know why the page takes so long to render ;)

With this extra data we can use jQuery to find and filter tweets that have the class ‘twt-reply’.

To recap using TAGS we can identify tweets that might be questions and using a Twitter embed we can also automatically get the message it is in reply to. So to display a question and answer together we only need to find the answer and Twitter will render the question it is in reply to (still with me). The problem we’ve got is we can easily filter for questions (possible_question == TRUE) but not the answer. To do this I create a sheet of all the tweet id_strings that are questions (=QUERY(Archive!A:N,"select A WHERE N is not null LIMIT 50",FALSE))  and another where we know the tweet is in reply to something (=QUERY(Archive!A:N,"select A, K WHERE K starts with '2' LIMIT 50",FALSE)) . For the last bit I need to write some Google Apps Script which replaced any question tweet ids with the answer id, which gives us the ‘Combination of Qs and As’ column.

Extracting question and answer ids

To render the tweets on a page we need to get the embed snippet using Twitter’s official oembed endpoint. Because getting the embed code need authenticated access I again used Google Apps Script to fetch this data and cache the result. Using Apps Script ContentService I can expose this by publishing the spreadsheet as a web app and serving up each tweets embed code in JSONP. For example here’s the JSONP wrapped embed code for #CFHE12. The last part of the puzzle is some good old fashioned HTML/JavaScript which renders the Twitter embed code and adds some UI (the code is here).