Livros de Redes Sociais, SEO e Web 2.0Perhaps the worst SEO post title I could possibly use. If you are still wondering what SEO is ask Nitin Parmar ;)

Still lost? SEO is Search Engine Optimisation. I’ve had a long interest in SEO primarily in selfish terms to try and get this blog more read, but also in a wider ‘educational resource discovery’ context. I was the author of the ‘SEO and discoverability’ chapter in Into the wild – Technology for open educational resources, which highlights the importance and UKOER programme experiments with SEO.

So it’s perhaps no surprise that I agree with Tony:

Something I’m increasingly become aware of is SEO is not just about having the right metadata on your webpage, in fact arguably this is the least important aspect. The area I’m particularly interested in is the tools/techniques the SEO community use to gain ‘actionable insight’.

Ironically this is an area I’ve been inadvertently contributing to without really knowing it. Someone who spotted this early was Wil Reynolds founder of SEER Interactive:

SEER Interactive offer services in Paid Search Marketing and Search Engine Optimization but what’s particularly interesting is their commitment to being “an Analytics first company, and we will not take on projects where we can’t analyze our impact on your business”.

So what do I do that’s of interest to the SEO community? Well it seems like me SEOers like a good old-fashioned spreadsheet. They also like a good old-fashioned spreadsheet that they can hook into social network channels. A recent example of this is the work Richard Baxter (CEO and founder of SEOgadget) presented at MOZCon which extends TAGS (my Twitter Archiving Google Spreadsheet solution) demonstrating How To Use Twitter Data for Really Targeted Outreach. The general synopsis is:

an alternative method to find sites that our target audiences may be sharing on Twitter. With that data, you can build content strategy, understand your market a little better, and construct an alternative outreach plan based on what real people are sharing and engaging with, rather than starting with sites that just rank for guest post queries.

It was really interesting to read how Richard had used the output from TAGS, which was ingested into Excel where additional free Excel based SEO tools could be used to gain that all important ‘actionable insight’.

So ‘learning tech and library folk’ if you are planning your next phase of CPD maybe you should be looking at some SEO training and perhaps I’ll see you at MOZCon next year ;)


This is my first ever attempt at 'live blogging' and haven't done any tidying other than cropping images. You might want to explore the LASI-UK a twitter summary by @sheilmcn  (complete #lasiuk twitter archive here)

Giles Carden (University of Warwick)
Good data visualisations solve real business problems

[Slides here]

Covering key components for effective data visualisation. Informed by Stephen Few's work using the following principles

Clarity important (forget the 3d charts). Using colour to indicate type of document.

Using different chart types for different audiences (senior management and individual academics seem to get simpler charts)

Warwick design framework



Some examples of charts used at Warwick (not commonly used elsewhere)


Using 'tadpole' chart to add time dimension (tails = older results)


Bullet chart


Lollypop chart used to show student loses/gains


Visualizing estates usage (red = underutilized) limited to teaching and learning. Became very apparent lack of estate usage in term 3 in part a consequence of final exams scheduling.

"good data visualization removes the need for extensive narrative"

Q: how do you get your data?
A: national resources and data warehouse. Culturally hasn't been much resistance. Also has a known face people seem more willing to share their data.

Better to have 80% of the data now than 100% in a years time

Q: visual literacy of senior management
A: thirst for data, variations as some academic backgrounds already have these literacies. Most people seem to get it. Mantra is to make it simple

Chris Ballard (Tribal Labs)
Data visualisation with predictive learning analytics

[Slides here]

Background: work has come out of a R&D with University of Wolverhampton. Looking at student success and retention going beyond the traffic light system. Interested in how staff interpret predictive visualisations. Using historical and current data combined with an understanding of student learning to provide insight. Different ways of using predictive analytics in learning analytics including student success, recommendation systems.

Understanding the student factors that influence student success. Focus is to help staff support students (maintaining a human interface). Requirement to make actionable insights.

Issues with predicting
Telling you what might happen and not will happen. Highlighting fallibility and need careful interpretation

Focusing on presenting data to staff. Delivering to a wide range of staff, want to present data appropriate to viewers need. Using logging to filter eg if tutor your students prediction, course leader students + module level


Staff can drill down into courses. Ability at a glance to see summary of module. Drilling down to student view


Giving an indication of the areas where student might be struggling


Using metrics that have shown to have a strong influence in student success: Preparation for HE, engagement (engagement including vle and library use) and academic integration (formative assessment results)

Design guidelines


Technology guidelines - highlight cross platform and touch friendly interfaces

Design considerations - 'traffic lights' are very emotive and have distinct colour banding (granularity)

Q: about prediction drilldown
A: very difficult to go beyond a certain level. Need to explain what '36% VLE engagement' actually means.

Em Bailey (Heriot-Watt University)
An overlooked tool? Using Excel for advanced data visualisation

All these shiny toys and here to defend the Excel. Benefits, very powerful and flexible and already on most people's desktops.

NSS survey dashboard presented in Excel developed by Em. No macros or VBA and using built in charts. Mix of subject and department information. Users can pick different subjects


Option to drilldown for response range and opinion strength and make sector comparisons


Looking at when things not going so well


Used Excel to mock up what Key Information Set data might look like. Used to all staff to validate their returns.



This is my first ever attempt at 'live blogging' and haven't done any tidying other than cropping images. You might want to explore the LASI-UK a twitter summary by @sheilmcn  (complete #lasiuk twitter archive here)

Sheila MacNeill (JISC CETIS)
Setting the scene: learning scenarios

[Slides are here]

Overview of learning scenarios, broad-brush, what data do we have access to. Where does it take place? Learning taking place in various places in institution but not forgetting informal setting.


Various places where we communicate and learn (synchronous and asynchronous), which is creating a data footprint both formal and informal. Opportunities to merge data


University of Derby learner engagement dartboard. Uses primary, secondary and tertiary indicators. Used to build fuller picture of the student and engagement with institution.


Using SNA to help understand learning. Example of using SNAPP for discussion board analysis.

Example from me of dashboarding Canvas Discussion forums in Google Spreadsheet and TAGSExplorer to 'see' conversations.



Diagram developed by Mark Stubbs for tackling the LA problem space 'analytics to develop understanding'

What actionable insights can we gain? Examples include patterns of behaviour, targeted intervention, benchmarking, retention, activity design, data driven design decisions, greater understanding of the who, why what and where of learning

Clare Llewellyn (University of Edinburgh)
Argumentation on the web - always vulgar and often convincing?

[Slides are here]

Using argumentation to analysis the social web. Session title is from Oscar Wild quote. Example, discussion around an typical Guardian post includes threading quoting etc.


Breaking down the discussion into conversations with specific topics, main points of the discussion. Issue is data deluge, how do we find the points of most interest. Solutions: BBC tried using editors picks but issue creates snapshots and not whole conversation.





Various levels of argumentation micro and macro level.



Data - using twitter data from the London riots (7729)

Identification - Unsupervised clustering machine learning. Select an appropriate algorithm. Strategy is try and see. Tried Unigram/bigram, incremental clustering, k-means, topic modelling. Topic modelling proved best in this example.

Guardian comments - using api to extract claim (quoted text) counter claim (comment). Using LTT - TTT2

Supervised machine learning - data from London riots which has already been marked up. Tools: TagHelper Lightside? Using claim (text) evidence (link)

Duygu Simsek (Open University)
Towards visual analytics to improve scientific reading & writing

[Slides are here]

[wordpress app decided not to save my notes :(]

Naomi Jeffery (Open University in Scotland)

Learning analytics: a whistle-stop tour


Overview of papers Naomi has recently found interesting. Marshal Lovetts (lak13) - no conception of what the data we are looking at actually means in the real world, remembering the person and learning design. Tutor v student requirements.


Overlap between needs.

5 types of LA. Social network analytics - visualizing social learning ties from Network awareness tool. Platform used to explore support learner communities.

Discourse analytics - - meaningful traces from speaking and listening within online discussion forums. Using 9 metrics and then supporting students in interpretation. Discovered that presenting numbers (rather than visualisations) had better results. As helping students developed desired behaviours

Context? Analytics Understanding promotions (how can student 'liking' posts can be used to promote content)

Disposition analytics - mooc disengagement identifying learner subpopulations. developed categorization of completing, auditing, disengaging and sampling


Learning pathways - Five learning methods minimal, careful, strategic explorers, explorers and haphazard. Prior knowledge an influence the model

Context analytics
Whether online, classroom, blended. Framework paper has several examples.


The concepts used in this post are very similar to the Google Analytics superProxy (announced after my solution ;) the main difference is this working in Google Drive meaning data access authentication is built in.

Access control to Google Analytics data is very coarse. Via the web interface you can have full access or none at all. Given the growing interest in data driven decision making at all levels of the institution I thought it would be beneficial to demonstrate how Google Spreadsheets and Google Apps Script could be used to selectively distribute Google Analytics data. This solution was developed for my session at the Institutional Web Managers Workshop 2013 (IWMW13) which I talk about in more detail at the end of this post.

Google Analytics Query ExporterSo here is the general flow. Google Apps Script, which is part of Google Drive, is used by Google Analytics admins to create custom queries which pull back slices of data. The admins can then control how these slices are distributed either within the security of Google Drive or published to the web. Also because Google App Script feature ‘script triggers’, which can be time based, admins can preconfigure mini scripts to decide when and how the data is distributed. Examples include:

  • Add data slice to a Google Spreadsheet (Spreadsheets feature options for publishing to the web)
  • Turn data slice into a .csv file and attached to an email for the recipient
  • Turn data slice into a .csv file and copy/share via Google Drive
  • Turn data slice into a .csv file and publish on Google Drive
  • All the above with different file formats including json

Here is the template I’ve developed which you can copy and use followed by some setup instructions (also there’s a setup video):

*** Google Spreadsheet GA Query Exporter ***

  1. File > Make a copy of this spreadsheet (need to be logged
  2. Open Tools > Script editor and then:
    File > Upgrade authorisation experience;
    Once it's upgraded still in the Script editor click Resources > Use Google's APIs; and
    Click the 'Google API Console' link at the bottom of the dialog window;
  3. In the Google Apis Console switch the Analytics API 'on' and read/accept the terms (you may have to accept two seperate terms windows)
  4. Close the Google Apis Console window and OK the Google API Services window in the Script editor
  5. Before closing the Script editor Run > authoriseGA to authenticate script

Once authenticated there are a number of ways to run the script. You can enter a query and click 'Get data' for one off fetches or open Tools > Script editor and modify the example cron jobs to automate collection and distribution of data (the cron jobs allow functionality that includes emailing/sharing links to csv files)

To create multiple queries in the same spreadsheet duplicate this sheet and modify the query

Tip: You can use formula to build the start and end dates e.g. yesterday is =(TODAY())-1

To help admins construct these data slices the tool is designed to work in collaboration with the official Google Analytics Query Explorer 2. With this users can experiment with queries like this one that returns the top blog posts based on visits. Within the Query Explorer I can export the Query URI and import to a sheet in the template. The general workflow is demonstrated in the video below which shows how to setup an example query/export:

Using the example query from above I’ve created this template sheet. Similar to the example in the video in cells E9 and E10 I use a formula to dynamically create a date range. Assuming I want to run the query on the 1st of each month for last months stats we start with the end-date generated using =TODAY()-1. To get the start-date we use the value of this field to get the 1st of last month using =DATE(YEAR(E10),MONTH(E10),1)

I could open the spreadsheet every month and manually click the ‘Get data’ button but instead I can setup a script trigger to run as a Month timer of the 1st between 1am and 2am.

Setup script trigger/cron job

The code I run is a little scriptlet like:

function exampleCronJobWriteCSVtoParticularFolder(){
  // copy new data as csv and then email as attachment (copy also kept in Google Drive)
  var sheetO = getData("input"); // name of sheet with query to execute e.g. 'input'
  var csv = saveAsCSV(sheetO.getName(), sheetO.getName()+" "+Utilities.formatDate(new Date(), "GMT", "yyyyMMdd"));
  var folder = DriveApp.getFolderById("0B6GkLMU9sHmLbThITlNvb2dzREE"); // folder id from url (navigate to folder and extract for your own folders)
  MailApp.sendEmail([email protected], "New data", "A new csv for '"+sheetO.getName()+"' has been created and is in the folder "+folder.getUrl() );
  // MailApp.sendEmail(recipient, subject, body, options) - More info

… which gets the data and then publishes it as a csv to this folder emailing a link to listed recipients. As the folder has been shared we can get access via Google Drive Host with the link

public folder

The template includes several example scriplet examples for you to play with and modify.  For those who are interested the slides for my IWMW13 session are on slideshare. The session was designed to introduce Google Apps Script followed by the query exporter template. I only made this yesterday so it might be a bit alpha. Feel free to leave issue and suggestions in the comments. I’m sure there are a number of clarifications required.

lasi-ukPart of the Learning Analytics Summer Institute Global Network

Jisc CETIS in partnership with the Open University in Scotland and the University of Edinburgh School of Informatics will be hosting a one day event on the 5th July in the Informatics Forum (Edinburgh) focusing on Learning Analytics.

LASI-UK is part of a global network of events in the first week of July, which are linked to the Learning Analytics Summer Institute (LASI) at Stanford University. LASI-UK will explore ways of working with data to draw out and share insights that can make a real difference to learning and teaching. This free, interactive event has a dual focus on data visualisation and working with dialogue data. You can check out the draft programme here.

Delegates are also invited to join us for an optional evening session, where we will be broadcasting the final morning plenary session live from Stanford.

Who should attend?

Anyone with an interest in using data to support students and inform curriculum design and teaching, including:

  • Data analysts, planners and management information providers
  • Curriculum designers / education developers
  • Those involved in the construction, delivery and analysis of MOOCs


Click here for more details and registration

This is a free event; however, places are limited, so interested parties are encouraged to register early.  We look forward to welcoming you to our event.

Learning Analytics appears to be increasingly an emerging area of interest for institutions and I'm aware of a number of staff being asked to contribute on this area in their teaching and learning strategies. I thought it would be useful to spotlight some resources produced by Cetis and Jisc in this area that might help. The list is in no ways exhaustive and if you have any other resources you think worth highlighting either leave a comment or get in touch and I’ll add them to the post.

**New** Ferguson, R. (2013). Learning Analytics for Open and Distance Education. In S. Mishra (Ed.), CEMCA EdTech Notes. New Delhi, India: Commonwealth Educational Media Centre for Asia (CEMCA).
**More New** Society for Learning Analytics Research (SoLAR) has a collection of useful resources including these introductory articles

SoLAR recently ran a open course Strategy & Policy for Systemic Learning Analytics. Something worth looking at might be the recording of the session  Belinda Tynan (OU’s PVC Learning & Teaching) did on Designing Systemic Analytics at The Open University.

**Even Newer** from SoLAR looking more at a national strategy for the learning analytics. Improving the Quality and Productivity of the Higher Education Sector Policy and Strategy for Systems-Level Deployment of Learning Analytics

Highlights from Jisc Cetis Analytics Series include

  • Analytics; what is changing and why does it matter?
    This paper provides a high level overview to the CETIS Analytics Series. The series explores a number of key issues around the potential strategic advantages and insights which the increased attention on, and use of, analytics is bringing to the education sector. It is aimed primarily at managers and early adopters in Further and Higher Education who have a strategic role in developing the use of analytics in the following areas:

    • Whole Institutional Issues,
    • Ethical and Legal Issues,
    • Learning and Teaching,
    • Research Management,
    • Technology and Infrastructure.
  • Analytics for Learning and Teaching
    A broad view is taken of analytics for Learning and Teaching applications in Higher Education. In this we discriminate between learning analytics and academic analytics: uses for learning analytics are concerned with the optimisation of learning and teaching per se, while uses of educational analytics are concerned with optimisation of activities around learning and teaching, for example, student recruitment.
  • Legal, Risk and Ethical Aspects of Analytics in Higher Education
    The collection, processing and retention of data for analytical purposes has become commonplace in modern business, and consequently the associated legal considerations and ethical implications have also grown in importance. Who really owns this information? Who is ultimately responsible for maintaining it? What are the privacy issues and obligations? What practices pose ethical challenges?
    Also of interest the LAK13 on An evaluation of policy frameworks for addressing ethical considerations in learning analytics
  • Institutional Readiness for Analytics
    This briefing paper is written for managers and early adopters in further and higher education who are thinking about how they can build capability in their institution to make better use of data that is held on their IT systems about the organisation and provision of the student experience. It will be of interest to institutions developing plans, those charged with the provision of analytical data, and administrators or academics who wish to use data to inform their decision making. The document identifies the capabilities that individuals and institutions need to initiate, execute, and act upon analytical intelligence
  • Case Study, Acting on Assessment AnalyticsOver the past five years, as part of its overall developments in teaching and learning, The University of Huddersfield has been active in developing new approaches to assessment and feedback methodologies. This has included the implementation of related technologies such as e-submission and marking tools.In this case study Dr Cath Ellis shares with us how her interest in learning analytics began and how she and colleagues are making practical use of assessment data both for student feedback and overall course design processes.
    Aspects of this case study and other work in this area are available in this webinar recording on Learning analytics for assessment and feedback

Examples of Learning Analytic Tools

Taken from Dyckhoff, A. L., et al. "Supporting action research with learning analytics."Proceedings of the Third International Conference on Learning Analytics and Knowledge. ACM, 2013.

  • LOCO-Analyst [1, 4],
  • TADA-Ed [46],
  • Data Model to Ease Analysis and Mining [38],
  • Student Inspector [50],
  • MATEP [56–58],
  • CourseVis [43, 45],
  • GISMO [44],
  • Course Signals [3],
  • Check My Activity [25],
  • Moodog [54, 55],
  • TrAVis [41, 42],
  • Moodle Mining Tool [48],
  • EDM Vis [34],
  • AAT [29],
  • Teacher ADVisor [37],
  • E-learning Web Miner [26],
  • ARGUNAUT [30],
  • Biometricsbased Student Attendance Module [27],
  • CAMera and ZeitgeistDashboard [51, 52],
  • Student Activity Meter [28],
  • Discussion Interaction Analysis System (DIAS) [8–11],
  • CoSyLMSAnalytics [49],
  • Network Visualization Resource and SNAPP [5, 6, 17, 18],
  • i-Bee [47],
  • iHelp [12], and
  • Participation Tool [32]

References for these tools are listed here

Here is a more general set of Analytics Tools and Infrastructure from the Analytics Series

A quick reminder that the Analytics in UK Further and Higher Education Survey is still open.


The term ‘dashboard’ currently seems to have a similar effect to marmite, you either love it or hate it. Fortunately it looks like the trend is towards increased personalisation of the way data is presented for both visual and exploratory purposes (an example of this is Thomson Reuters finance system Eikon), so if you don’t like marmite you can have something else instead.

One of the reasons I like using Google Spreadsheets is it’s a fairly easy environment to pull data into, provide specific views and share with others. Recent examples of this include the work I did providing a summary of the discussion activity within the open online course LAK13 (btw the recording of the recent ‘Geek Out’ webinar covering this is available from here). This example takes the existing Atom feed from Canvas discussion boards, which contains all forum posts, and provides some lite summary of activity (shown below – click to enlarge).


When faced with a similar challenge for ocTEL it made sense to evolve this existing work rather than reinvent. As previously outlined in the ocTEL course recipe, we use the WordPress platform as a course hub, creating additional functionality using plugins. For the discussion forums we use the bbPress plugin. Like the rest of WordPress, bbPress creates a number of data feeds that we can use for forums, topics and replies. As far as I can tell though these feeds follow the WordPress global reading settings for the number of items available and are limited to RSS2.0.

One of the really nice things however about working with WordPress is once you have you basic stack of plugins its easy to customise using your themes functions.php script. And that’s what we’ve done for ocTEL. Using WordPress’ extensive codex documentation we can see it’s possible to create our own feeds. So by adding this custom action in our theme functions.php and including this feed template in our theme folder we have an Atom feed for all our forum activity. With some very minor tweaking of the Canvas example mentioned above I’ve now got an overview of ocTEL forum discussions dashboard.

[If you have your own bbPress forums you can use the code above to create your own data feed and File > Make a copy of the spreadsheet to dashboard your forums.]

So now we have some insight to the ocTEL forum activity it raises some interesting question. Besides turning the insight into action the questions at the top of my mind are: how do we promote this data endpoint? what other data feeds and formats would be useful? How do we license the data? Your thoughts as always are very welcome. 

PS My colleagues Lorna Campbell and Phil Barker have recently published an Activity Data and Paradata briefing paper

1 Comment

Just a note to say on 26th April at 4pm BST (in your timezone) I’ll be giving a webinar on using Google Apps Script to extract data from Canvas (LMS by Instructure). Even if you’re not a Canvas user hopefully there will be some general techniques you’ll find useful. It might also be useful for people thinking about applying basic learning analytic techniques to your courses. I’ll update this post with any resources from the session.

Martin Hawksey of Jisc CETIS started playing around with discussion usage data in Canvas when he was participating in the Learning Analytics course in Canvas Network. Using Google's cloud scripting service Google Apps Script,  free for Google Apps and Google account users, he could dig deep into the data and start doing some interesting (and valuable) analysis all from the comfort and familiarity of a spreadsheet.

Join us for this free 50-minute technical webinar on Friday, April 26th as Martin details his experience using the Canvas APIs and walks through building scripts to get meaningful Canvas data using Google Apps Script and the Canvas APIs.

Registration is free and you can book here.

Here's a recording of the session and the bundle of links (including slides)

1 Comment

Sankey DiagramSankey diagrams are a specific type of flow diagram, in which the width of the arrows is shown proportionally to the flow quantity.” Wikipedia.

I first came across Sankey diagrams by (the OKFN’s latest School of Data’s contributors ;) Tony Hirst in d3.js Powered Sankey Diagram. Subsequently Bruce McPherson showed how to create Sankey diagrams from Excel also using d3.js.

Having collect some survey data for Analytics and Institutional Capabilities at #cetis13 (here’s a copy of the form) we were looking for a way to take us beyond the Google Form reporting and gaining extra insight. In particular I was interested in trying to see if there were any relationships between the multiple choice questions. Using a Sankey diagram seemed like a possible solution and my colleague David Sherlock quickly came up with a modification of Mike Bostock’s Sankey example to accept a csv input (I’ll link if/when it gets written up).

Seeing this I thought it might be useful to make a wrapper to generate Sankey diagrams for data stored in a Google Sheet. The solution was relatively straight forward, using the Google Visualisation API to get the data in the right shape for d3js. An example of the prototype is here

There’s no interface yet for you to select a spreadsheet, sheets, columns etc but you can take you Google Spreadsheet ‘publish to the web’ and then add the following data to the url.

One of the issues with this solution is you might not want to make all your data available. To get around this I’ve written a Google Apps Script that lets you use a custom formula to preformat the data. To see this in action this Spreadsheet contains an example. The formula is in cell A1 and uses the format =setSankey(datarange, cols , separator)

  • datarange – sheet/cell reference for source data eg 'Form Responses - Edit'!A2:D Note must start with column A
  • cols – comma separated list of columns to use to generate a chart for eg "B,C,D"
  • separator {optional} – used to split multi value cells defaults to ", " eg " | "

To use this in your own spreadsheets open Tools > Script editor and copy the code from this gist. Here’s an example url using pre-processed data. My main difference is the addition of the &output=1 to the querystring.

Obviously creating your own querystrings to render the data isn’t ideal and it would be relatively straight forward to create a UI wrapper similar to the one used in EDGESExplorer, but it’s something I reluctant to do unless there is enough demand. The other consideration is the question – does the sankey diagram provide useful insight for the type of data or is it just more ‘damn lies’.

It would have of course been nice to write a Google Gadget to include this in a Spreadsheet … but Google are discontinuing those :(.


As part of LAK13 I’ve already written a series of blog posts highlighting a couple of ways to extract data from Canvas VLE. Prompted by a question by On and my colleague Sheila MacNeill I wanted to show you a way of getting feed data into a spreadsheet without using any code. The solution is to use Yahoo Pipes, but as this post will highlight this isn’t entirely straight forward and you need to be aware of several tricks to get the job done. As LAK13 isn’t using Google Groups for this post I’ll be using the Learning Analytics Google Group as a data source.

Sniffing for data

First we need to find a data source. Looking for an auto-detected RSS/Atom feed by visiting the group homepage reveals nothing. [I always forget browsers seem are moving away from telling you when they detect a feed. To get around this I use the Chrome RSS Subscription Extension which indicates with the orange RSS icon when a page has a feed.]

Browser with no feed detected Browser with feed detected

Looking for an official Google Groups API as an alternative method turns up this open issue from August 2007 for a Groups API aka there's no API :( Digging deeper we find Groups did have data feeds in their old interface. So with a bit of url magic I can land on the old Groups interface for Learning Analytics which gives us the orange light

Google Groups Old Interface with Feeds

Requesting the View all available feeds page we get some additional feed options:

Google Groups View all available feeds

At this point I could grab the Atom links and with a bit of tweaking process it with my existing Google Apps Script Code, but lets look at a ‘no code’ solution.

Feeding Google Sheets with Yahoo Pipes

At this point it’s worth reminding you that you could use the importFeed formula in a Google Spreadsheet which would import the data from a Google Group. The issue however is it’s limited to the last 20 items so we need a better way of feeding the sheet.

A great tool for manipulating rss/atom (other data) feeds is Yahoo Pipes. Pipes gives you a drag and drop programming environment where you can use blocks to perform operations and wire the outputs together. I learned most of my pipework from the Pipemaster – Tony Hirst and if you are looking for a starting point this is a good one.

Yahoo Pipes - Edit interface

Here I’ve created a pipe that takes a Google Group shortname does some minor manipulation, which I’ll explain later, and output a result. When we run the pipe we get some export options:

Yahoo Pipe - Run pipe

The one I’m looking for is .csv because it'll easily import into Google Sheets, but it’s not there … Just as we had to know the old Google Group interface has RSS feeds, with Yahoo Pipes we have to know the csv trick. Here’s the url for ‘Get as JSON’:

and if we swap &_render=json for &_render=csv by magic we have a csv version of the output (whilst we are here also notice the group name used when the pipe is run is also in the url. This means if we know the group shortname we don’t need to enter a name a ‘Run pipe’, we can build the url to get the csv.

Now in a Google Spreadsheet if you enter the formula =importData("")we get the groups last 100 messages in a spreadsheet.

Extra tricks

There were a couple of extra tricks I skipped worth highlighting. RSS/Atom feeds permit multilevel data, so an element like ‘author’ can have sub elements like ‘name’, ‘email’. CSVs on the other hand are 2D, rows and columns.

Illustration of nested structure of atom feed

When Yahoo Pipes generates a csv file it ignores sub elements, so in this case it’ll generate an author column but won’t include name or email. To get around this we need to pull the data we want into the first level (doing something similar for content).

Rename block

The next little trick is to get the feed dates in a format Google Spreadsheets recognise as a date rather than a string. In the feed dates are in ISO 8601 format e.g. 2013-03-05T13:13:06Z. By removing the ‘T’ and ‘Z’ Google Spreadsheets will automatically parse as a date. To do this we use a Regex block to look for T or Z (T|Z) replacing with a single space (which is why the ‘with’ box looks empty).

Regex block

I’ve wrapped the data in a modified version of the dashboard used for the Canvas data feed.

*** Google Groups Activity Dashboard ***

Google Groups Activity Dashboard


A couple of big limitations to be aware of:

How long will Google Group data feeds last

Given we’ve had to dig out the data feeds from the old Google Group interface my suspicion is once this is shut off for good the feeds will also disappear. Who knows, Google may actually have a Group API by then ;s

Limited to last 100 messages

The eagle eyed amongst you will have spotted I was able to increase the number of messages returned to 100 by adding num=100 to the feed query. This is the limit though and you can’t use paging to get older results. There are a couple of ways you could store the feed results like using the FeedWordPress plugin. I’m experimenting with using IF I do THAT on {insert social network/rss feed/other} THEN add row to Google Spreadsheet, but as the data isn’t stored nicely (particularly dates which are saved like ‘February 15, 2013 at 01:48PM’ *sigh*) it makes it harder to use.

I think that’s me in terms of ways for extracting discussion boards data … for now. One other technique related to Google Spreadsheets is screen scraping using importXML. To see this in action you can read Dashboarding activity on public JISCMail lists using Google Sheets (Spreadsheets).