Registering blog addresses and generating a OPML file (Notes on FeedWordPress and MOOC-In-a-Box)

Last week I got a number of enquiries about using Google Forms. One of these was a question from Martin Weller (OU) who is in the process of setting up H817 which is an Open University open course (MOOC) on Openness in Education. He asked:

how do I go from this [Google Form] to an OPML file that feedwordpress can use (also if you have any ideas about how to do autodiscovery of feeds so if people only put in their main url it’ll find the feed, that would be great).

My response was to direct Martin to http://opml-generator.appspot.com/ which I used in Generating an OPML RSS bundle from a page of links using Google Spreadsheets. This takes a Google Spreadsheet of RSS feeds and generates an OPML bundle. Part of this solution relies on the user entering their blog RSS feed, which isn’t always known. It’s possible to use RSS auto-discovery, which at the time I didn’t have a scalable solution, but Martin appeared to have enough information to edge forward a bit more. I’ve subsequently come up with this

*** Blog Registration OPML Generator Template ***
[File > Make a copy for your own version]

With this a person can enter their blog url, some code tries to fetch the rss feed and uses the data to generate an OPML file. If you’d like to see it in action fill in this form (embedded below) … and see the generated OPML here (this should update every 5 minutes. It’s also *.tsv but the URL is enough to trick most feed aggregators).

I’ll let you discover uses for this template other than open course registrations.

Registering feeds with FeedWordPress (FWP)

Back on Martin’s Weller’s blog he’s written about is IT support woes. As part of this Alan Levine has kindly added some comments about how FeedWordPress handles blog registrations. The comment that caught my eye was:

I am not sure why you are fiddling with the OPML. You should should copy the URLs and paste them into FWP.

I’m assuming Martin had assumed that generating a OPML file accessible by url would mean FWP would keep checking for new blog registrations. As Martin has subsequently discovered this does not appear to be the case. When you import feeds from a url FWP still prompts you to select which feeds you want to include. This suggests to me that it’s not a fully automated process.

add opml

Alan suggested just copying the urls into FWP. Lets look at how that works. Clicking the ‘add multiple’ source button in the FWP admin lets you dump a blog url, one per line.

add multiple

When you click add FWP visits the blog and tries to auto-discover the feed using the first RSS feed it finds (this is the same trick used in my Google Form). So in short as FWP doesn’t appear to keep checking the OPML file and because you can just dump blog urls Martin may as well just collect these and do any cleaning up after.

So as Martin is discovering there are some very fiddle bits with using FeedWordPress as a open course aggregator and he ends his post with:

One last plea – I joked with Alan that I needed DS106 out of a box. I think I’m serious though – it would be great to have a step by step, idiots guide to installing and setting up a DS106-like environment. The rest of us don’t have Alan and Jim’s tech skills, so getting to the starting line is difficult. I know they’ll say you should invent your own way, but they done so much great work that I don’t think they realise just how much expertise they have. A simple installation that let the rest of us get started, would mean we could all go off in different directions then.

This takes me back to Jim Groom’s post on Integrating FeedWordPress with BuddyPress:

No where in the raging discussion around MOOCs is there anyone talking about sharing the infrastructural/architectural work they’ve done freely with others.  CUNY’s Commons-In-a-Box project makes the innovative development work they have been doing for years freely available for others to experiment with. That is the spirit of sharing that seems to me should characterize the groundswell for open, online education

+1 for ds106-In-a-Box

Keep your Twitter Archive fresh on Google Drive using a bit of Google Apps Script

Twitter Archive interfaceLike a growing number of other people I’ve requested and got a complete archive of my tweets from Twitter … well almost complete. The issue is that while Twitter have done a great job of packaging the archives even going as far as creating a search interface powered by HTML and JavaScript as soon as you’ve requested the data it is stale. The other issue is unless you have some webhosting where can you share your archive to give other people access.

Fortunately as Google recently announced site publishing on Google Drive by uploading your Twitter archive to a folder and then sharing the folder so that it’s ‘Public on the web’ you can let other people explore your archive (here’s mine). Note: Mark Sample (@samplereality) has discovered that if you have file conversion on during upload this will break your archive. [You can also use the Public folder in Dropbox if you don’t want to use a Google account]

The documentation wasn’t entirely clear on how to do this. Basically it seems that as long as there’s a index.html file in the folder root and links to subdirectories are relative all you need to do is open the folder in Google Drive and swap the first part of the url with https://googledrive.com/host/ e.g. https://drive.google.com/#folders/0B6GkLMU9sHmLRFk3VGh5Tjc5RzQ becomes https://googledrive.com/host/0B6GkLMU9sHmLRFk3VGh5Tjc5RzQ/

So next we need to keep the data fresh. Looking at how Twitter have put the archive together we can see tweets are stored in /data/js/tweets/ with a file for each months tweets and some metadata about the archive in /data/js/, the most important being tweet_index.js.

Fortunately not only does Google Apps Script provides an easy way to interface Drive and other Google Apps/3rd party services but the syntax is based on JavaScript making it easy to handle the existing data files. Given all of this it’s possible to read the existing data, fetch new status updates and write new data files keeping the archive fresh.

To do all of this I’ve come up with this Google Spreadsheet template:

*** Update Twitter Archive with Google Drive ***
[Once open File > Make a copy for your own copy]

Note: There is currently an open issue which is producing the error message ‘We’re sorry, a server error occurred. Please wait a bit and try again.’ Hopefully the ticket will be resolved soon

The video below hopefully explains how to setup and use:

A nice feature of this solution is that even if you don’t publically share your archive, if you are using the Google Drive app to syncs files with your computer the archive stays fresh on your local machine.

The model this solution uses is also quite interesting. There are a number of ways to create interfaces and apps using Google Apps Script. Writing data files to Google Drive and having a static html coded based interface is ideal for scenarios like this one where you don’t rely on heavy write processes or dynamic content (aware of course that there will be some sanitisation of code).

It would be easy to hook some extra code to push the refreshed files to another webserver or sync my local Google Drive with my webhost but for now I’m happy for Google to host my data ;s

#PDFTribute: MASHe as PDF and RDF/RSS 1.0

I’m embarrassed to say I didn’t know the name Aaron Swartz until a couple of days ago. I was aware of something trickling through my networks but it wasn’t until I read Dan Brickley’s Remembering Aaron Swartz that I started getting the bigger picture.

Reading Brian Kelly’s A Tribute to Aaron Swartz: Lets Make #pdftribute Trend highlighted me to PDFTribute, the campaign to show support for Aaron’s work (including the Guerilla Open Access Manifesto) by posting papers online I wanted to make my own contribution.

As I’m lacking a corpus of academic papers I thought I’d share all my blog posts. So here are 604 posts from MASHe (The MASHeBook). Because Aaron was one of the authors of RSS 1.0 specification I’ve also published MASHe as RDF/RSS 1.0 

Notes on making these

WordPress Stats in R: Generating postview sparklines using SparkTable

After posting Analysing WordPress post velocity and momentum stats with Google Sheets (Spreadsheet) Tony quickly came back with WordPress Stats in R, which includes a handy custom function for querying the WordPress Stats API. Something I wanted to do in my original post was to generate sparklines for post activity. I wasn’t able to find a way of doing this in Google Sheets  because of the number of data points (n. 366) and didn’t think Google Apps Script would handle the dataset because of timeouts. So given the start Tony had made with R it seemed like the ideal opportunity to refresh my R skills.

So below is the result (here’s the interactive version), which is followed by an explanation of how it was made.

SparkTable of postview 

Processing the data in R

Thanks to the code provided by Tony it’s easy to get postviews data back from WordPress using:

wp.postviews=wordpress.getstats.demo(APIKEY, BLOGURL, 'postviews', days='366', end='2012-12-31',  period='day', limit=-1)

This returns a data frame that looks like this:

Note that 478 different post_titles are returned compared to 178 post_permalinks showing some of the data is missing and in fact wrong. This isn’t important for this particular project. I’ve uploaded a sample of the returned data here. The data returned includes a day stamp, post title and number of views. Posts with no views on that day are not returned.

For the sparklines I initially headed off down a dead end (lines 99-107) making a pivot table in R. This wasn’t entirely wasted effort because the process of casting this and handling null values was used later in the project.  Having got stuck at this point searching for more information on sparklines in R turned up this question on stackoverflow and this set of slides referencing the sparkTable package. Initially I tried dropping and renaming some of the columns from the original data but on line 96 when trying the reshapeExt I got:

Error in `[<-.data.frame`(`*tmp*`, , attr(dat, "reshapeLong")[["timevar"]],  : 
  replacement has 32504 rows, data has 32538

Searching for variations around the error message didn’t turn anything up and if you know what is wrong I’d be very grateful for guidance. Instead I decided to follow a hunch and instead of using partial time series data filled in the blanks by casting and then melting i.e. from this:

data from worpress

to this (filling in missing data with ‘0’)

data from worpress casted

before going back to this

data from worpress then melted

Doing this got me past reshapeExt and outputting a sparkTable as html (the other option is LaTeX) embedding 478 sparkBars as png images. The final step was to wrap the html using jQuery DataTables (the main thing was to add <thead> and <tbody> tags). 

Was it worth it? I’ll let you decide if the product is any good, the process of doing it was definitely useful. So how would you improve it?

Analysing WordPress post velocity and momentum stats with Google Sheets (Spreadsheet)

Like other wordpress.com and self-hosted wordpress blogs using Jetpack I got my annual report. Some of the headlines include:

The report then lists the top posts (minus the totals, which are available from the stats panel)

These are the posts that got the most views on JISC CETIS MASHe in 2012.

  1. Export Twitter Followers and Friends using a Google Spreadsheet 77 COMMENTS March 2011
  2. Twitter Archiving Google Spreadsheet TAGS v3 17 COMMENTS January 2012
  3. The best Google Spreadsheet Event Manager (w/h Sites, Contact, Calendar integration) ever, ever, ever 4 COMMENTS November 2010
  4. Twitter: How to archive event hashtags and create an interactive visualization of the conversation 13 COMMENTS November 2011
  5. Using Google Apps Script for a event booking system (Spreadsheet to Calendar & Site | Form to Spreadsheet, Email and possible Contacts) 23 COMMENTS March 2010

These results are biased towards posts that have had the most time to collect views. What about the posts made later in the year? Or, other posts like IFTTT that attracted a lot of traffic in one day?

Remembering Tony Hirst’s Making Use of WordPress Stats which pulls data from the WordPress Stats API into a Google Sheet (Spreadsheet) I created a modification that pulls in a years worth of stats and lets me explore some of these questions. Below is a summary of WordPress Postviews Stats Analysis for MASHe (columns are sortable by clicking on the column heading).

Notes

  • AVERAGE of Views is the average post views per day. Days with no views are not counted
  • MAX of Views is the maximum post views in one day
  • SUM of Views is the total number of views a post has received
  • COUNTA of Views is the total number of days a post received at least one view

This reveals posts like Feeding Google Spreadsheets (from October 2012) and Mozilla Open Badges Issuer Gadget for Google Sites (December 2012) have good average daily views basic projections putting them in the top 10 for 2013.

There’s more digging to be done, but in the meantime if you want to have a look at your own wordpress stats grab a copy of the template below and have a go (and if you make any improvements/useful insights leave a note ;)

*** WordPress Postviews Stats Analysis Template ***
[File > Make a copy for an editable version]

Creation notes

The code comments highlight the main gotchas:

function wordpressStatPostviews(api_key, blog_uri, end, days, limit) {
  // build url for api. Found better results using json rather than csv (csv didn't get  years worth)
  var url = "http://stats.wordpress.com/csv.php?api_key="+api_key+"&blog_uri="+blog_uri+"&end="+Utilities.formatDate(end, "GMT", "yyyy-MM-dd")+"&days="+days+"&limit="+limit+"&table=postviews&format=json";
  var output = [["Link","Title","Title - Short","Date","Views"]]; // initialise return array
  try {
    var options = { "method" : "get" }; // initialise options for UrlFetchApp
    var response = UrlFetchApp.fetch(url, options); // fetch url
    if (response.getResponseCode() == 200) { // if response
      var raw = Utilities.jsonParse(response.getContentText()); // parse text respose into json
      raw.reverse(); // reorder to oldest first - no real need but helped with debugging
      for (i in raw){ // data is returned as object per day so iterate across
        var postdate = raw[i].date; // pull post date (as string) considered converting to date object but not required
        for (j in raw[i].postviews){ // in each date object there is array of views per post, interate across to push into returned 2d array 
          var apost = raw[i].postviews[j]; // abreviating
          // noticed a bug in returned data. Views for homepage have correct post_id but post_title is pulled from last interation
          var title = apost.post_id === 0 ? "Homepage" : apost.post_title; 
          // creating a short title for published table (no cell wrapping)
          var titleShort = title.length > 50 ? title.substring(0,50)+"..." : title;
          // push row to output
          output.push([blog_uri+"/?p="+apost.post_id,title, titleShort, postdate,  apost.views]);
        }
      }
      return output;
    }
  } catch(e) {
    throw e;
  }
}

What I’ve starred this month: December 28, 2012

Here's some posts which have caught my attention this month:

Automatically generated from my Diigo Starred Items.

Backup Twitter Status Updates to a Google Spreadsheet

It looks like Twitter are finally rolling out the option to download all your tweets. As well as providing a nice offline search interface it appears that “the archive also includes CSV and JSON files, the latter complete with each tweet’s metadata”. I’m looking forward to see the data visualisations/mashups people come up with around their data. 

The Twitter API has long allowed you to extract a user’s status updates the limitation being you can could only get the last 3,200 tweets. This is something Sheila MacNeill discovered when she tried vizify’s ‘Year on Twitter’ tool.

the archive of meExporting tweets was something I looked at in 2011 in the cryptically titled Google Spreadsheets and floating point errors aka when is 65078736491511804 + 1 = 65078736491511808 (and automatically archiving your Twitter Status Updates). With this I’ve got a record of my tweets going back to April 2010 which is triggered to update itself every week. A reason I do this is often I need to find things I’ve previously said in ‘the archive of me’.

Here’s the template (File > Make a copy) and follow the instructions if you want to try (please be aware of the Twitter Developer Rules of the Road). I’ve updated the code to make it compatibly with version 1.1 of the Twitter API. One of the options I’ve added is a JSON dump which is saved to your Google Drive. It only took two lines of code using Google Apps Script HT +Romain Vialard 

  var blob = Utilities.newBlob(Utilities.jsonStringify(json), "application/json", filename);
  DocsList.createFile(blob);

[The JSON dump is a bit buggy – some issues with character escaping somewhere]

Enjoy!

How the Publicis Groupe ‘Maurice Lévy: more digital than ever for 2013′ YouTube video was made

The 2013 New Year’s greetings, from the French advertising and public relations company Publicis Groupe by their CEO Maurice Lévy is making multiple appearances in my circles

When I saw it my immediate thought was how did they do it? In the official press release they say:

It’s just one of the many very human interpretations of the standard YouTube functionality taken for granted today.

Looking for YouTube player functionality left me scratching my head. Looking at the source gave a big clue:

Publicis Groupe custom tab code (click to enlarge)

… basically the entire video area including title and controls is an embedded iframe. This allows Publicis Groupe to create their own Flash based player with custom functionality for pause, play etc.

To get the iframe into their YouTube channel they used YouTube’s Custom tab settings which are available on Branded Channels.

Simple but very effective.

Twitter provides proof of real-time engagement with the public: How to record it as evidence

Alistair Brown has written an interesting post on the LSE Impact of Social Sciences blog – Proving dissemination is only one half of your impact story: Twitter provides proof of real-time engagement with the public. The post highlights the case of how a journal paper was picked up by a university media office, which ended in the author being interviewed on BBC Radio 4’s Today programme. As part of this Alistair highlights that:

REF impact involves an assessment of “significance” as well as “reach,” so the mere fact that research has been disseminated to a wide audience does not constitute an impact by itself; one has also to show the effect it has on those to whom it is disseminated. For this reason, citing the fact that a researcher has appeared on a primetime radio show with several million potential listeners might be one element of an impact statement, but one needs also to evidence that the audience has actively listened to what was being put out, and that it has affected, changed or benefitted them in some way

In the age of the second screen Alistair goes on to highlight how Twitter can be used as evidence of engagement, listeners tweeting personal reflections, feedback or just disseminating the information more widely. But as Alistair points out:

When a piece of academic work receives broadcast media coverage, then, it is useful to have a strategy in place to gather emerging responses, and it is also far easier to do this as it happens rather than retrospectively.

A strategy is required because, as Alistair points, out the Twitter search is limited to the last 7 days. While there are ways to view this activity in realtime how do you capture the evidence.  Here’s my response to the problem:

Here is the spreadsheet template I mention. So have you got a strategy for recording impact evidence from Twitter?

Guest Post for Big Data Week #bdw13: Getting Creative with Big Data and Google Apps

Getting Creative with Big Data and Google Apps I was recently asked to write a guest post for Big Data Week on using Google Apps as an interface for Big Data. For the post I decided to revisit an old recipe which uses Google Sheets (Spreadsheets) and Google Apps Script to interface the Twitter and Google Analytics API. One of the results is the bubble graph shown below which shows who has been tweeting my blog posts, how many visits their tweet generated, the number of retweets and how many followers the person has (click on the image for the interactive version). You can read more about his this was done and get a copy of the template in Getting Creative with Big Data and Google Apps

Visits, rewteets bubble graph

About

This blog is authored by Martin Hawksey Google+

JISC CETIS Learning Technology Advisor (OER Programme Support)
jisc cetis logo

The MASHezine (tabloid)

It's back! A tabloid edition of the latest posts in PDF format (complete with QR Codes). Click here to view the MASHezine

Preview powered by:
Bluga.net Webthumb

The MASHebook

You can also download this post as:

Subscribe to monthly email digest of posts

Loading...Loading...


Subscribe to per post email updates

Enter your email address:

Delivered by FeedBurner

Copyright License

Creative Commons Licence
This work is licensed under a Creative Commons Attribution 3.0 Unported License. CC-BY mhawksey

Privacy /Cookies

This blog uses Google Analytics (which makes use of 'cookie' technologies) to provide information on usage. Here's an overview of Google Analytics Privacy and how to opt-out (other 3rd party services like Twitter might also be tracking you via this site, but as far as possible I try and prevent this by removing official tweet buttons).

Badges

. . .