Google Apps Script

Just a quick note to highlight a guest post I’ve written for the Google Apps Developer Blog on How Apps Script Makes Classroom Observation Quicker and Easier. In this Google Apps Script example I show how a couple of lines of code can do some custom Google Form handling. In this case the scenario was:

Justin Marckel, the assistant principal at Cornatzer Elementary School in North Carolina, asked for help in modifying one of my existing Apps Script examples. Justin was recording teachers’ classroom activities using a Google Form, then manually copying and pasting data into separate spreadsheets for each teacher to review. Justin wanted to know whether there was a way for a Google Form to store the results in a master spreadsheet, then filter results to each teacher’s spreadsheet.

You can rad the full post here

PS Will Welch spotted that there will be new Google Apps Script/Google Forms functionality announced at Google I/O.

Share this post on:
| | |
Posted in Google Apps Script on by .

2 Comments

One of the nice things about open courses like ocTEL is whilst having your own blogging space wasn’t a mandatory requirement we were aware that a number of participants had setup one up anyway. Conscious that activities like blogging can be very lonely and also aware that this is often a critical moment in motivating people to engage in reflective writing we’ve been encouraging tutors to show these blogs some extra love, which seems to have gone down well.

This raises the question that given blogging is optional and we haven’t asked people to self-declare if they are new to blogging how do we identify this community? With this question in mind I quickly (initial pass took me about 5 minutes thanks in part to earlier work in this area) put together a spreadsheet which took the participant blog list and got feed counts.

post table

How it was made

If you look at Sheet 1 you’ll see there is a static list of Feed urls (Column B). This was obtained by using the Chrome Screen Scrape extension. The reason it is static is I wanted a column where tutors could make comments and a dynamic import would screw this up. Looking at Sheet 2 cell A1 you can see how to do a dynamic list using the importXML function (see this post for more examples of spreadsheet import options).

To get the post counts and dates some Google Apps Script was required to write a custom function (custom function, you ask? Where have you been, read my blog). After that it was a bit of conditional formatting.

What’s missing

It would be useful to have a dynamic list of feeds which could retain comment cells (ideas on how to do this very welcome. My thought was to do a cron job to read the sheet, fetch more feeds then output the result)

Another metric it would be useful to add a comment count. So 1 post 3 comments would be identified as a lower priority than 1 post 0 comments.

So do you think this spreadsheet is potentially a useful tool in the open course toolbox? How would you improve it?

1 Comment

Just a note to say on 26th April at 4pm BST (in your timezone) I’ll be giving a webinar on using Google Apps Script to extract data from Canvas (LMS by Instructure). Even if you’re not a Canvas user hopefully there will be some general techniques you’ll find useful. It might also be useful for people thinking about applying basic learning analytic techniques to your courses. I’ll update this post with any resources from the session.

Martin Hawksey of Jisc CETIS started playing around with discussion usage data in Canvas when he was participating in the Learning Analytics course in Canvas Network. Using Google's cloud scripting service Google Apps Script,  free for Google Apps and Google account users, he could dig deep into the data and start doing some interesting (and valuable) analysis all from the comfort and familiarity of a spreadsheet.

Join us for this free 50-minute technical webinar on Friday, April 26th as Martin details his experience using the Canvas APIs and walks through building scripts to get meaningful Canvas data using Google Apps Script and the Canvas APIs.

Registration is free and you can book here.

Here's a recording of the session and the bundle of links (including slides)

1 Comment

Sankey DiagramSankey diagrams are a specific type of flow diagram, in which the width of the arrows is shown proportionally to the flow quantity.” Wikipedia.

I first came across Sankey diagrams by (the OKFN’s latest School of Data’s contributors ;) Tony Hirst in d3.js Powered Sankey Diagram. Subsequently Bruce McPherson showed how to create Sankey diagrams from Excel also using d3.js.

Having collect some survey data for Analytics and Institutional Capabilities at #cetis13 (here’s a copy of the form) we were looking for a way to take us beyond the Google Form reporting and gaining extra insight. In particular I was interested in trying to see if there were any relationships between the multiple choice questions. Using a Sankey diagram seemed like a possible solution and my colleague David Sherlock quickly came up with a modification of Mike Bostock’s Sankey example to accept a csv input (I’ll link if/when it gets written up).

Seeing this I thought it might be useful to make a wrapper to generate Sankey diagrams for data stored in a Google Sheet. The solution was relatively straight forward, using the Google Visualisation API to get the data in the right shape for d3js. An example of the prototype is here

There’s no interface yet for you to select a spreadsheet, sheets, columns etc but you can take you Google Spreadsheet ‘publish to the web’ and then add the following data to the url.

One of the issues with this solution is you might not want to make all your data available. To get around this I’ve written a Google Apps Script that lets you use a custom formula to preformat the data. To see this in action this Spreadsheet contains an example. The formula is in cell A1 and uses the format =setSankey(datarange, cols , separator)

  • datarange – sheet/cell reference for source data eg 'Form Responses - Edit'!A2:D Note must start with column A
  • cols – comma separated list of columns to use to generate a chart for eg "B,C,D"
  • separator {optional} – used to split multi value cells defaults to ", " eg " | "

To use this in your own spreadsheets open Tools > Script editor and copy the code from this gist. Here’s an example url using pre-processed data. My main difference is the addition of the &output=1 to the querystring.

Obviously creating your own querystrings to render the data isn’t ideal and it would be relatively straight forward to create a UI wrapper similar to the one used in EDGESExplorer, but it’s something I reluctant to do unless there is enough demand. The other consideration is the question – does the sankey diagram provide useful insight for the type of data or is it just more ‘damn lies’.

It would have of course been nice to write a Google Gadget to include this in a Spreadsheet … but Google are discontinuing those :(.

4 Comments

In my last post on Canvas Network Discussion Activity Data I mentioned I was a little disappointed to not be able to use social network analysis (SNA) modelling techniques on the Canvas Network discussion boards. My main barrier was accessing the data via the Canvas API using my preferred toolset. Fortunately Brian Whitmer at instructure.com (the company behind Canvas) posted a comment highlighting that as a student on the course it was easy for me to get access to this data using a token generated on my Canvas profile page. With this new information in this post I’ll cover three main areas:

  • a very quick introduction into techniques/opportunities for analysing threaded networks using SNA;
  • how I retrieved data from the Canvas platform for the #LAK13 discussions; and finally
  • some analysis using the NodeXL add-on for Microsoft Excel (Windows).

On Friday 1st March at 4pm GMT I’ll also be having a live Hangout on Air with Marc Smith, one of the original creators and continued project coordinator of NodeXL. The live feed will be embedded below embedded here and you can ask question via Twitter or Google+ using the combined hashtags #lak13 and #nodexl  e.g. ‘What the best place to find out more about NodeXL? #lak13 #nodexl’. For the hangout session we’ll look at how easy it is to use NodeXL to analyse a Twitter hashtag community in a couple of clicks. [The rest of this post is introducing a more advanced use of NodeXL so if I lose you in the rest of this post fear not as the session will be a lot easier going]

Opportunities for analysing threaded networks using SNA

Hello graph

WiredUK friend/follower graphOne of the dangers of SNA is people see images like the one to the right and miss the point of using this modelling technique (the image is from one of my first goes at using NodeXL so I’d be the first to admit there’s room for improvement). If you do there are couple of things to bear in mind. Often these images are exhaust, generated as a snapshot of a visual and/or exploratory analytic. A certain level of literacy is required to understand the underlying structure of the graph. Taking this last point a threshold concept for me was understanding that a basic concept of these graphs are more often than not constructed from an edge list which is essentially two columns of data representing and start point and end point to a node. For example, all the names in columns Vertex 1 and Vertex 2 are nodes and each row represents an edge between the nodes so in this case Beth –> Adam generates the highlighted edge.

Basic edge list and force layout

There is a lot more to understand about these types of graphs, but this basic concept means I know if I have any relationship data its easy to graph and explore.

Hello SNAPP

At this point it’s worth mentioning the browser plugin SNAPP.

The Social Networks Adapting Pedagogical Practice (SNAPP) tool performs real-time social network analysis and visualization of discussion forum activity within popular commercial and open source Learning Management Systems (LMS). SNAPP essentially serves as a diagnostic instrument, allowing teaching staff to evaluate student behavioural patterns against learning activity design objectives and intervene as required a timely manner.

Valuable interaction data is stored within a discussion forum but from the default threaded display of messages it is difficult to determine the level and direction of activity between participants. SNAPP infers relationship ties from the post-reply data and renders a social network diagram below the forum thread. The social network visualization can be filtered based upon user activity and social network data can be exported for further analysis in NetDraw. SNAPP integrates seamlessly with a variety of Learning Management Systems (Blackboard, Moodle and Desire2Learn) and must be triggered while a forum thread is displayed in a Web browser.

The social network diagrams can be used to identify:

  1. isolated students
  2. facilitator-centric network patterns where a tutor or academic is central to the network with little interaction occurring between student participants
  3. group malfunction
  4. users that bridge smaller clustered networks and serve as information brokers

The paper referencing SNA research supporting these areas was presented at LAK11 (if you don’t have access also available in the presentation’s slidedeck).  The paper Visualizing Threaded Conversation Networks: Mining Message Boards and Email Lists for Actionable Insights (Hansen, Shneiderman & Smith, 2010) also highlights simple ways to identify question people, answer people and discussion starters which are all potentially very useful within courses for identifying network clusters individuals might want to join/follow.

Retrieving data from Canvas

Hopefully with that quick intro you can see there might be some value in using SNA from threaded discussion analysis. Reading the SNAPP overview hopefully you spotted that it currently doesn’t support extracting data from Canvas discussion boards. This is an opportunity to understand some of the analysis SNAPP is doing behind the scenes.

Hello Google Apps Script

If you have been following my posts you’ll see that I favour using Google Apps Script as a lightweight tool for extracting data. Thanks to Brian (Instructure) I’ve got a way to access the Discussion Topics API. Looking at the API documents I decided the best way to proceed was to get all of the LAK13 discussion topics (top level information) and use this to get the full topic data. If you speak JSON we are essentially turning this:

Data as rendered in Canvas

into this:

Data in JSON

finally getting this (web version here):

Data in spreadsheet

The code to do this is available here. I’m going to spare you the details of the code but here are the instructions is you’d like to export data from other Canvas hosted discussion boards. If you’re not interested in that you can just jump to the next section.

Generating an edge list (extracting data) from Canvas to Google Sheets

  1. Create a new Google Spreadsheet and then in Tool > Script editor copy in the code from here
  2. If you are not pulling data from LAK13 you need to edit values in lines 2-4. If you visit your course homepage hopefully you can decode the url pattern based on the example for LAK13 https://learn.canvas.net/courses/33 (I should also point out you need to be enrolled on the course to receive data. Also read Canvas API Policy)
  3. Next you need an access token which is generated from your Canvas Profile Settings page. Scroll down to the bottom and click New Access Token, filling in a purpose and leaving expires blank. Make a copy of the token as it’s needed for the next step (I added a copy to a .txt file just in case the next step didn’t work
  4. Back in the Script Editor in Google Spreadsheets click File > Project Properties. In the ‘Project properties’ tab click  ‘+ Add row’ and replace (name) with access_token and (value) with the token you got from Canvas before clicking Save
    Project Properties
  5. Make sure everything is saved in the Script Editor and then Run > getCanvasDiscussionEdges, wait for the script to finish and on Sheet1 you should have a bunch of data to play with.

Using NodeXL to analyse Canvas Discussions

There are a number of different questions we could ask of the LAK13 data. The particular one I want to look at is who are the core community members stimulating/facilitating discussion (e.g. applying a connectivist theory who are the people you might want to connect with). To do this we need to (I’m assuming you’ve already installed NodeXL):

  1. Download the data extracted to the Google Spreadsheet (File > Download as > Microsoft Excel). [If you just want the data I’ve extracted here’s the download link – the data is automatically refreshed nightly]
  2. Open the download file in Excel and in the created_at column select all and Format Cells as General (I needed to do this because NodeXL was miss formating dates on import)
  3. Start a new NodeXL Template (I use the NodeXL Excel Template option from my windows Start menu)
  4. From the NodeXL ribbon you want to Import > From Open Workbook
    Import > From Open Workbook
  5. In the import dialog vert1_name and vert2_name are edges, anything else prefixed with ‘vert’ is assigned to the corresponding Vertex n property column and everything else is an  Vertex 1 property:
    import dialog
  6. Once imported you can open the Edges sheet, select the created_at column and Format Cells reassigns a date/time format.
  7. In Prepare Data chose ‘Count and merge duplicate edges’ and select Count and Vertex1 and Vertex 2
    Count and merge duplicate edges
  8. In the Graph section of the NodeXL ribbon we want to make this a directed graph (replies are directed) and choose you layout algorithm (I usually go Harel-Koren Fast Multiscale)
    make this a directed graph
  9. Next we want to prepare the data we want to analyse. In the Autofill Columns (within Visual Properties portion of the ribbon) set Edge Visibility to ‘topic_id’ and in Edge Visibility Options set ‘If the source column number is: Not equal to 558’ Show otherwise Skip (this will skip edges that are responses to the Pre-course discussion forum – I’ll let you question this decision in the comments/forum)
    Edge Visibility Options
  10. Click Ok then Autofill
  11. Next open the Vertices sheet and select all the rows (Ctrl+A) and from the Visibility option select ‘Show if in an Edge’
    Show if in an Edge
  12. Now find the row in the Vertices sheet for George Siemens and Skip (doing this were creating a ‘what if George wasn’t there’ scenario
    ‘what if George wasn’t there’ scenario
  13. Open the Graph Metrics window and add Vertex in-degree, vertex out-degree, Vertex betweenness and closeness centrality and Top items (in the Top items options you’ll need to add these as the metrics you want top 10s for), finally click Calculate metrics.
    Graph Metrics window

At this point you could use the calculated metrics to weight nodes in a graph, but for now I’m going to skip that. You should now have a Top Items sheet with some useful information. In the Betweenness Centrality list you should have these names:

  • Martin Hawksey
  • Simon Knight
  • Alex Perrier
  • Khaldoon Dhou
  • Rosa Estriégana Valdehita
  • Maha Al-Freih
  • Suzanne Shaffer
  • Maxim Skryabin
  • Bryan Braul
  • Peter Robertso

Excluding the pre-course discussion forum and George Siemens the discussions these people have engaged with provide the shortest paths to other people engaging in discussions on the Canvas site. Strategically these are potentially useful people  within the network that you might want to follow, question or engage with.

Getting to this point obliviously hasn’t been straight forward and had SNAPP been available in this instance it would have turned this in to a far shorter post. Programmatically using tools like R we could have arrived at the same answer with a couple of lines of code (that might be my challenge for next week ;). What it has hopefully illustrated is if you have data in an edge format (two column relationships) tools like NodeXL make it possible for you use SNA modelling techniques to gain insight. (I’m sure it also illustrates that data wrangling isn’t always straight forward, but guess what that’s life).

NodeGL for LAK13 discussionsSo at this point some of you might be wondering what does the LAK13 reply network look like. I could give you a flat image but why don’t you use my NodeXL online graph viewer to explore it yourself or download the data from the NodeXL Graph Gallery.

10 Comments

In my last post I looked at the data available around a course hashtag from Twitter. For this next post I want to start looking at what’s available around the Canvas Network platform which is being used to host Learning Analytics and Knowledge (LAK13). Sizing up what was available I did come across the Canvas LMS API documentation, which provides a similar method of accessing data as the Twitter API. I wasn’t sure if this extended to Canvas Network but because the authentication method it uses (oAuth2) isn’t possible using my dev tools of choice (mainly Google Apps Script) I looked for something else.

Whilst browsing on the discussion page for the course I noticed that my browser was auto-detecting a feed:

image

an excerpt of this feed is below:

 https://learn.canvas.net/courses/33/discussion_topicsLearning Analytics and Knowledge Discussion Feed
  2013-02-20T12:39:02+00:00
  	Discussion: Week 8 Discussion Forum
    tag:canvas.instructure.com,2013-02-01:/discussion_topics/discussion_topic_580
    2013-02-01T21:15:20+00:00
    2013-02-01T21:15:02+00:00
    	George SiemensDiscussion: Week 3 Discussion Forum
    tag:canvas.instructure.com,2013-02-01:/discussion_topics/discussion_topic_575
    2013-02-01T21:15:56+00:00
    2013-02-01T21:13:24+00:00
    	

      George Siemens

Looking at the raw feed I could see it wasn’t limited (often feeds only contain the last 10 entries) and contained the entire content of messages.

Update: Below I use Google Apps Script to extract the data. Since then I've created a 'no code' solution that only uses existing Spreadsheet formula. There is an accompanying presentation (the webinar should eventually appear here)

Looking for an easy way to consume this I first turned to the importFeed formula in Google Spreadsheet’s but unfortunately it only returned the last 20 results. A trick I’ve used in the past is to put feeds through Yahoo Pipes to get a JSON/CSV to work with, but as working with dates this way isn’t straight forward I opted for some Google Apps Script which would create a custom formula to fetch the feed from Canvas Network and enter the results into a sheet.  The 12 lines of code for the main part of this are below:

function getCanvasDiscussions(url) {
  var response = UrlFetchApp.fetch(url);
  var contentHeader = response.getHeaders();
  if (response.getResponseCode() == 200) {
    var d = Xml.parse(response.getContentText()).feed.entry;
    var output = [['published','updated','title','author','link','id','content']];
    for (i in d){
      output.push([getDateFromIso(d[i].published.Text),getDateFromIso(d[i].updated.Text),d[i].title.Text,d[i].author.name.Text,d[i].link.href,d[i].id.Text,d[i].content.Text]);
    }
    return output;
  }
}

The getDateFromIso is a subfunction I use quite often and is available in this stackoverflow answer. Adding the above code to a Google Sheet (via Tools > Script editor..) allows me to use a custom formula to fetch the data.

Below is a quick look at the data returned (here it is published in a table). In the columns we have publish dates, title, author, link, uri and post content. As it goes this isn’t too bad. The big thing that is missing is whilst we can see which topic the message is in the reply threading is lost.

Canvas Network Discussion Data

Even with this like the #lak13 Twitter dashboard from last week I can quickly add some formulas to process the data and get an overview of what is going on (for the live view visit this spreadsheet – File > Make a copy is you want to edit).

CanvasNetworkDashboard

This obviously isn’t a complicated analytic and it wouldn’t surprise me if the course tutors didn’t have something similar on the backend of Canvas Network. As a student it’s useful for me to see how I’m doing compared to others on the course and get a sense of who else is contributing. [Adam Cooper has a great post on How to do Analytics Right... with some tips he picked up for John Campbell who is behind Purdue’s Signals Project which fits in nicely here.]

Summary

So with a bit of ken and a couple lines of code I can see how the #lak13 discussions are going. Again I’ve avoided any deep analytics such as analysing what has been said, to who, at what time, but hopefully now that I’ve highlighted and freed the data you can do something else with it. Not being able to extract the conversation thread is a little disappointing as it would have been nice to fire up SNAPP or NodeXL, but I’ll have to save those for another day ;)

273 Comments

For a couple of years now to support my research in Twitter community analysis/visualisation I’ve been developing my Twitter Archiving Google Spreadsheet (TAGS). To allow other to explore the possibilities of data generated by Twitter I’ve released copies of this template to the community.

In September 2012 Twitter announced the release of a new version of their API (the spreadsheet uses this to request data from Twitter). Around the same time Twitter also announced that the old version of their API would be switched off in March 2013. This has required some modification of TAGS to work with the new API. The biggest change for TAGS is that all requests now need authenticated access.

So here it is:

*** Twitter Archive Google Spreadsheet – TAGS v5.1 ***
[If the first link doesn't work try Opening this Spreadsheet and File > Make a copy]

Important: Uldis Bojars (@CaptSolo) has highlighted an issue with TAGS V5.0 which meant it would not page all the results. This has been fixed with the release of v5.1. If you are running version 5.0 of TAGS it's highly recommended you update the code.

Note! Search API isn't 100% reliable:
We find that the search API over-represents the more central users and does not offer an accurate picture of peripheral activity; we also find that the bias is greater for the network of mentions - González-Bailón, Sandra, et al. "Assessing the bias in communication networks sampled from twitter." Available at SSRN 2185134 (2012).

Instructions for setting up TAGSv5

Instructions are included on the Readme/Settings sheet of the template. If you are having problems it's worth checking written by Stacy Blasiola (@Blasiola) or this modified version by Karen Smith & Shanifa Nasser made for Open Data Day Toronto available as CC-BY-SA.

What will happen to my existing TAGS sheets that aren't version 5.0?

When Twitter turn off the old API (test outages this March) all authenticated and unauthenticated search requests will stop working.

How do I upgrade existing versions of TAGS spreadsheets (v3.x to v4.0) to keep collecting beyond March 2013?

As I can’t push an update to existing copies of TAGS you’ll have to manually update by opening your spreadsheet, then opening Tools > Script editor… and replacing the section of code that starts function getTweets() { and finishes 134 lines later (possiblly with the line function twDate(aDate){ ) with the code here. [And yes I know that’s a pain in the ass but best I could do] ... or you can just start a new archive using TAGSv5.0

More additional tips and info when I get a chance

4 Comments

Last week I got a number of enquiries about using Google Forms. One of these was a question from Martin Weller (OU) who is in the process of setting up H817 which is an Open University open course (MOOC) on Openness in Education. He asked:

how do I go from this [Google Form] to an OPML file that feedwordpress can use (also if you have any ideas about how to do autodiscovery of feeds so if people only put in their main url it'll find the feed, that would be great).

My response was to direct Martin to http://opml-generator.appspot.com/ which I used in Generating an OPML RSS bundle from a page of links using Google Spreadsheets. This takes a Google Spreadsheet of RSS feeds and generates an OPML bundle. Part of this solution relies on the user entering their blog RSS feed, which isn’t always known. It’s possible to use RSS auto-discovery, which at the time I didn’t have a scalable solution, but Martin appeared to have enough information to edge forward a bit more. I’ve subsequently come up with this

*** Blog Registration OPML Generator Template ***
[File > Make a copy for your own version]

With this a person can enter their blog url, some code tries to fetch the rss feed and uses the data to generate an OPML file. If you’d like to see it in action fill in this form (embedded below) … and see the generated OPML here (this should update every 5 minutes. It’s also *.tsv but the URL is enough to trick most feed aggregators).

I’ll let you discover uses for this template other than open course registrations.

Registering feeds with FeedWordPress (FWP)

Back on Martin’s Weller’s blog he’s written about is IT support woes. As part of this Alan Levine has kindly added some comments about how FeedWordPress handles blog registrations. The comment that caught my eye was:

I am not sure why you are fiddling with the OPML. You should should copy the URLs and paste them into FWP.

I’m assuming Martin had assumed that generating a OPML file accessible by url would mean FWP would keep checking for new blog registrations. As Martin has subsequently discovered this does not appear to be the case. When you import feeds from a url FWP still prompts you to select which feeds you want to include. This suggests to me that it’s not a fully automated process.

add opml

Alan suggested just copying the urls into FWP. Lets look at how that works. Clicking the ‘add multiple’ source button in the FWP admin lets you dump a blog url, one per line.

add multiple

When you click add FWP visits the blog and tries to auto-discover the feed using the first RSS feed it finds (this is the same trick used in my Google Form). So in short as FWP doesn’t appear to keep checking the OPML file and because you can just dump blog urls Martin may as well just collect these and do any cleaning up after.

So as Martin is discovering there are some very fiddle bits with using FeedWordPress as a open course aggregator and he ends his post with:

One last plea - I joked with Alan that I needed DS106 out of a box. I think I'm serious though - it would be great to have a step by step, idiots guide to installing and setting up a DS106-like environment. The rest of us don't have Alan and Jim's tech skills, so getting to the starting line is difficult. I know they'll say you should invent your own way, but they done so much great work that I don't think they realise just how much expertise they have. A simple installation that let the rest of us get started, would mean we could all go off in different directions then.

This takes me back to Jim Groom’s post on Integrating FeedWordPress with BuddyPress:

No where in the raging discussion around MOOCs is there anyone talking about sharing the infrastructural/architectural work they’ve done freely with others.  CUNY’s Commons-In-a-Box project makes the innovative development work they have been doing for years freely available for others to experiment with. That is the spirit of sharing that seems to me should characterize the groundswell for open, online education

+1 for ds106-In-a-Box

123 Comments

Twitter Archive interfaceLike a growing number of other people I’ve requested and got a complete archive of my tweets from Twitter … well almost complete. The issue is that while Twitter have done a great job of packaging the archives even going as far as creating a search interface powered by HTML and JavaScript as soon as you’ve requested the data it is stale. The other issue is unless you have some webhosting where can you share your archive to give other people access.

Fortunately as Google recently announced site publishing on Google Drive by uploading your Twitter archive to a folder and then sharing the folder so that it’s ‘Public on the web’ you can let other people explore your archive (here’s mine). Note: Mark Sample (@samplereality) has discovered that if you have file conversion on during upload this will break your archive. [You can also use the Public folder in Dropbox if you don’t want to use a Google account]

The documentation wasn’t entirely clear on how to do this. Basically it seems that as long as there’s a index.html file in the folder root and links to subdirectories are relative all you need to do is open the folder in Google Drive and swap the first part of the url with https://googledrive.com/host/ e.g. https://drive.google.com/#folders/0B6GkLMU9sHmLRFk3VGh5Tjc5RzQ becomes https://googledrive.com/host/0B6GkLMU9sHmLRFk3VGh5Tjc5RzQ/

So next we need to keep the data fresh. Looking at how Twitter have put the archive together we can see tweets are stored in /data/js/tweets/ with a file for each months tweets and some metadata about the archive in /data/js/, the most important being tweet_index.js.

Fortunately not only does Google Apps Script provides an easy way to interface Drive and other Google Apps/3rd party services but the syntax is based on JavaScript making it easy to handle the existing data files. Given all of this it’s possible to read the existing data, fetch new status updates and write new data files keeping the archive fresh.

To do all of this I’ve come up with this Google Spreadsheet template:

*** Update Twitter Archive with Google Drive ***
[Once open File > Make a copy for your own copy]

Billy has spotted an inconsistency with the way Drive searches for the files used in the script. This has been addressed in an update to the code but existing news will need to either take and setup a fresh copy or open their existing copy and then open Tools > Script editor and replace the code with version here.
Note: There is currently an open issue which is producing the error message 'We're sorry, a server error occurred. Please wait a bit and try again.' Hopefully the ticket will be resolved soon

The video below hopefully explains how to setup and use:

A nice feature of this solution is that even if you don’t publically share your archive, if you are using the Google Drive app to syncs files with your computer the archive stays fresh on your local machine.

The model this solution uses is also quite interesting. There are a number of ways to create interfaces and apps using Google Apps Script. Writing data files to Google Drive and having a static html coded based interface is ideal for scenarios like this one where you don’t rely on heavy write processes or dynamic content (aware of course that there will be some sanitisation of code).

It would be easy to hook some extra code to push the refreshed files to another webserver or sync my local Google Drive with my webhost but for now I’m happy for Google to host my data ;s

2 Comments

Like other wordpress.com and self-hosted wordpress blogs using Jetpack I got my annual report. Some of the headlines include:

The report then lists the top posts (minus the totals, which are available from the stats panel)

These are the posts that got the most views on JISC CETIS MASHe in 2012.

  1. Export Twitter Followers and Friends using a Google Spreadsheet 77 COMMENTS March 2011
  2. Twitter Archiving Google Spreadsheet TAGS v3 17 COMMENTS January 2012
  3. The best Google Spreadsheet Event Manager (w/h Sites, Contact, Calendar integration) ever, ever, ever 4 COMMENTS November 2010
  4. Twitter: How to archive event hashtags and create an interactive visualization of the conversation 13 COMMENTS November 2011
  5. Using Google Apps Script for a event booking system (Spreadsheet to Calendar & Site | Form to Spreadsheet, Email and possible Contacts) 23 COMMENTS March 2010

These results are biased towards posts that have had the most time to collect views. What about the posts made later in the year? Or, other posts like IFTTT that attracted a lot of traffic in one day?

Remembering Tony Hirst’s Making Use of WordPress Stats which pulls data from the WordPress Stats API into a Google Sheet (Spreadsheet) I created a modification that pulls in a years worth of stats and lets me explore some of these questions. Below is a summary of WordPress Postviews Stats Analysis for MASHe (columns are sortable by clicking on the column heading).

Notes

  • AVERAGE of Views is the average post views per day. Days with no views are not counted
  • MAX of Views is the maximum post views in one day
  • SUM of Views is the total number of views a post has received
  • COUNTA of Views is the total number of days a post received at least one view

This reveals posts like Feeding Google Spreadsheets (from October 2012) and Mozilla Open Badges Issuer Gadget for Google Sites (December 2012) have good average daily views basic projections putting them in the top 10 for 2013.

There’s more digging to be done, but in the meantime if you want to have a look at your own wordpress stats grab a copy of the template below and have a go (and if you make any improvements/useful insights leave a note ;)

*** WordPress Postviews Stats Analysis Template ***
[File > Make a copy for an editable version]

Creation notes

The code comments highlight the main gotchas:

function wordpressStatPostviews(api_key, blog_uri, end, days, limit) {
  // build url for api. Found better results using json rather than csv (csv didn't get  years worth)
  var url = "http://stats.wordpress.com/csv.php?api_key="+api_key+"&blog_uri="+blog_uri+"&end="+Utilities.formatDate(end, "GMT", "yyyy-MM-dd")+"&days="+days+"&limit="+limit+"&table=postviews&format=json";
  var output = [["Link","Title","Title - Short","Date","Views"]]; // initialise return array
  try {
    var options = { "method" : "get" }; // initialise options for UrlFetchApp
    var response = UrlFetchApp.fetch(url, options); // fetch url
    if (response.getResponseCode() == 200) { // if response
      var raw = Utilities.jsonParse(response.getContentText()); // parse text respose into json
      raw.reverse(); // reorder to oldest first - no real need but helped with debugging
      for (i in raw){ // data is returned as object per day so iterate across
        var postdate = raw[i].date; // pull post date (as string) considered converting to date object but not required
        for (j in raw[i].postviews){ // in each date object there is array of views per post, interate across to push into returned 2d array 
          var apost = raw[i].postviews[j]; // abreviating
          // noticed a bug in returned data. Views for homepage have correct post_id but post_title is pulled from last interation
          var title = apost.post_id === 0 ? "Homepage" : apost.post_title; 
          // creating a short title for published table (no cell wrapping)
          var titleShort = title.length > 50 ? title.substring(0,50)+"..." : title;
          // push row to output
          output.push([blog_uri+"/?p="+apost.post_id,title, titleShort, postdate,  apost.views]);
        }
      }
      return output;
    }
  } catch(e) {
    throw e;
  }
}