Pushing Canvas LMS discussion data to Google Analytics (tips on Google Analytics API integration, batch collection and queue time)

In this post I’m going to show you how you can extract data from a Canvas LMS discussion board and push to Google Analytics. This technique uses the Google Analytics Measurement Protocol which means you do not need to be able to access any existing Google Analytics tracking as you will be pushing the data to your own Google Analytics account. This post is designed to highlight the possibilities around integrating with third party APIs asynchronously with Google Analytics and batch sending tracking data whilst preserving the original activity timestamp. This post was inspired by Nico Miceli’s Slackalytics – a Slack Analysis bot with Google Analytics Integration.

I’ve previously demonstrated how you can extract Canvas discussions for analysis with NodeXL, and if you are interested in Learning Analytics that post is a better starting point. As Google Analytics is an advanced bean counter there is a limit to the data you can send and analyse. An important limitation to highlight at the very beginning is that:

The Analytics terms of service, which all Analytics customers must adhere to, prohibits sending personally identifiable information (PII) to Analytics (such as names, social security numbers, email addresses, or any similar data)

Whilst personal information is prohibited it’s worth knowing that unique identifiers such as a user ids are allowed and these can be remapped to personal information when data is exported from Google Analytics.

In this example we are going to extract very basic measures from the discussion data sending word and question mark counts to illustrate a general method. I make no claim in being an expert in discourse analysis and look forward to see if others can take this idea forward. Before going any further I should also highlight the Canvas API usage policy particularly around privacy. My advice if you are going to apply this technique on your own Canvas course is make it clear to students that data (beyond the usual Canvas reporting) will be collected with an option to opt-out (I’ve not done this in my example but feel it’s inline with the spirit of the source data and only used as proof of concept).

Getting technical

In more technical terms what we are doing is running a script every 15 minutes which calls the Canvas Discussion Topics API, looks for new messages since it last run and if there are any processes them to extract some counts before sending to Google Analytics in a batch. To do this I’m using Google Apps Script which is a free cloud based coding environment that lives in Google Drive. If you haven’t come across Google Apps Script before it uses a JavaScript syntax so it’ll look familiar to most developers and is supercharged with a long list of built in services. If you’d like a peak at the completed project the code is here and you can File > Make a copy to setup/run the code.

The Analytics Measurement Protocol is simple to use and only requires sending your required parameters as a payload to a url endpoint. Here are the required values for all hits to the measurement protocol:

NameParameterExampleDescription
Protocol Versionvv=1The protocol version. The value should be 1.
Tracking IDtidtid=UA-123456-1The ID that distinguishes to which Google Analytics property to send data.
Client IDcidcid=xxxxxAn ID unique to a particular user.
Hit Typett=pageviewThe type of interaction collected for a particular user.

In the Measurement Protocol Parameter Reference you can find some optional extras, in particular we are going to use qt which is ‘Queue Time‘:

Used to collect offline / latent hits. The value represents the time delta (in milliseconds) between when the hit being reported occurred and the time the hit was sent. The value must be greater than or equal to 0. Values greater than four hours may lead to hits not being processed.

This means that while our script is only running every 15 minutes we can use the timestamp on the discussion thread to create an offset so the true time is recorded in Google Analytics. As we are running the script every 15 minutes and Google Apps Script has quotas for the server time a script can run (from 1 hour/day) and number of urls it can hit (from 20,000/day), it makes sense to batch multiple hits in a single request.

The bulk of the code in this example handles interacting with the Canvas API. When the script finds new messages it calls buildQueryForGA() which builds our payload of parameters.  In this example the data sent to Google Analytics is structured as follows (written as an object before being converted to a querystring):

//The Structure Data! This is where are the pretty GA data gets gathered
    //before it is sent to the GA servers for us to analyse at a later time.
    var data = {
        v:      1, // protocol version
        tid:    GA_TRACKING_ID, // <-- ADD UA NUMBER
        cid:    user.id,      // client id (user identifier) 
        ds:     "Canvas",     // data source
        cs:     "Canvas",     // campaign source
        cd1:    user.id,      // custom dimension 1 <-needs to be setup in GA
        cd2:    channel.name, // custom dimension 2 <-needs to be setup in GA
        cm1:    wordCount,    // custom metric 1 <-needs to be setup in GA
        cm2:    questionMark, // custom metric 2 <-needs to be setup in GA
        t:      "event",      // hit type
        ec:     "Canvas Disc.: "+ channel.name + "|" + channel.id, // event category
        ea:     "post by " + user.id,  // event action
        el:     topic_title,  // event label
        ev:     1             // event value
    };

From the above snippet you’ll notice it uses custom dimensions and metrics. If you haven’t set these up in Google Analytics before I’ll touch upon this later.

Rather than sending our payload it’s stored in an array with the extracted message timestamp using addToGABatch() (detailed below). When the script has 20 queries, or when called directly, processGABatch() appends the queue time to the query and sends the data to Google Analytics.

// function that adds our GA call to a queue and sends when it hits 20
function addToGABatch(query, time){
  GA_BATCH.push({query: query, time:time});
  if (GA_BATCH.length >= 20){
    processGABatch(); 
  }
}

// as we are using queue time adding this to our query string before sending to GA
function processGABatch(){
  var payload = "";
  var ga_now = new Date().getTime() - time_warp;
  for (var i=0; i < GA_BATCH.length; i++){
    payload += GA_BATCH[i].query + "&qt=" + (ga_now - GA_BATCH[i].time) + "\n";
  }
  try {
    var options = {'method' : 'POST',
                 'payload' : payload };
    UrlFetchApp.fetch('https://ssl.google-analytics.com/batch', options);
    GA_BATCH = [];
  } catch(e) {
    // log/throw something 
  }
}

Google Analytics Reporting

As I was using discussion data from a course in 2013 I’ve included a time warp to allow the playback of the conversation into Google Analytics. Running this on the LAK13 course (https://learn.canvas.net/courses/33) the out-of-the-box event reporting gives and overview of popular threads and the number of posts over time.

Google Analytics - Event Label reporting

And there might be something in the Event Flow worth further exploration

Google Analytics - Event Flow

Where it perhaps gets even more interesting is around Custom Reports using the custom dimensions and metrics. For example, the screenshot below incorporates counts for thread replies, words and question marks but you could use your own.

Google Analytics - Custom Dimension and Metrics

If you are not familiar with setting up custom reports, metrics and dimensions two Google Analytics support articles are:

There is clearly more work to be done with this concept but hopefully you can see the opportunities of using Google Analytics as an analysis engine for data from third party APIs … remembering of course to follow the usage policies of both Google Analytics and those other APIs. If you miss the link here is all the source code for this project (open the link and File > Make a copy to tweak your own).

2 Comments


  1. Very cool, once again. I love the focus on the Canvas open API as it’s a straightforward way to pull precisely the data you want in near real-time.

    You may not have heard, but Canvas admins can also access data in bulk (star schema) through the Canvas Data service (daily flat files or direct access via Redshift).

    If you don’t have access to Canvas as an admin you can still play around with real data in a similar structure: We recently made a Canvas Network open course dataset a available to researchers like yourself on Dataverse: https://dataverse.harvard.edu/dataverse/cn

    I daresay the “restricted” dataset is already deidentified sufficiently to satisfy Google’s terms of use.

    Check it out and feel free to email me if we can help you explore further.


    1. Hi Jared – thanks for the additional links. I was aware of the Canvas Network data but hadn’t looked at yet. I’m not a Canvas admin but while I was writing the post thought there would be something extra for them. It’ll be interesting to see if anyone develops this concept further or the other Canvas data that’s available :)

Comments are closed.