88% of Universities UK members use Google Analytics: How I calculated the answer in under 10 minutes

Writing another blog post today which included reference to Google Analytics I pondered:

The response wasn’t promising:

My thought was to detect Google Analytics urchin code from website homepages. Knowing Tony Hirst had done something I asked and at 4:08pm the response was:

At 4:17pm

So how was it done?

I didn’t like the prospect of tweaking Tony’s scraperwiki code but spotted he was getting a list of institutions from Universities UK. Using the Scraper Chrome Extension I was able to export all the institution urls to a Google Spreadsheet:

Scraper Window

Having played around with Google Analytics before I knew if the site was using Google Analytics it would have a unique profile id in the source in the format UA-XXXXXX-X and found this regular expression to extract it using the following Google Apps Script:

function getUA(url) {
  var requestData = {
          method : "get",
          headers: { "User-Agent":"http://docs.google.com"}
        };
  var html = UrlFetchApp.fetch(url,requestData).getContentText();
  var urlPattern = /\bUA-\d{4,10}-\d{1,4}\b/ig;
  return html.match(urlPattern)[0];
}


I could then use a custom formula in column C to extract an urchin code from a website. This worked for most sites but I got a couple of errors for sites not using Google Analytics. Validating some of the results I noticed that it was because the UrlFetchApp wasn’t following browser redirects e.g. http://www.cardiffmet.ac.uk/ redirects to http://www3.cardiffmet.ac.uk/English/Pages/home2.aspx. This is a problem I’ve had before so recycled the code below which uses expandurl.appspot.com to follow a link to the destination.

function extractLink(text){
  // create a url pattern
  var urlPattern = /(\b(https?|ftp|file):\/\/[-A-Z0-9+&@#\/%?=~_|!:,.;]*[-A-Z0-9+&@#\/%=~_|])/ig;
  var feedproxyPattern = /(\b(http:\/\/feedproxy.google.com))/i;
  // extract link from email msg
  var url = text.match(urlPattern)[0];
  //if (feedproxyPattern.test(url)){
   // if feedproxy url see if cached (or resolve end url)
   var cache = CacheService.getPublicCache(); // using Cache service to prevent too many urlfetch
    var cached = cache.get(url);
    if (cached != null) { // if value in cache return it
      return cached;
    }
    var requestData = {
                        method : "get",
                        headers: { "User-Agent":"GmailProductivitySheet - Google Apps Script"}
                      };
    try {
      // try and get link endpoint using http://expandurl.appspot.com/
      var result = UrlFetchApp.fetch("http://expandurl.appspot.com/expand?url="+encodeURIComponent(url), requestData);
      var j = Utilities.jsonParse(result.getContentText());
      var link = (result.getResponseCode()===200)? Utilities.jsonParse(result.getContentText()).end_url:url;
    } catch(e) {
      // if http://expandurl.appspot.com/ doesn't work just return extracted url
      var link = url;
    }
    cache.put(url, link, 3600);
    return link;
  //}
  return url;
}

Using this formula in column D for the error results I got a fresh url to point the getUA function. Here’s the final spreadsheet (I’ve copied/pasted as values some of the formula results to save my quota) and the answer to my question:

134 institutional websites, 118 (88%) with Google Analytics code

But as Ranjit Sidhu reminded me

5 thoughts on “88% of Universities UK members use Google Analytics: How I calculated the answer in under 10 minutes

  1. Tony Hirst

    Hmmm, I also got different results. My uni table has 136 records, and I found 115 GA tracking codes...

  2. This demonstrates that many institutions installed the Google Analytics JavaScript call in their site's source at some point. It doesn't follow that they are using Google Analytics...

Comments are closed.