Writing another blog post today which included reference to Google Analytics I pondered:
.@briankelly how many institutions are using google analytics? #iwmw12
— Martin Hawksey (@mhawksey) June 7, 2012
The response wasn’t promising:
RT @mhawksey: "how many institutions are using google analytics? #iwmw12". Anyone know answer?Suggest add tag #ganalyticsyes
— Brian Kelly (@briankelly) June 7, 2012
My thought was to detect Google Analytics urchin code from website homepages. Knowing Tony Hirst had done something I asked and at 4:08pm the response was:
@briankelly @mhawksey looking for urchin: no, but could extend HE homepage feed auto detect scraper wiki to do is?
— Tony Hirst (@psychemedia) June 7, 2012
@mhawksey @briankelly there’s also a complementary view that maybe needs revising… views.scraperwiki.com/run/uk_he_feed…
— Tony Hirst (@psychemedia) June 7, 2012
At 4:17pm
@psychemedia @briankelly 1st pass bit.ly/Nk9HVd
— Martin Hawksey (@mhawksey) June 7, 2012
So how was it done?
I didn’t like the prospect of tweaking Tony’s scraperwiki code but spotted he was getting a list of institutions from Universities UK. Using the Scraper Chrome Extension I was able to export all the institution urls to a Google Spreadsheet:
Having played around with Google Analytics before I knew if the site was using Google Analytics it would have a unique profile id in the source in the format UA-XXXXXX-X and found this regular expression to extract it using the following Google Apps Script:
function getUA(url) { var requestData = { method : "get", headers: { "User-Agent":"http://docs.google.com"} }; var html = UrlFetchApp.fetch(url,requestData).getContentText(); var urlPattern = /\bUA-\d{4,10}-\d{1,4}\b/ig; return html.match(urlPattern)[0]; }
I could then use a custom formula in column C to extract an urchin code from a website. This worked for most sites but I got a couple of errors for sites not using Google Analytics. Validating some of the results I noticed that it was because the UrlFetchApp wasn’t following browser redirects e.g. http://www.cardiffmet.ac.uk/ redirects to http://www3.cardiffmet.ac.uk/English/Pages/home2.aspx. This is a problem I’ve had before so recycled the code below which uses expandurl.appspot.com to follow a link to the destination.
function extractLink(text){ // create a url pattern var urlPattern = /(\b(https?|ftp|file):\/\/[-A-Z0-9+&@#\/%?=~_|!:,.;]*[-A-Z0-9+&@#\/%=~_|])/ig; var feedproxyPattern = /(\b(http:\/\/feedproxy.google.com))/i; // extract link from email msg var url = text.match(urlPattern)[0]; //if (feedproxyPattern.test(url)){ // if feedproxy url see if cached (or resolve end url) var cache = CacheService.getPublicCache(); // using Cache service to prevent too many urlfetch var cached = cache.get(url); if (cached != null) { // if value in cache return it return cached; } var requestData = { method : "get", headers: { "User-Agent":"GmailProductivitySheet - Google Apps Script"} }; try { // try and get link endpoint using http://expandurl.appspot.com/ var result = UrlFetchApp.fetch("http://expandurl.appspot.com/expand?url="+encodeURIComponent(url), requestData); var j = Utilities.jsonParse(result.getContentText()); var link = (result.getResponseCode()===200)? Utilities.jsonParse(result.getContentText()).end_url:url; } catch(e) { // if http://expandurl.appspot.com/ doesn't work just return extracted url var link = url; } cache.put(url, link, 3600); return link; //} return url; }
Using this formula in column D for the error results I got a fresh url to point the getUA function. Here’s the final spreadsheet (I’ve copied/pasted as values some of the formula results to save my quota) and the answer to my question:
134 institutional websites, 118 (88%) with Google Analytics code
But as Ranjit Sidhu reminded me
@mhawksey @briankelly but how many are ACTUALLY using GA more then then they did logfiles ?n
— Ranjit Sidhu (@rssidhu) June 7, 2012
Tony Hirst
I added a table to https://scraperwiki.com/scrapers/uk_university_autodiscoverable_rss_feeds/ that lists the UA code for each uni. There was one site that seemed to break mechanise, somehow – chiuni.ac.uk
Tony Hirst
Hmmm, I also got different results. My uni table has 136 records, and I found 115 GA tracking codes…
Martin Hawksey
But how long did it take you? 😉 I did wonder about accuracy. When I’m on a PC next it would be worth doing a comparison
Paul Walk
This demonstrates that many institutions installed the Google Analytics JavaScript call in their site’s source at some point. It doesn’t follow that they are using Google Analytics…
Martin Hawksey
‘Using’ being the operative word. How many institutions are actually using analytics as part of their informed decision making?
Martin