Integrating Google Spreadsheet/Apps Script with Google Refine to update existing spreadsheets

Working on some OER Visualisation Project work today I found I needed to get some additional information from an external web service for a Google Spreadsheet I was working on. I could have of course just used Google Apps Script, a native feature within Google Spreadsheets, to do a UrlFetch and get the information but my source data was a bit messy and timeouts on Apps Script prevent you from getting lots of data in one hit. Instead it made more sense to download the data into Google Refine, clean it up, call external web services to get additional information, then somehow get it back into the original spreadsheet.

As I recently posted Integrating Google Spreadsheet/Apps Script with R: Enabling social network analysis in TAGS it wasn’t much of a cognitive leap to try a similar trick of publishing the spreadsheet as a service to allow a basic API write. In the R example, however, I used the POST method to pass data and whilst I’m pretty sure with the right Jython libraries you could do something similar in Refine, I wanted to keep it simple so here’s my method for using GET instead.

  1. In a Google Spreadsheet open Tools >  Script editor.. and paste the following code (there is some tweaking required to where you want the data to go and the secret passphrase – in this example I want to pass back a value named geo 
  2. Once tweaked Run > Setup (this get a copy of the spreadsheet id need to run as a service)
  3. Open Share > Publish as service..  and check ‘Allow anyone to invoke’ with ‘anonymous access’, not forgetting to ‘enable service’. You’ll need a copy of the service URL for later on. Click ‘Save’
    Publish as service
  4. Back in Google Refine select a column with some data you want to pass and choose Edit column > Add column by fetching URLs…
  5. The expression you use will be something like the one below where we have the service url from step 3, the ‘secret’ to match the one set in step 1 and what ever data you want to include 
    "https://docs.google.com/macros/exec?service=AKfycbw1VL_kNYEwhqELgccMgEvM9gJ3t1zhBr6bhLs84g&secret=lemon&idx="+cells['idx'].value+"&geo="+escape(value,"url")
    Add column dialog

Note: You’ll see from the script my import included an idx number which matched a row number, but as long as I didn’t add any additional rows or change the order I could have used row.index in the expression instead.

You also might want to play around with the throttling. I had problems with my first go because this was either set too low or my values weren’t url encoded.

Instead if anyone finds this useful or if they show how to do in Jython. The amount of Google Spreadsheet work I do I’m sure I’ll use this trick again ;)  

Last updated by at .

2 Responses to “Integrating Google Spreadsheet/Apps Script with Google Refine to update existing spreadsheets”


Comments are currently closed.

About

This blog is authored by Martin Hawksey Google+

JISC CETIS Learning Technology Advisor (OER Programme Support)
jisc cetis logo

The MASHezine (tabloid)

It's back! A tabloid edition of the latest posts in PDF format (complete with QR Codes). Click here to view the MASHezine

Preview powered by:
Bluga.net Webthumb

The MASHebook

You can also download this post as:

Subscribe to monthly email digest of posts

Loading...Loading...


Subscribe to per post email updates

Enter your email address:

Delivered by FeedBurner

Copyright License

Creative Commons Licence
This work is licensed under a Creative Commons Attribution 3.0 Unported License. CC-BY mhawksey

Privacy /Cookies

This blog uses Google Analytics (which makes use of 'cookie' technologies) to provide information on usage. Here's an overview of Google Analytics Privacy and how to opt-out (other 3rd party services like Twitter might also be tracking you via this site, but as far as possible I try and prevent this by removing official tweet buttons).

Badges

. . .