Google Spreadsheet

As part of the latest release of TAGS (Twitter Archiving Google Sheet) I moved a lot of the code into a Google Apps Script Library. Libraries are a great way to bundle and release code allowing you to write your own classes and methods. To see some of the functionality already being developed by the Apps Script community you should have a look at the Google Apps Script Samples site and the excellent work Bruce McPherson has done which includes a EzyOauth2 library.

One of the things you can do with libraries is wrap one library into another. When rewriting TAGS it made sense to strip out a separate Twitter client library that I and others could use in different projects. Based on the work by Arun Nagarajan at Google, TwtrService provides  access to Twitter's REST API. The work I’ve done is to add some UI and configuration methods to try to streamline the authentication flow. As part of this developers can offer authentication routes using their own Twitter application or use an application created by users. This particular angle is a result of one of the design principles for TAGS, that every copy of the template should use a Twitter application owned by the user. The reason behind this is to distribute the risk. If Twitter were to suspend my data access because a TAGS user abused their API it would suspend access for all TAGS users. By requiring TAGS users to register their own application with Twitter the responsibility to abide by Twitter’s terms of service lies with them. So in TAGS the auth flow looks like this

The result is hopefully a flexible library that developers can integrate into their own projects or by getting users to register their own.

Over the next couple of weeks I'll be sharing some examples applications we've developed at ALT. In the meantime this post serves as a basic introduction to TwtrService and covers:

Overview of TwtrService

The TwtrService library for Google Apps Script centrally stores your Twitter access details allowing them to accessed from multiple script projects without the need for re-authentication. TwtrService is designed to allow you to directly use the Twitter’s v1.1 REST API GET and POST methods. For example to return Twitter search results for the search ‘Google Apps Script’ you would use:

var data = TwtrService.get('https://api.twitter.com/1.1/search/tweets.json', {q: 'Google Apps Script'});

The url string can also be abbreviated to:

var data = TwtrService.get('search/tweets', {q: 'Google Apps Script'});

Additional options can be passed in the parameters array. For example to return 100 results for the same search you would use:

var data = TwtrService.get('search/tweets', {q: 'Google Apps Script', count: 100});

The project key for this library is MarIlVOhstkJA6QjPgCWAHIq9hSqx7jwh and the TwtrService methods are documented here.

To use the Twitter REST methods TwtrService first needs authenticated access. TwtrService has some built-in methods to do this detailed below. Once a user has authenticated access the TwtrService library stores these as User Properties. This means when a user has authenticated once with TwtrService using the library in another container-bound or standalone Apps Script immediately gives them access to Twitter API results using the get/post methods. In terms of security User Properties are limited to the effective user of the current script.

Usage

Quick start: Personal access

If you would just like to use TwtrService for your Apps Script projects the easiest way to get started is to register a Twitter application and enter it’s details on this page (if you are interested here is the source code for the interface).

Note: If you are already a user of TAGS you’ll already be able to use TwtrService without the step above.

In your Apps Script project you’d like to use the Twitter API in the Script Editor window use Resources > Libraries and add the service  using the project key MarIlVOhstkJA6QjPgCWAHIq9hSqx7jwh.

In your project you can now use the TwtrService.get() and TwtrService.post() methods. The documentation for get() is detailed below (post is the same but uses HTTP POST):

get(string url, Object parameters)

GET wrapper for request to the Twitter REST API. For full documentation of API method endpoints and parameters see https://dev.twitter.com/rest/public. For example to get last 100 tweets containing 'Google Apps Script': var data = TwtrService.get('search/tweets', {q: 'Google Apps Script', count: 100});

Arguments:
Name Type Description
url string Twitter REST API resource url. This can be either long form e.g. https://api.twitter.com/1.1/search/tweets.json or abbreviated e.g. search/tweets
parameters Object additional API parameters as detailed in the Twitter REST API documentation e.g. for search results a search string and count is specified by {q: 'Google Apps Script', count: 100}.
Return Values:
Type Description
Object API response in JSON format.

Quick start: Personal access in Sheets and Documents

If you would like to replicate the TAGS authentication flow where users enter their Twitter application key/secret TwtrService comes with a number of UI methods. For TAGS the following code is used:

/**
* Launches key/secret and auth flow
*/
function setup() {
  if (TwtrService.isUserConnectedToTwitter()){
   var result = Browser.msgBox("Twitter Authorisation", 
                   "You appear to already be connected to Twitter.\\n\\nWould you like to run the setup again?", 
                   Browser.Buttons.YES_NO);
    // Process the user's response.
    if (result == 'yes') {
      // User clicked "Yes".
      TwtrService.showTwitterKeySecret(SpreadsheetApp);
    } 
  } else {
    TwtrService.showTwitterKeySecret(SpreadsheetApp);
  }
}

/**
* Used as part of setup() to process form data
*/
function processForm(formObject) {
  TwtrService.setUserKeySecret(formObject);
  TwtrService.showTwitterLogin(SpreadsheetApp);
}

Quick Start: Shared Twitter Key/Secret

The earlier examples have assumed the user registers their own Twitter application. For scenarios where you would like to have the option for users to have authenticated access using a dedicated Twitter API key/secret it is possible to initialize these values. An example application code can be found here which is also deployed here.

Similar to earlier examples once a user authenticates access with your key/secret as long as these values are also initialized in other script projects the user will have access to the Twitter API via TwtrService.

Instructions for creating a Twitter Application to use with TwtrService

TwtrService requires you to have a registered Twitter application. If you are If you haven’t already done this here are some steps you need to get started:

  1. Register for an API key with Twitter at https://dev.twitter.com/apps/new (if you've already registered for a TAGS sheet you can reuse your existing API Key and Secret).
    • Name, description and website can be anything you like
    • Important Include the Callback URL https://script.google.com/macros/
  2. Read the 'Developer Rules of the Road' before clicking 'Create your Twitter application'

On the Twitter site your application should include a ‘Keys and Access Tokens’ tab which includes the Consumer Key (API Key) and Consumer Secret (API Secret) you’ll need.

4 Comments

Get TAGS

Amazing to think TAGS has been going for over 4 years now. Version 6 is a major code rewrite and the most important new feature is a better setup processes. No more digging into the script editor to run functions, no more entering your Twitter API key and secret each time you create a new archive. Both these things are history thanks to some base code released by Arun Nagarajan at Google. Now you enter/register for a Twitter API key and secret once and each copy of TAGS you make will remember these. This is made possible by incorporating this functionality as a custom Apps Script library I’ve called TwtrService. TwtrService makes it easy to make calls to all of Twitter’s API and I’ll be explaining how it works in another post.

Version 6 comes with some other new features. The one that was most requested was archiving favourited tweets. There was a quick hack version of TAGS that did this but was limited to the last 200. Now when you setup a favourite archive you can get up to the last 3,000 favourited tweets. Another option with TAGS v6.0 is to use Google’s new version of Sheets. This gives TAGS more capacity and performance. One issue however with new Sheets is it isn’t very stable with the Google Visualisation API which is used in TAGSExplorer.

With TAGS v6.0 I’ve also created a dedicated support site. So if you have any questions or need help head over to http://tags.hawksey.info where you can also get the latest version of TAGS.

1 Comment

Thanks to a recommendation by Josef Šlerka (@josefslerka) I got a chance to speak at WebExpo 14 (#webexpo) about some of my work using Google Sheets for Twitter data mining and analysis. I’m always get uneasy at speaking at events for web professionals/developers about spreadsheets as … it is spreadsheets.

Dangers aside, particularly ‘spreadsheet addiction’, hopefully I was able to covey  the possibilities of using Google Apps Script turbocharged Google Sheets. To do this I focused on some of the work I’ve been doing interacting with the Twitter search API. I’m sure many people are familiar with the Twitter Archiving Google Sheet (TAGS) but perhaps not aware of some of the experiments I’ve used it for. Some of these include:

(I was surprised myself to discover the whole project started in 2010)

After my talk it was great that some fellow educators came to speak to me interested in using Google Apps Script for social data mining as part of their curriculum. Slides for my talk and abstract are at the end of this post and the link bundle is here and the #webexpo TAGS archive is here.

Thanks also to my fellow GDE Ivan Kutil for capturing a picture of me dwarfed by one of my own sheets.

IMG_20140913_161030

.

 

There is growing interest in the use of data to provide actionable insight. This interest goes beyond the professional analysts and just as fields such as mathematics and astronomy have benefited from the enthusiastic amateur so does data science. Social networks are a rich playground of data and whilst many provide access to their data via APIs but access via this route can be daunting. You can of course turn to 'analytics as a service' sites which will take your credentials and provide you with some answers, but often this can be what they want to tell you and not what you want to hear. A solution is the spreadsheet. Spreadsheets provide an interface for data exploration for those with basic skills. With Google Sheets the opportunities increase exponentially, not just in terms of collaboration, but also with the power of Google Apps Script. Apps Script provides easy integration into other Google products and services, such as Google Analytics, as well as third party APIs like Twitter. In this presentation we show how Google Sheets can become a rich playground where data from different services can be collected and analysed.

1 Comment

Back in 2011 I showed how you can use Google Apps Script to write POST/GET data to a Google Sheet. Over the years a couple of things have changed in Apps Script so I thought it was worth a revisit.  The main changes are:

The core concept behind the script is the same. You have a Google Sheet with a set of header column names that matches the names of the data you are passing through. For example if I had a form with:

<input name="bar" type="text" value="" />

I'd need a sheet with the column name 'bar'. For this post I’m going to assume we use a container bound Apps Script in a Google Sheet, but you could easily modify this for a standalone script. So to start you can either create or open an existing Sheet and click Tools > Script editor and enter the code below or copy this template.

Usage

There are a couple of ways you can use this script to collect data. You could use a very traditional HTML form using the web app url as the action parameter. This would send users to a very unattractive JSON response which you could alternatively beautify using the HTMLService. A nicer solution is to use AJAX to submit the data without refreshing or moving page. Below is a simple form based on this Stackoverflow jQuery Ajax POST example which sends responses to this Google Sheet (if you are reading this via RSS/Email you need to visit this post):

The only real change to the stackoverflow example is to specify the destination web app url:

// fire off the request to /form.php
		request = $.ajax({
			url: "https://script.google.com/macros/s/AKfycbzV--xTooSkBLufMs4AnrCTdwZxVNtycTE4JNtaCze2UijXAg8/exec",
			type: "post",
			data: serializedData
		});

The example is using POST but you can also use GET. There is more you can do when handling the data at the Apps Script end other than writing to a Google Sheet. For example, if you wanted to send an email on each submission you could use the MailApp service and add something like:

MailApp.sendEmail("youremailaddress", "a subject", JSON.stringify(e.parameters));

in the try statement. If you do this there are a couple of things to remember. First Apps Script web apps using versioning. This means changes to your script are not 'live' until you push a new version. To do this you need to save your new script and then from the Script Editor select File > Manage versions... and 'Save New Version' before going into Publish > Deploy as web app and updating Project Version. Also when you add new services to your script the authentication scope changes and you need to approve additional services. For example, if you add the MailApp service to your code you need to give permission to send email. The easiest way to trigger this in this example is in the Script Editor Run > setup. I'm sure there are other trip ups but hopefully this gets you most of the way

Google Sheet/Apps Script Code

//  1. Enter sheet name where data is to be written below
        var SHEET_NAME = "Sheet1";
        
//  2. Run > setup
//
//  3. Publish > Deploy as web app 
//    - enter Project Version name and click 'Save New Version' 
//    - set security level and enable service (most likely execute as 'me' and access 'anyone, even anonymously) 
//
//  4. Copy the 'Current web app URL' and post this in your form/script action 
//
//  5. Insert column names on your destination sheet matching the parameter names of the data you are passing in (exactly matching case)

var SCRIPT_PROP = PropertiesService.getScriptProperties(); // new property service

// If you don't want to expose either GET or POST methods you can comment out the appropriate function
function doGet(e){
  return handleResponse(e);
}

function doPost(e){
  return handleResponse(e);
}

function handleResponse(e) {
  // shortly after my original solution Google announced the LockService[1]
  // this prevents concurrent access overwritting data
  // [1] http://googleappsdeveloper.blogspot.co.uk/2011/10/concurrency-and-google-apps-script.html
  // we want a public lock, one that locks for all invocations
  var lock = LockService.getPublicLock();
  lock.waitLock(30000);  // wait 30 seconds before conceding defeat.
  
  try {
    // next set where we write the data - you could write to multiple/alternate destinations
    var doc = SpreadsheetApp.openById(SCRIPT_PROP.getProperty("key"));
    var sheet = doc.getSheetByName(SHEET_NAME);
    
    // we'll assume header is in row 1 but you can override with header_row in GET/POST data
    var headRow = e.parameter.header_row || 1;
    var headers = sheet.getRange(1, 1, 1, sheet.getLastColumn()).getValues()[0];
    var nextRow = sheet.getLastRow()+1; // get next row
    var row = []; 
    // loop through the header columns
    for (i in headers){
      if (headers[i] == "Timestamp"){ // special case if you include a 'Timestamp' column
        row.push(new Date());
      } else { // else use header name to get data
        row.push(e.parameter[headers[i]]);
      }
    }
    // more efficient to set values as [][] array than individually
    sheet.getRange(nextRow, 1, 1, row.length).setValues([row]);
    // return json success results
    return ContentService
          .createTextOutput(JSON.stringify({"result":"success", "row": nextRow}))
          .setMimeType(ContentService.MimeType.JSON);
  } catch(e){
    // if error return this
    return ContentService
          .createTextOutput(JSON.stringify({"result":"error", "error": e}))
          .setMimeType(ContentService.MimeType.JSON);
  } finally { //release lock
    lock.releaseLock();
  }
}

function setup() {
    var doc = SpreadsheetApp.getActiveSpreadsheet();
    SCRIPT_PROP.setProperty("key", doc.getId());
}

4 Comments

import.io is a nice service I’ve been dipping into for a while. It’s one of a number of services that provides structured web data scraping. One of the nice features of import.io is it:

transform any website into a table of data or an API in minutes without even writing any code

You load a webpage with their web browser app and start highlighting the parts of the page you’d like to extract. Int3rhacktives has a nice How to scrape data without coding? A step by step tutorial on import.io if you want to find out more.

Once you have the data you want extracted import.io continue to try and keep the bar low allowing easy data download in various formats including .csv. and if you want to use live data there are example itegrations for Excel, Google Sheet and other programming languages.

Looking more closely at the Google Sheet integration import.io document a method that uses their REST API’s HTML table output which is then wrapped in a Google Sheet importHTML formula e.g.

=ImportHtml("https://query.import.io/store/connector/48fd118b-7572-44a6-816c-8f02d088fb6a/_query?_user=5895d593-9461-4b8b-8452-95bb82458bd2&_apikey=YOUR_API_KEY&format=HTML&input/webpage/url=http%3A%2F%2Fwww.scoop.it%2Ft%2Fgas", "table", 1)

import.io easy as 1, 2

I’m a big fan of Google Sheet ‘import’ and have some tutorials on these. The ‘import’ formula are useful for quick results but not appropriate if you need to do additional manipulation or integration into other automated workflows. import.io do have a number of client libraries and code examples you can look at to address this but the one I thought was missing was one for Google Apps Script. One of the great strengths of Apps Script is it’s easy to create time-based routines to pull and push data around as and when needed. So based on import.io’s php example here’s what it would look like in Google Apps Script.

You can read the Google Apps Script Documentation to find out more about what you can do with the result.  Something the guys at import.io might want to think about is creating a Google Apps Script Library. Similar to their other client libraries it will again lower the bar for developers. As a starter I’ve implemented the query method here which means anyone creating a Apps Script project and including a library using the Project Key: M2ZyMvVZdgKdy3SaLP8gq3X797_hv7HHb could just use:

function getImportioExample(){
  // Query for tile Integrate Page Example
  var result = importio.query("caff10dc-3bf8-402e-b1b8-c799a77c3e8c", {"searchterm": "avengers 2",}, userGuid, apiKey, false);
  Logger.log(result);
}

with the benefit of also getting a code autocomplete:

autocomplete

If you've already got Google Apps Script/import.io integrations I'd love to hear about them. Hopefully I'll follow-up this post with an example automation to illustrate what is possible.

 

repeating seriesSometimes it’s useful to generate a column of data based on a series repeating x number of times e.g. a series 1,2,3 repeated 3 times would give 1, 1, 1, 2, 2, 2, 3, 3, 3 (see column A in here for example). In my particular scenario I want to repeat week numbers for a series from 0-6. There are a number of ways you can do this like indexing row numbers but here’s a little formula I quickly threw together for Google Sheets:

=TRANSPOSE(SPLIT(JOIN(",", ARRAYFORMULA(REPT(SPLIT(D2,",")&","D3))), ","))

where

  • D2 is a comma separated series e.g. Week 0,Week 1, Week …
  • D3 is the number of times to repeat

How does it work?

Like a lot of spreadsheet formula is starts in the middle with SPLIT(D2,",") which turns our series of values into an array. If you use this in a single cell in a Google Sheet the values Week 0, Week 1 will be split out across the columns.

Next we want to repeat Week 0 and so on x number of times. This is done with the a combination of REPT, which repeats a given string x times. If we use this by itself it will only apply to the first column of data from the SPLIT so we wrap it in an ARRAYFORMULA like so ARRAYFORMULA(REPT(SPLIT(D2,",")&",",D3))),",")). This repeats the series value the number of times specified in D3. Something to note is the &"," in the REPT. This adds a comma at the end of each repeated value.

ARRAYFORMULA Enables the display of values returned from an array formula into multiple rows and/or columns and the use of non-array functions with arrays.

This now gives us our columns with the repeating text but across several columns e.g. “Week 0,Week 0,Week 0”,  “Week 1,Week 1,We..” etc To get a single value in each column we use a trick of using a JOIN to turn our array of columns into a single cell value separated with a comma. We then use the SPLIT formula again to turn this single cell into multiple cells.

The final part is to use TRANSPOSE to convert our columns of data into rows. Here is the finished version of the Google Sheet with the stages broken down.

Can you think of a better way to do this?

1 Comment

It’s been a while since I’ve done a review post but as this year has been a blur if for no one else I wanted to look at my posts from 2013. Rather than a look at everything I wanted to extract some themes and for this first review I wanted to look back at some of the work I’ve done around Google Apps Script and Google Spreadsheets/Sheets. ...continue reading

2 Comments

Google recently (1st October 2013) announced improved segmentation in Google Analytics using age, gender and interests

It been interesting to read some of Tony Hirst’s posts on the use of Google Analytics within education. The thread goes back to 2008 with Library Analytics but most recently Tony has writing about this in an open course context such as MOOC Platforms and the A/B Testing of Course Materials and MOOC Busting: Personal Googalytics… which looks at the  idea of collecting and feeding back performance data to users from across platforms.

When Daphne Koller was on the early Coursera push one of the aspects that caught my eye was using student generated data (aka their answers to questions) to in course design, in particular, to identify misconceptions or incorrectly set questions. To see what I mean here’s a cued clip from a presentation Koller gave at the Centre for Distance Education back in 2012.

YouTube: The Online Revolution: Education at Scale

Merging the two lines of thought I wondered if there was a way you could use Google Analytics to create a similar feedback mechanism. My starting point was Google Analytics Event tracking. From the Event Tracking documentation:

Event Tracking is a method available in the ga.js tracking code that you can use to record user interaction with website elements, such as a Flash-driven menu system. This is accomplished by attaching the method call to the particular UI element you want to track. When used this way, all user activity on such elements is calculated and displayed as Events in the Analytics reporting interface. Additionally, pageview calculations are unaffected by user activity tracked using the Event Tracking method. Finally, Event Tracking employs an object-oriented model that you can use to collect and classify different types of interaction with your web page objects.

Examples include:

  • Any Flash-driven element, like a Flash website, or a Flash Movie player
  • Embedded AJAX page elements
  • Page gadgets
  • File downloads
  • Load times for data

Essentially anything you can trigger with a bit of JavaScript is up for grabs. Looking at setting up event tracking each event can include:

  • category (required) - The name you supply for the group of objects you want to track.
  • action (required) - A string that is uniquely paired with each category, and commonly used to define the type of user interaction for the web object.
  • label (optional) - An optional string to provide additional dimensions to the event data.
  • value (optional) - An integer that you can use to provide numerical data about the user event.
  • non-interaction (optional) - A boolean that when set to true, indicates that the event hit will not be used in bounce-rate calculation.

So we can capture events and have some control over how they are described. What might we want to catch? Lets start by looking at was multiple choice questions (MCQs). Looking at the anatomy of an event this is one way we might want to encode it:

  • category: ‘MCQ’
  • action: ‘right’ or ‘wrong’
  • label: a question identifier. This needs to be unique and might be something like coursecode_module_section_question (having a consistently structured label will help filter the data later)
  • value: this is optional but as it needs to be an integer this restricts you a bit. You may want to use time taken to respond, confidence based mark etc.

I should say before you get carried away with tracking that:

The first 10 event hits sent to Google Analytics are tracked immediately, thereafter tracking is rate limited to one event hit per second.

To see how this works I’ve created this example page with a simple MCQ. This is a ‘live’ example with some crude code to push events to my Google Analytics account. You’ll notice on the page a response graph generated from the GA data. I’ll explain how that was made later.

GA Real-time eventsThe fist thing to note is that we can now see responses in real-time via the Google Analytics admin interface. The interface is not really geared for MCQs and there is a complication of who has access to the Analytics dashboard, but given that there is a Real Time Reporting API in beta a custom slice’n’dice should be possible in the future (I’ve got beta access so this might be one I revisit if/when Events get added to the API).

Similarly the Content Events report gives us access to historic data but again it has accessibility issues in terms of who has access to the Google Analytics account. On the plus side tweaking the display from the default ‘data’ view  to ‘performance’ gives a basic bar chart which is more intuitive for this type of data.

Default data view for Content > Events
Default data view for Content > Events
Performance data view for Content > Events
Performance data view for Content > Events

Segmentation and cohort analysis

There are some other built-in Google Analytics features that may also support analysis of the data including filtering:

GA Filtering

or switching from a ‘line chart’ to a ‘motion chart’ (there are limits on what can be used for x-y values so some experimentation is required) and adding event reports to custom dashboards which may pull in other GA data.

GA Motion Chart

This is where is potentially get even more interesting as the new Google Analytics Advanced Segmentation* allows you to do cohort analysis. The built-in segments are perhaps not relevant for this scenario but the custom options have lots of potential. Google provide 6 segment templates for ‘Demographics’, ‘Technology’, ‘Behavior’, ‘Data of First Visit’, ‘Traffic Sources’ and ‘E-commerce’ but it is easy for you to add custom conditions and sequences for segmentation.

GA custom conditions and sequences for segmentation

*I’m not sure if Google are still following this out but noticed the new UI and segmentation options were only available in my Google Apps GA account, my standard @gmail account not having this option.

Examples of conditions/sequences you might want to explore include combining Tony’s suggestion of using Analytics A/B testing with event tracking e.g. identifying any correlation with content to performance or if someone visiting page x did they perform better in the test. It is also worth noting that:

Previously, advanced segments were based on visits. With the new segment, a new option is provided to create user segment. In a user segment, all visits of the users who fit the segment criterias will be selected (such as specific demographics or behaviors). It will be a useful technique when you need to perform user level analysis.

This is particularly useful as “Google Analytics customers are prohibited from sending personal information to Google.” [ref]. So while named individual level analysis isn’t possible you can get down to a user level.

Distributing data

On a practical level whilst these options potentially open some interesting avenues for exploration Google Analytics account administration is still not easy. Whilst this area has been recently improved the granularity of permissions is very course, an all or nothing approach. There is a growing list of tools/add-ins that integrate with Google Analytics which let you create custom workflows for data distribution. This is an area Google appear to be working on recently announcing the Google Analytics superProxy which is a  web application that runs on Google App Engine to allow the distribution of GA data.  This uses the Google Analytics Reporting APIs to define data queries and generate data files. Along similar lines (and announced before Google) I’ve published a similar solution that works in Google Drive (Using Google Spreadsheets as a Google Analytics Data Bridge). Below is an example query I using the the MCQ example at the beginning of the post. It's currently using a very specific filter to exctract the data for all the event labels beginning EMD101_Mod1_1.1_Q1_, but if using a standardise labeling you could include results for the entire module or course. I'm also not using an segment filters. As well as using standard segments you can also use custom segments

Google Analytics Query ExporterAs I outlined in my original post there is a number of ways that these slices of Google Analytics data can be shared or consumed into other tools. In the example above the data is written (and refreshed every hour) to the sheet below, Google Sheets providing a convenient environment for sharing and querying data with the relative familiarity of a spreadsheet interface.

At this point I’d imagine some of you are wondering why go through all of this bother when your VLE is able to do similar, if not better, levels of reporting. My eye is primarily on the open education context where the institutional  VLE is usually not and option. It also potentially provides a more holistic data source where you can experiment with content and resources across your little oasis (like ocTEL).

So what do you think? Will you be event tracking your MCQs?

1 Comment

From the postbag Marjolein Hoekstra (CleverClogs) writes:

Short description
Can you make a Google Script for me that compares two strings character by character? If differences are found, the script should point these out. If no differences are found at all, the script should put out the text "[ id. ]" .

Detailed description
I have two columns containing lists of horizontally identical, but sometimes almost identical text strings. This is on purpose. Each row has another couple of words that need to be compared.

I'd like to compare them on a character by character basis, and then point out in the second column at which positions it differs from the first, for example like this:

A2: ABCDE

B2: ABKDE

If you compare these two, you'll see that cell B2[3] has 'K' where A2[3] reads 'C'.

My envisioned formula would then populate cell C2 with: "[ – – K – – ]"

As far as I can tell, I'd need a Google Script that parses both strings character by character and output "–" when they are identical, and output the value of the character string from B2. It should be relative simple, with a FOR loop. Thing is, I've never written a Google Script, and it's a bit daunting for me to start on my own.

Note that LEN (A) is always identical to LEN (B)

Background info
In case you're interested in the actual use case: I want to use this formula to compare strings of Chinese characters, where the first column contains the traditional writing of these characters (typically requiring more strokes) and the second column containing the simplified writing of those same characters. Sometimes the characters are different, sometimes they are not. You can see this clearly in the screenshot below.

The Google Spreadsheet is used as input for a flashcard deck I'm building, using the iPhone app Flashcard Deluxe (top-notch system, highly flexible) [also available for Android].

Screenshot:

Google Spreadsheet Example Flashcard Deluxe

There's no need to use Chinese characters to test the formula, I'm just providing this so that you know in what context the formula will be used.

The Solution

My initial thought was to use existing formula to SPLIT the cell text into individual character values and then do a comparison but unfortunately the SPLIT formula requires a character to split on. So instead I turned to Google Apps Script and wrote the following custom formula:

function stringComparison(s1, s2) {
  // lets test both variables are the same object type if not throw an error
  if (Object.prototype.toString.call(s1) !== Object.prototype.toString.call(s2)){
    throw("Both values need to be an array of cells or individual cells")
  }
  // if we are looking at two arrays of cells make sure the sizes match and only one column wide
  if( Object.prototype.toString.call(s1) === '[object Array]' ) {
    if (s1.length != s2.length || s1[0].length > 1 || s2[0].length > 1){
      throw("Arrays of cells need to be same size and 1 column wide");
    }
    // since we are working with an array intialise the return
    var out = [];
    for (r in s1){ // loop over the rows and find differences using diff sub function
      out.push([diff(s1[r][0], s2[r][0])]);
    }
    return out; // return response
  } else { // we are working with two cells so return diff
    return diff(s1, s2)
  }
}

function diff (s1, s2){
  var out = "[ ";
  var notid = false;
  // loop to match each character
  for (var n = 0; n < s1.length; n++){
    if (s1.charAt(n) == s2.charAt(n)){
      out += "–";
    } else {
      out += s2.charAt(n);
      notid = true;
    }
out += " ";
  }
  out += " ]"
  return (notid) ? out :  "[ id. ]"; // if notid(entical) return output or [id.]
}

One of the things to be aware of is Google Apps Script formulas are associated with a spreadsheet. You can't globally use a custom formula unless the script is attached. Fortunately when copying a spreadsheet you also get a copy of the script, so providing templates is a way around this.

With this limitation in mind I thought I’d have another go  at cracking this with built-in formula … and guess what it is possible. The key to unlocking this was when playing with the REGEXREPLACE formula I accidentally turned ‘ABCDE’ into ‘,A,B,C,D,E,’ by using =REGEXREPLACE(A20,"(.*?)",","). My RegEx is terrible so I’ll let someone else explain how this works in the comments, but getting to this point meant I could use a combination of SPLIT and REGEXREPLACE to do a character by character comparison on two cells of text. The final version of the formula goes (comparing cell A14 to B14):

=IF(EXACT(A14,B14),"[ id. ]","[ "&JOIN(" ",ARRAYFORMULA(REGEXREPLACE(SPLIT(REGEXREPLACE(B14,"(.*?)",","),","),SPLIT(REGEXREPLACE(A14,"(.*?)",","),","),"–")))&" ]")

My rough workings are embeded below. You can also make a copy of the entire project including the Apps Script solution here.

Update: Bruce Mcpherson has posted an alternative formula to do this which goes like:

"[ " & CONCATenate(ARRAYFORMULA(if(mid(A31, row(indirect("x1:x"&len(A31))) ,1)=mid(B31,row(indirect("x1:x"&len(A31))),1)," – "," "&mid(B31,row(indirect("x1:x"&len(A31)))&" ",1) ))) &" ]"

As you will see from the comments thread on that post Marjolein was having problems using my version with a Chinese characterset. Adding this to the example spreadsheet I'm unable to replicate the error but have encountered the problem here. If anyone can spot the difference I'd welcome your thoughts?

Update 2: Bruce pointed out that "the likely issue is that the columns with the problem are times - the characters mean AM. The same thing would probably happen with numbers. Have you tried wrapping the cell references in concatenate() to convert to a string?"

I said: ah I see what you mean 时 is being interpreted as 上午12:00:00. Not sure how I'd wrap the concatenate with my regexreplace. Your solution looks better all round so rather than loosing sleep I'd go with that

5 Comments

This was the question that came in over the wire this morning:

My first thought was using the Google Visualisation API Query Language which can would let you do a SQL type query and  LIMIT the response to 1 record and OFFSET to a particular row. The Google Code Playground has a nice example of this to play with to see what you can do, even better Tony Hirst’s Guardian Data Explorer helps you generate the query to generate an html view (although Tony hasn’t implemented to LIMIT and OFFSET.  So below is an example spreadsheet:

Example Spreadsheet

… and using the query

https://spreadsheets.google.com/tq?tqx=out:html&tq=select * LIMIT 1 OFFSET 3&key=rYQm6lTXPH8dHA6XGhJVFsA&gid=0

We get … (the important bit is ‘OFFSET 3’ where offset 0 = row 2, offset 1 = row 3, etc.)

Google Visualization html row

So a couple of issues. First the spreadsheet needs to be File > Published to the web and the result is also read-only:

Thinking about the other view offered by Google on Spreadsheets it occurred to me the mobile view might be a solution. The mobile view if you don’t use a native app is List view (here is more about List View):

List View 

… which allows you to edit a row of data

Edit row in List View

the final trick was to change the rows per page to 1 and then use the page query to select the row e.g. to open the spreadsheet to edit row 5 we set &page=4 (the header isn't counted as a row). You can also select the sheet by changing the &ampgid= number. Tip: Open your spreadsheet and switching to 'List View' will help you build the URL.

https://docs.google.com/spreadsheet/lv?key={your_key}&type=view&gid=0&f=true&sortcolid=-1&sortasc=true&page=4&rowsperpage=1

Edit single row in List View

So there you go how to open a Google Spreadsheet at a specific row for editing for Marjolein to use. The perfect solution … well almost?

Update: Saqib Ali has kindly passed on this trick from one of the Google team (I don't think it would work in the original scenario, but still very useful to know

Insert/Comment at a specific cell, and, in that comment "plus someone" - that is, type "+" and then the person's email or name NOTE: you will get a list of people in your domain as soon as you type the "+", filtered as you start typing their name/email the person you "plussed" will get an email with both the contents of the cell and your comment the link in that email will take them DIRECTLY TO THE TARGET CELL with the comment activated.

Share this post on:
| | |
Posted in Google Spreadsheet, How-to on by .