Automatic translation of TAGS Twitter archives using Google Apps Script ‘Language’ services

A flare went up from Hamburg this morning from Tony Hirst who is at the Daten, Recherchen, Geschichten (2012) conference (#drg12) in Hamburg:

Translating tweets is something I’ve dabbled with before as part of experiments with my Twitter Subtitling tool iTitle (really must revisit this). Notable examples were the Google I/O 2010 Android Keynote and presentations from #UIMPUni20 talked about in Kirsty Pitkin’s Lost In Translation blog post (because of domain shuffles most of the links are broken but here is Professor Alejandro Piscitelli talk at #UIMPUni20 with tweets normailsed to English).

Both these examples used the Google Translate API to convert tweets from one language to another. Back then the API was free for anyone to use but in August 2011 Google switched it to a paid for service … doh. All is not lost though as Google Apps’ programming environment Google Apps Script still has a ‘Language Service’ <cough>Google Translate</cough>. The disadvantage you have with this service is unlike Translate it doesn’t have an option to autodetect the source language. [Update: Re-read documentation and clearly states it can auto detect. Probably still not a bad idea to use iso codes] This is not a problem when dealing with Twitter as the metadata includes ‘iso_language_code’.

So I have a solution to archive tweets in a Google Spreadsheet using Google Apps Script (must write more about v4) … ponder, ponder, 5 minutes later by adding:

objects[i]["text_en"] = LanguageApp.translate(objects[i]["text"], objects[i]["iso_language_code"], "en");

I’ve got a spreadsheet archiving #drg12 tweets and translating the text into English. Impressed much?

image

4 thoughts on “Automatic translation of TAGS Twitter archives using Google Apps Script ‘Language’ services

  1. Tony Hirst

    That was a really handy and timely hack – thanks Martin…:-)
    PS Makes me wonder if a real time translated backchannel stream might be a handy value add for international confs? After all, it’s pointless more than one person in a conf doing the tweet translation if the result itself can be streamed?

    1. Martin Hawksey

      Post author

      ;)

      Surely someone has done this for a live stream!? I was looking at the Twitter terms and conditions to see if anything prevented this in there. This is what they say about ‘broadcasting’ tweets

      Don’t: Edit or revise Tweets except as necessary due to technical or other limitations. E.g. it is acceptable to remove links from Tweets on-air as they are inoperable in a broadcast medium.

      I’d say the viewer not being muti-lingual is a limitation ;)

      Martin

    1. Martin Hawksey

      Post author

      The technical ticket regarding language codes in search is still open so I guess not. I imagine the issue of how tweets are marked with a language code will always be there as it appears to be a global setting rather than a multilingual user separately tagging each tweet. For example, I noticed in the #drg12 tweets that some of the tweets were marked ‘de’ and were in English because (the user enabling the German twitter interface but tweeting in English). As noted in the post correction it is possible to autodetect the source language which might get around this using something like:
      objects[i]["text_en"] = LanguageApp.translate(objects[i]["text"], , "en");
      Martin

Comments are closed.