Automatic translation of TAGS Twitter archives using Google Apps Script ‘Language’ services

A flare went up from Hamburg this morning from Tony Hirst who is at the Daten, Recherchen, Geschichten (2012) conference (#drg12) in Hamburg:

Translating tweets is something I’ve dabbled with before as part of experiments with my Twitter Subtitling tool iTitle (really must revisit this). Notable examples were the Google I/O 2010 Android Keynote and presentations from #UIMPUni20 talked about in Kirsty Pitkin’s Lost In Translation blog post (because of domain shuffles most of the links are broken but here is Professor Alejandro Piscitelli talk at #UIMPUni20 with tweets normailsed to English).

Both these examples used the Google Translate API to convert tweets from one language to another. Back then the API was free for anyone to use but in August 2011 Google switched it to a paid for service … doh. All is not lost though as Google Apps’ programming environment Google Apps Script still has a ‘Language Service’ <cough>Google Translate</cough>. The disadvantage you have with this service is unlike Translate it doesn’t have an option to autodetect the source language. [Update: Re-read documentation and clearly states it can auto detect. Probably still not a bad idea to use iso codes] This is not a problem when dealing with Twitter as the metadata includes ‘iso_language_code’.

So I have a solution to archive tweets in a Google Spreadsheet using Google Apps Script (must write more about v4) … ponder, ponder, 5 minutes later by adding:

objects[i]["text_en"] = LanguageApp.translate(objects[i]["text"], objects[i]["iso_language_code"], "en");

I’ve got a spreadsheet archiving #drg12 tweets and translating the text into English. Impressed much?

image

Last updated by at .

4 Responses to “Automatic translation of TAGS Twitter archives using Google Apps Script ‘Language’ services”


  • That was a really handy and timely hack – thanks Martin…:-)
    PS Makes me wonder if a real time translated backchannel stream might be a handy value add for international confs? After all, it’s pointless more than one person in a conf doing the tweet translation if the result itself can be streamed?

    • ;)

      Surely someone has done this for a live stream!? I was looking at the Twitter terms and conditions to see if anything prevented this in there. This is what they say about ‘broadcasting’ tweets

      Don’t: Edit or revise Tweets except as necessary due to technical or other limitations. E.g. it is acceptable to remove links from Tweets on-air as they are inoperable in a broadcast medium.

      I’d say the viewer not being muti-lingual is a limitation ;)

      Martin

  • Have the problems with Twitter’s use of iso-language-code as discussed in the comments on http://www.verso.co.nz/uncategorized/691/language-codes-for-twitter/ been solved as far as you know?

    • The technical ticket regarding language codes in search is still open so I guess not. I imagine the issue of how tweets are marked with a language code will always be there as it appears to be a global setting rather than a multilingual user separately tagging each tweet. For example, I noticed in the #drg12 tweets that some of the tweets were marked ‘de’ and were in English because (the user enabling the German twitter interface but tweeting in English). As noted in the post correction it is possible to autodetect the source language which might get around this using something like:
      objects[i]["text_en"] = LanguageApp.translate(objects[i]["text"], , "en");
      Martin

Comments are currently closed.

About

This blog is authored by Martin Hawksey Google+

JISC CETIS Learning Technology Advisor (OER Programme Support)
jisc cetis logo

The MASHezine (tabloid)

It's back! A tabloid edition of the latest posts in PDF format (complete with QR Codes). Click here to view the MASHezine

Preview powered by:
Bluga.net Webthumb

The MASHebook

You can also download this post as:

Subscribe to monthly email digest of posts

Loading...Loading...


Subscribe to per post email updates

Enter your email address:

Delivered by FeedBurner

Copyright License

Creative Commons Licence
This work is licensed under a Creative Commons Attribution 3.0 Unported License. CC-BY mhawksey

Privacy /Cookies

This blog uses Google Analytics (which makes use of 'cookie' technologies) to provide information on usage. Here's an overview of Google Analytics Privacy and how to opt-out (other 3rd party services like Twitter might also be tracking you via this site, but as far as possible I try and prevent this by removing official tweet buttons).

Badges

. . .