Collect/backup tweets in a Google Spreadsheet on one sheet

A quick recap in June 2010 I hastily threw together a way to archive tweets using Google Spreadsheets. This was okay but a bit flaky. In February (2011) having learned a lot more about Google Spreadsheets and Google Apps Script I reworked the solution producing version 2 Collect/backup tweets in a Google Spreadsheet.

I would like to say that this idea has gone down like a storm, but … well according to the Docs Template Gallery its got 17 users - 5 star rating though ;) that might be my vote :( I wonder if this is so low because people can make a copy of the spreadsheet from the link in the original post?

Thank you the 17 users of Twitteralytics

Ignoring this overwhelming success there was one tweak I was meaning to do for a while. Currently the Spreadsheet dumps the results from it’s search into a new sheet each time it runs. This means you can end up with lots of duplicate tweets across different sheets. So when Tony Hirst recently asked:

Tony Hirst@mhawksey do you have a twitterlytics archiver that appends tweets to one sheet rather than form a new sheet for each run? 28 Apr 2011

the time felt right to do something about this. To get the ball rolling Tony also kindly contributed a coding snippet which finds new tweets from the search results. This solution is a bit greedy running a search collecting up to 1500 items then comparing with what is already there. Meanwhile I was looking at the Twitter Search API documentation to see if I could filter the results to remove previous results (the answer was yes). The disadvantage of this solution is the results can’t be paged so you are limited to 100 results. 

The latest version of the spreadsheet has been included in the Collect/backup tweets in a Google Spreadsheet post (it defaults to using the twitter API to work out the last tweet, but the greedy version is included in the source code). Below is a copy of the main bits of the code. Thanks to Tony for the prod and code suggestion!

function collectTweets() {
  // if continuous sheetname = archive else make a name
  if (RESULT_TYPE == "continuous"){
    var sheetName = "Archive";
  } else {
    var sheetName = Utilities.formatDate(new Date(), "GMT", "dd-MMM-yy hh:mm"); //make a new sheet name based on todays date
  }
  // if sheetname doesn't exisit make it
  if (!ss.getSheetByName(sheetName)){
    var temp = ss.getSheetByName("TMP");
    var sheet = ss.insertSheet(sheetName, {template:temp});
  } else {
    var sheet = ss.getSheetByName(sheetName);
  }
  // else if already have archive get since id
  var stored = getRowsData(sheet);
  if (stored.length >0 ){
    var sinceid = stored[0]["id"];
  } 
  // prepare search term
  var myUntil=new Date();
  myUntil.setDate(myUntil.getDate()-1);
  var mySince = new Date(myUntil);
  if (SEARCH_DURATION == " weekly"){
    myUntil.setDate(myUntil.getDate()+1);
    mySince.setDate(mySince.getDate()-6);
  }
  //if no since id grab search results
  if (sinceid){
    var data = getTweets(SEARCH_TERM, mySince, myUntil, NUMBER_OF_TWEETS, sinceid); // get results from twitter sinceid
  } else {
    var data = getTweets(SEARCH_TERM, mySince, myUntil, NUMBER_OF_TWEETS); // get results from twitter
  }
  // if some data insert rows
  if (data.length > 0){
    sheet.insertRowsAfter(1, data.length);
    setRowsData(sheet, data);
  }

  /*
  // This is a very greedy grab (the limit of since id is it has max of 100 results, this method gives you 1500)
  var data = [];  
  if (sinceid){  
    for (var i = 0; i < objects.length; i++){
      var row = objects[i];
      if (row.id > sinceid){
        data.push(row);
      }
    }
    if (data.length > 0){
      sheet.insertRowsAfter(1, data.length);
    }
  } else {
    data = objects;
  }
  */
}

function getTweets(searchTerm, since, until, maxResults, sinceid, languageCode) {
    //Based on Mikael Thuneberg getTweets - mod by mhawksey to convert to json
    // if you include setRowsData this can be used to output chosen entries
    try {
        var pagenum = 1;
      var data =[];
      var idx = 0;
      var sinceurl ="";
        if (typeof maxResults == "undefined") {
            maxResults = 100;
        }
        if (maxResults > 1500) {
            maxResults = 1500;
        }
        if (maxResults > 100) {
            resultsPerPage = 100;
            maxPageNum = maxResults / 100;
        } else {
            resultsPerPage = maxResults;
            maxPageNum = 1;
        }
        if (typeof sinceid != "undefined") {
            if (sinceid.length > 0) {
                resultsPerPage = 100;
                maxPageNum = 1;
                sinceurl = "&since_id=" + sinceid;
            }
        }
        Logger.log(twDate(since)+" "+twDate(until));
        searchTerm = encodeURIComponent(searchTerm);
        for (pagenum = 1; pagenum <= maxPageNum; pagenum++) {
            var URL = "http://search.twitter.com/search.json"
            URL = URL + "?q=" + searchTerm;
            URL = URL + "&since=" + twDate(since);
            URL = URL + "&until=" + twDate(until);
            URL = URL + "&rpp=" + resultsPerPage;
            URL = URL + "&page=" + pagenum;
            URL = URL + "&result_type=recent";
            URL = URL + sinceurl;
            if (typeof languageCode != "undefined") {
                if (languageCode.length > 0) {
                    URL = URL + "&lang=" + languageCode;
                }
            }
            
            var response = UrlFetchApp.fetch(URL, {method:'get', headers: { "User-Agent": REFERER}});
     
            if (response.getResponseCode() == 200) {
              var objects = Utilities.jsonParse(response.getContentText()).results;
              
              for (i in objects){ // not pretty but I wanted to extract geo data
                if (objects[i].geo != null){
                  objects[i]["geo_coordinates"] = "loc: "+objects[i].geo.coordinates[0]+","+objects[i].geo.coordinates[1];
                }
                objects[i]["status_url"] = "http://twitter.com/"+objects[i].from_user+"/statuses/"+objects[i].id_str;
                data[idx]=objects[i];
                idx ++;
              }
            }      
          }         
    return data;
    } catch (e) {
        return e.message;
    }
} 

13 thoughts on “Collect/backup tweets in a Google Spreadsheet on one sheet

  1. Sarath

    I appreciate you for the wonderful work that you have done.
    I am doing research on Public Mood in India. For my study, I need to collect tweets. I am currently using your Google document and downloading 1500 tweets every day. However, I am looking for tweets for the specific period relevant for Indian context. Hence, Can you please advise me on the below questions?

    1. Is it possible to increase the number of tweets?
    2. Is it possible to collect the tweets country wise? I am more concerned on tweets posted from India.
    3. Is it possible collect for the tweets posted for the prior dates?

    1. Post author

      > 1. Is it possible to increase the number of tweets?
      1500 is the rate limit per search. You can run the search more often to get more tweets per day (you might hit rate limiting problems if you request too often)
      2. Is it possible to collect the tweets country wise? I am more concerned on tweets posted from India.
      It's possible to add a geocode attribute to the search but its based on a point and radius so you may miss or have extra tweets (ie you can't just add from india)
      3. Is it possible collect for the tweets posted for the prior dates?
      The twitter search is limited to the last 7days or the last 1500 tweets in a day for a search term.

      Hope this helps,
      Martin

  2. Pingback:

  3. Anthony Nirwana

    Martin, after using Twitteralytics v2.2.3, i've got some question :

    1. I've authenticated the API Access in Twitteralytics using my Twitter Developer Key. Now if i choose continuous sheet, set the result to 1500 Tweets, and set a trigger to collect every 1 minute, is there will be any duplicated Tweets?
    2. If i choose paged sheet option and trigger it every 1 minute, will i get duplicate on second, third, 4,5,6... sheet?
    3. How to filter the search terms, based on specific Country or language? I've used "near" and "lang" operator, but it didn't work.

    Thank you so muuch Martin!

    Anthony,

    1. Post author

      @Anthony Q1) shouldn't be any duplicates Q2) can be if you run more than once a day/week Q3) you might have to hardcode (at line 162 insert

      var languageCode = "the ISO 639-1 language code you want to use";

      and line 216
      URL = URL + "&geocode=the latitude, the longitude,the radius";

      More info on the Twitter Search API here
      Martin

  4. Pingback:

  5. Brandon Muramatsu

    Martin, thanks for putting this together. I'm trying to get this setup since Twapper Keeper is finally shutting down, and while I don't really want to run that many archives I find that HootSuite appears to have a horrendously limited free version.

    I followed your instructions to go to the template gallery and use that for my spreadsheet.

    I copy the spreadsheet, but find that the Twitter menu does not appear, nor does the Load Twitter menu appear to function. I get "Oops Script function onOpen could not be found". Perhaps the Google docs UI update messed something up. I also went spelunking for the scripts themselves in the Tools menu but didn't find anything loaded.

    Is there any advice you can provide?

    Thanks.

    Brandon

    1. Post author

      Hi Brandon - it does occasionally happen that google doesn't make a good copy of the spreadsheet. Best thing to do is to try and make a fresh copy of the spreadsheet
      Martin

      1. Brandon Muramatsu

        I tried again in Safari 5.1, and Chrome 16.0.912.63, both from the Template Gallery, and making a copy of the spreadsheet that was copied over from the template. Multiple times on both browsers. Also, when I look for any scripts in the document, I don't see any scripts. And all times I get the same onOpen error.

        Any other suggestions?

        Thanks.

    1. Post author

      In the search cell use the term similar to from:mhawksey

      If you have multiple you can join with OR so: from:mhawksey OR from:jisccetis

      There will be a limit to how many accounts you can do this for. You can also extract the last 3600 things a person tweeted using this template

Comments are closed.