Keeping your Twitter Archive fresh and freely hosted on Github Pages

github

tl;dr How do you keep your downloaded Twitter archive fresh on Github Pages using Google Apps Script? By running this Google Apps Script powered web app.

If you were to ask me which of my projects was my favourite you might be surprised by the answer. Regulars to this site might think that TAGS or my Twitter follower export or, longtime followers might think even recall the work I did on Twitter captions with Tony Hirst. In fact, while these are the posts more likely to earn me a beer at the bar, the piece of code I’m most proud of is ‘Keeping your Twitter Archive fresh on Google Drive‘. If you’re not familiar with this, in essence I’ve taken the archive of tweets, which available on request from Twitter, hosted it on Google Drive and kept it up-to-date with a script that runs every day. There are a couple of reasons this is my favourite. Notably it earned me a genius card from Alan Levine (this is a lonely game most of the time and recognition from your peers fuels the journey). Another reason is simply the code is poetry.

The particular flourish I’m proud of is how the script uses the Javascript written by Twitter for the display of the archive to calculate what tweets need to be fetched from the Twitter API, writing the result back into the static archive data files so that it can be rendered. The poetic part is this is possible because Google Apps Script, which is powering the process, itself uses a Javascript syntax. Another aspect of this solution that I like is that as Google Apps Script integrates with Google Drive writing the new files is a line of code, and with what was Google Drive web hosting meant you could share your continually updating archive with the world. This simplifies the process hugely … sometimes things are just meant to be.

The important term in that last sentence is ‘you could’. Google announced last year that it was ending the ability to use Google Drive as a place to host basic web content by the end of this August (2016). I wasn’t surprised by the announcement, Google Drive web hosting always felt like a feature that escaped into the wild rather than been purposefully released, never fully being integrated into the Google Drive UI, although it’s disappearance may be more down to misuse. I have however been able to caress my original solution to keep working, using Google Apps Script’s ability to publish a web app. If you are interested here is the source code of the webapp so you can see how data from Google Drive is included. This tweak is however is not without it’s drawbacks, in particular, performance isn’t great and Google impose a nag at the top of the webpage.

Now there are many places you can freely host web content, and as touched upon in our last ‘Totally Unscripted’ episode these can be integrated into Google Apps Script. One option I’ve been interested in playing with is Github Pages. You might have come across Github before as a code versioning repository. As I (see my git post) and many others have highlighted in the past Github is not just limited to code, and has been used as a space to host, and importantly remix/modify, a range of content from books to music. An important aspect of any resource is how it is communicated to your audience.

This is where Github Pages come in. They allow you to create a website to support your repository including rich content. Github Pages are updated in a similar way to to the content you version in a Github repository, creating new or modifying existing content which is committed as the current version, this means you can read/write Pages using the existing Github API. This isn’t the first Google Apps Script project that integrates with Github. Bruce Mcpherson has the very useful gasGit script that lets you backup/release Google Apps Script projects to Github, and I recall seeing other pieces of code floating around.

I appreciate becoming a git user is somewhat daunting and I still flounder around myself. Github have however spent a lot of time and effort making versioning easier for all, particularly with their desktop clients that hide the traditional command lines. I’ve tried to take a similar approach with my little Google Apps Script powered web app and instead of my usual ‘copy this spreadsheet, setup a Twitter Dev account’ all of this is hidden behind a set of steps demonstrated in this video:

To have a go your self visit:

*** Twitter Archive on Github Pages Setup ***

If you’d like a peek at the code you can view it here or in this Github Repo. The interface is made using Semantic-UI mainly because I was looking for a step style interface.

Big thank you to Bruce Mcpherson (@brucemcpherson), Alan Levine (@cogdog), Marjolein Hoekstra (@CleverClogs), Deborah Kay (@debbiediscovers) and Adam Croom (@acroom) for providing feedback and testing this script. I’ll be providing a follow-up post if you are interested in using Google Apps Script to commit other files to Github … stay tuned :)

Update: Here’s a post on Working with Github repository files using Google Apps Script: Examples in getting, writing and committing content

Update 2: John Johnson has kindly done a post looking syncing Github to your local hard drive and web site via ftp

16 Comments


  1. This interface is super elegant, a new level of genius card.

    Just noticed a small quirk on display (again OS X Chrome) if you are an old twitter fogie like me- the right side display by year is fixed, and I cannot see below 2007

    Reply

    1. ;) Twitter’s Archive UI is so 2013 .. they should apply for a Google Summer of Code project to get it fixed

      Reply

  2. I fixed their CSS by changing the positioning in .sidebar to absolute instead of fixed. Not sure if that will break the internet.

    Also, looking at the console, they load a bunch of images over http from https generating rafts of warnings. Call those code interns.

    Also- not sure why, but I am not seeing the comments on this post.

    Reply

    1. I did wonder about tweaking Twitter’s index.html and starting the instructions ‘fork this repo’ … one day.

      (need to look at my cache setting re:comments )

      Reply

  3. Thanks a lot, Martin. I’m a total coding idiot, but this transition worked very smoothly. Getting my archive files from Twitter was more of a hassle than setting up that new repository/page. (Okay, I already had a GitHub account…)

    Contrary to the Google Drive hosting (which broke a bit over the years), now the profile pic shows up correctly on every tweet. And with Alan’s tip I was even able to adjust the sidebar. Thanks again, guys!

    Reply

    1. Good to know you’ve been able to get this working … before desktop clients I just could get my head around github

      Reply

  4. Lovely stuff, the new setup is very sweet and friendly.

    I wonder if you envisage folk pulling there arrive back down from GitHub for a local back up?

    Reply

    1. Funny you should mention that, @cogdog has also suggested. I’d imagine a lot easier on Unix machines. Had a quick look if there was an existing app that I could suggest to install but came up blank…

      Reply

  5. Looking great, except that I wonder why I need to grant such extensive permission: “This application will be able to read and write all public and private repository data”, including basically everything. That seems neither necessary nor legitimate. All the app needs is read/write access to one repository linked to github.io. I won’t use this and won’t recommend this to anyone without more information on this. Thanks!

    Reply

    1. Unfortunately the nature of the scoping of the Github API – if anyone can’t point me to how I can limit the scope to read/write to a single named repository I’d be more than happy to implement it

      Reply

  6. It looks like all the tweets in the archive before your script adds new ones are missing timestamps. Compare my current tweets https://cogdog.github.io/tweets/ – the most recent have timestamps- but any time before August 2016 lack time stamps. This makes it hard to use the search not knowing the date of the tweet.

    I figure it’s a bug/feature removal on their side, just thought I’d mention it.

    Reply

    1. Hi Alan – I think there is a bug in the applications.js file used by Twitter. Here’s how I’ve tweaked mine which seems to fix this issue https://gist.github.com/mhawksey/e2e8063bec8def15ae058e03a6c42424/revisions (I keep meaning to redo some of Twitter’s interface. One thing I wanted to do was let you link to months in your archive by url … some crazy idea I had about making it possible for Google to crawl to enable a custom search engine to be used to replace the existing text match search)

      Reply

  7. My archive stopped updating 6 days ago and I don’t know how or where to fix it.

    ======
    Limit Exceeded: URLFetch POST size. (line 142, file “GithubService”)
    ======

    I changed the Update time from daily to hourly. I’ve hit the “manually update now” button several times. It looks like it’s working but archive doesn’t update.

    I can tell from graph on archive my tweeting went way up so I’m thinking the issue is too many mbs or tweets.

    Can I fix this somewhere?

    Thanks!

    Reply

    1. Hi – quickest fix would be to request your archive again from twitter and replace the files in your Github repo. No other changes are required as the script is able to read/write the Twitter files

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *