Keeping your Twitter Archive fresh and freely hosted on Github Pages

github

tl;dr How do you keep your downloaded Twitter archive fresh on Github Pages using Google Apps Script? By running this Google Apps Script powered web app.
Note: As Ken Bauer has discovered Twitter now offers two different archive options – you need to request the one from https://twitter.com/settings/account

If you were to ask me which of my projects was my favourite you might be surprised by the answer. Regulars to this site might think that TAGS or my Twitter follower export or, longtime followers might think even recall the work I did on Twitter captions with Tony Hirst. In fact, while these are the posts more likely to earn me a beer at the bar, the piece of code I’m most proud of is ‘Keeping your Twitter Archive fresh on Google Drive‘. If you’re not familiar with this, in essence I’ve taken the archive of tweets, which available on request from Twitter, hosted it on Google Drive and kept it up-to-date with a script that runs every day. There are a couple of reasons this is my favourite. Notably it earned me a genius card from Alan Levine (this is a lonely game most of the time and recognition from your peers fuels the journey). Another reason is simply the code is poetry.
The particular flourish I’m proud of is how the script uses the Javascript written by Twitter for the display of the archive to calculate what tweets need to be fetched from the Twitter API, writing the result back into the static archive data files so that it can be rendered. The poetic part is this is possible because Google Apps Script, which is powering the process, itself uses a Javascript syntax. Another aspect of this solution that I like is that as Google Apps Script integrates with Google Drive writing the new files is a line of code, and with what was Google Drive web hosting meant you could share your continually updating archive with the world. This simplifies the process hugely … sometimes things are just meant to be.
The important term in that last sentence is ‘you could’. Google announced last year that it was ending the ability to use Google Drive as a place to host basic web content by the end of this August (2016). I wasn’t surprised by the announcement, Google Drive web hosting always felt like a feature that escaped into the wild rather than been purposefully released, never fully being integrated into the Google Drive UI, although it’s disappearance may be more down to misuse. I have however been able to caress my original solution to keep working, using Google Apps Script’s ability to publish a web app. If you are interested here is the source code of the webapp so you can see how data from Google Drive is included. This tweak is however is not without it’s drawbacks, in particular, performance isn’t great and Google impose a nag at the top of the webpage.
Now there are many places you can freely host web content, and as touched upon in our last ‘Totally Unscripted’ episode these can be integrated into Google Apps Script. One option I’ve been interested in playing with is Github Pages. You might have come across Github before as a code versioning repository. As I (see my git post) and many others have highlighted in the past Github is not just limited to code, and has been used as a space to host, and importantly remix/modify, a range of content from books to music. An important aspect of any resource is how it is communicated to your audience.
This is where Github Pages come in. They allow you to create a website to support your repository including rich content. Github Pages are updated in a similar way to to the content you version in a Github repository, creating new or modifying existing content which is committed as the current version, this means you can read/write Pages using the existing Github API. This isn’t the first Google Apps Script project that integrates with Github. Bruce Mcpherson has the very useful gasGit script that lets you backup/release Google Apps Script projects to Github, and I recall seeing other pieces of code floating around.
I appreciate becoming a git user is somewhat daunting and I still flounder around myself. Github have however spent a lot of time and effort making versioning easier for all, particularly with their desktop clients that hide the traditional command lines. I’ve tried to take a similar approach with my little Google Apps Script powered web app and instead of my usual ‘copy this spreadsheet, setup a Twitter Dev account’ all of this is hidden behind a set of steps demonstrated in this video:

To have a go your self visit:

*** Twitter Archive on Github Pages Setup ***

If you’d like a peek at the code you can view it here or in this Github Repo. The interface is made using Semantic-UI mainly because I was looking for a step style interface.
Big thank you to Bruce Mcpherson (@brucemcpherson), Alan Levine (@cogdog), Marjolein Hoekstra (@CleverClogs), Deborah Kay (@debbiediscovers) and Adam Croom (@acroom) for providing feedback and testing this script. I’ll be providing a follow-up post if you are interested in using Google Apps Script to commit other files to Github … stay tuned 🙂
Update: Here’s a post on Working with Github repository files using Google Apps Script: Examples in getting, writing and committing content
Update 2: John Johnson has kindly done a post looking syncing Github to your local hard drive and web site via ftp

chevron_left
chevron_right

Join the conversation

comment 67 comments
  • Alan Levine

    This interface is super elegant, a new level of genius card.
    Just noticed a small quirk on display (again OS X Chrome) if you are an old twitter fogie like me- the right side display by year is fixed, and I cannot see below 2007

    • Martin Hawksey

      😉 Twitter’s Archive UI is so 2013 .. they should apply for a Google Summer of Code project to get it fixed

  • Alan Levine

    I fixed their CSS by changing the positioning in .sidebar to absolute instead of fixed. Not sure if that will break the internet.
    Also, looking at the console, they load a bunch of images over http from https generating rafts of warnings. Call those code interns.
    Also- not sure why, but I am not seeing the comments on this post.

    • Martin Hawksey

      I did wonder about tweaking Twitter’s index.html and starting the instructions ‘fork this repo’ … one day.
      (need to look at my cache setting re:comments )

  • drikkes

    Thanks a lot, Martin. I’m a total coding idiot, but this transition worked very smoothly. Getting my archive files from Twitter was more of a hassle than setting up that new repository/page. (Okay, I already had a GitHub account…)
    Contrary to the Google Drive hosting (which broke a bit over the years), now the profile pic shows up correctly on every tweet. And with Alan’s tip I was even able to adjust the sidebar. Thanks again, guys!

    • Martin Hawksey

      Good to know you’ve been able to get this working … before desktop clients I just could get my head around github

  • John

    Lovely stuff, the new setup is very sweet and friendly.
    I wonder if you envisage folk pulling there arrive back down from GitHub for a local back up?

    • Martin Hawksey

      Funny you should mention that, @cogdog has also suggested. I’d imagine a lot easier on Unix machines. Had a quick look if there was an existing app that I could suggest to install but came up blank…

  • Christof

    Looking great, except that I wonder why I need to grant such extensive permission: “This application will be able to read and write all public and private repository data”, including basically everything. That seems neither necessary nor legitimate. All the app needs is read/write access to one repository linked to github.io. I won’t use this and won’t recommend this to anyone without more information on this. Thanks!

    • Martin Hawksey

      Unfortunately the nature of the scoping of the Github API – if anyone can’t point me to how I can limit the scope to read/write to a single named repository I’d be more than happy to implement it

  • Alan Levine

    It looks like all the tweets in the archive before your script adds new ones are missing timestamps. Compare my current tweets https://cogdog.github.io/tweets/ – the most recent have timestamps- but any time before August 2016 lack time stamps. This makes it hard to use the search not knowing the date of the tweet.
    I figure it’s a bug/feature removal on their side, just thought I’d mention it.

    • Martin Hawksey

      Hi Alan – I think there is a bug in the applications.js file used by Twitter. Here’s how I’ve tweaked mine which seems to fix this issue https://gist.github.com/mhawksey/e2e8063bec8def15ae058e03a6c42424/revisions (I keep meaning to redo some of Twitter’s interface. One thing I wanted to do was let you link to months in your archive by url … some crazy idea I had about making it possible for Google to crawl to enable a custom search engine to be used to replace the existing text match search)

      • Frank Meeuwsen

        Thanks for this excellent script. I had it working on Google Drive for years but to be honest forget about it. Until this week when I found out I am almost 10 years active on Twitter. I migrated from Google Drive to Github, which just works beautiful, thanks again.
        But I have a similar issue with date and time, although it is just the other way around. I also have this issue with Timehop, an app that shows my tweet from that day x years ago. It looks like the timestamp in my js files changed somewhere over the course of Twitter’s archive facility.
        In the beginning, timestamps of the individual tweets in the year_month.js files were like “created_at” : “Sat Mar 31 21:58:55 +0000 2007”
        I had those working for years on Google Drive until 2014. That’s the last time I updated the Google Drive version of the script. I still have those js files.
        Now, when I downloaded the new archive, the timestamp looks like “created_at” : “2007-03-31 00:00:00 +0000”. Mind you, this is the timestamp of the exact same tweet. You see the time is gone. This is not cool, since this new archive will not show my tweets in the correct order. Mind you, this _will_ break the internet one day 😉
        So what I did, I downloaded the original older archive from Google Drive, put those in the data/js/tweets folder and commit & sync.
        But… Now the older tweets show a date and are in the correct order, you can see it on https://frankmeeuwsen.github.io/TwitterArchive/. But it doesn’t show the time. And newer tweets don’t show a date/time at all.
        So where to begin? I know I can make a local branch and play around with the source code but my js and bootstrap knowledge is very limited. Is there any way to solve this so that all tweets are in the correct order and show a date/time?
        I hope my rambling makes some sense…

      • dr

        You know, what would be awesome? Even more than the URLs by month, sometimes I want to link to the whole list of tweets containing a single expression – like showing my complete disgust of “wheeled suitcase”s. So it would be great, if the archive’s search results could get specific URLs, too.
        …io/?s=Rollkoffer

  • Gabrielle Campbell

    My archive stopped updating 6 days ago and I don’t know how or where to fix it.
    ======
    Limit Exceeded: URLFetch POST size. (line 142, file “GithubService”)
    ======
    I changed the Update time from daily to hourly. I’ve hit the “manually update now” button several times. It looks like it’s working but archive doesn’t update.
    I can tell from graph on archive my tweeting went way up so I’m thinking the issue is too many mbs or tweets.
    Can I fix this somewhere?
    Thanks!

    • Martin Hawksey

      Hi – quickest fix would be to request your archive again from twitter and replace the files in your Github repo. No other changes are required as the script is able to read/write the Twitter files

  • Justin

    Thank you so much for your work on this. It’s brilliant!
    Do you have or know of something similar that can archive Twitter favorites/likes? I would love to grab these, too.
    Thanks!

  • im

    Hi Martin,
    I tried all steps to bond twitter and Github and seems all checked. However, when I try to “Manually Update Now”, it stuck there forever. I actually set it up days ago but yet to see any daily archive(“last update time is 12/30/2016”). Do you have any clue of this?

    • Martin Hawksey

      Hi – not sure what is happening here. Only thought is something happened on 12/30/2016 which prevented one of the files from updating correctly. Quickest way to see would be to request your archive again from Twitter and copy the files to github and see if it begins updating again.

  • Damian

    Hi Martin – once again, brilliant work. I appreciate the time/effort you put in to make this as idiot-proof as possible for folks like me.
    One question for you – any thoughts on how to update the information that pops up under “View Account Details”? That box brings up one of my old profiles with an outdated, no-longer-in-service website URL. Any way to force a sync with the most current profile from Twitter?

    • Martin Hawksey

      Hi Damian – glad you like 🙂 Quickest way is probably to request your archive again from twitter. You don’t need to redo any of the setup other than replacing the new files from Twitter on Github.

      • Damian

        Right, thanks; that worked. For future reference, in fiddling with this I learned this information can also be manually edited in data/js/user_details.js
        BTW, thanks @Alan as well for figuring out the sidebar issue – that’s been vexing me for as long as I’ve used this archive solution (I’m also an old-school Twitter user whose history exceeds his laptop screen height).

        • Martin Hawksey

          Thanks for leaving the tip about user_details.js

  • Susan

    I have literally no programming knowledge, but thanks to your video instructions I managed to activate this app script and it worked beautifully! …until this morning, when google sent me a summary of failures saying this script has failed to finish successfully.
    I have it on an hourly update system, and the errors are:
    (6/26/17 6:21 PM) One message saying
    ======
    Address unavailable: https://api.github.com/repos/spandam19/spandam19.github.io/git/blobs (line 142, file “GithubService”)
    ======
    (6/26/17 8:21 PM ~ 6/27/17 12:21) Followed by five hourly messages saying
    ======
    SyntaxError: Unterminated string literal (line 144, file “GithubService”)
    ======
    Now it seems to have given up(?) and when I press the ‘manually update now’ button it just goes dim and nothing seems to happen.
    I would love to fix this problem.
    Please help!

    • Martin Hawksey

      Hi Susan – might be the volume of tweets. Unfortunately Google Apps Script, which powers this, can only fetch a maximum of 10MB so when it gets towards the end of the month it can’t fetch the existing data file to update. Unfortunately currently I haven’t implemented a workaround 🙁

  • Keith Frankish

    Hi — Many thanks for this script. I’ve been running it successfully for several months. It’s worked beautifully. However, for the past week or so, my archive hasn’t been updating and I’ve been getting daily error messages from Google Apps Scripts like this:
    Start Function Error Message Trigger End
    10/4/17 12:14 AM updateArchive TypeError: Cannot read property “id_str” from undefined. (line 85, file “Code”) time-based 10/4/17 12:14 AM
    I’m a novice and don’t understand this. Can you help, please? I’m not aware of having changed anything at all at my end.
    Thanks again for a great app.

    • Martin Hawksey

      Hi Keith – can you share where you archive is on github and I’ll take a look

          • Keith Frankish

            Thanks Martin. It looks like the github connection has been lost (no green tick). The address next to ‘check your archive’ is wrong (it should be https://github.com/k0711/), but I can’t figure out how to change it. 🙁 Sorry for my ineptness.

          • Martin Hawksey

            Hi Keith – I think the problem might be if you setup your and your partners data collection using the same Google account. This is because the script uses Google’s internal user properties storage and the way it’s currently coded is one user/one twitter archive. Would this explain the problem you are having?

  • Keith Frankish

    I don’t think that’s it. I was careful to log in and out of the two Google accounts. Besides, everything worked fine till a month or so or ago. I think I’ll just delete and reinstall the app on my account. That should fix it. Thanks again for creating this — it’s a little marvel!

  • Keith Frankish

    Hi Martin. Sorry to keep bugging you, but I’m still struggling. I’ve narrowed down the problem. It’s that the script is locked onto the wrong github repository. It thinks my archive is here https://keithfrankish.github.io// (an old, now deleted repository) and won’t let me select the correct one. When I try to select a repository, it says ‘no results found’. (It seems to log into github OK; I’ve disconnected and reconnected.) I was going to try a clean installation, but I can’t even work out how to uninstall the app. Do you have any suggestions, please?

    • Martin Hawksey

      The list of repos is build each time the app is used so it should pick it up. One thing you can try is disconnecting from Github and also revoking access from your Github settings page https://github.com/settings/applications and then try reconnecting

      • Keith Frankish

        Thanks for the suggestion, Martin. I tried revoking access and re-authorizing, but it’s not fixed it, alas. The script still locks onto the nonexistent https://keithfrankish.github.io// (next to ‘Check your archive’) and no results show up under ‘Select a repository’. It’s baffling.

        • Martin Hawksey

          Hi Keith – I’ve pushed an update to the web app so that step lets you see the username you are logged in to Github with. Does the username look right for your account? My only other thought is if you are logged in to multiple google accounts when you authenticate with Github to data might not be stored with the correct account. Possible?

          • Keith Frankish

            Hi Martin – Thanks for your efforts. I still can’t get it to work, alas. I’m definitely logged into the right google account, and the script seems to log into github OK (it accepts my password and the app shows up on gh as authorized). However, the script is still locked onto the wrong address for my archive. It says:
            Check your archive: https://keithfrankish.github.io// Logged in as: keithfrankish
            which is wrong (clicking the hyperlink takes me to a 404 page). My gh account is in fact at https://github.com/k0711 and my archive is at https://github.com/k0711/keithfrankish-twitter
            The ‘select a repository’ doesn’t find anything either (perhaps because it’s looking in the nonexistent https://keithfrankish.github.io// rather than https://github.com/k0711 ?)
            Do you have any idea why it’s dong this, please?
            Thanks again for your help.
            Keith

          • Martin Hawksey

            So look like wrong github account is been connected. When you authorise access are you using k0711 ?

  • Keith Frankish

    Yes — it’s the only gh account I have. However — and this may be the nub of it — I originally named it keithfrankish. Could the script have stored the old name somewhere? (Though it worked fine with the new name for months.) I’ve tried deauthorizing and reauthorizing the script but it still thinks the account is called keithfrankish. Sorry to be such a pain!

    • Martin Hawksey

      ahh – spotted some github properties weren’t been cleared. If you disconnect and reconnect github from the web app I think it should work now

      • Keith Frankish

        Bingo! That’s cracked it. Many thanks for your help and persistence.

  • Keith

    Hi — I’m having problems getting the app to run. It just presents me with a series of spinning circles and won’t connect to Twitter to begin the set-up process. (I’m logged into my google account.) Am I doing something wrong or could there be a problem somewhere else? I’ve run the app successfully before. Do you have any suggestions, please? I’d be grateful for any help.

    • Martin Hawksey

      Hi Keith – looks like Twitter changed their authentication flow. I’ve updated the app so hopefully it works now, if not ping me 🙂

  • Stefan

    Thanks for this awesome Script! Any plans to support the new 280 Char Tweet length?

    • Martin Hawksey

      So others know – I had missed this but have updated the code so it pulls full tweet text. If you’ve already setup this no action is required to get full text from now. In my own archive I deleted a couple of months to get it to write the full text. If you want to do similar you could rollback to an earlier commit or delete the month data files in \data\js\tweets and edit \data\js\tweet_index.js to remove the same months from the index. Note: as this script uses GET statuses/user_timeline only the last 3,200 tweets can be collected, so keep this in mind before deleting all your data. Also Twitter have stopped providing archived tweets in a download that is compatible with this solution 🙁 Old archives will still work

  • drikkes

    Hey Martin,
    it’s not the first time that the newest tweets in my archive get displayed repeatedly about 20 times. See? https://drikkes.github.io/
    Is this a hiccup on behalf of Github? I got an error mail from them a few weeks ago, but since I didn’t see any problem then, I didn’t read it exactly, so I can’t remember what it said.
    Strangely, older repeatings seems to disappear again…

    • drikkes

      Ah, and as I see now, the last three years in the sidebar get repeated, too.

        • drikkes

          Actually, I have no idea what that means and how to do that. Do I get a hint, if I rewatch the installion video? Or would it be better for a non-coding-guy to (ha!) simply rebuild a new archive from scratch?

        • Martin Hawksey

          I was hoping you were a Github ninja as I’m not even sure about how you rollback to a previous commit but there is also an easier way.
          If you go to https://github.com/drikkes/drikkes.github.io/tree/5cc9da9bbc905629366f5cc1ba7c2d03d8b04763 and click the ‘Clone or download’ button and download as a .zip it will have all the files for that commit. You can use these to replace the current version of you archive (similar process to when you first create the archive on github)

          • drikkes

            I’m sorry, but I went to this link, downloaded the files, wasn’t able to replace folders, was too lazy to replace every single file – I just opened a new repository with the downloaded data.
            Of course it lacked the tweets from the last two weeks, but the newly published site also contained the doublications of mostly retweets.
            I’m not a pro, for sure, but this is wrong, I guess.
            And I discovered a new bug: When I do a search in my archive, some tweets also show up repeatedly – some not. I guess which one do, is due to the repeated year statistics in my sidebar, but that’s just an educated guess.
            I have no idea, what to do. I think I have to reinstall everything…

          • Martin Hawksey

            Hi – I’ve got another report of a similar problem so definitely a bug of some sort. I’m busy for the next couple of days but will try and find some time to see if I can debug.

  • Chris Jobling

    Hi Martin
    I realise that it’s been remiss of me not to thank you for this tool that I’ve been using since I first came across this post! It’s been ticking over without any intervention for more than two years to produce https://cpjobling.github.io/cpjobling-tweets/#.
    Thanks and Happy new year!
    See you at #SocMedHE18!

    • mhawksey

      Hi Chris – nice to know your archive still lives. I was pleased to see my own one happily continue in 2019 🙂

  • Martin Hawksey

    For info from the 11 October 2018 a code update introduced a bug where the script would add duplicate tweets to your archive. I’ve hopefully resolved this issue and pushed an update. There no need to do anything as the script will use the new code, but if your archive has become a mess you could download an earlier commit of your archive and rollback to this version (as long as you’ve made less than 3,200 tweets since the rolled back version the script will be able to collect all the data).

    • Ben Wilkoff

      I’m still experiencing the duplication issue for retweets. I’ve been able to modify and get rid of many of the duplicates, but it does seem to keep happening with every new refresh if the most recent tweet is a Retweet. I know you aren’t responsible for every change Twitter makes to their archive API, but I am hoping to continue using this archive in the future. Thanks for any help you might be able to provide. You are amazing!

  • Ken Bauer

    So after I messed up and tried to do this with “your Twitter data” (which includes everything about you and DMs and stuff that shouldn’t really go on a public visible archive. Oops.
    Link about the data format change: https://kyleconroy.com/your-twitter-data
    You can get your Tweet Archive (what you need for this project) here: https://twitter.com/settings/account
    The one that gives you *everything* (be careful where you put that) is https://twitter.com/settings/your_twitter_data
    Thanks, this should work now.
    – Ken

    • Martin Hawksey

      Thank you so much Ken for looking into this – I thought Twitter had killed this by no longer providing the archive in a format that can be used. Updated the instructions to let people know.

  • Bitz

    Is this possible with facebook public pages posts

    • Martin Hawksey

      Not with this solution but someone else may have created something for that

  • Keith Frankish

    Hi Martin — The dating on my archive seems to have gone awry. In the side panel on the right, tweets from February 2019 are listed under April 2018, in a second row for that year. The problem seems to have righted itself for the new month, but there are now double rows in the panel for 2018 and 2019. See https://k0711.github.io/keithfrankish-twitter/#
    Do you have any idea what has happened, please, and how to fix it? I’m reluctant to recreate the archive from scratch since I gather from one of your earlier replies in the thread that this may result in my losing data.

    • Martin Hawksey

      Hi Keith – this is an odd one. It appears when the Feb 2019 index was created (and a couple of other months looking at it) the wrong month and year number was added https://github.com/k0711/keithfrankish-twitter/commit/248ea6138b0bf7760778bd2913f5ceb160e6f133#diff-60302947636109b0df62592939d91843
      It appears all the tweet data is there correctly in your archive and all you need to do is manually edit the following file https://github.com/k0711/keithfrankish-twitter/blob/master/data/js/tweet_index.js (when logged in to github you can edit the file online – there are a couple of bad years and months so you might want to check the entire file)
      Looking at why this happened for February 2019 it was because the first thing you tweeted that month was a RT from April 2018 https://twitter.com/amiguello1/status/987076270352752646.
      I’ve pushed an update to the code that should prevent this happening in the future

      • Keith Frankish

        Many thanks for the rapid response, Martin. That worked perfectly!
        May I ask your advice on another little problem, please? I have created archives for three Twitter accounts, each set to sync daily using your app. The archives are here:
        https://k0711.github.io/keithfrankish-twitter/
        https://k0711.github.io/philhellenes-twitter/
        https://k0711.github.io/mariakasmirli-twitter/
        They seem to be working OK. (There’s the same dating problem with the last one, but I know how to fix that now.) The problem is that roughly every other day I get an auto error message from Google Scripts. Here’s a sample:
        ************
        Your script, Twitter Archive Update to Github, has recently failed to finish successfully. A summary of the failure(s) is shown below. To configure the triggers for this script, or change your setting for receiving future failure notifications, click here.
        Summary:
        Error Message Count
        Error: {“message”:”Bad credentials”,”documentation_url”:”https://developer.github.com/v3″} (line 146, file “GithubService”) 2
        Start Function Error Message Trigger End
        3/1/19 5:21 PM updateArchive Error: {“message”:”Bad credentials”,”documentation_url”:”https://developer.github.com/v3″} (line 146, file “GithubService”) time-based 3/1/19 5:21 PM
        3/2/19 5:21 PM updateArchive Error: {“message”:”Bad credentials”,”documentation_url”:”https://developer.github.com/v3″} (line 146, file “GithubService”) time-based 3/2/19 5:21 PM
        **************
        Can you help me diagnose the problem, please? I’m not even sure which of the three accounts is causing it. The error doesn’t seem to be generated on every sync.
        Best, Keith

Comments are closed.

css.php