Working with Github repository files using Google Apps Script: Examples in getting, writing and committing content

In a recent post I demonstrated how you can use Google Apps Script to add new tweets to a downloaded Twitter archive hosted on Github Pages. This project used the Github API to commit files to the repository. Given the demise of Google Drive web hosting and the potential usefulness of this approach I’ve extracted the parts of the project that handle the file gets and commits in the hope that it helps someone else out requiring to do something similar.

This isn’t the first Google Apps Script project that integrates with Github. Bruce Mcpherson has the very useful GasGit script that lets you backup/release Google Apps Script projects to Github, and I recall seeing other pieces of code floating around. Bruce’s code specifically deals with Google Apps Script project files, where my example can be used to commit other types of content. To handle calls to I’ve ported parts of the open source GitHub.js wrapper around the GitHub API by Michael Aufreiter into the demonstration code.

Quotas …

In both examples something to be aware of is you are restricted by Google Apps Script quotas. The two you are most likely to hit into are runtime and UrlFetchApp. The maximum runtime for a script is 6 minutes and if you are processing lots of files you may have to run in batches. More likely you’ll discover the limits of URLfetch, the HTTP/HTTPS service built-in to Google Apps Script and used to make the API calls to Github. In particular, you should be aware of the 10MB maximum payload size per call, so if you are planning on having big files you are going to need a different solution.

Setup

The Github API uses oAuth2 authentication. There isn’t a native handler for this in Google Apps Script but fortunately there are various libraries which have been created to handle this. In this example I’ll be using the oAuth2 library created by Eric Koleda but you could easily swap it out for another like Bruce Mcpherson’s Goa. I’ve created a demonstration project you can File > Make a copy to use or you can get the code on Github. To use the Github API you first need to create a developer project which will give you a client ID and secret for your code. When setting up your application remember to include the callback URL which needs to be set to https://script.google.com/macros/d/{SCRIPT ID}/usercallback, where {SCRIPT ID} is the ID of your script projectyou are using this library.

In my example I’ve stored the client ID and secret from Github using Script Properties in the Properties Service (if using in your own project these can be added in the Script Editor via File > Project properties). Something to watch out for is the scope required in your project. In this example I’ve .setScope('repo'). Without the correct scope Github will return a generic documentation url in it’s response and you may, like me end up pulling your hair out trying  to find what’s wrong (more about Github Scopes).

With your Github application setup you then need to authenticate against a user. The code includes a getGithubAuthURL() function you can run to get a url to visit from the Logger service or use the returned HTML button in your own UI. The authCallbackGit(e) in this example it currently only renders a simple success message. Once authenticated I’ve used  User Properties to store the token so all of this initial setup should only be required once. There are a couple of other bits you need to set up in the script, importantly setting which repository you want to update using Github.setRepo('YOUR_USERNAME', 'YOUR_REPO');

With your script setup lets look at some common functions:

Getting a small file (<1MB)

The Github API includes a get contents call. This can be used to get single or multiple files including the directory structure. As you will see later this call can also be used as part of calls for larger files. Some example calls for small files are given below:

/**
 * Getting a file less than 1MB. 
 * See https://developer.github.com/v3/repos/contents/#get-contents
 */
function getSmallFileFromGithub(){
  // set token service
  Github.setTokenService(function(){ return getGithubService_().getAccessToken();});
  // set the repository wrapper
  // Github.setRepo('YOUR_USERNAME', 'YOUR_REPO');
  Github.setRepo('mhawksey', 'mhawksey.github.io'); // e.g. Github.setRepo('mhawksey', 'mhawksey.github.io');
  var branch = 'heads/master'; // you can switch to differnt branch
  
  // getting a single file object
  var git_file_obj = Github.Repository.getContents({ref: branch}, 'tweets/data/js/payload_details.js');
  var git_file = Utilities.newBlob(Utilities.base64Decode(git_file_obj.content)).getDataAsString();
  
  var git_dir = Github.Repository.getContents({ref: branch}, 'tweets/data/js/');
  // In my project I included a getContentsByUrl which uses a git url which is useful if working within the tree
  var git_file_by_url = Github.Repository.getContentsByUrl(git_dir[0].git_url);
  
  Logger.log(git_dir);
}

Getting larger files

If files as greater than 1MB you need to get the data as a Blob. Claudio Cicali has shared some useful advice on this process:

The error that you get back instructs you to use another set of API, not the “simple” /contents for files that are too big. Instead, you have to use the /dataone. You basically need to ask Github for a blob, which is your file, but you can refer to it only by its SHA signature, not by its file name.

So your next problem is: How do I get the SHA of my file? As it turned out (and I’d like to be wrong here), it’s impossible. There is no way to ask Github for just the meta-information of a file: if you ask for a file resource (the aforementioned JSON, which happens to also contains the SHA), Github will also try to give you the content of it. And if the file is too big… you ain’t get nothing.

The solution I found is that you could still use the /contents API but to ask the content of the directory containing you fatty, stubborn file. You’ll then get — as a JSON — the array containing the data for each of the files which resides inside the directory. And guess what? Each array element contains the SHA of the file.

Here is an example of how you can get files over 1MB:

/**
 * For files over 1MB you get to fetch as a blob using sha reference. 
 * See https://developer.github.com/v3/git/blobs/#get-a-blob
 */
function getLargeFileFromGithub(){
  // set token service
  Github.setTokenService(function(){ return getGithubService_().getAccessToken();});
  // set the repository wrapper
  // Github.setRepo('YOUR_USERNAME', 'YOUR_REPO'); 
  Github.setRepo('mhawksey', 'mhawksey.github.io'); // e.g. Github.setRepo('mhawksey', 'mhawksey.github.io');
  var branch = 'heads/master'; // you can switch to differnt branch
  
  // first we get the git hub directory tree e.g. here getting the tweets sub-dir
  var tweet_dir = Github.Repository.getContents({ref: 'master'}, 'tweets');
  // filtering for the filename we are looking for
  var git_file = tweet_dir.filter(function(el){ return el.name === 'tweets.csv' });
  // getting the file
  var git_blob = Utilities.newBlob(Utilities.base64Decode(Github.Repository.getBlob(git_file[0].sha).content)).getDataAsString();
  
  Logger.log(git_blob);
}

Creating/Updating single files

As well as getting contents from a repo you can also create files. If you try and create a file with a path/filename that already exists you’ll get an error response and instead you have to use the update method. To update a file you are required to include the SHA signature of the existing file so you may find yourself making the getContent() call to get this. Here is an example of creating or updating a single file:

/**
 * Committing (creating/updating) a single file to github. 
 * See https://developer.github.com/v3/repos/contents/#create-a-file
 * and https://developer.github.com/v3/repos/contents/#update-a-file
 */
function commitSingleFileToGithub() {
  // set token service
  Github.setTokenService(function(){ return getGithubService_().getAccessToken();});
  // set the repository wrapper
  // Github.setRepo('YOUR_USERNAME', 'YOUR_REPO'); // e.g. Github.setRepo('mhawksey', 'mhawksey.github.io');
  Github.setRepo('mhawksey', 'mhawksey.github.io'); 
  var branch = 'master'; // you can switch to differnt branch
  
  // Sending string content
  var test_json = {foo:'bar'};
  try { // if file exists need to get it's sha and update instead
    var resp = Github.Repository.createFile(branch, 'test2.json', JSON.stringify(test_json), "YOUR FILE COMMIT MESSAGE HERE");
  } catch(e) {
    test_json.newbit = "some more data";
    var git_file_obj = Github.Repository.getContents({ref: branch}, 'test2.json');
    Github.Repository.updateFile(branch, 'test2.json', JSON.stringify(test_json), git_file_obj.sha, "YOUR UPDATED FILE COMMIT MESSAGE HERE");
  }
}

Committing multiple files

The process for committing files to Github may appear daunting but once you get your head around it it’s very straightforward. Patrick McKinley has an excellent walkthrough you can read but in summary you push files to Github (Github.Repository.createBlob()) and then send a directory tree of where those files go, referencing the old tree. The advantage of this approach to committing single files is you can push new files or update existing files using the create blob call (Github.Repository.createBlob()) and all the creating or updating is handled in the commit. The flow I used in my project looks like this:

/**
 * Adding multiple files to Github as a single commit.
 */
function commitMultipleFilesToGithub() {
  // set token service
  Github.setTokenService(function(){ return getGithubService_().getAccessToken();});
  // set the repository wrapper
  //Github.setRepo('YOUR_USERNAME', 'YOUR_REPO'); // e.g. Github.setRepo('mhawksey', 'mhawksey.github.io');
  Github.setRepo('mhawksey', 'mhawksey.github.io');
  var branch = 'heads/master'; // you can switch to differnt branch
  
  var newTree = []; // new tree to commit
  
  // Sending string content
  var test_json = {foo:'bar2'};
  // building new tree, by pushing content to Github, here pushing test_json
  newTree.push({"path": 'test2.json', // path includes path and filename - here adding test.json to repo root
                "mode": "100644",
                "type": "blob",
                "sha" : Github.Repository.createBlob(JSON.stringify(test_json)).sha});
   
  // Sending a blob - grabbing an example file to push
  var resp = UrlFetchApp.fetch("https://www.gstatic.com/images/icons/material/product/2x/apps_script_64dp.png");
  
  newTree.push({"path": 'apps_script_64dp.png', // path includes path and filename - here adding apps_script_64dp.png to repo root
                "mode": "100644",
                "type": "blob",
                "sha" : Github.Repository.createBlob(resp.getBlob().getBytes()).sha});          
                
  /* using http://patrick-mckinley.com/tech/github-api-commit.html as ref for this process */
  
  // 1. Get the SHA of the latest commit on the branch
  var initialCommitSha = Github.Repository.getRef(branch).object.sha;
  
  // 2. Get the tree information for that commit
  var initialTreeSha = Github.Repository.getCommit(initialCommitSha).tree.sha;
  
  // 3. Create a new tree for your commit
  var newTreeSha = Github.Repository.createTree(newTree, initialTreeSha).sha; 
  
  // 4. Create the commit
  var newCommitSha = Github.Repository.commit(initialCommitSha, newTreeSha, "YOUR COMMIT MESSAGE HERE").sha;                
  
  // 5. Link commit to the reference
  var commitResponse = Github.Repository.updateHead(branch, newCommitSha, false);
}

Summary

Hopefully you’ve found this post useful. Remember to extend these examples you need to include the oAuth2 library in your project and the partial port of Michael Aufreiter Github.js included in this demonstration project It would be easy to extend this code into a full Github service library for Google Apps Script, but not something I’m personally planning to do. As well as the complete Apps Script project I’ve also put all the code in these examples in this gist and embedded below. Enjoy!

2 Comments


  1. Wow looks great thanks! Can’t wait to try it. Don’t understand much of it but I’ll stick to my style of Git.. 😊 Really need to learn how to use GAS. So many APIs.. so little time

    BTW I screwed up yesterday – was trying to authenticate your revision bot, and realised that I didn’t it to tweet from my account, so I changed the API keys to another account that I had (while TAGS was running). And I think that got both Chrome and Twitter really confused because I was locked out of my own account for 12 hours even after I shut down the scripts and closed the browser, restarted the computer. Was afraid my account had been suspended. I’ve never had a problem using more than one set of API keys for the same Twitter account though.


Comments are closed.