Google Apps

1 Comment

Here is some text I prepared for a possible Google Apps Developer blog guest post. It doesn’t look like it’s going to get published so rather than letting it go to waste I thought I’d publish here:


Martin Hawksey is a Learning Technology Advisor for the JISC funded Centre for Educational Technology and Interoperability Standards (JISC CETIS) based in the UK. Prior to joining JISC CETIS, and in his spare time, Martin has been exploring the use of Google Apps and Apps Script for education. In this post Martin highlights some features of a Google Apps Script solution which combines Google Spreadsheet and Google Documents to speed up and standardise personal feedback returned to students at Loughborough College.

One of things that drew me to Apps Script over two years ago was the ease in which you could interact with other Google services. I also found that both using Google Spreadsheets and a coding syntax I recognised ideal as a ‘hobbyist’ programmer.

Late last year when I was approached by Loughborough College to take part in their ‘Fast Tracking Feedback’ project, I saw it as an ideal opportunity to get staff using Apps Script  and showcase the possibilities of Apps Script to the Google Apps for Education community.

The goal of the project was to produce a mechanism that allows tutors to input assignment grades using a custom UI that mirrors the final feedback sheet or enter details directly into a Google Spreadsheet.  These details are then pushed out as individually personalised Google Documents shared with the student. This sounds relatively simple, but the complication is that each assignment needs to map to a predefined set of rubrics which vary between units. For example in one course alone there are over 40 units and every unit can be assessed using multiple assignments with any combination of predefined criteria ranging from pass, merit and distinction.

Below is an example student feedback form highlighting the regions that are different for each assignment.

Example student feedback form highlighting the regions that are different for each assignment

The video below shows a demonstration of the current version of the of the ‘Fast Tracking Feedback’ system is set-up and used:

Solution highlights

A number of Apps Script Services have been used as part of this project. Lets look at how some of these have been implemented.

DocList Service – The self-filing Google Spreadsheet

The eventual plan is to rollout the Fast Tracking Feedback system to teaching teams across the College. To make the life of support staff easier it was decided to use a common filing structure. Using a standardised structure will help tutors stay organised and aid creation of support documentation.

When a tutor runs the setup function on a new feedback spreadsheet it checks that the correct folder structure exists (if not making it) and moves the current spreadsheet into the pre-defined collection.


Self-generating folder structure and organization

The code that does this is:

// code to generate folder structure and move spreadsheet into right location 
// + ROOT_FOLDER 
// |- SPREADSHEET_FOLDER 
// |- DRAFT_FOLDER 
// |- RELEASED_FOLDER 
var rootFolder = folderMakeReturn(ROOT_FOLDER); // get a the system route folder (if it deosn't existing make it 
// create/get draft and release folders 
var draftFolder = folderMakeReturn(DRAFT_FOLDER,rootFolder, ROOT_FOLDER+"/"+DRAFT_FOLDER); 
var releaseFolder = folderMakeReturn(RELEASED_FOLDER,rootFolder, ROOT_FOLDER+"/"+RELEASED_FOLDER); 
var spreadsheetFolder = folderMakeReturn(SPREADSHEET_FOLDER, rootFolder, ROOT_FOLDER+"/"+SPREADSHEET_FOLDER); 

// move spreadsheet to spreadhsheet folder 
var file = DocsList.getFileById(SpreadsheetApp.getActiveSpreadsheet().getId()); 
file.addToFolder(spreadsheetFolder); 

// function to see if folder exists in DocList and returns it 
// (optional - if it doesn't exist then makes it) 
function folderMakeReturn(folderName,optFolder,optFolderPath){ 
try { 
   if (optFolderPath != undefined){ 
     var folder = DocsList.getFolder(optFolderPath); 
   } else { 
     var folder = DocsList.getFolder(folderName); 
   } 
   return folder; 
} catch(e) { 
   if (optFolder == undefined) { 
     var folder = DocsList.createFolder(folderName); 
   } else { 
     var folder = optFolder.createFolder(folderName); 
   } 
   return folder; 
} 
}

UI Service – Hybrid approach

A central design consideration was to make the Fast Tracking Feedback system easy for College staff to support and change. Consequently wherever possible the Apps Script GUI Builder was used to create as much of the user interface as possible. Because of the dynamic nature of the assessment rubrics part of the form is added by selecting an element holder and adding labels, select lists and textareas. Other parts of the form like the student information at the top can be added and populated with data by using the GUI Builder to insert textfields which are named using normalized names matching the spreadsheet column headers. The snippet of code that does this is:

app.getElementById(NORMHEADER[i]).setText(row[NORMHEADER[i]]);

Where NORMHEADER is an array of the normalized spreadsheet column names and row is a JavaScript Object of the row data generated based on the Reading Spreadsheet data Apps Script Tutorial.

Hybrid UI construction using GUI Builder and coding

Document Services – Master and custom templates

The process for filling in personalized feedback forms has three main steps. First a duplicate of the Master Template is made giving it a temporary name (DocList Services). Next the required assessment criteria are added to the form using the Document Services mainly using the TableCell Class. Parts of the document that are going to be filled with data from the spreadsheet are identified using a similar technique to the Apps Script Simple Mail Merge Tutorial. Finally for each student the assignment specific template is duplicated and filled with their personalised feedback.

if (email && (row.feedbackLink =="" || row.feedbackLink == undefined)){
  // Get document template, copy it as a new temp doc, and save the Doc's id
  var copyId   = DocsList.getFileById(newTemplateId)
                         .makeCopy(file_prefix+" - "+email)
                         .getId();
  var copyDoc  = DocumentApp.openById(copyId);
  // move doc to tutors folder
  var file = DocsList.getFileById(copyId);
  var folder = DocsList.getFolder(ROOT_FOLDER+"/"+DRAFT_FOLDER);
  file.addToFolder(folder);

  // select the document body
  var copyBody = copyDoc.getActiveSection();

  // find edittable parts of the document
  var keys = createKeys(copyDoc);

  // loop through elements replacing text with values from spreadsheet
  for (var j in keys) {
    var text = keys[j].text;
    var replacementText = ""; // set the default replacement text to blank
    if (row[keys[j].id] != undefined){ // if column value is defined get text
      replacementText = row[keys[j].id];
    }
    copyBody.replaceText('{%'+keys[j].text+'%}', replacementText); // replace text
  }
  copyDoc.saveAndClose();

  // create a link to the document in the spreadsheet
  FEEDSHEET.getRange("B"+(parseInt(i)+startRow)).setFormula('=HYPERLINK("'+file.getUrl()+'", "'+copyId+'")');
  FEEDSHEET.getRange("C"+(parseInt(i)+startRow)).setValue("Draft");
  // you can do other things here like email a link to the document to the student
}

Currently the system is configured to place generated feedback forms into a draft folder. Once the tutor is happy for the feedback to be released either individual or class feedback forms are distributed to students from a menu option in the feedback spreadsheet for the assignment, a record being kept of the status and location of the document.

Easy record keeping

Next steps/Get the code

The Fast Tracking Feedback System is currently being piloted with a small group of staff at Loughborough College. Comments from staff will be used to refine the system over the next couple of months. The current source code and associated template files are available from here.

The Fast Tracking Feedback project is funded by the UK’s LSIS Leadership in Technology (LIT) grant scheme.

5 Comments

Another post related to my ‘Hacking stuff together with Google Spreadsheets’ session at  Dev8eD (A free event for building, sharing and learning cool stuff in educational technology for learning and teaching!!) next week. In this example Google Apps Script is used to create a custom user interface that can be used in Google Spreadsheets, allowing tutors to enter feedback and grades based on individual assessment criteria pulled from a central data source (another Google Spreadsheet). The system then generates personalised feedback forms (Google Documents) based on the data and distributes them to students.


As part of my work outwith JISC CETIS I have been helping staff at Loughborough College with their LSIS funded Fast Tracking Feedback project. As part of this project I’ve helping staff create a system that standardises and speeds up the return of assignment feedback to students. This project has generated a number of outputs for the wider community including training material and some code snippets (sending free SMS | generating Google Documents from Spreadsheets).

As my official involvement in the project comes to the close there is another chunk of code and resources to push out in the wild. It’s complete copy of the beta system currently being piloted with staff at Loughborough College. If you want to get an idea of how it works here’s a short video demonstrating the system I did as a lightning talk at GEUG12.

The code

If you want to pick over the code for this I’ve dumped a copy in github. This is more for reference as the code makes use of the Apps Script GUI Builder for parts of the interface, which can’t be extracted from the Spreadsheet. For a functional version you’ll need to make a copy of the four documents linked to below (this is followed by some instructions on setup and usage). I should also point out that this system has been build around the British BTEC qualifications. An example of the assessment and grading criteria is on page 3 of this document. Hopefully there is enough reusable code for other qualification systems.

Files you’ll need

  • Master Spreadsheet – this is the main document with all the Apps Script in it.
  • Master Template (you’ll need to File > Make a copy) – this is a Google Document used as a template for the form
  • Criteria Sheet – spreadsheet of units and courses with the related assessment criteria/rubric
  • Student Lookup – spreadsheet of student fullnames and related Google Apps ids (used because the App Script Group Services can only return ids and email addresses)

The basic setup

  1. Place all four copied files into a folder in Google Docs (you can name the folder anything you like).
  2. Change the share setting on the folder so that either ‘Anyone with the link’ or ‘People at your Google Apps domain with the link’ can view.
  3. Open your copy of the Master Spreadsheet and open Tools > Script Editor…
  4. On lines 17-19 copy and paste the document id/keys for Master Template, Criteria Sheet and Student Lookup. You can get these by opening the documents and looking at the browser url for the highlighted bits.
    Where to find spreadsheet/document keys
  5. From the Script Editor you can also open File > Build a user interface… and then open the importStudentList and click on the ‘Enter the group …’ label to edit the text in the property pane on the right-hand-side and then similarly for the textfield beneath it. There’s also the option to customise/add logos to the entryForm GUI
  6. In the Criteria Sheet create your list of units/courses and associated assessment criteria
  7. In the Student Lookup sheet import a list of Google Apps Ids and names

There are additional options in the code to change folder names, pass, merit, distinction colouring.
That’s it. Enjoy and any of your thoughts are welcomed in the comments (I can’t make any guarantees to respond to all)

The Fast Tracking Feedback project is funded by the UK’s LSIS Leadership in Technology (LIT) grant scheme.

6 Comments

In the original JISC OER Rapid Innovation call one of the stipulations due to the size and durations of grants is that the main reporting process is blog-based. Amber Thomas, who is the JISC Programme Manager for this strand and a keen blogger herself, has been a long supporter of projects adopting open practices, blogging progress as they go. Brian Kelly (UKOLN) has also an interest in this area with a some posts including Beyond Blogging as an Open Practice, What About Associated Open Usage Data?

For the OERRI projects the proposal discussed at the start-up meeting was that projects adopt a taxonomy of tags to indicate keys posts (e.g. project plan, aims, outputs, nutshell etc.). For the final report projects would then compile all posts with specific tags and submit as a ms-word or pdf.

There are a number of advantages of this approach one of them, for people like me anyway, is it exposes machine readable data that can be used in a number of ways. In this post I’ll show I’ve create a quick dashboard in Google Spreadsheets which takes a list of blog RSS feeds and filters for specific tags/categories. Whilst demonstrated this with the OERRI projects the same technique could be used in other scenarios, such as, as a way to track student blogs. As part of this solution I’ll highlight some of the issues/affordances of different blogging platforms and introduce some future work to combine post content using a template structure.

OERRI Project Post Directory
Screenshot of OERRI post dashboard

The OERRI Project Post Directory

If you are not interested in how this spreadsheet was made and just  want to grab a copy to use with your own set of projects/class blogs then just:

*** open the OERRI Project Post Directory ***
File > Make a copy if you want your own editable version

The link to the document above is the one I’ll be developing throughout the programme so feel free to bookmark the link to keep track of what the projects are doing.

The way the spreadsheet is structured is the tags/categories the script uses to filter posts is in cells D2:L2 and urls are constructed from the values in columns O-Q. The basic technique being used here is building urls that look for specific posts and returning links (made pretty with some conditional formatting).

Blogging platforms used in OERRI

So how do we build a url to look for specific posts? With this technique it comes down to whether the blogging platform supports tag/category filtering so lets first look at the platforms being used in OERRI projects.

chart1This chart (right) breaks down the blogging platforms. You’ll see the most (12 of 15) are using WordPress in two flavours, ‘shared’, indicating that the blog is also a personal or team blog containing other posts not related to OERRI and ‘dedicated’, setup entirely for the project.

The 3 other platforms are 2 MEDEV blogs and the OUs project on Cloudworks. I’m not familiar with the MEDEV platform and only know a bit about cloudworks so for now I’m going to ignore these and concentrate on the WordPress blogs.

WordPress and Tag/Category Filtering

One of the benefits of WordPress is you can can an RSS feed for almost everything by adding /feed/ or ?feed=rss2 to urls (other platforms also support this, I a vague recollection of doing something similar in blogger(?)). For example, if you want a feed of all my Google Apps posts you can use https://mashe.hawksey.info/category/google-apps/feed/.

Even better is you can combine tags/categories with a ‘+’ operator so if you want a feed of all my Google Apps posts that are also categorised with Twitter you can use https://mashe.hawksey.info/category/google-apps+twitter/feed/.

So to get the Bebop ‘nutshell’ categorised post as a RSS item we can use: http://bebop.blogs.lincoln.ac.uk/category/nutshell/feed/

Looking at one of the shared wordpress blogs to get the ‘nutshell’ from RedFeather you can use: http://blogs.ecs.soton.ac.uk/oneshare/tag/redfeather+nutshell/feed/

Using Google Spreadsheet importFeed formula to get a post url

The ‘import’ functions in Google Spreadsheet must be my favourites and I know lots of social media professionals who use them to pull data into a spreadsheet and produce reports for clients from the data. With importFeed we can go and see if a blog post under a certain category exists and then return something back, in this case the post link. For my first iteration of this spreadsheet I used the formula below:

importFeed formula

This works well but one of the drawback of importFeed is we can only have a maximum of 50 of them in one spreadsheet. With 15 projects and 9 tag/categories the maths doesn’t add up.

To get around this I switched to Google Apps Script (macros for Google Spreadsheets I write a lot about). This doesn’t have an importFeed function built-in but I can do a UrlFetch and Xml parse. Here’s the code which does this (included in the template):

Note this code also uses the Cache Service to improve performance and make sure I don’t go over my UrlFetch quota.

We can call this function like other spreadsheet formula using ‘=fetchUrlfromRSS(aUrl)’.

Trouble at the tagging mill

So we have a problem getting data from none WordPress blogs, which I’m quietly ignoring for now, the next problem is people not tagging/categorising posts correctly. For example, I can see Access to Math have 10 post including a ‘nutshell’ but none of these are tagged. From a machine side there’s not much I can do about this but at least from the dashboard I can spot something isn’t right.

Tags for a template

I’m sure once projects are politely reminded to tag posts they’ll oblige. One incentive might be to say if posts are tagged correctly then the code above could be easily added to to not just pull post links but the full post text which could then be used to generate the projects final submission.

Summary

So stay tuned to the OERRI Project Post Directory spreadsheet to see if I can incorporate MEDEV and Cloudworks feeds, and also if I can create a template for final posts. Given Brian’s post on usage data mentioned at the beginning should I also be tracking post activity data on social networks or is that a false metric?

I’m sure there was something else but it has entirely slipped my mind …

BTW here’s the OPML file for the RSS feeds of the blogs that are live (also visible here as a Google Reader bundle)

5 Comments

Lou McGill from the JISC/HEA OER Programme Synthesis and Evaluation team recently contacted me as part of the OER Review asking if there was a way to analyse and visualise the Twitter followers of @SCOREProject and @ukoer. Having recently extracted data for the @jisccetis network of accounts I knew it was easy to get the information but make meaningful was another question.

There are a growing number of sites like twiangulate.com and visual.ly that make it easy to generate numbers and graphics. One of the limitations I find with these tools is they produce flat images and all opportunities for ‘visual analytics’ is lost.

Click to see twiangulate comparison of SCOREProject and UKOER
Twiangulate data
Click to see visual.ly comparison of SCOREProject and UKOER
create infographics with visual.ly

So here’s my take on the problem. A template constructed with free and open source tools that lets you visually explorer the @SCOREProject and @ukoer Twitter following.

Comparison of @SCOREProject and @ukoerIn this post I’ll give my narrative on the SCOREProject/UKOER Twitter followership and give you the basic recipe for creating your own comparisons (I should say that the solution isn’t production quality, but I need to move onto other things so someone else can tidy up).

Let start with the output. Here’s a page comparing the Twitter Following of SCOREProject and UKOER. At the top each bubble represents someone who follows SCOREProject or UKOER (hovering over a bubble we can see who they are and clicking filters the summary table at the bottom).

Bubble size matters

There are three options to change how the bubbles are sized:

  • Betweenness Centrality (a measure of the community bridging capacity); (see Sheila’s post on this)
  • In-Degree (how many other people who follower SCOREProject or ukoer also follow the person represented by the bubble); and
  • Followers count (how many people follower the person represented by the node

Clicking on ‘Grouped’ button lets you see how bubble/people follow either the SCOREProject, UKOER or both. By switching between betweeness, degree and followers we can visually spot a couple of things:

  • Betweenness Centrality: SCOREProject has 3 well connected intercommunity bubbles @GdnHigherEd, @gconole and  @A_L_T. UKOER has the SCOREProject following them which unsurprisingly makes them a great bridge to the SCOREProject community (if you are wondering where UKOER is as they don’t follow SCOREProject they don’t appear.
  • In-Degree: Switching to In-Degree we can visually see that the overall volume of the UKOER group grows more despite the SCOREProject bubble in this group decreasing substantially. This suggests to me that the UKOER following is more interconnected
  • Followers count: Here we see SCOREProject is the biggest winner thanks to being followed by @douglasi who has over 300,000 followers. So whilst SCOREProject is followed by less people than UKOER it has a potential greater reach if @douglasi ever retweeted a message.

Colourful combination

Sticking with the grouped bubble view we can see different colour grouping within the clusters for SCOREProject, UKOER and both. The most noticeable being light green used to identify Group 4 which has 115 people people following SCOREProject compared to 59 following UKOER. The groupings are created using community structure detection algorithm proposed Joerg Reichardt and Stefan Bornholdt. To give a sense of who these sub-groups might represent individual wordclouds have been generated based on the individual Twitter profile descriptions. Clicking on a word within these clouds filters the table. So for example you can explore who has used the term manager in their twitter profile (I have to say the update isn’t instant but it’ll get there. 

wordclouds

Behind the scenes

The bubble chart is coded in d3.js and based on Animated Bubble Chart by Jim Vallandingham. The modifications I made were to allow bubble resizing (lines 37-44). This also required handling the bubble charge slightly differently (line 118). I got the idea of using the bubble chart for comparison from a Twitter Abused post Rape Culture and Twitter Abuse. It also made sense to reuse Jim’s template which uses the Twitter Bootstrap. The wordclouds are also rendered using d3.js by using the d3.wordcloud extension by Jason Davies. Finally the table at the bottom is rendered using the Google Visualisation API/Google Chart Tools.

All the components play nicely together although the performance isn’t great. If I have more time I might play with the load sequencing, but it could be I’m just asking too much of things like the Google Table chart rendering 600 rows. 

How to make your own

I should say that this recipe probably won’t work for accounts with over 5,000 followers. It also involves using R (in my case RStudio). R is used to do the network analysis/community detection side. You can download a copy of the script here. There’s probably an easier recipe that skips this part worth revisiting.

  1. We start with taking a copy of Export Twitter Friends and Followers v2.1.2 [Network Mod] (as featured in Notes on extracting the JISC CETIS twitter follower network).
  2. Authenticate the spreadsheet with Twitter (instructions in the spreadsheet) and then get the followers if the accounts you are interested in using the Twitter > Get followers menu option 
  3. Once you’ve got the followers run Twitter > Combine follower sheets Method II
  4. Move to the Vertices sheet and sort the data on the friends_count column
  5. In batches of around 250 rows select values from the id_str column and run TAGS Advanced > Get friend IDs – this will start populating the friends_ids column with data. For users with over 5,000 friends reselect their id_str and rerun the menu option until the ‘next_cursor’ equals 0 
    next cursor position
  6. Next open the Script editor and open the TAGS4 file and then Run > setup.
  7. Next select Publish > Publish as a service… and allow anyone to invoke the service anonymously. Copy the service URL and paste it into the R script downloaded earlier (also add the spreadhsheet key to the R script and within your spreadsheet File > Publish to the web 
    publish as service window
  8. Run the R script! ...  and fingers crossed everything works.

The files used in the SCOREProject/UKOER can be downloaded from here. Changes you’ll need to make are adding the output csv files to the data folder, changing references in js/gtable.js and js/wordcloud.js and the labels used in coffee/coffee.vis

So there you go. I’ve spent way too much of my own time on this and haven’t really explained what is going on. Hopefully the various commenting in the source code removes some of the magic (I might revisit the R code as in some ways I think it deserves a post on its own. If you have any questions or feedback leave them in the comments ;)

2 Comments

This morning I finished listening to Episode 5 of Data Stories: How To Learn Data Visualization. Data Stories is a bi-weekly podcast on data visualisation produced by Enrico Bertini and Moritz Stefaner, episode 5 also featuring Andy Kirk. For anyone interested in Data Visualisation I’d highly recommend you give it a listen.

Like many others I’m at the beginning of my data visualisation journey, one of the things this episode highlighted was there is a whole world of data visualisation experts out there that I’ve yet to start stealing learning from. Fortunately today another Visualisation expert, Nathan Yau (FlowingData), posted his list of Data and visualization blogs worth following. Perfect!

I could’ve gone through the list and individually subscribed to each of the blogs feeds but I’m lazy (so lazy that a 15 minute hack has turned into a 3 hour write-up <sigh>) and just wanted to dump them into my Google Reader. This is a problem Tony Hirst has encountered  in Feed-detection From Blog URL Lists, with OPML Output. One thing that is not clear is how Tony got his two column CSV of source urls. There are various tools Tony could have used to do this. Here’s my take on converting a page of blog urls into an OPML bundle.

Step 1 Extracting blogs urls: Method 1 using Scraper Chrome Extension

“Scraper is a Google Chrome extension for getting data out of web pages and into spreadsheets.”

Chrome users can grab a copy of Scraper here. Once installed if you go to Nathan Yan's Data and visualization blogs worth following and right-click on  the first link in the list and select ‘Scrape similar’

scraper menuIn the window that opens you should get something similar to the one below. Scraper has two options for identifying the parts of the page you want to extract, XPath or JQuery Selectors. Both of these have similar coding structures but for this example I’m going to stick with XPath. XPath basically provides a way to identify parts of the XML/HTML structure and extract content (if you are not familiar with XPath the w3schools is a great starting point).

Scraper dialog

In this example Scraper should default to ‘//div[1]/div[2]/ul[1]/li/a’. Here’s a quick explanation of how I read this query.  Because it starts with // it will select “nodes in the document from the current node that match the selection no matter where they are” for me this is the trigger to read the query from right to left as we are matching an endpoint pattern. So:

match all <a> in all <li> in first <ul> of second <div> (<div class=”entry-content”> of first <div> (<div class="entry">)

this give use the links from the first block of bullet point. We want the links from all of the bullet points lists so the pattern we want is

match first <a> in all <li> in all <ul> of second <div> of first <div>

So basically we need to switch a to a[1] and ul[1] to ul e.g. ‘//div[1]/div[2]/ul/li/a[1]’. Edit the XPath query and in the columns section beneath change the order by clicking and dragging so that @href/URL comes first. Clicking on the ‘Scrape’ button to get a new preview which should now contain a list of 37 urls. Click on Export to Google Docs … You are now ready to move to Step 2 Auto-discovering feed urls below.

Step 1 Extracting blogs urls: Method 2 using Google Spreadsheet importXML function

Another way to get this data is to directly scrape it using Google Spreadsheets using the importXML function. This function also uses XPath to extract parts of a webpage so we can reuse the query used in Method 1 but get the data straight into a spreadsheet (it’s also a live link so if Nathan adds a new link the spreadsheet will automatically update to include this). Let give it a go.

Create a new spreadsheet and in cells A1 to B3 enter the column heading Link, Title and Url. Next in cell A2 enter:

=ImportXML("http://flowingdata.com/2012/04/27/data-and-visualization-blogs-worth-following/","//div[1]/div[2]/ul/li/a[1]/@href")

Note the addition of @href. This is included to extract the href attribute in the <a>. You should now have similar list of 37 urls from Nathan’s post.  To get titles we could enter another importXML function in cell B2 using the XPath ‘//div[1]/div[2]/ul/li/a[1]’ which will extract the text between <a></a>. Another way is to actual scrape the data from the target url. So in cell B2 enter:

=ImportXML(A2,"//title")

So this will go to the url in A2 (http://infosthetics.com/) and extract anything wrapped in <title>

Now select cell B2 and fill the column down to get titles for all the urls. Finally we need to select the entire B column and Copy/Paste values only. The reason we do this is Google Spreadsheets only allows 50 importXML function per spreadsheet and we’ll need 37 more to get the RSS feeds for these sites.

Copy Paste as Values

Step 2 Auto-discovering feed urls

Initially i tried using Feed Autodiscovery With YQL with importXML using an XPath of "//link/@href" but I was not getting any results. So instead decided to auto-detect the feed directly using importXML. In cell C2 enter:

=ImportXML(A2,"/html/head/link[@rel='alternate'][1]/@href")

This time the XPath starts at the XML tree root (<html>) looks in the <head> for the first link with the attribute rel=’alternative’. From Tony’s post:

Remember, feed autodiscovery relies on web page containing the following construction in the HTML <head>element:
<link rel=”alternate” type=”application/rss+xml” href=”FEED_URL” title=”FEED_NAME” />

[I tried using //link[@rel="alternate" and @type="application/rss+xml"] but Google Spreadsheet didn’t like it, instead grabbing the first rel=alternate link]

Fill cell C2 down the rest of the column to get RSS feeds for the other urls. You’ll notice that there’s a #N/A for http://neoformix.com/ this is because their feed isn’t auto-discoverable. Visiting their site there is a XML link (http://neoformix.com/index.xml) that we can just paste into our spreadsheet (tiding data is a usual processes in data visualisation).

Step 3 Generating an OPML bundle

Spreadsheet -> OPML GeneratorYou should now have a spreadsheet like this one with 3 columns of data (if you used the Scraper extension in step 1/method 1 you’ll need to make sure your columns are headed Link, Title and Url for the next step). Next to turn our spreadsheet of feeds into an OPML bundle. Fortunately this step has been made super easy by using the Spreadsheet -> OPML Generator. Just follow the instructions on this site an seconds later you’ve got:

OPML File of Nathan Yau’s recommended Data and Visualisation Blogs

And because I’ve imported these into Google Reader here’s an aggregated page of their posts.

Update:

Tony Hirst said:

I said:

and the how to

Feed autodiscovery scraper - based on http://bit.ly/IDuJLu CC-BY psychemedia

1 Comment

A retweet yesterday by Amber Thomas (@ambrouk) of Anna Armstrong (@frenchdisko) alerted me to a feature of Google Docs I wasn’t aware of, that the Insert Image Search automatically filters for Creative Commons released pictures:

Insert image in Google Docs

Fantastic I thought. A way for staff to create open resources with millions of pictures to choose from and reuse with no more effort than if they were inserting any other image. Such a feature obviously doesn’t come without it’s health warnings. Clicking on ‘Learn more’ we can see:


Before reusing content that you've found, you should verify that its licence is legitimate and check the exact terms of reuse stated in the licence.
For example, most licences require that you give credit to the image creator when reusing an image. Google has no way of knowing whether the licence is legitimate, so we aren't making any representation that the content is actually lawfully licensed.

I can appreciate that Google’s search technology isn’t going to be 100% reliable in detecting which license is being used in an existing work, but wouldn’t it be great if when you inserted the image Google also gave their ‘best guess’ of the license for you to check and edit if necessary.

A better way for Google Docs to embed?
[This graphic includes Japanese light bulb - CC-BY Elliot Burke]

Or am I just being naïve about this whole thing?

PS I don’t know if something has gone horrible wrong with Google image indexing, but when in Insert Image Search I enter ‘lightbulb site:.flickr.com’ the thumbnails don’t always match the actual image.

thumbwhat

4 Comments

Update: Here's the updated Topsy Media Timeline Google Spreadsheet Template v2.1, which pulls data straight from Topsy. Follow the instructions in this template for your own Twitter/Topsy media timeline

Recently I posted an Experiment to dynamically timeline media posted on Twitter using Topsy and Timeline (my contribution to @Arras95) #arras95 which uses a Yahoo Pipes to extract tweets with images and videos using the Topsy Otter API, which is then pulled into a Google Spreadsheet before being rendered in a Timeline tool developed by Vérité.

This recipe appears to be working, that is the timeline is automatically updating with new media. There’s a separate question about practicality of the timeline and navigation which I’m quietly ignoring, instead I want to highlight some technical hit/misses and present a revised version.

Technical misses

Because Topsy includes tweets to image sites like twitpic and yfrog in the search results, which redirect to those sites rather than having an image source url these appear in frames (up until recently Timeline used the snapito.com API to convert into a thumbnail but this was recently removed because the free service was stopped).

image

To get around this I’ve modified the source Yahoo Pipe to only let image urls (new source Pipe here). This limits results to those uploaded via the Twitter official interfaces (e.g. Web/New Tweetdeck). Update: I've now coded the data collection from Topsy directly in the Google Spreadsheet using Google Apps Script. New version is available via the link at the end of the post. image

If you wanted to replicate the original experiment another drawback was that you would have to host the Timeline code somewhere. As not everyone has easy access to a web host I’ve published an interface which lets you include the published Google Spreadsheet key in the URL. Here’s an example for #cam12

Here’s a new version of:

*** Topsy Media Timeline Google Spreadsheet Template v2.1 ***

Enjoy!

[PS I’m looking forward to seeing what Sam comes up with using Timeline ;)]

17 Comments

Update: New version of this spreadsheet template here

There’s a new kid on the block if you are considering an open source timeline tools. For a long time the Simile Exhibit Timeline tool has been the tool of choice appearing in places like Freebase.com (click on Timeline in this page to see a history of internet search engines).

A nice feature of ‘Timeline’ is it’s focus on making it easy to embed content from other sites including individual tweets, videos from YouTube and Vimeo, images hosted on Flickr or with a direct url and audio from SoundCloud. Here’s an out-of-the-box example (I tried to use the embed code in this post but it seems to conflict with some of blog theme code (a jQuery problem))

I wanted to try out ‘Timeline’ to see how it preformed under different use cases. The two I had in mind were: Timeline/Google Spreadsheet as a simple OER creation tool (in part influenced by Pat Lockley’s post on using Google Maps as an OER authoring tool); and using Google Spreadsheet’s built-in functions to scrape and automagically publish information into a dynamic timeline.

The first case is fairly easy to do using the template and instructions on the Timeline site (although more complicated than Pat’s example). A couple of ‘gotchas’ for you. When I changed the spreadsheet setting to United Kingdom formats it messed up the dates on the timeline. I also had problems using Google Maps with external KML files (I’ve opened an issue). On to the fun bit though, gluing webservices together to generate dynamic timelines.

The glue – Google Spreadsheet

Because Google Spreadsheet sits in the cloud and has a number of ways to get live data feeds in they are great for gluing data streams together and republishing in different formats. Also as Timeline likes Google Spreadsheets  all we need to do is get some data in a format Timeline likes and it should happily start updating itself … in theory anyway.

The data side left me scratching my head a bit. There’s lots of data out there its just finding some with readable timestamps. I had thought about pulling information from Wikipedia but found tables of dates not particularly machine readable. Then I started reading about the @Arras95 event which is happening as part of the JISC funded WW1C project run by the University of Oxford.

Between the 9th April and 16th May 2012 an experiment in social media will take place. We will tweet the events of the Battle of Arras in realtime, from the perspective of a neutral reporter on the field. What makes this Twitter event different from other realtime tweeting initiatives (and there are some great ones out there!) is that @Arras95 will engage online communities, crowdsourcing facts about Arras and the individuals who played a part, asking for reappraisals and additions to the action as it happens.

You can read more about how to get involved in the Contribute. Collaborate. Commemorate. I could just scrape the @Arras95 tweets and put them in Timeline, but where would the fun be in that ;) Instead I want to capture some of the visual richness. Whilst I could start to unpick media links to videos and images from the official Twitter stream, there’s no need as the social web search site Topsy already does this and the data is accessible via the Topsy Otter API.

More glue – Yahoo Pipes

As Arras95 hasn’t started yet here’s an example call to #ukoer looking for video. The result is in JSON which is usually great for other mashups but unfortunately it’s a format Google Spreadsheet’s doesn’t like (although you can handle it with Google Apps Script, but on this occasion I was trying to avoid that route). Instead I turned to Yahoo Pipes, which hopefully won’t disappear just yet despite Yahoo laying off 2,000 of its staff this week.

Yahoo Pipe pulling Topsy dataPipes is right at home with JSON  and what's more (despite hiding the option) you can output the data in .csv which Google Spreadsheet does like. Here’s a Pipe which builds a search query for images and videos posted on Twitter for a user entered search term. I’ve also prepared a slightly different Pipe which has the search hard-coded as well as pulling tweets from the @Arras95 twitter account (in both these you can edit/clone the source)

Piecing it together – importing Yahoo Pipes into Google Spreadsheets

From the Timeline site there is a Google Spreadsheet Template. This gives us the format we need to get the data in. For now lets keep working with #ukoer as this gives us some data to play with. Here’s a copy of the template with an extra sheet called data. In cell B1 of the data sheet is the formula:

=ImportData("http://pipes.yahoo.com/pipes/pipe.run?_id=61ef2f4123254106f5acdea3db47d092&_render=csv&q="&urlencode(Readme!B7))

 

This comes from running the Pipe with a search term and copying the ‘Get as RSS’ link, which is:

http://pipes.yahoo.com/pipes/pipe.run?_id=61ef2f4123254106f5acdea3db47d092&_render=rss&q=%23ukoer

getting the data feedYou’ll see I’ve highlighted two parts of this url. At _render I’ve changed rss to csv and in the formula the search term is replaced by a cell value (the latter was so I could share/reuse the template). I should say urlencode is a custom formula I wrote using Apps script to encode the search term. It’s a nice little ditty that goes like this:

function urlencode(text) {   
 return encodeURIComponent(text)
}

Down column A of data there is another custom function to convert entity numbers into characters eg turn &#39; into apostrophe’s. That particular ditty goes:

function entitydecode(text){   
 //http://stackoverflow.com/a/6155669   
 return text.replace(/&amp;#(\d+);/g,function(match, number){ return String.fromCharCode(number); });
}

Back in the spreadsheet on the ‘od1’ sheet we start pulling in the bits of data we need for the timeline. This mainly uses ArrayFormulas in row 3 to populate all the data without having to manually fill in the column. For example in D3 we have:

=ARRAYFORMULA(IF(ISBLANK(data!E2:E),"",(data!E2:E/ 86400) + 25569))

which reads as ‘if the cell in column E of data is blank do nothing otherwise divide by 86400 and add 25569 (converts Unix epoch times used in the Topsy API into human/spreadsheet readable formats)

Slapping it into a Timeline

All that’s left to do is in the spreadsheet File > Publish to the web… and then find somewhere to host your timeline page. So that you can see what it looks like here’s one for #ukoer.

#ukoer media timeline

@Arras95 Living Timeline

Here is the @Arras95 timeline and the source spreadsheet.

@Arras95 Dynamic Timeline

Nothing much to see now apart from a test tweet. The theory is that this will self populate over time as data filters into Topsy. It’ll be interesting to see if it actually works or if I need to set up a Apps Script trigger for force a refresh.

If you would like to make your own dynamic timeline from tweeted media here’s:

*** The Topsy Timeline Template ***
[File > Make a copy to use]

5 Comments

As recently mentioned on Sheila's work blog the way the @jisccetis twitter account is evolving. Up until recently this account was used as a broadcast channel, pushing out latest news to followers and not following back. This was balanced by members of staff having personal twitter accounts, engaging with the community. As with any community there’s going to be overlap with common friendships and Phil Barker (@philbarker) suggested it would be good to see the extended JISC CETIS twitter follower network.

In this post I’ll introduce some sketches* with results to explore and show you how the data was extracted.

*this is a term I’ve picked up from Tony Hirst along with explanatory and exploratory visualisations both presented in More Thoughts on a Content Strategy for Data. The other thing I have sitting heavily in my thoughts is Eric Berlow’s TEDTalk where he shows complex doesn’t always mean complicated (H/T @PaulHollins). My fear is I’m going to dump you with complicated exploratory sketches, when I should be giving you a simple explanatory answers.

Dump #1 Blooming great

Blooming great

For this first dump I’ve deliberately left it as low resolution as I only want to give you an overview and not analyse each node. In the graph you’ll spot dense patches of purple [A] these are made of the individual twitter screen names of people following one of the CETIS twitter accounts. So at the very top of the image there is a cluster of people following just me [B]. Other dense patches represents other groups of people following other CETIS Twitter accounts. In the centre of the main group [C] are Twitter users who follow 2 or more CETIS accounts. In Gephi by rolling over nodes it’s easy to explore who people follow. To the right of the graph [D] is the @ArchimateTool account. This cluster has fewer connections to the main CETIS following. Finally around the centre of the graph are loose groups [E] of users who follow 2 CETIS staff.

Update: Some other stats. The average out-degree in the network is 1.424 and 81% of the people in graph only follow one of the CETIS accounts. It would be interesting to see how this compares with other organisations. It's important to also remember it's not just about twitter (email probably still has the best reach and conversion)

[If you are desperate to explore an interactive version of this I’ve put a copy on my install of Gexf-JS viewer.image

Dump #2 Many Eyes

Overall there are over 3,500 unique Twitter accounts that follow one or more CETIS staff accounts. 3,500 pairs of eyes looking at what CETIS or staff members are doing, with the potential to spread our message even further through their own networks. Here’s what a lot of those eyes look like (click for larger version on zoom.it):

Many eyes (click to see on zoom.it)

I suppose the next question is do we have the right Twitter audience watching us.  A quick wordcloud of the profile description of the staff following us:

CETIS Follower Description Wordle

Getting the data

My regular top traffic generating blog post is Export Twitter Followers and Friends using a Google Spreadsheet which allows users to easily grab details of up to 5,000 (more if you don’t mind some code tinkering) Twitter account friend/followers. I don’t know how widely known it is but Twitter doesn’t just let you get your own friends/followers, you can get the data for any public Twitter account. So that’s what I did, snaffled details of who was following @jisccetis and JISC CETIS staff with public twitter accounts.

The way the spreadsheet is set up it generates a separate sheet for each persons follower details. To make it easy to import into Gephi/NodeXL I wrote this short script:

Here’s a copy of the modded spreadsheet. To use File > Make a copy, run through the authentication instructions, grab some follower details from different accounts then run Twitter > Combine follower sheets. If you’re going to be using Gephi last thing you should do before downloading as csv is change the column heading on the ‘combined’ sheet from screen_name to source.

Using Gephi

The best way I’ve found to get the data in Gephi is start a new project and then use the Import Spreadsheet option in the Data Laboratory pointing it to the csv file downloaded from Google Spreadsheet. I’ll let you play with manipulating the data. If you come up with any nice recipes please share ;)

Using NodeXL

Open a blank NodeXL template and then open the downloaded csv in Excel as a new workbook, then from the NodeXL ribbon Import > Open workbook. Its worth ticking the extra columns as vertex 1 properties. Again I’ll let you play, any recipes please share (the many eyes image was generated by switching the nodes to image and using the profile_image_url extracted using the Google Spreadsheet and using a grid layout. If anyone has worked out how to using images as nodes in Gephi I’d be very interested to hear).

So what

I avoided going into any deep analysis with this as there are probably internal discussions to be had, such as, should we be targeting college staff more? What I hope this posts illustrates is it’s relatively easy to extract this type of data and start to get the very beginnings of some answers (e.g. how many unique followers do we have). There still a lot to unpick in this area so I’m sure I’ll be revisiting. My question to you is if you were doing this type of study what answers would you be looking for?

2 Comments

In Using Google Reader to create a searchable archive of Twitter mentions Alan Cann commented:

Subscribing to RSS feeds in Google Reader is my bog standard way of archiving Twitter feeds. Now to figure out how to get an RSS feed from a Google+ hashtag…

Lets look at how it might be possible. So there’s no visible RSS feed from the Google+ Search page. Looking at the API documentation there is documentation on Activities: search. So we could have a query like:

but there are a couple of problems. Data is returned in JSON and would need remapping to RSS. The real deal breaker, which is highlighted if you click on the link above, is you need to register for an API key from Google’s API Console to get the the data. So at this point I could setup a service to convert Google+ Searches into RSS feeds (and someone may have already done this), show you how to do it via the Console or show you some other way. For now I’m opting for ‘another way’.

Publishing any XML format using Google Spreadsheets

Using the same trick in Tracking activity: Diigo site bookmark activity data in a Google Spreadsheet (who is saving your stuff) we can extract some information from a Google+ Search page like this one into a Google Spreadsheet using the importXML function and XPath queries to pull out part of the page (here are parts of the same search pulled into a Google Spreadsheet). There is an option to publish a Google Spreadsheet as RSS or ATOM but it’s not structured in the same way as for a blog feed (title is a cell reference etc. like this).

What we need is a way to trick Google into formatting the data in a different way. As part of the Google Earth Outreach project a Spreadsheet Mapper was developed. This spreadsheet template allows you to enter geographic data which is automatically formatted as KML data (KML is another XML language for presenting geo data). This is achieved by creating a KML template within the spreadsheet and using the plain text output as KML. 

So using the same trick here’s a:

*** Google Spreadsheet Template to Turn Google+ Search into an RSS Feed ***

Google+ Search in Google Reader

Entering a search term and publishing the spreadsheet gives me a custom RSS feed of activity data. This feed works in Google Reader (haven’t tested any others), and with Reader we have the benefit of the results being cached (still not sure what the limitations are).

Important: Some things to be aware of. Because the data for this is extracted using XPath when Google change the page styling this solution probably won’t work anymore. Also the RSS feed being produced is for the last 10 search items. If you’ve got an active term then data might get lost.

So yet more resource based activity/paradata for you to digest!