API

1 Comment

I was building an analytics dashboard today that collected data from various services including Google Anaytics and YouTube. Apps Script makes this very easy as highlighted in my previous post. An issue I encountered when I tried to access our YouTube channel reports is even though my account is attached to as a manager I was getting a ‘Forbidden’ error.  Turning to the Channel Reports documentation I discovered:

channel==CHANNEL_ID – Set CHANNEL_ID to the unique channel ID of the channel for which you are retrieving data. The user authorizing the request must be the owner of the channel.

As our YouTube channel is associated with our Google+ page you can’t log in to Google Drive with that account. I did notice however that when I added YouTube Analytics as an Advanced Apps Script service the authentication prompt gave an option of authenticating using our Google+ page.

auth window 

The issue then is if you authenticate against the Google+ page you can’t get access to other services like Google Analytics. I thought of a couple of ways I might tackle this such as writing a separate Apps Script project that just got the YouTube Analytics data and wrote it to the spreadsheet I was working on. I’m not entirely sure how the permissions would work out on that. Instead my solution was to expose the YouTubeAnalytics.Reports.query  in a separate Apps Script published as a web app. Setting this to run ‘anyone, even anonymously’ I could then use UrlFetchApp to get the data in my main script.

Here’s how I set it up. Below (or in this gist) the 'main' script is handling all the data read/write to sheet and a separate 'proxy' Apps Script project running the YouTube Analytics data collection.

Note: This technique exposes our YouTube Channel report to the world (barring security by obscurity). The method we are exposing is read only so we don’t have to worry about an injection. 

Feels a bit hacky but can you see a better way of doing this?

Update 22/07/2014: Matias Molinas had the great suggestion of writing the results to a sheet which would avoid exposing your data. Jarom McDonald has also suggested using Google App Engine would give security and scalability à la superProxy

Jorum has a Dashboard Beta (for exposing usage and other stats about OER in Jorum) up for the community to have a play with: we would like to get your feedback!

For more information see the blog post here: http://www.jorum.ac.uk/blog/post/38/collecting-statistics-just-got-a-whole-lot-sweeter

Pertinent info: the Dashboard has live Jorum stats behind it, but the stats have some irregularities, so the stats themselves come with a health warning. We’re moving from quite an old version of DSpace to the most recent version over the summer, at which point we will have more reliable stats.

We also have a special project going over the summer to enhance our statistics and other paradata provision, so we’d love to get as much community feedback as possible to feed into that work. We’ll be doing a specific blog post about that as soon as we have contractors finalised!

Feedback by any of the mechanisms suggested in the blog post, or via discussion here on the list, all welcome.

The above message came from Sarah Currier on the [email protected] list. This was my response:

It always warms my heart to see a little more data being made openly available :)

I imagine (and I might be wrong) that the main users of this data might be repository managers wanting to analyse how their institutional resources are doing. So to be able to filter uploads/downloads/views for their resources and compare with overall figures would be useful.

Another (perhaps equally important) use case would be individuals wanting to know how their resources are doing, so a personal dashboard of resources uploaded, downloads, views would also be useful. This is an area Lincoln's Bebop project were interested in so it might be an idea to work with them to find out what data would be useful to them and in what format (although saying that think I only found one #ukoer record for Lincoln {hmm I wonder if anyone else would find it useful if you pushed data to Google Spreadsheets a la Guardian datastore (here's some I captured as part of the OER Visualisation Project}) ).

I'm interested to hear what the list think about these two points

You might also want to consider how the data is licensed on the developer page. Back to my favourite example, Gent use the Open Data Commons licence  http://opendatacommons.org/licenses/odbl/summary/

So what do you think of the beta dashboard? Do you think the two use cases I outline are valid or is there a more pertinent one? (If you want to leave a comment here I’ll make sure they are passed on to the Jorum team, or you can use other means).

[I’d also like to add a personal note that I’ve been impressed with the recent developments from Jorum/Mimas. There was a rocky period when I was at the JISC RSC when Jorum didn’t look aligned to what was going on in the wider world, but since then they’ve managed to turn it around and developments like this demonstrate a commitment to a better service]

Update: Bruce Mcpherson has been working some Excel/Google Spreadsheet magic and has links to examples in this comment thread

Share this post on:
| | |
Posted in API, Data, Jorum, OER and tagged on by .

6 Comments

My pseudo PhD supervisor for my mocorate degree, Dr Tony Hirst (Open University – and now Visiting Senior Fellow in Networked Teaching and Learning/Senior Fellow of the University of Lincoln – congrats) recently, amongst many other things, started Tinkering with the Guardian Platform API – Tag Signals and Visualising New York Times Article API Tag Graphs Using d3.js.

Like any fake PhD candidate it’s important to follow the work of your supervisor, after all they will be marking your imaginary neverending thesis. So after much toil and many pointers from Tony here’s what I’ve come up with - a collision of the Guardian Platform API and visualisation with the d3.js library – GuardianTagExplorer.

In this post I’ll highlight a couple of features of the interface and then try to recall many of the lessons learned. Below is a short clip to show how it’s supposed to work or you can have a play yourself via the link above (because it uses SVG the 9% of you who use Internet Explorer 8 or less won’t see anything):

What does it do

When you enter a search term it asks the Guardian Open Platform if there are any articles associated with that term. Each of these articles has some metadata attached including a list of tags used to categorise the piece. Using a ported version of Tony’s python code these tags are collected and the number of other articles from the search result with the same tag are counted. The page then renders this information as a force layout diagram using the d3.js visualisation library (tags and links = nodes and edges) and a histogram by putting the same data into the Google Visualization API.

I didn’t show it in the video but you can create predefined searches for linking and embedding. For example, here’s one for the term ‘JISC’ and if your RSS reader hasn’t stripped out the iframe the same page is embedded below:

How it was made/What I learned

I mentioned to my unsupervisor that I was thinking of doing something with the a live version of the Guardian Open Platform with d3 based on his Friendviz example and he immediately spotted a couple of problems, the biggest being the Guardian API prefer it if you keep your api key a secret.

Yahoo Pipes as a proxy service

Fortunately Tony also had the answer of using Yahoo Pipes with a private string block as a proxy sketchservice (I’m not sure if there is much benefit to doing this as while the API key is still hidden anyone can access the pipe. The API is rate limited anyway and I hope the Guardian people see I’m keeping to the spirit of the terms and conditions.

So data source, check. Porting python to JavaScript. Relatively straight forward apart from no combinations mapping function, but having sketched out what was going on I think I’ve got an equivalent.

ddd dd dd d3.js

Big headache! even though I’ve churned out a fair bit of code I’m not or never have been a professional programmer so getting my head around d3.js has been a big challenge. There were a couple of examples I spent a lot of time picking over trying to understand what was going on. The main ones were:

[These examples are renderings of GitHub Gists using the bl.ocks.org service created by … mbostock and is a great way to publish little snippets of stuff]

I also got a peak at the generated code for Tony’s Visualising New York Times Article API Tag Graphs Using d3.js. You’ll notice that my offering is similar in appearance and functionality to yohman’s example (I’m quietly ignoring his copyright mark – fair use etc :-s).

Its hard to convey exactly what I learned from the last couple of days of pushing pixels. The big difference between d3 and the similar protovis library I used here is there is a lot more setting up to do in the code. The payoff is you have far more control of the end result. Having spent days trying to understand d3, it was contrasted by the minutes needed to create the tag histogram using the Google Visualization API.

One thing I never got working is a zoom/pan effect. I’ve seen tiny snippets of code that does this for charts. Unfortunately the API Reference for this behaviour is still to be written.

Where next

Now that I’ve got a basic framework for visualising tag/category information I interested in refining this by trying out some other examples. So if you have an API you want me to play with drop me a line ;)

PS my code is here and you can see how it renders in blocks here

PPS really must setup a labs page outlining my various experiments

PPPS forgot to say during a d3 low I few together a prototype of the tag explorer using protovis in a couple of minutes

14 Comments

Did you tune into into Donald Clark’s “Don’t Lecture Me!” keynote at ALT-C 2010 or were you at Joe Dale’s FOTE10 presentation? These presentations have two things in common, both Donald and Joe posted their reflections of a ‘hostile’ twitter backchannel (see Tweckled at ALT! and Facing the backchannel at FOTE10) and I provided a way for them to endlessly relive their experience with a twitter subtitle mashup, wasn’t that nice of me ;) (see iTitle: Full circle with Twitter subtitle playback in YouTube (ALT-C 2010 Keynotes) and Making ripples in a big pond: Optimising videos with an iTitle Twitter track)

Something I’ve been meaning to do for a while is find a way to quickly (and freely) analysis a twitter stream to identify the audience sentiment. You can pay big bucks for this type of solution but fortunately viralheat have a sentiment API which gives developers 5,000 free calls per day. To use this you push some text to their API and you’ll get back the text mood and probability that the sentiment detected is correct (more info here).

So here is a second-by-second sentiment analysis of Donald and Joe’s presentations, followed by how it was done (including some caveats about the data).

Donald Clark – Don’t Lecture Me! from ALT-C 2010


Open Donald Clark's ALTC2010 Sentiment Analysis in new window

Joe Dale - Building on firm foundations and keeping you connected in the 21st century. This time it’s personal! FOTE10


Open Joe Dale's Sentiment Analysis in new window

How it was done

Step 1: formatting the source data

This first part is very specific to the data source I had available. If you already have a spreadsheet of tweets (or other text) you can skip to the next part.

All viralheat needs is chunks of text to analyse. When I originally added the twitter subtitles to the videos I pulled the data from the twitter archive service Twapper Keeper but since March this year the export function has been removed (BTW Twapper Keeper was also recently sold to Hootsuite so I’m sure more changes are on the horizon). I also didn’t get a decent version of my Google Spreadsheet/Twitter Archiver working until February so had to find an alternate data source (To do: integrate sentiment analysis into twitter spreadsheet archiver ;).

So instead I went back to the subtitle files I generated in TT-XML format. Here’s an example line:

<p style="s1" begin="00:00:28" id="p3" end="00:01:01" title="Tweeted on 01 Oct 2010 14:45:28">HeyWayne: A fantastic talk by ??? @mattlingard #FOTE10 [14:45GMT]</p>

The format is some metadata (display times, date), then who tweeted the message, what they said and a GMT timestamp. The bits I’m interested in are the message and the date metadata, but in case I needed it later I also extracted who tweeted the message. Putting each of the <p> tags into a spreadsheet cell it’s easy to extract the parts I want using these formula:

Screen name (in column B)

  • =MID(A2,FIND(">",A2)+1,(SEARCH(":",A2,FIND(">",A2))-FIND(">",A2)-1))

Tweet (in column C)

  • =MID(A2,FIND(">",A2)+3+LEN(B2),LEN(A2)-FIND(">",A2)-LEN(B2)-16)

Date/time (in column D)

  • =VALUE(MID(A2,FIND("Tweeted on",A2)+11,20))

You can find more information about the cell formula used elsewhere but briefly:

  • MID extracts some subtext based on start point and length;
  • FIND finds the first occurrence of some text in text
  • SEARCH is used to find the first occurrence of some text after a specific point (in this case I knew ‘:’ marked the end of who tweeted the message but if I used FIND it would have returned the position of the : in begin=
  • LEN is the number of characters in a cell
  • VALUE is used to convert a text string into another format like a number or in this case a date/time

  This gives me a spreadsheet which looks like this:

Extracting old tweets

Step 2: Using Google Apps Script to get the sentiment from viralheat

Google Apps Script is great for automating repetitive tasks. So in the Tools > Script editor… you can drop in the following snippet of code which loops through the text cells and gets a mood and probability from viralheat:

Step 3: Making the data more readable

If the script worked you should have some extra columns with the returned mood (positive or negative) and a probability factor. To make the data more readable I did a couple of things. For the mood I used conditional formatting to turn the cell green for positve, red for negative. To do this select the column with the mood vaules and then open Format > Conditional formatting and add these rules:

Conditional formating 

In the examples above you’ll see I graphed a sentiment rating over time. To do this I converted ‘negative’ and ‘positive’ into the values –1 and 1 using the formula where the returned mood is in column E:

  • =IF(E2="positive",1,-1)

I also wanted to factor in the probability by multiplying the value by the probability by adding:

  • =IF(E2="positive",1,-1)*F2 (where the probability factor is in column F)

These values were then accumulated over time.

Using the time and accumulated sentiment columns you can then Insert > Chart, and if not already suggested, use the Trend > Time line chart.

Insert time line chart

One last trick I did, which I’ll let you explore in the published spreadsheets I’ll link to shortly, is extract and display certain tweets which are above a threshold probability.

As promised here are the spreadsheets for (you can reuse by File > Make a copy):

Some quick notes about the data

A couple of things to bear in mind when looking at this data

Noise – All of the analysed tweets don’t  necessarily relate to the talk so if someone didn’t like the coffee at the break or someone is tweeting about the previous talk they liked that will effect the results

Quoting presenter – Donald’s talk was designed to get the audience to question the value of lectures so if he made negative statements about that particular format that were then quoted by someone in the audience it would be recorded as negative sentiment.

Sometimes it’s just wrong – and let not forget there maybe times when viralheat just get it wrong (there is a training api ;)

There is probably more to say like is there a way to link the playback with a portion of the sentiment chart or should I explore a way to use Google Apps Script and viralheat to automatically notify conference organisers of the good and bad. But that’s why I have a comment box ;)

Flickr Tag Error: Call to display photo '1149873101' failed.

Error state follows:

  • stat: fail
  • code: 95
  • message: SSL is required

What is an API some of you may be asking. API stands for application programming interface and is a set of functions or commands used to control a computer programme. Control can be within the existing programme where the API has been created, but importantly an API can be used by external programmes to  allow them to communicate with each other (Wikipedia has a more technical explanation of an API). 

Having a public and freely available API is becoming a must have for new and existing web services. Technology Magazine has a list of almost 700 Web2.0 APIs, which includes offering from Facebook, Google and even the BBC. This list is long but is not complete and there are new APIs coming out on a weekly basis.

Until recently my knowledge of APIs was concept only with no practical experience. This all changed over the festive break when I decided to roll up my shirt sleeves and push some code. I was spurred on by the discovery of a great web service developed by Hewlett-Packard called Tabbloid.

Tabbloid allows you to submit your favourite RSS feeds it then pulls all the stories together and formats them in a ‘tabloid’ format. The resulting PDF is then conveniently emailed to your inbox either daily or weekly. I was interested in this service because we were looking for a way of automatically creating an attractive PDF version of our fortnightly RSC NewsFeed. [I can also see educational uses or this service. For example, if you have a group of students generating assessed blog posts having a PDF version allows you to automatically create an irrefutable snapshot of the posts.]

While exploring this service I noticed they had a developers page, which within a couple of clicks gave me access to the Tabbloid API. The API allows you to control the RSS feeds you want to include and to make a Tabbloid PDF on demand.

My first experiments with the API were with a standalone application to make a NewsFeed tabloid. It worked well and I could have continued down this line but thought it would be more ‘fun’ to integrate it into the WordPress blog we use for NewsFeed. This required more shirt rolling as it would require coding a new plugin for WordPress using their API. Not satisfied with just trying to get my programme to talk to two APIs I added one more into the mix with integration to Viewer (http://view.samurajdata.se), a web application which generates images from PDF documents.

A couple of late nights later ‘Make Tabbloid’ was born. As a courtesy I emailed the developers of Tabbloid and Viewer, just to make sure I wasn’t doing anything naughty. To my surprise to project manager for Tabbloid got back to me asking to chat. They were very appreciative of my endeavours and were interested in any feedback I had for their API. In the course of the discussion I mentioned I had a problem removing feeds. This turned out to be a bug in their code, which they were quickly able to fix.  

So what can we learn from this and what are the implications for higher education? The Internet continues to become increasingly mashable. Openness is allowing huge creativity allowing developers to pull and push together lots of different web services into custom applications. This flexibility is making it possible for educators to develop learning environments which are no longer inward looking but instead integrate themselves with the wider web (e.g. SocialLearn).

This model isn’t without its risks. Only today Google announced that it is axing several of its applications including Notebook and Video (full story on Google’s axed services here). With no service level agreement there is also no guarantee that a 3rd party service will be available when you need it. While it’s hard to mitigate against such circumstances I think the risk of not engaging in this area has greater implications.

I’ve now turned my theoretical understanding of APIs into practical application, and I have to say its quite addictive. Since publishing my plug-in I’ve monitored downloads (182 since 06-Jan-2009) and I’m embarrassed to say I’ve even email fellow WordPress bloggers who have previously highlighted the Tabbloid service.

But what makes a good API? This is my blatant opportunity to plug the JISC funded Good APIs project. This project “aims to provide JISC and the sector with information and advice on the factors that encourage use of machine interfaces, based on existing practice”.  As part of this they are looking for respondents to a research survey. More information is here on their blog.

It's tough to make predictions, especially about the future, but I reckon 2009 is set to be a big year for educational uses of APIs ;-)

Share this post on:
| | |
Posted in API on by .