2 Comments

A little snippet of code I’m using to get share counts for a url from a number of services in one call. The idea is I pass a url and I get some json back with counts from Facebook, Twitter, Delicious, Pinterest and Google +1s (if it’s a bad url or nothing returned from the service then null value).

json returned 

<?php
$url = $_GET['url'];
$finfo = json_decode(file_get_contents('http://api.ak.facebook.com/restserver.php?v=1.0&method=links.getStats&urls='.$url.'&format=json'));
$tinfo = json_decode(file_get_contents('http://urls.api.twitter.com/1/urls/count.json?url='.$url));
$dinfo = json_decode(file_get_contents('http://feeds.delicious.com/v2/json/urlinfo/data?url='.$url));
$pinfo = json_decode(preg_replace('/^receiveCount\((.*)\)$/', "\\1",file_get_contents('http://api.pinterest.com/v1/urls/count.json?callback=receiveCount&url='.$url)));
$gplus = gplus_shares($url);


//http://papermashup.com/google-plus-php-function/
function gplus_shares($url){
    // G+ DATA
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, "https://clients6.google.com/rpc?key=AIzaSyCKSbrvQasunBoV16zDH9R33D88CeLr9gQ");
    curl_setopt($ch, CURLOPT_POST, 1);
    curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
    curl_setopt($ch, CURLOPT_POSTFIELDS, '[{"method":"pos.plusones.get","id":"p",
"params":{"nolog":true,"id":"' . $url . '","source":"widget","userId":"@viewer","groupId":"@self"},
"jsonrpc":"2.0","key":"p","apiVersion":"v1"}]');
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch, CURLOPT_HTTPHEADER, array('Content-type: application/json'));
    $result = curl_exec ($ch);
    curl_close ($ch);
    return json_decode($result, true);
}

$output = array(
    'facebook'=> isset($finfo[0]) ? $finfo[0]->total_count : NULL,
    'twitter'=> isset($tinfo->count) ? $tinfo->count : NULL,
    'delicious'=> isset($dinfo[0]) ? $dinfo[0]->total_posts : NULL,
    'pinterest'=> isset($pinfo->count) ? $pinfo->count : NULL,
    'googlePlus'=> isset($gplus[0]['result']) ? $gplus[0]['result']['metadata']['globalCounts']['count'] : NULL

);

header('Content-Type: text/javascript');

echo "[".json_encode($output)."]";
?>

Usually I use sharedcount.com for this, but was worried that give over 15k urls to get data on I might trip their server (although I was given assurances that it was fine). I’ve deployed this code on a local webserver (XAMPP Lite) and it’s working well in parallel with Google Refine.

Here's some posts which have caught my attention this month:

Automatically generated from my Diigo Starred Items.

Yesterday I got stuck into the first week of the Coursera course on Computing for Data Analysis. The course is about:

learning the fundamental computing skills necessary for effective data analysis. You will learn to program in R and to use R for reading data, writing functions, making informative graphs, and applying modern statistical methods.

You might be asking given that I’ve already dabbled in R why am I taking an introductory course? As I sat watching the lectures on my own (if anyone wants to do a Google Hangout and watch next weeks lectures together let me know) I reminisced about how I learned to swim. The basic story is 6 year old boy is staying a posh hotel for first time, nags parents to take him to the swimming pool, when they get there gets changed runs off and jumps in at the deep end. When I eventually came back to the surface I assumed the doggy paddle and was swimming’ … well ‘swimming’ in the sense that I wasn’t drowning.

The method of ‘throwing myself in’ is replicated throughout my life, particularly when it comes to learning. So whilst I’ve already thrown myself into R I can survive but only just and what I’ve produced is mainly as a result of trying not to drown. This revelation was particularly clear when learning about subsetting (reshaping data)

I’ve got an example where I’ve been practicing my subsetting skills with NodeXL data later in this post, but first some quick reflections about my experience on the course so far.

MOOCing about in Coursera

So hopefully you’ve already got the picture that I’m a fairly independent learner so I haven’t bothered with the built-in discussion boards, instead opting to view the lectures (I’m finding x1.5 speed suits me) and take this weeks quiz. The assignment due for week 2 is already announced and people are racing ahead to get it done (which appears to have forced the early release of next weeks content).

Something apparent to me in the Coursera  site is the lack of motivational cues. I’ve got no idea how I’m doing in relationship with my fellow 40,000 other students in terms of watching the lectures or in this weeks quiz. Trying to get my bearings in using the #compdata Twitter hashtag hasn’t been that successful because in the last 7 days there have only been 65 people using or mentioned with the tag (and of the 64 tweets 29 were ‘I just signed up for Computing for Data Analysis #compdata …’)

Things are looking up on the Twitter front though as some recent flares have gone up:

and also @ @hywelm has made himself known ;)

Will there be much community building in the remaining 3 weeks?

Mucking about with NodeXL and R

In the section above I’ve mentioned various Twitter stats. To practice this week’s main compdata topics of reading data and subsetting I thought I’d have a go at getting the answers from a dataset generated in NodeXL (I could have got them straight from NodeXL but where is the fun in that ;).

Step 1 was to fire up NodeXL and import a Twitter Search for #compdata with all of the boxes ticked except Limit to… .

Twitter Search Import from NodeXL

As a small aside I grabbed the the NodeXL Options Used to Create the Graph used in this MOOC search by Marc Smith, hit the automate button and came up with the graph shown below (look at those isolates <sigh>):

The #compdata graph

To let other people play along I then uploaded the NodeXL spreadsheet file in .xlsx to Google Docs making sure the ‘Convert documents …’ was checked and here it is as a Google Spreadsheet. By using File > Publish to the web… I can get links for .csv versions of the sheets.

In R I wrote the following script:

If you run the script you should see various answers pop out. As I’m learning this if anyone would like to suggest improvements please do. My plan is to keep adding to the data and extending the script as the weeks go buy to practices my skills and see what other answers I can find

1 Comment

Via Doug Holton I spotted that there is a new YouTube beta feature to add multiple choice questions (MCQ) to your Youtube videos. Those who have already taken a course on Coursera (surely everyone has now ;), will know the video with MCQ is one of the staples. From the ‘Video Questions Editor Beta’ page it says:

Description

This is an opt-in beta for a simple Video Questions Editor on YouTube. Through this editor you can setup multiple questions to be displayed on top of your video during playback that a viewer can answer.

How can I use this feature?

The editor itself can be found on the video edit page, on the edit bar. After you have added several questions and the users have viewed them, you can see a summary of the interaction your users had with them through the analytics page, within the Annotations section.

Disclaimer

The feature represents work in progress, there is no plan for long-term support of the feature and may be removed at any time without prior notification. Your comments will help us improve and perfect the mixtures we're working on. So jump in, play around and send your feedback directly to the brains behind the scenes.

The page also includes a link to opt-in to the beta.

If you do when you go to edit one of your videos you’ll get a ‘Questions’ button

Add question

Clicking on this lets you enter your question:

Question entry

Unfortunately when I click on save it hangs on ‘Your changes are being saved...’.

Bigger picture

It’s a shame that this feature doesn’t work yet. It’s interesting to put this development into the context of Google’s recent open source release of Course Builder, which was used as the technology behind their Power Searching with Google online course (xMOOC). So is Google trying to rock the boat in the mass online education market?

5 Comments

As part of the JISC OER Rapid Innovation Programme we’ve been experimenting with monitoring project blogs by gluing together some scripts in Google Spreadsheets. First there was Using Google Spreadsheets to dashboard project/course blog feeds #oerri which was extended to include social activity around blog posts.

As the programme comes to a close projects will soon be thinking about submitting their final reports. As part of this projects agreed to submit a selection of their posts with a pre-identified set of tags shown below as a MS Word document. 

tag

structure

projectplan

detailed project plan, either in the post or as an attachment

aims

reminder of the objectives, benefits and deliverables of your project

usecase

link to / reproduce the use case you provided in your bid

nutshell

1-2 paragraph description in accessible language, an image, a 140 character description [1 post per project]

outputs

update posts on outputs as they emerge, with full links/details so that people can access them

outputslist

end of project: complete list of outputs, refer back to #projectplan and note any changes  [1 post per project]

lessonslearnt

towards of the end of the project, a list of lessons that someone like you would find useful

impact

end of project: evidence of benefits and impact of your project and any news on next steps

grandfinale

this is the follow up to the nutshell post. a description in accessible language, and a 2 minute video [1 post per project]

 

OERRI DashboardWhen this was announced at the programme start-up concerns were raised about the effort to extract some posts into a document rather than just providing links. As part of the original experimental dashboard one thing I had in mind was to automatically detect the tag specific posts and highlight which had been completed. Having got the individual post urls it hasn’t been too hard to throw a little more Google Apps Script to extract the content and wrap in a MS Word document (well almost – if you have some html and switch the file extension to .doc it’ll open in MS Word). Here’s the code and template to do it:

And here are the auto-generated reports for each project:

Projectposts (Est).PROD urlGenerated Report urlComments
Attribute images2http://prod.cetis.ac.uk/projects/attribute-image No tagged posts
bebop14http://prod.cetis.ac.uk/projects/bebopReport Link 
Breaking Down Barriers10http://prod.cetis.ac.uk/projects/geoknowledgeReport Link 
CAMILOE1http://prod.cetis.ac.uk/projects/camiloe No tagged posts
Improving Accessibility to Mathematics15http://prod.cetis.ac.uk/projects/math-accessReport Link 
Linked data approaches to OERs15http://prod.cetis.ac.uk/projects/linked-data-for-oersReport LinkPartial RSS Feed
Portfolio Commons10http://prod.cetis.ac.uk/projects/portfolio-commonsReport Link 
RedFeather18http://prod.cetis.ac.uk/projects/redfeatherReport Link 
RIDLR7http://prod.cetis.ac.uk/projects/ridlrReport LinkNot WP
sharing paradata across widget stores10http://prod.cetis.ac.uk/projects/spawsReport Link 
SPINDLE17http://prod.cetis.ac.uk/projects/spindleReport Link 
SupOERGlue6http://prod.cetis.ac.uk/projects/supoerglueReport LinkNot WP
synote mobile16http://prod.cetis.ac.uk/projects/synote-mobileReport Link 
TRACK OER12http://prod.cetis.ac.uk/projects/track-oerReport LinkNot WP
Xenith4http://prod.cetis.ac.uk/projects/xenithReport Link 
 157   

Issues

I should say that these are not issues I have with the OERRI projects, but my own issues I need to solve to make this solution work in a variety of contexts.

  • Missing tags/categories – you’ll see the dashboard has a number of blanks. In some cases it’s not the projects fault (as the majority of projects used WordPress installs it was easier to focus on these), but in other cases projects mix tags/categories or just forget to include them
  • Non-WordPress – 3 of the projects don’t use WordPress, so other ways to grab the content are required
  • RSS Summary instead of full feed – ‘Linked data approaches to OERs’ uses a summary in their RSS feed rather than full-text. As this script relies on a full text feed it can’t complete the report (one of my pet hates is RSS summary feeds – common people you’re supposed to be getting the word out, not putting up barriers.)

Hopefully it’s not a bad start and if nothing else maybe it’ll encourage projects to sort out their tagging. So what have I missed … questions welcomed.

I should say this post contains a lot of technical information, doesn't give much background and is mainly for my hard-core followers

This is a very lose sketch of an experiment I might refine which uses Jason Davies wordcloud script (add-on for d3.js) as a way to filter data hosted in a Google Spreadsheet. I was essentially interested in the Twitter account descriptions of the community using the the Social Media Week – Glasgow hashtag, but a minor detour has reminded me you can:

  • get json data straight from a Google Spreadsheet
  • you can build dynamic queries to get what you want

So I fired up NodeXL this morning and got this pretty graph of how people using the #smwgla hashtag at least twice follow each other.

people using the #smwgla hashtag at least twice follow each other

One of the features of NodeXL is to add stats columns to your data which includes friend/follower counts, location and profile descriptions.

NodeXL Stats

Uploading the data from NodeXL (Excel) to Google Spreadsheets allows me to render an interactive version of the community graph using my NodeXL Google Spreadsheet Graph Viewer.

interactive version of the #smwgla community graph

All this is doing is grabbing data from Google Spreadsheets using their Visualization API and rendering it visually using javascript/HTML5 canvas. You can use the query language part of this API to get very specific data back (if you want a play try Tony Hirst’s Guardian Datastore Explorer). Using Tony’s tool I got this query built. One thing you might notice is I’m selecting a column of twitter description WHERE it contains(‘’) <- a blank – if it’s a blank why did I bother with the WHERE statement?

Switching to Jason Davies wordcloud demo we can play with custom data sources if you have some JSON. In Tony’s tool you have options to get the data in html (tqx=out:html) and csv (tqx=out:csv). There is a third undocumented option for json tqx=out:json. Using this we can get a url for the wordcloud generator https://spreadsheets.google.com/tq?tqx=out:json&tq=select%20AH%20where%20AH%20contains%28%2727%29&key=0Ak-iDQSojJ9adGNUUXZnU2k3V1FRTjR3eFp0RmRNZWc&gid=118

To make the wordcloud interactive, so that when you click on a term it filters the data on that term was can use the option to include {word} in our source url e.g. https://spreadsheets.google.com/tq?tqx=out:json&tq=select%20AH%20where%20AH%20contains%28%27{word}%27%29&key=0Ak-iDQSojJ9adGNUUXZnU2k3V1FRTjR3eFp0RmRNZWc&gid=118

And here is the final result, an interactive wordcloud of #smwgla Twitter account descriptions [Note: you need to hit the Go button when you click-through]:

interactive wordcloud of #smwgla Twitter account descriptions

The end result useful? Not sure, but how the data is extracted is (to me anyway).

4 Comments

importHTML is a fantastic formula you can use in Google Spreadsheets. Here’s Google’s support documentation for importHTML:

importHtml

Syntax: ImportHtml(URL, query, index)

URL is the URL of the HTML page. Either "list" or "table" indicates what type of structure to pull in from the webpage. If it's "list," the function looks for the contents of <UL>, <OL>, or <DL> tags; if it's "table," it just looks for <TABLE> tags. Index is the 1-based index of the table or the list on the source web page. The indices are maintained separately so there might be both a list #1 and a table #1.

Example: =ImportHtml("http://en.wikipedia.org/wiki/Demographics_of_India"; "table";4). This function returns demographic information for the population of India.

Note: The limit on the number of ImportHtml functions per spreadsheet is 50.

What’s even better is you can wrap this formula in other formula to get the data in the shape you want. A case in point I was recently asked:

Using TRANSPOSE

The answer is yes, you can TRANSPOSE a importHTML. Let use the Demographics of India table from the support documentation as an example. To switch columns into rows we can use =TRANSPOSE(ImportHtml("http://en.wikipedia.org/wiki/Demographics_of_India"; "table";4))

This lets us change the way the data is imported from this:

"=ImportHtml("http://en.wikipedia.org/wiki/Demographics_of_India"; "table";4)"

to this:

"=TRANSPOSE(ImportHtml("http://en.wikipedia.org/wiki/Demographics_of_India"; "table";4))"

Using QUERY

Lets now say we are only interested in the population figures for 1991 and 2001.  You could always just import all the data then pull it using a cell reference. Another way of doing this is to wrap our data in a QUERY formula.

The QUERY function is a built-in function that allows you to perform a query over an array of values using the Google Visualization API Query Language.

Anyone used to tinkering with databases will recognise the query language which uses the clauses like SELECT, WHERE, GROUP_BY etc.

There are a couple of ways to query our data for the population of India in 1991 and 2001.

Using LIMIT and OFFSET

  • Limit - Limits the number of returned rows.
  • Offset - Skips a given number of first rows.

Using these we could use the query "SELECT * LIMIT 2 OFFSET 4". This selects all the columns (using *) and then limits to 2 results starting from the 4th row. The order of limit/offset is important, using these the other way around won’t return any results.

"=QUERY(ImportHtml("http://en.wikipedia.org/wiki/Demographics_of_India"; "table";4),"SELECT * LIMIT 2 OFFSET 4 ")"

SELECT columns

  • Select - Selects which columns to return, and in what order. If omitted, all of the table's columns are returned, in their default order.

Because we are using importHTML as our datasource when selecting the columns we need to use the syntax Col1, Col2, Col3 …. So if you just want the year and population our query could be "SELECT Col1, Col2 LIMIT 2 OFFSET 4"

"=QUERY(ImportHtml("http://en.wikipedia.org/wiki/Demographics_of_India"; "table";4),"SELECT Col1, Col2 LIMIT 2 OFFSET 4 ")"

WHERE rows

  • Where - Returns only rows that match a condition. If omitted, all rows are returned.

One issue with using limit/offset is if more data is inserted into the source table it might push your results out of the range. A way around this is to include a WHERE clause to only include data on certain conditions. WHERE allows various comparison operators like <=, =, >, multiple conditions (‘and’, ‘or’ and ‘not’) and more complex string comparisons like ‘contains’. More information on WHERE conditions here. So if we only wan the population where the year is 1991 or 2001 we can use the query "SELECT Col1, Col2 where Col1='*1991*' or Col1='*2001*'"

For this last example lets also TRANSPOSE the result and remove the table header:

"=TRANSPOSE(QUERY(ImportHtml("http://en.wikipedia.org/wiki/Demographics_of_India"; "table";4),"SELECT Col1, Col2 WHERE Col1='*1991*' or Col1='*2001*'",0))"

So there you using the QUERY formula to be more selective on your html import to Google Spreadsheets. Here is a copy of the spreadsheet with all the examples I’ve used in this post Any questions/clarifications leave a comment.

PS Tony Hirst has also  written about Using Google Spreadsheets Like a Database – The QUERY Formula and this is a place if you want some more query examples.

PPS I’m on leave now which is why this post has very little to do with CETIS or OER.

6 Comments

I came, I saw, I failed. This was a potentially promising hack that didn’t work out. Hopefully you’ll get as much benefit from failure, as from success.

Today I can across oomfo (from the same makers as FusionCharts):

oomfo is a plug-in for Microsoft PowerPoint that brings all the awesomeness of FusionCharts Suite XT to PowerPoint. Its wizard-based interface helps you create great-looking animated and interactive charts in minutes.

Using oomfo, you can create specialized charts like Waterfall, Pareto, Marimekko, Funnel and Pyramid, which PowerPoint forgot to add. Additionally, you can connect to live data sources like Excel, SalesForce, Google Docs and your own back-end systems

I was interested in the Google Docs integration but so far I can only find a Google Analytics connector. It was disappointing to discover that this relied on the user hosting a PHP file on their own webserver. Disappointment turned into shock when I then discovered to get even this to work required the user to pass unencrypted Google usernames and passwords in plaintext!

WTF unencrypted passwords

All the connector file is doing is formatting data from the Google Analytics API in an oomfo/FusionChart XML format. Below is an example for a single series bar chart:

oomfo xml

My thought was if I wrap data from a Google Spreadsheet around the Google Apps Script ContentService I could generate the required XML for oomfo to generate the chart in PowerPoint, no hosting of files, no passing of passwords.

Using my simple electronic voting system hack as a data source I was able to reuse this example on Stackoverflow on how to create a rss feed using class ContentService to create a template and code shown here. Deploying this code as a service/web app gives me a url I can query to get oomfo formatted xml. So if I want responses tagged ‘dev1’ I use:

https://script.google.com/macros/s/AKfycbw79D4L2nZ2chj9Q4bZxQPkd-nLNr1PFjyzdNHgSj_HSFGTkCc/exec?id=dev1 

Unfortunately when I try to use this as an external data source for oomfo I get ‘Unable to retrieve data from the specified URL’:

image

To check it’s not malformed xml I’ve downloaded the generated markup and uploaded to dropbox, which does work. So I’m not sure if oomfo is unable to follow query redirection or if Apps Script is preventing the data from being used by oomfo (if anyone has any suggestions, that would be great).

There you go. How you can’t embed live data from Google Spreadsheet with Apps Script ContentService in PowerPoint using oomfo.

5 Comments

Poh, Swenson & Picard (2010) Brain ActivityThe graph on the right is taken from a paper mentioned in Eric Mazur's keynote from the first day of the ALT* Conference 2012. The paper, A Wearable Sensor for Unobtrusive, Long-Term Assessment of Electrodermal Activity (Poh, Swenson & Picard, 2010), reports the study of an experimental wristband which can record brain activity. Mazur used the paper to highlight that often being in class generates less brain activity than when asleep and similar levels to when watching TV. Mazur went on to describe the theory and techniques, including Peer Instruction, for moving lectures away from a broadcast mode into a richer interactive experience.

Unfortunately Mazur was unable to incorporate some of these techniques into his keynote and so, enjoyable as it was, I found myself on a verge of a TV watching state. As is becoming increasingly common when watching the box this was augmented by the ‘second screen’, in this case the #altc2012 twitter stream.

Had I been hooked up to a brain monitor I sure it would have recorded frantic activity trying to report some of the c.170 spam tweets (over 20%) pushed into a UK trending stream. But did I learn anything from the remaining c.600 legitimate tweets? On reflection I don’t recall ‘learning’ anything from the backchannel. One theory is that the backchannel is just an amplifier or repeater. As I recently noted in Notes from the Twitter backchannel at eAssessment Scotland 2012 #eas12 the audience is largely in a rebroadcast or note collection mode which could is evident in the lack of @relies. So there is less peer dialogue, but this doesn’t mean other processes aren’t at work. For example, there may some level of cognition in forming a 140 character tweet which provides the opportunity for internal self dialogue.

So I think I’m adjusting my expectation of the backchannel and taking a leaf out of Sheila MacNeill’s Confessions of a selfish conference tweeter. I still think there are opportunities unpick the discourse from Twitter communities, but just when people are in a different mode like in #moocmooc. 

[I really need to blog about how I calculated the number of spam tweets. In the meantime here is a graph of Twitter activity during Eric’s keynote]

Tweets, Replies and Spam for #altc2012 during Eric Mazur's Keynote
Interactive version of Tweets, Replies and Spam for #altc2012 during Eric Mazur's Keynote

*ALT is the Association of Learning Technologists

3 Comments

New item starred in Google Reader then add to DiigoIFTTT is a web service that lets you do some basic plumbing between web services. The idea is that if something happens on one service (referred to by IFTTT as ‘channels’) then you do something else on another (options also exist for other triggers including time, SMS and external sensors). To illustrate this I use IFTTT to bookmark anything I star in Google Reader in my Diigo account. Setting up these recipes takes no more than selecting a trigger and assigning an action, so no coding required.

There are currently 41 channels in IFTTT (including Twitter, Facebook and RSS feeds) users can link together in this way. One of the most recent additions is Google Drive. With Google Drive integration you can store files or, the one that interests me more, insert a new row in a Google Spreadsheet.

A nice feature of IFTTT is the ability to share and reuse recipes. Here’s a selection of recipes that use Google Spreadsheets. You’ll see there is already a range of quick hacks people have shared including things like:

and my contribution:

Backup RSS Feed to Google Spreadsheet

Examples of RSS backup

RSS is great for accessing machine readable data but often one of the limitations, particularly with blogs, is results are limited to the last 10 or so items.  This has created problems in some of my projects like the OERRI Projects Dashboard where I need all the posts. The solution to date has been to rely on 3rd party services like the Google Feed API.

Blog posts

By using IFTTT I’m able to easily capture blog posts as they are written.  Having the data stored in a Google Spreadsheet makes it easy for me to query and export to other services (here’s an example from my blog) [this might be something JISC might want to do to capture and mine funded project feeds – other solutions/techniques already exist]

Scoop.it

Is a service which lets users collect posts around topics and display in an online magazine format. I use scoop.it to collect examples of how people are using Google Apps Script. Since June last year I’ve collected over 200 examples. I mainly do this for my own benefit, in part as a directed task to make sure I’m up-to-date with Apps Script developments, but also as a reference source of apps script examples. I can query these examples using a Google search but it concerns me that I’ve got no way to bulk export the data. Recently Scoop.it added a RSS option so I use this recipe to Backup Scoop.it to Google Spreadsheet.

Summary

There’s lots more you could do with the data now it’s in a spreadsheet and once I’ve built up a corpus I’ll have a play. One thing to note which might put some people off is to allow IFTTT to add data for you, you need to give them authenticated access to your Google Drive. I can live with that risk, but you might not.