16 Comments

This post is a bit messy. I got caught trying out too many ideas at once, but hopefully you'll still find it useful

Sheila recently posted Analytics and #moocmooc in which she collects some thoughts on the role of analytics in courses and how some of the templates I’ve developed can give you an overview of what is going on.  As I commented in the post I still think there is more work to make archives from event hashtags more useful even if just surfacing tweets that got most ‘reaction’.

There are three main reactions that are relatively easy to extract from twitter: retweets, favouring and replies. There are issues with what these actions actually indicate as well as the reliability of the data. For example users will use ‘favouring’ in different ways, and not everyone uses a twitter client that can or uses a reply tweet (if you start a message @reply without clicking a reply button Twitter looses the thread).

But lets ignore these issues for now and start with the hypothesis that a reaction to a tweet is worth further study. Lets also, for now, narrow down on threaded discussions. How might we do this? As mentioned in Sheila's post we’ve been archiving #moocmooc tweets using Twitter Archiving Google Spreadsheet TAGS v3. As well as the tweet text other metadata is recorded including a tweet unique identifier and, where available the id of the tweet it is replying to.

Google Spreadsheet columns

We could just filter the spreadsheet for rows with reply ids but lets take a visual approach. Downloading the data as a Excel file we can open it using the free add-in NodeXL.

NodeXL allows us to graph connections, in this case conversation threads. NodeXL allows use to do other useful things like group conversations together to make further analysis easier. Skipping over the detail here’s what you get if you condense 6,500 #moocmooc tweets into grouped conversations.

 moocmooc grouped converstations

This is more than just a pretty picture. In NodeXL I’ve configured it so that when I hover over each dot which represents and individual tweet I get a summary of what was said by who and when (shown below).

NodeXL being used to examine nodes

It’s probably not too surprising to see strings of conversations, but by graphing what was an archive of over 6500 tweets we can start focusing on what might be interesting subsets and conversation shapes. There are some interesting patterns that emerge:

conversation group 1 conversation group 2conversation group 3

Within NodeXL I can extract these for further analysis. So the middle image can be viewed as:

Examination of conversation group 2

There’s a lot more you can do with this type of data, start looking at how many people are involved in conversations, number of questions per conversations and lots more. I should also say before I forget that NodeXL can be configured to collect twitter search results with it’s built-in twitter search tool. It can also be configured to do the collection on a regular basis (hmm I should really have a go at doing that myself). So potentially you’ve got a nice little tool to analysis twitter conversations in real-time …

If you’d like to explore the data more it’s available from the NodeXLGraphGallery. I’m going off to play some more ;)

Share this post on:
| | |
Posted in NodeXL, Twitter and tagged on by .

42 Comments

Templated Export allows users to reshape and selectively publish data from Google Spreadsheets in custom formats. The tool is powered by Google App Script which allows integration with Google Spreadsheets using existing authenticated access. Example uses include: turning data from a spreadsheet into a RSS feed and publishing data from a spreadsheet in JSON format to be used by other tools/services. The framework for creating a custom wrapper for your data means there are many more opportunities.

Templated Export

Main features

  • Cloud based
  • Selective publication of data (down to individual cells)
  • Publication of data in different file types (text, atom, rss, xml, json)
  • Complete customisation of output file structure

Setup

  1. Open Templated Export for Google Spreadsheets
  2. File > Make a copy of the script
  3. File > Manage Versions and enter a description and Save New Version
  4. Run > onInstall twice (first time to authorise, second to fun the function)
  5. Publish > Deploy as web app... and click Update and open the 'Current web app URL' or click on the 'latest code' link if available. Bookmark this link (there are many tutorials for turning chrome bookmarks into applications)

Example usage

Generating a custom RSS Feed of data from a Google Spreadsheet

Publishing data from other sources like Guardian Datastore

Selectively publishing cell ranges as JSON

Some example templates

If you download this JSON file it contains the templates used in the demonstrations above. Use the Local Template: Backup/Restore button in Templated Export to install them.

Comments/suggestions welcome

I make the usual warranties that I'm not a professional programmer and I'm already aware of some niggly bugs, but comments and suggestions are always welcome. If there is enough interest in this tool I'm happy to put it on a community coding site or donate it to someone who actually knows what they are doing ;)

4 Comments

Warning: Very techie post, the noob version is here Google Spreadsheet Template for getting social activity around RSS feeds HT @dajbelshaw

In my last post announcing a Google Spreadsheet Template for getting social activity around RSS feeds I mentioned it was built from spare parts. It came out of on-going work at CETIS exploring activity data around CETIS blogs (including this one) and JISC funded projects. In this post I’ll highlight some of the current solutions we are using and how they were developed.

To put this work into context it aligns with my dabbling's as a data scientist.

A data scientist … is someone who: wants to know what the question should be; embodies a combination of curiosity, data gathering skills, statistical and modelling expertise and strong communication skills. … The working environment for a data scientist should allow them to self-provision data, rather than having to rely on what is formally supported in the organisation, to enable them to be inquisitive and creative. [See this post by Adam Cooper for the origins of this definition]

It’s perhaps not surprising that the main tool I use for these explorations with data is Google Spreadsheets/Google Apps Script. The main affordances are a collaborative workspace which can store and manipulate tablature data using built-in and custom functionality. To illustrated this let me show how easy it is for me to go from A to B.

Example 1: OER Rapid Innovation Projects – Blog post social activity

I’ve already posted how we are Using Google Spreadsheets to dashboard project/course blog feeds #oerri. For this we have a matrix of project blog feeds and a set predefined categories/tags. As projects make different posts a link to them appears on the dashboard.

OERRI Project Post Directory

As part of the programme completion JISC are piloting a new project survey which will include reporting the ‘reach of the papers/articles/newsitems/blogposts out of the project’. Out of curiosity I wanted to see what social share and other activity data could be automatically collected from project blogs. The result is the page shown below, which is also now part of the revised OERRI Project Post Directory Spreadsheet. The important bit is the columns on the right with social counts from 10 different  social network services. So we can see a post by Xenith got 29 tweets, a post by bebop got 5 comments, etc.

Caveat: these counts might not include modified urls with campaign tracking etc …

OERRI Post Social Activity

How it works

The three key aspects are:

  • Getting a single sheet of blog posts for a list of RSS feeds.
  • Getting social counts (Likes, Tweets, Saves) for a range of services.
  • Allowing columns sorting on arrays of results returned by custom

Getting a single sheet of blog posts
You can use built in formula like importFeed to do this, but you are limited to a maximum 20 post items and you need to do some juggling to get all the results from 15 projects in one page. An alternative, which I quickly ruled out was fetching the rss xml using Google Apps Script and parsing the feed. The issue with this one is RSS feeds are usually limited to the last 10 posts and I wanted them all.

The solution was to use the Google Feed API (can’t remember if this is on Google cull list). The two big pluses from this are it can pull historic results (up to 100 items), and results can be returned in JSON which is easy to work with in App Script.

Here’s what the function looks like which is called as a custom formula in cell Data!A4 (formula used looks like =returnPostLinks(Dashboard!A3:B17,"postData")):

function returnPostLinks(anArray,cacheName){
  var output = [];
  var cache = CacheService.getPublicCache(); // using Cache service to prevent too many urlfetch 
  var cached = cache.get(cacheName);
  if (cached != null) { // if value in cache return it
    output = Utilities.jsonParse(cached);
    for (i in output){
      output[i][3] = new Date(output[i][3]); // need to recast the date
    }
    return output; // if cached values just return them
  }
  // else make a fetch
  var options = {"method" : "get"};
  try { 
    for (i in anArray){ // for each feed url
      var project = anArray[i][0]; // exact the project name from 1st col
      if (project != ""){ // if it's not blank
        var url = "https://ajax.googleapis.com/ajax/services/feed/load?v=1.0&num=100&q="+anArray[i][1];
        var response = UrlFetchApp.fetch(url , options);
        var item = Utilities.jsonParse(response.getContentText()).responseData.feed.entries; // navigate to returned items
        for (j in item){
          // for each item insert project name and pull item title, link, date, author and categories (other available are content snippet and content)
          output.push([project,item[j].title,item[j].link,new Date(item[j].publishedDate),item[j].author,item[j].categories.join(", ")]);
        }
      }
    }
    // cache the result to prevent fetch for 1 day
    cache.put(cacheName, Utilities.jsonStringify(output), 86400); // cache set for 1 day
    return output; // return result array
  } catch(e) {
    //output.push(e);
    return output;
  }
}

 

Getting social share counts

Collecting social share counts is something I’ve written about a couple of times so I won’t go into too much detail. The challenge is this project was to get results back from over 100 urls. I’m still not entirely sure if I’ve cracked it and I’m hoping more result caching helps. The issue is because the custom formula used in this appears in each row of column H on the Data sheet, when the spreadsheet opens it simultaneously tries to do 100 UrlFetches at the same time which Apps Script doesn’t like (Err: Service has been invoked too many times). {I’m wondering if away around this would be to set a random sleep interval (Utilities.sleep(Math.floor(Math.random()*10000))… just tried it and it seems to work}

Sorting columns when arrays of results are returned by custom formula

Sort order selectionWhen you use spreadsheet formula that write results to multiple cells sorting becomes a problem because the cells are generated using the CONTINUE formula which evaluates a result from the initiating cell. When you sort this reference is broken. The solution I use is to pull the data into a separate sheet (in this case Data) and then use the SORT formula in the ‘Post SN Count’ sheet to let the user choose different sort orders. The formula used in ‘Post SN Count’ to do this is in cell A4: =SORT(Data!A4:Q,D1,IF(D2="Ascending",TRUE,FALSE)). To try and prevent this being broken an ‘Item from list’ data validation is used with the ‘Allow invalid data’ unchecked.

Time to hack together: functional in 2hrs, tweaks/cache fixes 4hrs

Example 2: JISC CETIS Blog Post Dashboard

Having got social share stats for other peoples feeds it made sense for the JISC CETIS blogs. At the same time because I keep my blog on my own domain and Scott (Wilson) is on wordpress.com it was also an opportunity to streamline our monthly reporting and put all our page views in one place.

Below is a screenshot of this de-page viewed version of the CETIS dashboard (the code is all there I’ve just haven’t authenticated it with our various Google Analytic accounts). With this version the user can enter a date range and get a filtered view of the posts published in that period. Sort and social counts make a re-appearance with the addition of an ‘Examine Post Activity’ button. With a row highlighted, clicking this gives individual post activity (shown in the second screenshot). The code for this part got recycled into my last post Google Spreadsheet Template for getting social activity around RSS feeds.

CETIS Post Dashboard

Individual post activity

Getting to B

To get to this end point there are some steps to pull the data together. First the ‘Sources’ sheet pulls all our blog urls from the CETIS contacts page using the importHtml formula. Next on the ‘FeedExtract’ sheet the array of blog urls is turned into a sheet of posts using the same code in example 1. Social counts are then collect on the ‘Data’ sheet which is read by the Dashboard sheet.

A detour via page views

Having some existing code to get stats from Google Analytics made it easy to pass a url to one of two GA accounts and get stats back. Because Scott is on wordpress.com another way was required. Fortunately Tony Hirst has already demonstrated a method to pull data from the WordPress Stats API into Google Spreadsheets. Following this method stats are imported to a ‘Scott’sStats’ sheet and tallied up with a code loop (WordPress stats come in on per post, per day).

WordPress Stats API import

Here’s the custom formula code to return page views for a url:

function getCETISPageViews(url, startDate, endDate){
  // format dates
  var startDate = Utilities.formatDate(startDate, "BST", "yyyy-MM-dd");
  var endDate = Utilities.formatDate(endDate, "BST", "yyyy-MM-dd") ;
  // extract domain and path
  var matches = url.match(/^https?\:\/\/([^\/:?#]+)(?:[\/:?#]|$)/i);
  var domain = matches && matches[0];
  var path = url.replace(domain,"/");
  // switch for google analytic accounts
  var ids = false;
  if (domain == "https://mashe.hawksey.info/"){
    ids = ""//ga-profile-id  
  } else if (domain == "http://blogs.cetis.ac.uk/"){
    ids = ""//ga-profile-id;
  } else if (domain == "http://scottbw.wordpress.com/"){
    // code to compily stats imported to sheet
    // get all the values
    var doc = SpreadsheetApp.getActiveSpreadsheet();
    var sheet = doc.getSheetByName("Scott'sStats")
    var data = sheet.getRange(1, 4, sheet.getLastRow(), 2).getValues();
    var count = 0;
    // count on url match
    for (i in data){
      if (data[i][0] == url) count += data[i][1];
    }
    return parseInt(count);
  }
  if (ids){
    // GA get data using sub function
    return parseInt(getGAPageViews(startDate, endDate, path, 1, ids));
  }
}

Note: I use a custom Google Analytics function available in the spreadsheet script editor. There’s a new Analytics Service in Apps Script as an alternative method to connect to Google Analytics.

Time to hack together: functional in 3hrs, tweaks fixes 1hr

Summary

So there you go two examples of how you can quickly pull together different data sources to help us record and report the social reach of blog posts. You’ll notice I’ve conveniently ignored whether social metrics are important, the dangers of measurement leading to gaming, the low recorded social activity around OERRI projects. I look forward to your comments ;)

4 Comments

The 'Deprecated notice' is because the access to the Topsy API (which is used to extract Twitter activity) is now behind a developer key.

A post from Alan Levine (@cogdog) on Syndication Machines (plus syndication of syndication) which details how feeds for the cMOOC course ds106 are aggregated and resyndicated got me thinking if there was additional data around student blog posts that could be usefully captured. Previously for something like that I would have turned to PostRank Analytics, which allowed you to specify any rss feed and it would aggregate social activity like tweets, bookmarks from a wide range of services and let you see it all in one place. Unfortunately PostRank were bought by Google and while some of this data is now accessible from Google Analytics it’s restricted to your account and data from social networks Google isn’t directly competing with.

So I thought it would be useful/interesting to start looking at what social activity you could pull together without authenticated access or API keys. So far I’ve identified couple and given some overlap with other work (and some late night play time), I’ve come up with a Google Spreadsheet template which pulls data together from comment feeds, Twitter and Delicious (with social counts for these plus Facebook, LinkedIn and Google+). You can give the spreadsheet a try with the link below:

*** Blog Activity Data Feed Template ***

Blog Activity Data Feed Template Overview

Features

  • Collects last 10 posts from a RSS feed
  • Uses sharedcount.com to get overall post share counts from Facebook, Twitter, Google+, LinkedIn and Delicious
  • For supported blogs (mainly manila WordPress) extracts comment snippets
  • Uses Topsy to collect tweets mentioning the post
  • Fetches all the Delicious bookmarks for the post url
  • Summarises activity from comments, tweets and bookmarks on the dashboard

You can make as many copies of the template as you like to track other RSS feeds.

If you prefer a slightly different overview then the RSS Feed Social Share Counting Google Spreadsheet gives a table of share counts for a feed

*** RSS Feed Social Share Counter ***

Social share count matrix

Technical highlights

There are a couple of cool things under-the-hood for the Activity Data template worth noting.

Managed Library = Easy to add more sources

So far I’ve limited the template to sources that have easy access. Previously if an API disappeared, changed or I discover another service I would have to modify the template and users would have to make a fresh copy. With Managed Libraries the Google Spreadsheet template now only needs 11 lines of code (shown below) for 3 custom formulas (see cells Dashboard!C8, and #n!G2 and #n!B8).

function getSharedCountActivity(url){
  return socialActivity.getSharedCount(url);
}

function getItemFeed(url, num){
  return socialActivity.getItemRSSFeed(url, num)
}

function getActivityFeed(url){
  return socialActivity.getActivityFeed(url);
}

Because the template uses my own socialActivity library and because I’ve left the template in Development Mode when I update the code changes should (I hope) filter to all the existing templates in use. This means if I can add more services with open APIs details should start appearing on the post activity pages (denoted with # at the beginning).

Fun with sparklines

Activity SparklinesIf you were wondering how I did the activity sparklines, here’s how. Sparklines are a built-in Google Spreadsheet formula (here’s the  reference). To get them to work you need some data if you want some options. For the data all you need is an array of values and sparkline will plot them equally spaced.

In this use case I’ve got comments, tweets, and bookmarks with date/times. To plot the sparkline I need to know how many of these were made between specific time periods. Another function we can use to get this data is FREQUENCY. This lets us specify some intervals and source data and returns an array of the frequency of data points.

One issue is that for each post the activity range varies, so one post might only have activity in July, and another in August. There are other formula (MIN/MAX) to get the range values. Here’s how the final solution was pieced together.

On each post sheet (prefixed with ‘#’) in cells A1:A20 are some formula to calculate date intervals based on the minimum and maximum date ranges. Frequency data used for each sparkline is then calculated inside the sparkline formula by indirectly looking at the data in each post sheet. The indirectly part is handled by the INDIRECT formula which evaluates a cell reference from other cell values. So on row 9 of Dashboard INDIRECT(B9&"!D9:D") evaluates to ‘#1!D9:D’, row 10 is ‘#2!D9:D and so on. I could have hardcoded these cell references but I prefer the ‘clever’ way. Here’s part of the final formula with a breakdown of what it does:

SPARKLINE(FREQUENCY(INDIRECT(B9&"!D9:D"),INDIRECT(B9&"!A1:A20")),{"charttype","line";"ymin",-1;"ymax",MAX(FREQUENCY(INDIRECT(B9&"!D9:D"),INDIRECT(B9&"!A1:A20")))+1} )

  • INDIRECT(B9&"!D9:D") – Frequency data source built using INDIRECT which evaluates a cell reference from other cell values. On row 9 of Dashboard this evaluates to ‘#1!D9:D’, row 10 is ‘#2!D9:D
  • INDIRECT(B9&"!A1:A20") – similar trick to get an interval array for the FREQUENCY formula
  • FREQUENCY(INDIRECT(B9&"!D9:D"),INDIRECT(B9&"!A1:A20")) – gives our sparkline data array
  • MAX(FREQUENCY(… – get the maximum value from the frequency data just so we can add 1 to it to give a margin on the sparkline

Summary

Hopefully you’ll enjoy and if you can think of any improvements or other services to tap into just get in touch.

44 Comments

For those not familiar with MOOCs Dave Cormier & co. have a nice video explaining them here. Dave should know as the term:

MOOC was coined in 2008 by Dave Cormier, Manager of Web Communication and Innovations at the University of Prince Edward Island, and Senior Research Fellow Bryan Alexander of the National Institute for Technology in Liberal Education in response to an open online course designed and lead by George Siemens, associate director, Technology Enhanced Knowledge Research Institute at Athabasca University and Stephen Downes, Senior Researcher at The National Research Council (Canada). The course was called "Connectivism and Connective Knowledge" and was presented to 25 tuition-paying students at the University of Manitoba in addition to 2,300 other students from the general public who took the online class free of charge – from Wikipedia

If MOOCs aren’t already on the Gartner ‘peak of inflated expectations’ I’m sure by the end of this year they will as various people are gambling Where there’s MOOC, there’s brass?. Recently projects like Coursera, Udacity and edX have attracted over $100 million in investment to deliver MOOCs. It’s worth reiterating that not all MOOCs are the same and as George Siemens recently wrote:

Largely lost in the conversation around MOOCs is the different ideology that drives what are currently two broad MOOC offerings: the connectivist MOOCs (cMOOCs?) that I [George Siemens] have been involved with since 2008 (with people like Stephen Downes, Jim Groom, Dave Cormier, Alan Levine, Wendy Drexler, Inge de Waard, Ray Schroeder, David Wiley, Alec Couros, and others) and the well-financed MOOCs by Coursera and edX (xMOOCS?).

George’s post was entitled ‘MOOCs are really a platform’, which got me thinking that given the recent announcement that Coursera is offering it’s platform to other partners, including the University of Edinburgh, this falls into the ‘as a Service’ model as used in cloud computing. So Coursera are offering chosen partners (for a fee) ‘MOOCs as a Service’ (MaaS), or using the distinction from above ‘xMaaS’.

One other consideration is whether xMOOCs are really MOOCs. Martin Weller recently questioned if the recent offerings from Coursera et al. are truly open. So ignoring the xMOOCs for now I thought it would be useful to survey some ‘real’ MOOCs and identify if there were any technological trends and possible future directions. This process has left me the questioning if there is a need for more shared guidance and support on aggregation architectures. [This post is mainly for me to record thoughts and resources but it might be something we (JISC CETIS) take forward. If this is something you’d like please comment or share this post to let us gauge the level of interest]

How the data was collected

A list of MOOCs was taken from the ‘(Very) Partial History of MOOCs’ section of mooc.ca (maintained by Stephen Downes). This was added to with UK based examples funded by JISC/HEA. All 14 courses were then visited and examined for technologies used. The criteria for inclusion were that the course had to have finished or started.

Because of the nature of MOOCs there is a grey line between the technology: chosen by the course team; recommended to students; and used by students either as part of their personal learning environment or used to facilitate smaller study groups. A distinction was attempted to only include technologies chosen/used by the course team.

Data collection was also not entirely conclusive due some of the early MOOCs (CCK08  and CCK09) no longer having a web presence and others like MobiMOOC reusing their existing web presence (the 2011 version of the course has been edited to become the version for the 2012 session).

A Google Spreadsheet with the data and annotations is here. Please contact me for edits/additions.

Results

Frequency of MOOC tech

MOOC tech treemapAbove is the obligatory word cloud of the frequency different technologies were used in the MOOCs surveyed (if you are feeling brave you can explore the data as a treemap on ManyEyes).

Two things are apparent in this data. Firstly, email is often a hidden technology. Services like Google Groups, Twitter and Facebook all allow (and in some cases rely on) email notifications. Secondly, it’s of limited use to know what technologies are being used, the important question is how they are being used.

We can get some insight into this from the diagram below taken from George Siemens “What is the Theory that Underpins Our MOOCs?

Siemens: MOOC Architecture

gRSShopper is an application that allows you to define your own community of RSS feeds, aggregates content from those feeds and organizes it, and helps you integrate that content into your own posts, articles and other content [source]

Because of the connectivist theory underpinning cMOOCs gRRShopper is a key component in aggregating distributed activity. It’s worth noting that only 5 of the 14 MOOCs surveyed used gRSShopper, but there is evidence that other mechanisms are in place to preform similar functions. For example in Digital Storytelling (DS106) this functionality is handled by the FeedWordPress plugin which allows administrators to specify feeder RSS feeds and selectively repost content to a self hosted WordPress blog. In PHONAR and PICBOD, which are photography based courses, students were directed to publish work to Flickr using a common tag to allow aggregation of work.

The general sense of it all – distributed, chaotic, emergent 

The ‘distributed, chaotic, emergent’ comes from a recent presentation by George Siemens for EDUCAUSE talking about cMOOCs. It’s apparent from the survey of MOOC technology that course teams are taking a loosely joined set of tools that they are comfortable with to facilitate a shared experience with the learner. As commented by Downes when writing about gRSShopper “the users are assumed to be outside the system for the most part, inhabiting their own spaces, and not mine”. It’s also apparent that people are taking discipline based approaches using tools aligned to study areas as previously described with PHONAR/PICBOD.

Even with the bespoke nature of MOOCs there are still opportunities to start collectively raiding the parts bin. Given the widespread use of Twitter in MOOCs are there tools/techniques required to aggregate and disseminate the course discussions? Given the wide use of WordPress within education are there opportunities for MOOC specific themes or plugins? With the ability to freely record and stream video from a Google Hangout do we need a wrapper to allow comment collection and annotation?

Aggregation!

It’s just not the technology that sits on top. It’s been fascinating to read the blog posts from JIm Groom as he and his colleagues come to grips with the infrastructural issues of running ds106.us. As commented by Stephen Downes, and as Jim is finding, “aggregation kills websites”. So if it’s the aggregation of content that’s the glue in cMOOCs, perhaps this becomes the area of priority? Perhaps this is the area where JISC CETIS can most useful provide or facilitate guidance?  As Jim recently commented:

No where in the raging discussion around MOOCs is there anyone talking about sharing the infrastructural/architectural work they’ve done freely with others – Jim Groom in Integrating FeedWordPress with BuddyPress

So having loosely started with the pedagogy, looked at some of the technology I’m beginning to think aggregation architecture/infrastructure might be the best place where an organisation like JISC CETIS could help. Do you agree?   

6 Comments

Today Tony Hirst posted Data Shaping in Google Refine – Generating New Rows from Multiple Values in a Single Column. Coincidently I faced a similar problem today working with a Google Spreadsheet so thought I’d share my solution.

Here’s a Google Spreadsheet where I’ve been capturing some MOOC infrastructure (contract me if you’d like to contribute/edit). For the technology column I’ve just been dumping in comma separated terms.

Source spreadsheet

Having entered some data I started thinking about how it could be presented. One idea was to use the Smile timeline widget, which I’ve experimented with before here. To get the data in the right shape I need to split the technologies to have one per row. To do this I wrote the custom function below in Google Apps Script which splits the technology into one per row (I’m undecided now if I’ll still use this for a timeline – it’s another case of it’s taken longer to write this up than actually do it)

Reshaped spreadsheet

// splitColumnAndRepeatRows iterates row by row in the input range and returns an array spliting
// by a comma separated column
// Arguments:
//   - anArray: values[][]
//   - splitColumnIndex: specifies the column number where the cell values need to be comma split 
//                       0 indexed
// Returns an [][] Array of values.
function splitColumnAndRepeatRows(anArray, splitColumnIndex) {
  var output = [];
  for (i in anArray){ // for each row
    var splitArray = anArray[i][splitColumnIndex].split(","); // split values in specified column
    for (j in splitArray){ // for each split cell value
      var row = anArray[i].slice(0); // take a copy of source row 
      row[splitColumnIndex] = alltrim(splitArray[j]); // replace comma separated value with current split value
      output.push(row); // push new row to output
    }
  }
  return output;
}

function alltrim(str) {
  return str.replace(/^\s+|\s+$/g, '');
}

Update: added an alltrim function to remove whitespaces

Here's some posts which have caught my attention this month:

Automatically generated from my Diigo Starred Items.
Share this post on:
| | |
Posted in Starred on by .

2 Comments

What is the difference between a map and a compass? he [Ade Oshineye] asked. His answer: people have too much faith in maps, which are, in the end, just someone else's view of the world. Whereas a compass only gives you a rough direction and we don't expect more of it than that.

The point? It's the same with finding a successful model for ... the future: getting the direction right is better than trying to find the perfect strategy with the false certainty of a map.

... Even bookshelves, he said, have evolved over years: boring technology may have the answer to your problem.

Google+ Hangouts, he said, is an example of boring technology - video chat - being used in a new way. ...

Google+ has taught him that if you apply boring technology to interesting people you can get something new: "The hard part is coming up with the right metrics to see if you're getting there." - Charles Miller, Future media challenges are about finding compasses not maps (Emphasis mine)

Share this post on:
| | |
Posted in Oddment on by .

4 Comments

I recently had a chance to spend some time with Marc Smith co-founder of the Social Media Research Foundation which is behind the Microsoft Excel networks add-in NodeXL. I’ve done a couple of blog posts now with NodeXL and after a prompted by Marc I thought it was worth a revisit. So in this post I’ll highlight some of the new features of NodeXL's Twitter Search tools that make it a useful tool for community/resource detection and analysis.

Importing a Twitter Search – expanding all the urls

Before going too far I should also point out there has been a separate social network importer for Facebook Fan pages for a while now. On the new Twitter Search import there is now an option to ‘Expand URLs in tweets’. This is useful because Twitter now  wraps all links in it’s own shortening service t.co. The shortened urls are also unique for each tweet* even if the target url is the same. Having a feature that expands these is useful to see what people are linking to (it makes it easier to see if people are sharing the same resources or resources from the same domain).  And as you’ll see later makes it easier data to use in mashups.

*except new style RTs which use the same t.co link

Expand URLs options

Did you know you can use urls and website domains in your search? This is a trick I’ve been using for a long time and I’m not sure how widely known it is. For example here is everyone who has been sharing the new Creative Commons License chooser at http://creativecommons.org/choose/ or just everyone sharing a link that has anything that links to the Creative Commons website domain. In Tweetdeck I use a search column with ‘hawksey.info OR cetis.ac.uk’ to pickup any chatter around these sites.

Replicable research and the (almost) one button expert

NodeXL has been a great tool for me to start learning about network analysis, but as I play with various settings I’m conscious that I’m missing some of the basic tricks to get the data into a meaningful shape. For a while now people have been able to upload and share their network analysis in the NodeXLGraphGallery. This includes downloading the NodeXL data as an Excel Workbook or GraphML (this is a nice way to allow replicable research).

An even newer feature is to download the NodeXL Options the graph author used. This means a relative amateur like myself with no sociological background, and unlike Marc unaware of what the increasing popularity of zombie films might be saying about our society (although they can be used to explain betweenness centrality), can tap into their expertise and format a graph in a meaningful way with a couple of clicks. There’s still the danger that you don’t understand the graph, but it can still be a useful jumpstart.

Import NodeXL Options

Twitter Search Top Items Summary

The next new thing is a Twitter Search Network Top Items page. I did a search for ‘#oer OR #ukoer’ to pull the last 7 days  tweets. By importing the options from this NodeXL Graph Gallery example and running the ‘Automate’ you can reuse my settings on your own search result. By running Graph Metrics > Twitter search network top items (part of my Automate options) I get this sheet which I’ve uploaded to Google Spreadsheet

Twitter Search Network Top Items page

This sheet lets you quickly see overall and group level:

  • Top Replied-To in Entire Graph
  • Top Mentioned in Entire Graph
  • Top URLs in Tweet in Entire Graph
  • Top Hashtags in Tweet in Entire Graph
  • Top Tweeters in Entire Graph

Detection

These are useful summaries to look at who is most active in the community, what urls are most being shared, overlapping tag communities. I admit that it can look like a scary dashboard of stuff which not all of you will like, but NodeXL is a network graphing tool so it’s easy to visually explore the data.

So looking at macro level we can quickly graph the ripples typical within a Twitter community which mainly showing the effects of retweets (this view was extracted from my online NodeXL Google Spreadsheet Graph Viewer). This can help you quickly see the smaller clusters within the community who are generating retweets and conversations.

Retweet ripples

Community (group) in a box

Because my data was also on the NodeXL Graph Gallery Marc kindly created this view which groups sub-communities using an algorithm and overlays the most used hashtags used by the sub-community (Marc’s version on NodeXL Graph Gallery). The group hashtag labels, which are in frequency order, are very useful in this situation because the search term I used was pulling in overlapping hashtag communities (#oer and #ukoer). So looking for boxes where ‘ukoer’ is near the beginning would indicate they are from the uk crowd.

oer/ukoer graph by Marc Smith

Getting more from the data

Earlier I mentioned that having expanded urls was useful for further analysis. Something I quickly played with that I’m not entirely sure how to get the most out of (if anything) is reusing my RSS Feed Social Share Counting Google Spreadsheet code to get social share data from the most tweeted links. Here’s the result (embedded below). Let me know if you have any thoughts on how it can be used:

2 Comments

This is my last post of notes from Google I/O 2012, so I won’t get to mention that Google Docs has offline editing, Google Drive not only can ocr your pdfs to make them searchable but can also recognise what’s in the images, Google Compute Engine or various Chrome improvements. Instead I’m going to focus on the sessions around Google Apps Script (which I occasionally write about anyway ;s).

One of the main questions I get from people interested in using Apps Script is how stable/sustainable the platform is. Given there were three separate sessions on Apps Script at #io12 announcing a number of new features including the launch of script.google.com I'd say for now things are looking very healthy.

There were (by my count) four new features of Apps Script announced outlined below, I’m throwing in one more old announcement, which I’ll share my notes on in this post:

  • Recent announcement Managed Libraries and Versioning – this lets you control a master script which you can maintain and bug fixes, new features can be used by anyone using the library (before if you created a script in a spreadsheet and someone made a copy the script became orphaned so the only way you could receive updates was making a fresh copy or manually replacing the existing code).
  • ContentService – lets you publish content generated by App Script which will be interpreted by other web services/tools as atom, rss, json, text or xml
  • HTMLService – lets you publish content generated by Apps Script which will be interpreted by other web services/tools as html which can include custom CSS and client-side JavaScript (including importing existing libraries/code like jQuery)
  • ScriptDb – a NoSQL style database that lets you store and query data that doesn’t have a fixed schema (doesn’t have a fixed number of values or may have nested data)
  • Publish to Chrome Web Store – you can publish Apps Script based applications directly to the Chrome Web Store so that other people may easily install and use.

Managed Libraries/Versioning

Existing services and ability to write your own custom functions has made it easy to extend the functionality of Google Apps. Using things like UrlFetch you could make calls to 3rd party APIs or maybe there was something you wanted to do with your Gmail account to make it better.

Previously there were a couple of ways you could share your custom scripts. Most often than not I embed them into my Google Spreadsheet templates, or you could add them to a Script Gallery or maybe drop some code you’ve found from elsewhere like GitHub.

The problem with all these methods is once the script is installed it becomes orphaned from the original, the only way to update is for the user to take a fresh copy or overwrite the code. Managed Libraries let you write a script which other people can then import into there project to immediate extend the methods they can use.

There’s no official searchable store of released scripts but here are some Notable Script Libraries to give you an idea of what’s possible. Here is more information on Managing Libraries.

One of my first concerns was because of Versioning if you want to share a scripted template which used a Managed Library the user would be stuck on the release version (this makes sense if you are a coder and want to use the stable version). So if you had an update users could go into their version and select the next stable release. It also looks like to can distribute a template spreadsheet with the library in Developer Mode, which means it’ll always use the latest version of the code.

ContentService

Getting formatted XML from Google SpreadsheetThere have been several times when I’ve wanted to digest/process some data in a Google Spreadsheet and then republish in a different format. A typical example would be import some timestamped data and generate a custom RSS feed. I trick I picked up from the Google Earth Outreach Spreadsheet Mapper is to format the data in a sheet within Google Spreadsheets and publish as text. So here is a spreadsheet of data and here is the data reformatted as kml.

This solution works well but can be fiddly and the output might be interpreted as text and not XML (because of the MIME type). The other issue is you are stuck with one view, so if I wanted to create a different RSS feed using the same data but a different view it needs to be published as a separate sheet. By creating a ContentService means data views can be published that are generated according to values pasted in the querystring (GET) or payload (POST).

Turn Google Sites pages into RSS codeHere’s some example code I’ve written that allows you to generate an RSS feed from Google Sites pages. When this script it published as a service the doGet catches a user defined Google Site url and page stub (checks if it already cached the result), then creates a page with the RSS.

sites2RSS url

I would show you an example but I’d need to publish with anonymous access and then anyone could use my quota. But here is an example of a feed passed through Yahoo Pipes to obscure the service key, which is using ContentService to generate an RSS feed of this page on QMU’s Audio Feedback Site. Most of the time the generate feed seems be alright for Google Reader and ingested back into Spreadsheet using importFeed() but timeouts in services are a problem if you are pulling back a lot of pages.

HTMLService

HTMLService is similar to ContentService but has some extra neat little tricks. Rather than go into detail about what it can do I’ll direct you to the HTMLService documentation and the video of session given by Corey Goldfeder. To get a quick idea of what is possible keisuke oohashi has already ported the Twitter bootstrap template to HTMLService.

Some of my notes on HTMLSerivce:

ScriptDb

Again the documentation for ScriptDb is very good so I’ll let you explore (worth noting the stub it’s in which covers various ways of Storing Data in Apps Script). The basics are:

  • NoSQL style database which lets you store, update and search data as JavaScript Objects (JSON)
  • Each Apps Script project gets it’s own local database. The only way I can find of accessing the data across projects is if the database is part of a Managed Library. Note of caution: My understanding is if you let other people use your managed library which has ScriptDb in it, everyone using the library has access to the data and if they can write to it, it’s using up your quota. More details on ScriptDb and Libraries).
  • The quota is a total of 50MB for regular Google accounts, 200MB for Google Apps for Education per-user (a concern I have is if you start a regular scrape collecting data in ScriptDb in one script and forget about it, it would be a nightmare to find where your quota had gone. Best practice might be to only use your ScriptDb’s through a personal script library. Because ScriptDbs are local instances using in a managed library will give you access to the data across scripts Bruce Mcpherson has already written a library you could use to silo your data.
  • Querying is by example’ – if you use type: ‘employee’ in your query it pulls all of the objects that include this somewhere in their record.

Here is the video from Google I/O session which covers  Storing Data in Google Apps Script.

Publish to Chrome Web Store

So your institution is running Google Apps for Education and you are doing interesting stuff within the Google Apps ecosystem. Maybe, like Loughborough College, you are experimenting with how Apps Script can be used for assessment and feedback. Perhaps you see a way of making your students lives a little easier like providing a filtered view of all the feedback docs they’ve received or a way for them to book a perch at a PC. Well now you could brand and publish the application to the Chrome Web Store.

I haven’t had a huge chance to play with this feature, but here is the Publish to Chrome Web Store documentation. One thing I do know is you need a web store developer account which has a one off fee ($5). Some examples already in the store:

  • Gmail Meter - runs on the first day of every month and sends you an email with all the desired statistics on your own email behavior.
  • Gmail Print All for Chrome - help you quickly export messages right to a Google Doc that you can print, save as a pdf, share with others, or store in a way that works for you
  • DriveEye - lets you keep an eye on your Google Drive. DriveEye will send you email alerts when a collaborator adds files to a shared folder.

[You can actually search for more because the from references where the script is published]

It seems fitting to finish on the I/O session recording Putting Together the Pieces: Building Apps with Google Apps Script (this actually ran first during the event, but pulls in all the parts I’ve mentioned above):

I think that’s enough Google for now. Next to write about new version of NodeXL, Voyant-tools or Datameer … or something completely different instead