OER Visualisation Project: Timelines, timelines, timelines [day 30] #ukoer

Following on Day 20’s Maps, Maps, Maps, Maps the last couple of days I’ve been playing around with timelines. This was instigated by a CETIS PROD ‘sprint day’ on Friday where Sheila MacNeill, Wilbert Kraan, David Kernohan and I put our thinking caps on, cracked the knuckles and look at what we could do with the CETIS PROD data.

Creating a timeline of JISC projects from PROD is not new, Wilbert already posting a recipe for using a Google Gadgetized version of MIT’s SIMILE timeline widget to create a timeline of  JISC e-Learning projects. I wanted to do something different, trying to extract project events and also have more timeline functionality than offered by the SIMILE gadget. My decision to go down this particular route was also inspired by seeing Derek Bruff’s Timeline CV which renders Google Spreadsheet data in a full feature version of the SIMILE timeline widget (an idea that Derek had got from Brian Croxall).   

PROD Project Directory pageHaving already dabbled with the PROD data I knew CETIS staff have annotated JISC projects with almost 3,000 individual comments. These can be categorised as general and related projects comments and comments associated with technology and standards choices (you can see this rendered in a typical project page and highlighted in the graphic).

Timeline 1 – All project comments

Discovery number one was that all the comments don’t have timestamp  information in the linked data (it turns out only general comments have these). Ignoring these for now and seeing what happens if we create a SPARQL query, import the data into a Google Spreadsheet, format and then wrap in a HTML page we get this JISC CETIS PROD Comment Timeline:

JISC CETIS PROD Comment Timeline

The good – search and filtering strands; information popups render well; user can resize window; easy export options (activated when mouseover timeline via orange scissors)
The bad – too many comments to render in the timeline; the key under the timeline can’t render all the strands

Timeline 2: Technology timeline with comments

One of the suggestions at ‘show and tell’ was to focus on a particular area like technology to see if there were any trends in uptake and use. As mentioned earlier there are currently no timestamps associated with technology comments and it was suggested that project start and end dates could be used as an indication. So using the same recipe of SPARQL query, formatting data in a Google Spreadsheet to a HTML page we get the CETIS PROD Tech Timeline.  

CETIS PROD Tech Timeline

Again the default view presents information overload which is slightly alleviated by filtering. I still don’t get any sense of wave of technologies coming and going, partly because the project start/end dates and maybe it very rare for a technology to die.    

Timeline 3 – Programme level

Trying to take a more focused view it was suggested I look at a programme level timeline of general comments (being general comments means they are individually timestamped). Using the recipe one more time of SPARQL query, formatting data in a Google Spreadsheet to a HTML page we get the CETIS PROD OER Phase 1 & 2 timeline.

CETIS PROD OER Phase 1 & 2 timeline

Still there is a problem navigating the data because of clustering of comments (shown by the string of blue dots in the bottom timebar). So what’s going on here? Looking at the data it’s possible to see that despite the fact that general comments could have been made at any point in the two years of the programme 912 comments were made on only 73 different days (which partly makes sense – ‘going to sit down and do some admin, I’ll review and comment on project progress’).    

So timelines are maybe not best for this type of data. At least there’s a recipe/template used here which might help people uncover useful information. There is an evolution of this combining timelines and maps that I’m working on so stay tuned.

OER Visualisation Project: How is OER being socially shared – postscript [day 30] #ukoer

A quick postscript to day 24 of the OER Visualisation project where I looked at how individual Jorum UKOER resources were being, or as was the case, not being shared on social networking sites like Twitter, Facebook et al. I did a similar analysis on HumBox records and using my method it was a similar story, almost undetectable social sharing of individual resources.

To try an see if this was because of bad data on my behalf I posed the question to the #ukoer twitter community and the OER-DISCUSS mailing list.  On the OER-DISCUSS list Peter Robinson highlighted that one mention of one of University of Oxford’s resources on StumbleUpon resulted in a 20,000 views spike. On Twitter Catherine Grant (@filmstudiesff) responded:

Facebook
Likes: 13
Shares: 24
Comments: 11
Total: 48

Twitter
Tweets: 81

Google +1
+1s: 0

Diggs
Diggs: 0

LinkedIn
Shares: 2

Google Buzz
Buzzes: 0

Delicious
Bookmarks: 8

StumbleUpon
Stumbles: 1

Putting Catherine’s first link into sharedcount.com gives the following –> 

So one page of curated resources with almost 50 Facebook reactions, over 80 tweets can have as many social shares an entire repository.

This issue is a well known one within the OER community and with almost eerie timing the following day after the ‘day 24’ post Pat Lockley posted The OERscars – and the winner is  in which he looks at some of the activity stream around ‘Dynamic Collections’ created as part of the OER Phase 2 Triton Project. From Pat’s post:

Dynamic Collections function as a WordPress plug in, bring in RSS Feeds from OER sites and blogs, and then search these feeds for particular words before moving these items into collections. These collections can be created as simply as a WordPress post, and so gives almost everyone the scope to start building OER collections straight away. Once a collection has some content, it can be displayed to visitors to the site (normally as a “wider reading” style link at the end of a post on a particular topic) and we made sure to track how these resources are used.  As well as showing as a WordPress page, the collections can also be seen as an RSS Feed (add ?rss_feed_collection=true to the end of a page), An Activity Stream – which will be handy for the Learning Registry (?activity_stream=true), or embedded into another page (?dc_embed=true) via some javascript.

I’m not entirely sure what my point is but thought worth sharing the information and links.

Sending free SMS messages to students by mashing up Google Calendar notifications

Finding free ways to send and receive SMS messages is something I’ve covered before (Twitter Ye! Twitter Ye! Keep your students informed with free SMS text message broadcasts! and Free SMS voting using intelliSoftware SMS Gateway service). Browsing the Google Apps Script site I saw there is a new tutorial by Romain Vialard which lets you Link a Gmail Filter to Google Calendar SMS Notifications in which it “shows how to create a specific filter in Gmail and be notified by SMS when you receive an important email”. This got me thinking if there was a way of using this feature to send free SMS updates to students, say for example, when they they received some Fast-tracking feedback.

This solution needs the student to configure their calendar for SMS notifications and not all mobile operators are supported, but I will show how to make this work and you can decide if its worth the effort.

Student side setup

To enable SMS updates all the students need a Google Calendar and to have enabled SMS notifications for event invites (you may want to tell students about this anyway just so they can get SMS notifications for event reminders). This is done by:

  1. Google Calendar Settingsopening Google Calendar going into ‘Calendar settings’ from the cog icon top-right
  2. on the Mobile Setup tab the student needs to verify their mobile number (the supported UK providers are: 3; O2; Orange; T-Mobile; Virgin Mobile; and Vodafone – full international list here), then click ‘Send Verification Code
  3. once a code has been received enter it in the ‘Verification code’ field and click ‘Finish setup
  4. this then takes you to the ‘Notifications’ settings for your default calendar. In the New events row tick the SMS checkbox (you might want enable other SMS services), then click ‘Save
    Google Calendar Notification Settings

Sending SMS notifications manually

To send a message to students open your Google Calendar and create an event (you might want to create a new calendar so you don’t mess up your own schedules – creating a new calendar and creating events using your current date/time also means you have a record of when a message was sent – SMS notifications appeared to only be sent for events in the future). The fields included in the SMS are the event title, location and date/times. Here’s an example message recieved:

 Example SMS notification

In the ‘Add guest’ field paste in a comma separated list of student email addresses (these addresses need to be associated with the student’s Google Calendar so if you are not using Google Apps for Education some prep is required to get this list) and click ‘Add’. I would also recommend making sure the guest options to modify, invite and see a guest list are unticked.

When you save the event shortly afterwards students should receive the SMS message and an email similar to the one shown below:  

Example email notification

You might want to agree a system with your students where they have to confirm they have seen it by clicking ‘Yes’ in the ‘Going’ option. This will give you a summary of who has seen it in the event details. You may want to utilise other Calendar features like notes.

 guest list

Using Google Apps Script to automatically send SMS notifications

This whole idea came from a Google Apps Script tutorial so it only seems fitting to show how this could modified to automate the guest list creation, which in turn would trigger the SMS notification. I’m not going to go into any detail about what Apps Script is but basically as one of its main uses in to automate tasks it’s fairly easy to write a script that will take a list of email addresses from a Google Spreadsheet, create a new event in your calendar adding those addresses as guests. And here it is:

For this to work all you need is a column of email addresses associated with personal Google Calendars and within seconds your sending  free SMS messages to people.

So what do you think, a viable solution or is student side setup/concerns over network coverage going to prevent you using it? 

A way to archive and display Twitter hashtag chats

It’s interesting to watch the popularity of Twitter hashtag chat communities (like #uklibchat) grow. It’s also interesting to the number of different ways these chats are recorded from dumping/exporting a Twitter Search into a word or pdf doc to using tools like Storify to capture the highlights. If you organise or are thinking of organising a #chat here’s one way you might want to keep a complete record of what was said and a couple of ways you can use this data.

Archiving the conversation

Perhaps not surprisingly it starts with my Google Spreadsheet Template for capturing Twitter searches. I realise that this solution isn’t as straight forward as old services like Twapper Keeper and it can look a bit daunting but trust me it’s not that bad and even with my dodgy typing it can be setup in under 3 minutes.

Publish archive as a spreadsheet

Here’s an example from the last #uklibchat (for demo purposes only, the archive isn’t been updated). What you can do is select the rows that cover your chat period and paste them to a new sheet (as I have done here). Once you’ve copied you might want to sort on the Time column to get the tweets back in chronological order. At this point you have a couple of options. You can File > Publish to the web generating a link for the sheet (or as I’ve done just File > Share then entire spreadsheet).

Embed tweets in your blog

By adding a column with the formula ="[tweet "&M2&"]" and filling it down the entire column of your chat sheet when you paste the values generated into certain blogging tools like WordPress.com the twitter status urls are automatically embedded in the page like the one shown below (you might want to be selective about the tweets you include as too many will kill your page load time):


Experimental – Visualise, interact and replay the conversation

TAGSExplorer #uklibchatWith you #chat in a Google Spreadsheet another thing you can do is use my TAGSExplorer to let other people see and interact with the conversation.  Here’s #uklibchat for the 12th January:

Some things worth noting: click on a node lets you see their tweets, replies and mentions; and you can replay the conversations with that person 

What do you think?

OER Visualisation Project: How is OER being shared – Quick look at Jorum and #ukoer Twitter archive [day 24]

This might be slightly off-topic for the OER Visualisation project, but I followed an idea, did – what I think are – some interesting things with an archive of tweets and thought I would share. This line of thought was triggered by a tweet from Terry McAndrew in which he asked:

@mhawksey Have you a visualisation planned of JORUM ‘customers’ of OER (and the rest of it for that matter).

Tracking who views or uses OER material can be tricky but not impossible the main issue comes when someone else like me comes along and wants to see who else has viewed, used, remixed the resource. For example, with the Jorum ukoer resources the only usage data I could get was each resource page view and even getting this required scraping over 8,000 pages. This is mentioned in day 18, but I realise I haven’t gone into any detail about how Google Refine was used to get this data – if someone wants me to I will reveal all.

A recent development in this area is the US Learning Registry project which is looking to maximise the use of activity data to support resource discovery and reuse. On Lorna Campbell’s CETIS blog there is a very useful introduction to the Learning Registry announcing JISCs involvement in a UK node. The post includes the following use case which helps illustrate what the project is about:

“Let’s assume you found several animations on orbital mechanics. Can you tell which of these are right for your students (without having to preview each)? Is there any information about who else has used them and how effective they were? How can you provide your feedback about the resources you used, both to other teachers and to the organizations that published or curated them? Is there any way to aggregate this feedback to improve discoverability?

The Learning Registry is defining and building an infrastructure to help answer these questions. It provides a means for anyone to ‘publish’ information about learning resources. Beyond metadata and descriptions, this information includes usage data, feedback, rankings, likes, etc.; we call this ‘paradata’”

Lorna’s post was made in November 2011 and since then the Mimas have started The JLeRN Experiment (and if you are a developer you might want to attend the CETIS contributors event on the 23rd Jan).

Plan A – Inside-out: Repository resource sharing data

All of this is a bit late for me but I thought I’d see what ‘paradata’ I could build around ukoer and see if there were any interesting stories to be told as part of the visualisation project. This is an area I’ve visited before with a series of ‘And the most socially engaging is …’ posts which started with And the most engaging JISC Project is… For this I used Google Spreadsheet to get social share counts (Facebook, Twitter and more) for a list of urls. One of the issues with this technique is Google Spreadsheets timeout after 5 minutes so there is a limit to the number of links you can get through -this is however not a problem for Google Refine.

Taking 380 of the most viewed Jorum ukoer tagged resources (approximately 5% of the data) I used Google Refine to

  1. ‘Add column by fetching URL’ passing the resource link url into Yahel Carmon’s Shared Count API -  using the expression "http://api.sharedcount.com/?url="+escape(value,"url") 
  2. Add column based on the results column parsing each of the counts – e.g. using expressions similar to parseJson(value)['Twitter']

At this point I stopped parsing columns because it was clear that there was very little if any social sharing of Jorum ukoer resources (here is a spreadsheet of the data collected).  In fact the most tweeted resource which Twitter records as having 7 tweets gets most of these following my tweet as it being the most viewed resource.

So what might be going on here. Is just a issue for national repositories? Are people consuming ukoer resources from other repositories? Are Jorum resources not ranking well in search engines? Are resources not being marketed enough?

I don’t have answers to any of those questions, and maybe this is just an isolated case, but the ‘marketing’ aspect interests me. When I publish a blog post I’ll push it into a number of communication streams including RSS, a couple of tweets, the occasional Google+. For posts I think are really worthwhile I’ll setup a twitter search column on Tweetdeck with related keywords and proactively push posts to people I think might be interested (I picked up this idea from Tony Hirst’s Invisible Frictionless Tech Support. Are we doing something similar with our repositories?  

Plan B – Outside-in: Community resource sharing data

There’s probably a lot more to be said about repositories or agencies promoting resources. To try and find some more answers, instead of looking from the repository perspective of what is being shared, I thought it’d be useful to look at what people are sharing. There are a number of ways you could do this like selecting and monitoring a group of staff. I don’t have time for that, so decided to use data from an existing community  who are likely to be using or sharing resources aka the #ukoer Twitter hashtag community. [There are obvious issues with this approach, but I think it’s a useful starter for ten]

Having grabbed a copy of the #ukoer Twapper Keeper archive using Google Refine before it disappeared I’ve got over 8,000 tweets from 8th March 2010 to 3rd January 2012.  My plan was to extract all the links mentioned in these tweets, identify any patterns or particularly popular tweets.

Extracting links and expand shortened urls

As most tweets now get links replaced with t.co shortened urls and the general use of url shortening the first step was to extract and expand all the links mentioned in tweets. Link expansion was achieved using the longurl.org API, which has the added bonus of returning meta description and keywords for target pages. Here’s a summary of the Google Refine actions I did (taken from the undo/redo history):

  1. Create new column links based on column item – title by filling 8197 rows with grel:filter(split(value, " "),v,startsWith(v,"http")).join("||")
  2. Split multi-valued cells in column links
  3. Create column long_url at index 4 by fetching URLs based on column links using expression grel:"http://api.longurl.org/v2/expand?url="+escape(value,"url")+"&format=json&user-agent=Google%20Refine&title=1&meta-keywords=1"
  4. Create new column url based on column long_url by filling 5463 rows with grel:parseJson(value)['long-url']

Searchable table of #ukoer linksWith long urls extracted the data was exported and uploaded to Google Spreadsheet so that a list of unique urls and their frequencies could be calculated. Here is the Spreadsheet of data.  From the refined data there are 2,482 different links which appear 6,220 times in the #ukoer archive. Here is the searchable table of extracted links with frequencies (sorry for the small font – can’t seem to find a way to control column width using Google Visualisation API … anyone?).

Not surprisingly a number of domain root urls appear at the top of the list. More work needs to be done to match resource sharing to different OER sources, but you can start doing simple filtering to find out what’s there. Something for the initial analysis I find interesting is that the top common link in the archive is to Oxfords Open Advent Calendar, which was a daily posting highlighting some of their OER resources. This could be interpreted as underlying the need for OER resources to be more effectively marketed. I’ll let you decide.

PS out of interest I put the list of 2,500 odd links back into Google Refine and extracted social share counts. I haven’t had a chance to look at the data closely but if you want to play a copy of it and a meta-data enhanced version of the #ukoer archive is here. Please share any findings ;)

Fast-tracking feedback using Google Apps Script [Release 1]

In October last year the Sport Learning Technologists at Loughborough College successfully won funding from the LSIS Leadership in Technology (LIT) grant scheme for the Fast-tracking feedback using Google Scripts project. Here’s an extract from the project summary:

This project will effectively combine Google Apps for Education and Google Apps Script in order to create a tool which allows tutors to enter grades and feedback in a single spreadsheet which then automatically populates individual feedback proforma, simultaneously sharing these results with students, progress tutors, and administrators as appropriate.

The benefit will be an increase in the efficiency with which assessment feedback can be shared, improving the speed and quality of paper-less student feedback. A successful conclusion to this project will be demonstrated by reduced submission turnaround times and a reduction in the errors brought about by inconsistencies in data entry.

Project funding is not just for deploying technology but also increases the capacity within the organisation at the operational level. With this in mind I have been working with Loughborough, helping them in the technical aspects of developing the Fast-Tracking Feedback System and also learn about Google Apps Script via a series of workshops. Friday was the first of these and I thought I’d share the story so far.  

The Loughborough group had already got of to a flying start successfully modifying My #eas11 world premier: Creating personalised Google Documents from Form submissions. 5 months is a long time in Google Apps Script and since then not only is there some new functionality in Apps Script, but I’ve also picked up some new tips. My own understanding has come on along way thanks to receiving a preview copy of Google Script: Enterprise Application Essentials by James Ferreira [due out 27th January]. I’ve been a regular visitor to James simpleappssolutions.com site and tutorials so wasn’t sure if his book would teach me much more, but how I was wrong. Part of the reason I believe for this is the book is geared towards ‘enterprise applications’ so concentrates on documents and workflows, just as assessment in education (for better or worse) is concentrated on documents and workflows.   

So below are two links of the current version of the Google Apps Script Spreadsheet and example Document template followed by a quick video to show how it is used. Obviously these are still work in progress as there is still 6 months to run on the project but there’s already enough there for others to benefit from and perhaps feedback on design.

Stay tuned for more developments

Twitter Archiving Google Spreadsheet TAGS v3

Important: Changes to Twitter API

Because Twitter are making changes to their API this version of TAGS will stop working in March 2013. TAGSv5.0 is compatible with the new API and you should get it here! 

For existing TAGS users:

What will happen to my existing TAGS sheets?

When Twitter turn off the old API (test outages this March) all authenticated and unauthenticated search requests will stop working.

How do I upgrade my existing version of TAGS?

As I can’t push an update to existing copies of TAGS you’ll have to manually update by opening your spreadsheet, then opening Tools > Script editor… and replacing the section of code that starts function getTweets() { and finishes 134 lines later (possiblly with the line function twDate(aDate){ ) with the code here. [And yes I know that’s a pain in the ass but best I could do] … or you can just start a new archive using TAGSv5.0

To support my research in Twitter community visualisation I’ve updated my Twitter Archiving Google Spreadsheet (TAGS) [formerly called twitteralytics – I pre-emptively changed the name to play nice with Twitter ToS].

This new version has some coding improvements and new features including a dashboard summary and advanced tools for getting user profile information and  friend/follower relationships for social network analysis.

You can get a copy by selecting one of the links below [make sure you are looked in to your Google account first]:

*** Twitter Archive Google Spreadsheet – TAGS v3.0 ***
*** Twitter Archive Google Spreadsheet – TAGS v5.0 ***
[Please rate it in the Template Gallery]
[If the first link doesn't work try making a copy from the Template Gallery
or Opening this Spreadsheet and File > Make a copy]

Below are the setup instructions for TAGS v3.x and may not work with TAGS v5.0

Basic setup/use

  1. Open the TAGS Google Spreadsheet and copy
  2. On the Readme/Settings sheetenter the following settings (starting in cell B9):
    • Who are you = any web address that identifies you or your event
    • Search term = what you are looking for eg #cetis12
    • Period = default
    • No. results = 1500 (this is the maximum twitter allows but without authenticated access you might get less. See the Advanced setup for info on configuration)
    • Continuous/paged = continuous
  3. To configure the spreadsheet to automatically update select Tools > Script Editor … and then in the Script Editor window select Triggers > Current script’s triggers… and Add a new trigger. Select to run ‘collectTweets’ as a ‘Time-driven’ choosing a time period that suits your search (For unauthenticated access I collect 1500 tweets every hour). Click ‘Save’
  4. The collection can manually be trigger by TAGS > Run Now! (Results appear on the ‘Archive’ Sheet).

Advanced setup/use

  1. Open the TAGS Google Spreadsheet and make a copy
  2. Register for an API key with Twitter at http://dev.twitter.com/apps/new. In the form these are the important bits:
    • Application Website = anything you like
    • Application Type = Browser
    • Callback URL = https://spreadsheets.google.com/macros
    • Default Access type = Read-only
  3. Once finished filling in the form and accepting Twitter’s terms and conditions you’ll see a summary page which includes a Consumer Key and Consumer Secret
  4. Back in the Google Spreadsheet select Twitter > API Authentication (you’ll need to select this option twice, the first time to authorise read/write access to the spreadsheet). Paste in your Consumer Key and Secret from the previous step and click ‘Save’ (if the Twitter menu is not visible click on the blue button to show it)
  5. From the spreadsheet select Tools > Script Editor … and then Run > authenticate and Authorize the script with Twitter using your Twitter account
  6. While still in the Script Editor window select Triggers > Current script’s triggers… and Add a new trigger. Select to run ‘collectTweets’ as a ‘Time-driven’ choosing a time period that suits your search (I usually collect 1500 tweets once a day, but increase to hourly during busy periods eg during a conference). Click ‘Save’
  7. Now close the Script Editor window. Back in the main spreadsheet on the Readme/Settings sheetenter the following settings (starting in cell B9):
    • Who are you = any web address that identifies you or your event
    • Search term = what you are looking for eg #cetis12
    • Period = default
    • No. results = 1500 (this is the maximum twitter allows)
    • Continuous/paged = continuous
  8. Click TAGS > Run Now! to check you are collecting results into a ‘Archive’ sheet
  9. To allow the results to be visualised from the spreadsheet select File > Publish to the web…You can choose to Publish All sheets or just the  Archive sheet. Make sure Automatically republish when changes are made is ticked and click Start publishing

Creating a public interactive visualisation of the archived conversation

  1. Copy the url of the published spreadsheet
  2. Visit http://hawksey.info/tagsexplorer and paste your spreadsheet url in the box, then click‘get sheet names’
  3. When it loads the sheet names leave it on the default ‘Archive’ and click ‘go’
  4. You now have a visualisation of your spreadsheet archive (click on nodes to delve deeper)
  5. To share the visualisation at the top right-click ‘link for this’ which is a permanent link (as your archive grows and the spreadsheet is republished this visualisation will automatically grow)

Quick way to display archive community (Links) data

    1. Run TAGS Advanced menu options (1-3)
    2. Copy the url of the published spreadsheet
    3. Visit http://hawksey.info/edgesexplorer and paste your spreadsheet url in the box, then click‘get sheet names’
    4. When it loads the sheet names leave it on the default ‘Links’ and click ‘go’

 

OER Visualisation Project: Maps, Maps, Maps, Maps [day 20]

Dear Diary, it is now day 20 of the OER Visualisation Project … One of the suggested outputs of this project was “collections mapped by geographical location of the host institution” and over the last couple of days I’ve experimented with different map outputs using different techniques. Its not the first time I’ve looked at maps and as early as day 2 used SPARQL to generate a project locations map. At the time I got some feedback questioning the usefulness of this type of data, but was still interested in pursuing the idea as a way to provide an interface for users to navigate some of the OER Phase 1 & 2 information. This obsession shaped the way I approached refining the data, trying to present project/institution/location relationships, which in retrospect was a red herring. Fortunately the refined data I produced has helped generate a map which might be interesting (thought there would be more from London), but I thought it would also be useful to document some of what I’m sure will end up on the cutting room floor.

Filling in the holes

One of the things the day 2 experiment showed was it was difficult to use existing data sources (PROD and location data from the JISC Monitoring Unit) to resolve all host institution names. The main issue was HEA Subject Centres and partnered professional organisations. I’m sure there are other linked data sources I could have tapped into (maybe inst. > postcode > geo), but opted for the quick and dirty route by:

  1. Creating a sheet in the PROD Linked Spreadsheet of all projects and partners currently filtered for Phase 1 and 2 projects. I did try to also pull location data using this query but it was missing data so instead created a separate location lookup sheet using the queries here. As this produced 130 institutions without geo-data (Column M) I cheated and created a list of unmated OER institutions (Column Q) [File > Make a copy of the spreadsheet to see the formula used which includes SQL type QUERY].
  2. Resolving geo data for the 57 unresolved Phase 1 & 2 projects was a 3 stage process:
    1. Use the Google Maps hack recently rediscovered by Tony Hirst to get co-ordinates from a search. You can see the remnants of this here in cell U56 (Google Spreadsheets only allow 50 importDatas per spreadsheet so it is necessary to Copy > Paste Special > As values only).
    2. For unmatched locations ‘Google’ Subject Centres to find their host institution and insert the name in the appropriate row in Column W – existing project locations are then used to get coordinates.  
    3. For other institutions ‘google’ them in Google Maps (if that didn’t return anything conclusive then a web search for a postcode was used). To get the co-ordinate pasted in Column W I centred their location on Google Maps then used the modified bookmarklet javascript:void(prompt('',gApplication.getMap().getCenter().toString().replace(',','').replace(')','').replace('(',''))); to get the data.
  3. The co-ordinates in Column S and T are generated using a conditional lookup of existing project leads (Column B) OR IF NOT partners (Column F) OR IF NOT entered/searched co-ordinates.

Satisfied that I had enough lookup data I created a sheet of OER Phase 1 & 2 project leads/partners (filtering Project_and_Partners and pasting the values in a new sheet). Locations are then resolved by looking up data from the InstLocationLookup sheet.

Map 1 – Using NodeXL to plot projects and partners

Exporting the OER Edge List as a csv allows it to be imported to the Social Network Analysis add-on for Excel (NodeXL). Using the geo-coordinates as layout coordinates gives:

OERPhase1&2 Edge 

The international partners mess with the scale. Here’s the data displayed in my online NodeXL viewer. I’m not sure much can be taken from this.

Map 2 – Generating a KML file using a Google Spreadsheet template for Google Maps

KML is an XML based format for geodata originally designed for Google Earth, but now used in Google Maps and other tools. Without a templating tool like Yahoo Pipes which was used in day 2, generating KML can be very laborious. Fortunately the clever folks at Google have come up with a Google Spreadsheet template – Spreadsheet Mapper 2.0. The great thing about this template is you can download the generate KML file or host it in the cloud as part of the Google Spreadsheet.

The instructions for using the spreadsheet are very clear so I won’t go into details, you might however want to make a copy of the KML Spreadsheet for OER Phase 1 & 2 to see how data is being pulled from the PROD Spreadsheet. The results can be viewed in Google Maps (shown below), or viewed in Google Earth.

OER Phase 1 & 2 in Google Maps

Map 3 – Customising the rendering KML data using Google Maps API

Map 3 ExampleWhilst digging around the Google Maps API for inspiration I came across this KML with features example (in the KML and GeoRSS Layers Section. Out of interest I thought I’d use the KML link from Map 2 as the source which gives this OER Phase 1 & 2 map. [If you are wondering about the map styling I recently came across the Google Maps API Styled Map Wizard which lets you customise the appearance of Google Maps, creating a snippet of code you can use in Google Maps API Styling.

Map 4 – Rendering co-ordinate data from a Google Spreadsheet

Another example I game across was using a Google Spreadsheet source which gives this other version of OER Phase 1 & 2 Lead Institutions rendered from this sheet.

I haven’t even begun on Google Map Gadgets, so it looks like there are 101 ways to display geo data from a Google Spreadsheet. Although all of this data bashing was rewarding I didn’t feel I was getting any closer to something useful. At this point in a moment of clarity I realised I was chasing the wrong idea, that I’d made that schoolboy error of not reading the question properly.

Map 5 – Jorum UKOER records rendered as a heatmap in Google Fusion Tables

Having already extracted ‘ukoer’ records from Jorum and reconciling them against institution names in day 11 it didn’t take much to geo-encode the 9,000 records to resolve them to an institutional location (I basically imported a location lookup from the PROD Spreadsheet, did a VLOOKUP, then copy/pasted the values. The result is in this sheet)

For a quick plot of the data I thought I’d upload to Google Fusion Tables and render as a heatmap but all I got was a tiny green dot over Stoke-on-Trent for the 4000 records from Staffordshire University. Far from satisfying.

Map 6 – Jorum UKOER records rendered in Google Maps API with Marker Clusters

imageThe final roll of the dice … for now anyway. MarkerClusterer is an open source library for Google Maps API which groups large numbers of closely located markers for speed and usability gains. I’d never used this library before but the Speed Test Example looked easy to modify. This has resulted in the example linked at the very beginning of this post mapping Jorum ukoer records

This is still a prototype version and lots of tweaking/optimisation required and the data file, which is a csv to json dump has a lot of extra information that’s not required (hence the slow load speed), but is probably the beginnings of the best solution for visualising this aspect of the OER programme.

So there you go. Two sets of data, 6 ways to turn it into a map and hopefully some hopefully useful methods for mashing data in between.

I don’t have a ending to this post, so this is it.

Free (and rebuild) the tweets! Export TwapperKeeper archives using Google Refine

Last month I posted Free the tweets! Export TwapperKeeper archives using Google Spreadsheet, which was a response to the announcement that TwapperKeeper would be removing public access to archives on the 6th January. This solution was limited to archives smaller than 15,000 tweets (although minor tweaking could probably get more). Since then Tony Hirst has come up with a couple of other solutions:

One of the limits these solutions have is they only collect the data stored on TwapperKeeper missing lots of other juicy data like in_reply_to, location, retweet_count (here’s what a tweet used to look like, now there is more data). Whilst this data is probably of little interest to most people for people like me it opens the opportunity to do other interesting stuff. So here’s a way you can make a copy of a Twapper Keeper archive and rebuild the data using Google Refine.

  1. You’re going to need a copy of Google Refine and install/run it
  2. Visit the Twapper Keeper page of the archive you want. On the page copy the RSS Permalink into the URL box in Google Refine adding &l=50000 to the end e.g. the ukoer archive is http://twapperkeeper.com/rss.php?type=hashtag&name=ukoer&l=50000 and click Next.
  3. In the preview window that appears switch ‘Parse data as’ to XML files. Scroll the top pane down to hover over the ‘item’ tag and click
    Refined parse xml 
  4. You should now have a preview with the data split in columns. Enter a Project name and click ‘Create Project’
    Refined Columns
  5. From the ‘item – title’ column dropdown select Edit column > Add column based on this column…
  6. In the dialog that opens enter a New column name ‘id_str’ and the Expression smartSplit(value," ")[-1] (this splits the cell and returns the last group of text)
  7. From the new id_str column dropdown select Edit column > Add column by fetching URLs… and enter a name ‘meta’, change throttle delay to 500 and enter the expression "https://api.twitter.com/1/statuses/show.json?id="+value+"&include_entities=true" (that’s with quotes), then click OK.  [What we’ve done is extract a tweet id and then asked refine to fetch details about it from the Twitter API]
  8. Wait a bit (it took about 4 hours to get data from 8,000 tweets)

Once you have the raw data you can use Refine to make new columns using the expression parseJson(value) then navigate the object namespace. You might find it useful to paste a cell value into http://jsonviewer.stack.hu/ to see the structure. So to make a column which extracts the tweets full name you’d use parseJson(value).user.name

So don’t delay ‘free the tweets’

OER Visualisation Project: What I know about #UKOER records on Jorum and OER Phase 1 & 2 [day 18]

*viewed doesn’t necessarily viewed by a human. There are multitude of bots and automated code that might visit the jorum site which could increase a records view count. For example the process of me getting this data scraping each record page generated almost 10,000 ‘views’.

Most of the numbers above come from two spreadsheets: CETIS PROD Spreadsheet; and jorumUKOERReconciled – Issue 2.  I’ve mentioned both of these spreasheets before (day 8 | day 16), but you might like to File > Make a copy of these to play with the data yourself and see some of the formulas used. An additional note on the resource view counts. These were collected by fetching the each resource page on Jorum using Google Refine and scraping the data (more details on using Refine to extract data from day 11.

[The additional processes were to extract a Jorum url by using the expression filter(value.split("|"),v,contains(v,"jorum.ac.uk"))[0]) on the identifier column, fetching a new column based on the new url and then extracting a count using toNumber(replace(filter(cells["jorumpage"].value.parseHtml().select("p.ds-paragraph"),v,contains(v.htmlText(),"viewed"))[0].htmlText(),/\D/, ''))]

So I now have a decent amount of data (some of which might be dropped), next to communicate …

About

This blog is authored by Martin Hawksey Google+

JISC CETIS Learning Technology Advisor (OER Programme Support)
jisc cetis logo

The MASHezine (tabloid)

It's back! A tabloid edition of the latest posts in PDF format (complete with QR Codes). Click here to view the MASHezine

Preview powered by:
Bluga.net Webthumb

The MASHebook

You can also download this post as:

Subscribe to monthly email digest of posts

Loading...Loading...


Subscribe to per post email updates

Enter your email address:

Delivered by FeedBurner

Copyright License

Creative Commons Licence
This work is licensed under a Creative Commons Attribution 3.0 Unported License. CC-BY mhawksey

Privacy /Cookies

This blog uses Google Analytics (which makes use of 'cookie' technologies) to provide information on usage. Here's an overview of Google Analytics Privacy and how to opt-out (other 3rd party services like Twitter might also be tracking you via this site, but as far as possible I try and prevent this by removing official tweet buttons).

Badges

. . .