By my calculation it’s day 16 of the OER Visualisation Project. Since day 11 and the Jorum UKOER ‘snowflake’ visualisation I’ve been going back to the refined data and trying to validate the set and understand the information it contains better.
One of the things I did was upload the data from Google Refine to Google Spreadsheet (exported Refine as .xls and uploaded it to Docs). Here is a copy of the spreadsheet. Using the UNIQUE and COUNTIF formula it’s very easy to built a summary of the top Jorum UKOER contributors and subject categorisation.
In the original OER funding call paragraph 19 states: “depositing resources funded through this call into JorumOpen will be mandatory” so in theory all 51 Phase 1 and 2 OER projects should in theory have records in Jorum. We can use this assumption to validate the refined dataset.
Using day 8’s CETIS PROD to Google Spreadsheet its easy for me to create a list of Phase 1 and 2 lead institutions (41 in total as some institutions from phase one were refunded). Using this list was able to query the spreadsheet data and produce the following table embedded below which counts Jorum ‘ukoer’ records for each of the institutions:
You can see a number of institutions have zero record counts. These are mainly for the HEA Subject Centre projects which were not detected using the original extraction and reconciliation method, but as also noted, a number of these records are reconciled against other university names. Using this data the original extracted dataset was further refined and an additional 705 ukoer records were reconciled against institution names. A revised issue and summary of ukoer records is available here.
Data Driven Journalism
Most people are probably unfamiliar with the term ‘data driven journalism’ but would have seen some of the products of the process like The Guardian’s Interactive guide to every IED attack (Wikipedia has a useful definition and overview of data-driven journalism).
It’s been useful for me to consider the OER visualisation project like a data journalistic assignment, using Paul Bradshaws The inverted pyramid of data journalism as a basic processes to approach the Jorum data. For example, remembering the ‘context’ in which the Jorum data was collected (mandatory task, which in cases wasn’t always full automated) is a reminder that even after multiple refinements of the data it’s still not 100% complete and in parts may be unrepresentative.
Looking at a table of top Jorum UKOER contributors, for example, Staffordshire University accounts for almost 50% of the deposits almost all going in the HE – Creative Arts and Design subject area, while University College Falmouth have one Jorum entry for their entire ukoer work.
|Top UKOER Depositors||Records|
|University of Cambridge||855|
|Subject Centre for Information and Computer Sciences||669|
|Leeds Metropolitan University||383|
|HE – Creative Arts and Design||4068|
|HE – Engineering||1227|
|HE – Veterinary Sciences, Agriculture and related subjects||874|
|HE – Mathematical and Computer Sciences||767|
|HE – Physical Sciences||454|
Using Data Journalism processes should also be helpful when considering how the data is communicated using techniques like interactive slideshows and providing brief narration to the data.
With this in mind it was useful to revisit Geoff McGhee’s Journalism in the Age of Data (embedded below)
A lot more to do in 2012