Jorum

3 Comments

If you haven’t already you should check out Jorum's 2012 Summer of Enhancements and you’ll see it’s a lot more than a spring clean. In summary there are 4 major projects going on:

  • JDEP - Improving discoverability through semantic technology
  • JEAP - Expanding Jorum’s collection through aggregation projects
  • JPEP - Exposing activity data and paradata
  • JUEP - Improving the front-end UI and user experience (UI/UX)
SEO the Game by Subtle Network Design - The Apprentice Card
Image Copyright subtlenetwork.com

As I was tasked to write the chapter on OER Search Engine Optimisation (SEO) and Discoverability as part of our recent OER Booksprint I thought I’d share some personal reflections on the JDEP - Improving discoverability through semantic technology project (touching upon JEAP - Expanding Jorum’s collection through aggregation projects).

Looking through JDEP the focus appears to be mainly improving internal discoverability within Jorum with better indexing. There are some very interesting developments in this area most of which are beyond my realm of expertise.

Autonomy IDOL

The first aspect is deploying Autonomy IDOL which uses “meaning-based search to unlock significant research material”. Autonomy is a HP owned company and IDOL (Intelligent Data Operating Layer) was recently used in a project by Mimas, JISC Collections and the British Library to unlocks hidden collections. With Autonomy IDOL it means that:

rather than searching simply by a specific keyword or phrase that could have a number of definitions or interpretations, our interface aims to understand relationships between documents and information and recognize the meaning behind the search query.

This is achieved by:

  • cluster search results around related conceptual themes
  • full-text indexing of documents and associated materials
  • text-mining of full-text documents
  • dynamic clustering and serendipitous browsing
  • visualisation approaches to search results

An aspect of Autonomy IDOL that caught my eye was:

 conceptual clustering capability of text, video and speech

Will Jorum be able to index resources using Autonomy's Speech Analytics solution?

If so that would be very useful, the issue may be how Jorum resources are packaged and where resources are hosted. If you would like to see Autonomy IDOL in action you can try the Institutional Repository Search which searches across 160 UK repositories.

Will Jorum be implementing an Amazon style recommendation system?

One thing it’ll be interesting to see (and this is perhaps more of a future aspiration) is the integration of a Amazon style recommendation system. The CORE project has already published a similar documents plugin, but given Jorum already has single sign-on I wonder how easy it would be integrate a solution to make resource recommendations based on usage data (here’s a paper on A Recommender System for the DSpace Open Repository Platform).

Elasticsearch

This is a term I’ve heard of but don’t really know enough to comment on. I’m mentioning it here mainly to highlight the report Cottage Labs prepared Investigating the suitability of Apache Solr and Elasticsearch for Mimas Jorum / Dashboard, which outlines the problem and solution for indexing and statistical querying.

External discoverability and SEO

Will Jorum be improving search engine optimisation?

From the forthcoming chapter on OER SEO and Discoverability:

Why SEO and discoverability are important

In common with other types of web resources, the majority of people will use a search engine to find open educational resources, therefore it is important to ensure that OERs feature prominently in search engine results.  In addition to ensuring that resources can be found by general search engines it is also important to make sure they also are easily discoverable in sites that are content or type specific e.g iTunes, YouTube, Flickr.

Although search engine optimisation can be complex, particularly given that search engines may change their algorithms with little or no prior warning or documentation, there is growing awareness that if institutions, projects or individuals wish to have a visible web presence and to disseminate their resources efficiently and effectively search engine optimisation and ranking can not be ignored1.

The statistics are compelling:

  • Over 80% of web searches are performed using Google [Ref 1]
  • Traffic from Google searches varies from repository to repository but ranges between 50-80% are not uncommon [Ref 2]
  • As an indication 83% of college students begin their information search in a search engine [Ref 3]

Given the current dominance of Google as the preferred search engine, it is important to understand how to optimise open educational resources to be discovered via Google Search. However SEO techniques are not specific to Google and are applicable to optimise resource discovery by other search engines.

By all accounts the only way for Jorum is up as it was recently reported in the JISCMail REPOSITORIES-LIST that “just over 5% of Jorum traffic comes directly from Google referrals”. So what is going wrong?

I’m not an SEO expert but a quick check using a search for site:dspace.jorum.ac.uk returns 135,000 results so content is being indexed (Jorum should have access to Googe Webmaster Tools to get detailed index and ranking data). Resource pages include metadata including DC.creator, DC.subject and more. One thing I noticed was missing from Jorum resource pages was <meta name="description" content="A description of the page" />. Why might this be important? Google will ignore meta tags it doesn't know (and here is the list of metatags Google knows).

Another factor might be that Google, apparently (can’t find a reference) trusts metadata that is human readable by using RDFa markup. So instead of hiding meta tags in the of a page Google might weight the data better if it was inline markup:

Current Jorum resource html source
Current Jorum resource html source

With example of RDFa markup
With example of RDFa markup

[Taking this one step further Jorum might want to use schema.org to improve how resources are displayed in search results]

It’ll will be interesting to see if JEAP - Expanding Jorum’s collection through aggregation projects will improve SEO because of backlink love.

Looking further ahead

Will there be a LTI interface to allow institutions to integrate Jorum into their VLE?

Final thought. It's been interesting to see Blackboard enter the repository marketplace with xpLor (see Michael Feldstein’s Blackboard’s New Platform Strategy for details). A feature of this cloud service that particularly caught my eye was the use of IMS Learning Tools Interoperability (LTI) to allow institutions to integrate a repository within their existing VLE (CETIS IMS Learning Tools Interoperability Briefing paper). As I understand it with this institutions would be able to seamlessly deposit and search for resources. I wonder Is this type of solution on the Jorum roadmap or do you feel there would be a lack of appetite within the sector for such a solution?

Fin

Those are my thoughts anyway. I know Jorum would welcome additional feedback on their Summer of Enhancements. I also welcome any thoughts on my thoughts ;)

BTW Here's a nice presentation on Improving Institutional Repository Search Engine Visibility in Google and Google Scholar

Jorum has a Dashboard Beta (for exposing usage and other stats about OER in Jorum) up for the community to have a play with: we would like to get your feedback!

For more information see the blog post here: http://www.jorum.ac.uk/blog/post/38/collecting-statistics-just-got-a-whole-lot-sweeter

Pertinent info: the Dashboard has live Jorum stats behind it, but the stats have some irregularities, so the stats themselves come with a health warning. We’re moving from quite an old version of DSpace to the most recent version over the summer, at which point we will have more reliable stats.

We also have a special project going over the summer to enhance our statistics and other paradata provision, so we’d love to get as much community feedback as possible to feed into that work. We’ll be doing a specific blog post about that as soon as we have contractors finalised!

Feedback by any of the mechanisms suggested in the blog post, or via discussion here on the list, all welcome.

The above message came from Sarah Currier on the [email protected] list. This was my response:

It always warms my heart to see a little more data being made openly available :)

I imagine (and I might be wrong) that the main users of this data might be repository managers wanting to analyse how their institutional resources are doing. So to be able to filter uploads/downloads/views for their resources and compare with overall figures would be useful.

Another (perhaps equally important) use case would be individuals wanting to know how their resources are doing, so a personal dashboard of resources uploaded, downloads, views would also be useful. This is an area Lincoln's Bebop project were interested in so it might be an idea to work with them to find out what data would be useful to them and in what format (although saying that think I only found one #ukoer record for Lincoln {hmm I wonder if anyone else would find it useful if you pushed data to Google Spreadsheets a la Guardian datastore (here's some I captured as part of the OER Visualisation Project}) ).

I'm interested to hear what the list think about these two points

You might also want to consider how the data is licensed on the developer page. Back to my favourite example, Gent use the Open Data Commons licence  http://opendatacommons.org/licenses/odbl/summary/

So what do you think of the beta dashboard? Do you think the two use cases I outline are valid or is there a more pertinent one? (If you want to leave a comment here I’ll make sure they are passed on to the Jorum team, or you can use other means).

[I’d also like to add a personal note that I’ve been impressed with the recent developments from Jorum/Mimas. There was a rocky period when I was at the JISC RSC when Jorum didn’t look aligned to what was going on in the wider world, but since then they’ve managed to turn it around and developments like this demonstrate a commitment to a better service]

Update: Bruce Mcpherson has been working some Excel/Google Spreadsheet magic and has links to examples in this comment thread

Share this post on:
| | |
Posted in API, Data, Jorum, OER and tagged on by .