Notes on Jorum’s 2012 Summer of Enhancements: SEO and OER #ukoer

If you haven’t already you should check out Jorum’s 2012 Summer of Enhancements and you’ll see it’s a lot more than a spring clean. In summary there are 4 major projects going on:

JDEP – Improving discoverability through semantic technology
JEAP – Expanding Jorum’s collection through aggregation projects
JPEP – Exposing activity data and paradata
JUEP – Improving the front-end UI and user experience (UI/UX)

SEO the Game by Subtle Network Design - The Apprentice Card

Image Copyright subtlenetwork.com

As I was tasked to write the chapter on OER Search Engine Optimisation (SEO) and Discoverability as part of our recent OER Booksprint I thought I’d share some personal reflections on the JDEP – Improving discoverability through semantic technology project (touching upon JEAP – Expanding Jorum’s collection through aggregation projects).
Looking through JDEP the focus appears to be mainly improving internal discoverability within Jorum with better indexing. There are some very interesting developments in this area most of which are beyond my realm of expertise.

Autonomy IDOL

The first aspect is deploying Autonomy IDOL which uses “meaning-based search to unlock significant research material”. Autonomy is a HP owned company and IDOL (Intelligent Data Operating Layer) was recently used in a project by Mimas, JISC Collections and the British Library to unlocks hidden collections. With Autonomy IDOL it means that:

rather than searching simply by a specific keyword or phrase that could have a number of definitions or interpretations, our interface aims to understand relationships between documents and information and recognize the meaning behind the search query.

This is achieved by:

cluster search results around related conceptual themes
full-text indexing of documents and associated materials
text-mining of full-text documents
dynamic clustering and serendipitous browsing
visualisation approaches to search results

An aspect of Autonomy IDOL that caught my eye was:

conceptual clustering capability of text, video and speech

Will Jorum be able to index resources using Autonomy’s Speech Analytics solution?

If so that would be very useful, the issue may be how Jorum resources are packaged and where resources are hosted. If you would like to see Autonomy IDOL in action you can try the Institutional Repository Search which searches across 160 UK repositories.

Will Jorum be implementing an Amazon style recommendation system?

One thing it’ll be interesting to see (and this is perhaps more of a future aspiration) is the integration of a Amazon style recommendation system. The CORE project has already published a similar documents plugin, but given Jorum already has single sign-on I wonder how easy it would be integrate a solution to make resource recommendations based on usage data (here’s a paper on A Recommender System for the DSpace Open Repository Platform).

Elasticsearch

This is a term I’ve heard of but don’t really know enough to comment on. I’m mentioning it here mainly to highlight the report Cottage Labs prepared Investigating the suitability of Apache Solr and Elasticsearch for Mimas Jorum / Dashboard, which outlines the problem and solution for indexing and statistical querying.

External discoverability and SEO

Will Jorum be improving search engine optimisation?

From the forthcoming chapter on OER SEO and Discoverability:

Why SEO and discoverability are important

In common with other types of web resources, the majority of people will use a search engine to find open educational resources, therefore it is important to ensure that OERs feature prominently in search engine results. In addition to ensuring that resources can be found by general search engines it is also important to make sure they also are easily discoverable in sites that are content or type specific e.g iTunes, YouTube, Flickr.
Although search engine optimisation can be complex, particularly given that search engines may change their algorithms with little or no prior warning or documentation, there is growing awareness that if institutions, projects or individuals wish to have a visible web presence and to disseminate their resources efficiently and effectively search engine optimisation and ranking can not be ignored¹.
The statistics are compelling:

Over 80% of web searches are performed using Google [Ref 1]
Traffic from Google searches varies from repository to repository but ranges between 50-80% are not uncommon [Ref 2]
As an indication 83% of college students begin their information search in a search engine [Ref 3]

Given the current dominance of Google as the preferred search engine, it is important to understand how to optimise open educational resources to be discovered via Google Search. However SEO techniques are not specific to Google and are applicable to optimise resource discovery by other search engines.
By all accounts the only way for Jorum is up as it was recently reported in the JISCMail REPOSITORIES-LIST that “just over 5% of Jorum traffic comes directly from Google referrals”. So what is going wrong?
I’m not an SEO expert but a quick check using a search for site:dspace.jorum.ac.uk returns 135,000 results so content is being indexed (Jorum should have access to Googe Webmaster Tools to get detailed index and ranking data). Resource pages include metadata including DC.creator, DC.subject and more. One thing I noticed was missing from Jorum resource pages was <meta name="description" content="A description of the page" />. Why might this be important? Google will ignore meta tags it doesn’t know (and here is the list of metatags Google knows).
Another factor might be that Google, apparently (can’t find a reference) trusts metadata that is human readable by using RDFa markup. So instead of hiding meta tags in the of a page Google might weight the data better if it was inline markup:

Current Jorum resource html source
Current Jorum resource html source

With example of RDFa markup
With example of RDFa markup

[Taking this one step further Jorum might want to use schema.org to improve how resources are displayed in search results]
It’ll will be interesting to see if JEAP – Expanding Jorum’s collection through aggregation projects will improve SEO because of backlink love.

Looking further ahead

Will there be a LTI interface to allow institutions to integrate Jorum into their VLE?

Final thought. It’s been interesting to see Blackboard enter the repository marketplace with xpLor (see Michael Feldstein’s Blackboard’s New Platform Strategy for details). A feature of this cloud service that particularly caught my eye was the use of IMS Learning Tools Interoperability (LTI) to allow institutions to integrate a repository within their existing VLE (CETIS IMS Learning Tools Interoperability Briefing paper). As I understand it with this institutions would be able to seamlessly deposit and search for resources. I wonder Is this type of solution on the Jorum roadmap or do you feel there would be a lack of appetite within the sector for such a solution?

Fin

Those are my thoughts anyway. I know Jorum would welcome additional feedback on their Summer of Enhancements. I also welcome any thoughts on my thoughts 😉
BTW Here’s a nice presentation on Improving Institutional Repository Search Engine Visibility in Google and Google Scholar

Join the conversation

comment 3 comments

Ben Ryan
September 7, 2012 at 11:32 am
Martyn,
The work with Autonomy IDOL is extending the work done with Jorum resources that are accessible through IRS by indexing and analysing not just the metadata records but also the content.
We have an external company working on the export of this information, in an XML format, that will then be processed in to IDOL. This work is progressing in stages and once the initial processing has been done we will be investigating what can be retrieved through the IDOL API and how that would integrate into the Jorum Web Application (A Ruby on Rails application currently being developed under JPEP, JDEP and JUEP).
We have thought about a recommendation system but one of the key barriers to doing this is due to the open nature of Jorum for resource discovery. We can track IP address (and from this a rough location including city, country, latitude and longitude) but cannot tie this directly to a user as no login has taken place.
There are other ways to address this problem and once the major work to port Jorum to the latest version of DSpace, finish the four other projects, embed new personnel and stabilise the new infrastructure and platform (including integrating Jorum modifications into the main DSpace GitHub system so that others can track/follow/improve/extend the modifications) we will be moving into a new phase of development.
Currently the list of major new developments is under review and will include all comments/suggestions/feedback that we receive (so keep posting ideas/comments/questions) with a view to having a definitive (as possible) work plan ready by the end of the year.
Regards,
Ben
- Martin Hawksey
  September 10, 2012 at 12:15 pm
  Hi Ben – thanks for your response. I look forward to following developments. Any comment on SEO? Reported referrals appear to be low. Any plans to do a quick fix to include ?
  Martin
Ben Ryan
September 16, 2012 at 12:23 pm
Martin,
This has been fixed in the 1.8 port that is planned to go live in the next few few weeks.
Regards,
Ben

Comments are closed.