Notes on generating live wordclouds from Yahoo Pipes using D3.js

The JISC OER Rapid Innovation projects are all quickly finding their feet and most are already fully embracing the open innovation model and blogging their progress. Having attended the programme start-up meeting on the 26th March 2012 and speaking to most of the projects there’s rich pickings for me to blog about over the next couple of months.

In our role (JISC CETIS) supporting this programme we’ve already dusted the programme with some of our wizardry. Phil Barker has aggregated all of the registered project RSS feeds into a single stream using Yahoo Pipes and I’ve bundled an OPML file of registered feeds (if you are a Google Reader user you can subscribe directly here) Note: Not all the projects have provided feeds yet. I’ve also started an archive of the #oerri tweets which is looking sparse now but will grow over time.

Wordle: OERRI FeedSomething I was interested in trying out was to see if there was a way to dynamically create a word cloud from a RSS feed. Wordle.net does have an option to generate a feed from a blog feed (shown here), but it looks like it’s a static image eg it won’t update as new project blog posts are created.

So I turned my attention to Jason Davies and his Cloud extension to the D3 javascript library.  Jason has a demonstration site which lets you experiment with wordcloud outputs using data from Twitter and wikipedia. Here’s an example for the Twitter search term jisccetis (clicking on a word starts a new search for that term).

OER RI posts straight from Yahoo PipeThere is also an option on Jason’s site to use a ‘custom’ url. This seems to accept a range of sources: html pages, rss feeds and json. You can just use the RSS output from Phil’s pipe to get this. This however looks a bit suspect to me. For example the word ‘rapid’ appears in the cloud but there are just as many occurrences of the word ‘innovation’ in the source text but it doesn’t appear. What I think is happening is the script is picking up the first 250 words and then counting the occurrences of those words. I haven’t had time to test that theory but if anyone else does leave a comment and I’ll update the post.

Instead I tried a workaround using Yahoo Pipes Term Extract. With this Pipe I take Phil’s Pipe as a source and for each blog post extract terms. I can then output this as json and use as a data source for Jason’s cloud generator creating a wordcloud that will update as more posts are published (although I’ve got no way of embedding it yet):

OER RI Posts using term extract
Dynamic cloud of OER-RI Posts using term extract

Visual inspection would suggest that this version is more reliable. There are however some things to remember:

Last updated by at .

1 Response to “Notes on generating live wordclouds from Yahoo Pipes using D3.js”


Comments are currently closed.

About

This blog is authored by Martin Hawksey Google+

JISC CETIS Learning Technology Advisor (OER Programme Support)
jisc cetis logo

The MASHezine (tabloid)

It's back! A tabloid edition of the latest posts in PDF format (complete with QR Codes). Click here to view the MASHezine

Preview powered by:
Bluga.net Webthumb

The MASHebook

You can also download this post as:

Subscribe to monthly email digest of posts

Loading...Loading...


Subscribe to per post email updates

Enter your email address:

Delivered by FeedBurner

Copyright License

Creative Commons Licence
This work is licensed under a Creative Commons Attribution 3.0 Unported License. CC-BY mhawksey

Privacy /Cookies

This blog uses Google Analytics (which makes use of 'cookie' technologies) to provide information on usage. Here's an overview of Google Analytics Privacy and how to opt-out (other 3rd party services like Twitter might also be tracking you via this site, but as far as possible I try and prevent this by removing official tweet buttons).

Badges

. . .