Yahoo Pipes


As part of LAK13 I’ve already written a series of blog posts highlighting a couple of ways to extract data from Canvas VLE. Prompted by a question by On and my colleague Sheila MacNeill I wanted to show you a way of getting feed data into a spreadsheet without using any code. The solution is to use Yahoo Pipes, but as this post will highlight this isn’t entirely straight forward and you need to be aware of several tricks to get the job done. As LAK13 isn’t using Google Groups for this post I’ll be using the Learning Analytics Google Group as a data source.

Sniffing for data

First we need to find a data source. Looking for an auto-detected RSS/Atom feed by visiting the group homepage reveals nothing. [I always forget browsers seem are moving away from telling you when they detect a feed. To get around this I use the Chrome RSS Subscription Extension which indicates with the orange RSS icon when a page has a feed.]

Browser with no feed detected Browser with feed detected

Looking for an official Google Groups API as an alternative method turns up this open issue from August 2007 for a Groups API aka there's no API :( Digging deeper we find Groups did have data feeds in their old interface. So with a bit of url magic I can land on the old Groups interface for Learning Analytics which gives us the orange light

Google Groups Old Interface with Feeds

Requesting the View all available feeds page we get some additional feed options:

Google Groups View all available feeds

At this point I could grab the Atom links and with a bit of tweaking process it with my existing Google Apps Script Code, but lets look at a ‘no code’ solution.

Feeding Google Sheets with Yahoo Pipes

At this point it’s worth reminding you that you could use the importFeed formula in a Google Spreadsheet which would import the data from a Google Group. The issue however is it’s limited to the last 20 items so we need a better way of feeding the sheet.

A great tool for manipulating rss/atom (other data) feeds is Yahoo Pipes. Pipes gives you a drag and drop programming environment where you can use blocks to perform operations and wire the outputs together. I learned most of my pipework from the Pipemaster – Tony Hirst and if you are looking for a starting point this is a good one.

Yahoo Pipes - Edit interface

Here I’ve created a pipe that takes a Google Group shortname does some minor manipulation, which I’ll explain later, and output a result. When we run the pipe we get some export options:

Yahoo Pipe - Run pipe

The one I’m looking for is .csv because it'll easily import into Google Sheets, but it’s not there … Just as we had to know the old Google Group interface has RSS feeds, with Yahoo Pipes we have to know the csv trick. Here’s the url for ‘Get as JSON’:

and if we swap &_render=json for &_render=csv by magic we have a csv version of the output (whilst we are here also notice the group name used when the pipe is run is also in the url. This means if we know the group shortname we don’t need to enter a name a ‘Run pipe’, we can build the url to get the csv.

Now in a Google Spreadsheet if you enter the formula =importData("")we get the groups last 100 messages in a spreadsheet.

Extra tricks

There were a couple of extra tricks I skipped worth highlighting. RSS/Atom feeds permit multilevel data, so an element like ‘author’ can have sub elements like ‘name’, ‘email’. CSVs on the other hand are 2D, rows and columns.

Illustration of nested structure of atom feed

When Yahoo Pipes generates a csv file it ignores sub elements, so in this case it’ll generate an author column but won’t include name or email. To get around this we need to pull the data we want into the first level (doing something similar for content).

Rename block

The next little trick is to get the feed dates in a format Google Spreadsheets recognise as a date rather than a string. In the feed dates are in ISO 8601 format e.g. 2013-03-05T13:13:06Z. By removing the ‘T’ and ‘Z’ Google Spreadsheets will automatically parse as a date. To do this we use a Regex block to look for T or Z (T|Z) replacing with a single space (which is why the ‘with’ box looks empty).

Regex block

I’ve wrapped the data in a modified version of the dashboard used for the Canvas data feed.

*** Google Groups Activity Dashboard ***

Google Groups Activity Dashboard


A couple of big limitations to be aware of:

How long will Google Group data feeds last

Given we’ve had to dig out the data feeds from the old Google Group interface my suspicion is once this is shut off for good the feeds will also disappear. Who knows, Google may actually have a Group API by then ;s

Limited to last 100 messages

The eagle eyed amongst you will have spotted I was able to increase the number of messages returned to 100 by adding num=100 to the feed query. This is the limit though and you can’t use paging to get older results. There are a couple of ways you could store the feed results like using the FeedWordPress plugin. I’m experimenting with using IF I do THAT on {insert social network/rss feed/other} THEN add row to Google Spreadsheet, but as the data isn’t stored nicely (particularly dates which are saved like ‘February 15, 2013 at 01:48PM’ *sigh*) it makes it harder to use.

I think that’s me in terms of ways for extracting discussion boards data … for now. One other technique related to Google Spreadsheets is screen scraping using importXML. To see this in action you can read Dashboarding activity on public JISCMail lists using Google Sheets (Spreadsheets).

The University of Phoenix have been in the news recently having been awarded US Patent US8341148 B1 for the ‘Academic activity stream’:

A method and computer-readable medium for generating an activity stream is provided. The activity stream includes a ranked set of objects that are presented to one or more users. The ranking of objects is updated to reflect events associated with objects.

I was alerted to this news in this post by Phil Hill on e-Literate in which he highlighted:

“The patent lists several “embodiments” of the concept – examples of approaches that could be pursued to implement the activity stream. These embodiments include re-ranking of a book chapter based on recent student comments or preferences and notifications when 75% of students have completed an assigned reading.”

As pointed out by Scott Wilson in the comments it looks hard for this patent to stick given the amount of prior art knocking about (especially as I used to use PostRank and still use Google Reader ‘sort by magic’ to make sure the important information stays at the top of my reader). Also the Talis Aspire Reading List Dashboard may address the last case, although it's unclear if this is prior art.

diigo group popularAnother example of ranking based on associated event I stumbled on today was in bookmarking site Diigo. As part of Diigo’s service you can create groups to collaboratively collect and curate resources. As well as a chronological view there is an option to sort by ‘popular’. Here’s an example of a Diigo Group for #etmooc.

I’m not entirely sure what the ranking is based on but it doesn’t appear to solely be on page views. I’d imagine number bookmarks by other users is another factor or the number of times someone hits the ‘Like’ button.  Another example of prior art?

An aside: Using Yahoo Pipes to add social share counts to a feed

I had a quick go with pulling resources from the Diigo ‘popular’ page and using social share counts as another factor in ranking results. I got as far as this pipe which scrapes the top 20 popular items (no api/feed available) and then loops through the results using this sub-pipe, which uses the API to take a url and see how many times it’s been shared across social networks (here’s a more general pipe that adds social counts to any RSS feed).

pipes feed with social counts

On one hand because a lot of the bookmarked resource aren’t brand new the counts might be a weak indicator of something (you may want to normalise the weighting based on age of resource), but overall this feels like a rabbit hole (not least because I couldn’t use the social share data to rank the results), so I’m tagging this as a failed idea.

It did get me wondering if within a open course (cMOOC) context pulling bookmarked resources into your course hub using FeedWordPress and then getting participants to rate might be … umm interesting  (GD Star Ratings plugin looks good for this although I did spot on the etmooc hub they are using WP PostRatings).

1 Comment

Focusing on some of the data behind Current/Future State of Higher Education (CFHE12) has been very stimulating and has got me thinking about the generation and flow of data on many levels. Having recently produced some data visualisations of ds106 for OpenEd12 it was reassuring that one of the first questions was “is the #ds106 data openly licensed?”. Reassuring because it is good to see that people are thinking beyond open content and open learning experiences and thinking about open data as well. So what data is available around CFHE12? In this post I look at data feeds available from CFHE12, see what we can get and suggest some alternative ways of getting the data and pulling it in to other services for analysis. Finally I highlight the issues with collecting participant feeds filtered by tag/category/label.

edfuture homepage sidebar

Working down some of the menu options on the CFHE12 home page lets see what we’ve got and how easy it is to digest.

Newsletter Archives

This page contains a link to each ‘Daily Newsletter’ sent out by the gRSShopper (Stephen Downes’ RSS/content collection, remix and distribution system). I’m not familiar with how/if the Daily data can be exported by an officially API, but with tools like Google Spreadsheet, Yahoo Pipes and others it's possible to extract a link to each edition of the Daily using some basic screen scraping techniques like using XPath. So in this sheet in cell A2 I can get a list of archive pages using =ImportXML("","//a/@href"). Using the ImportXML using the resulting list of pages it’s possible to get a comma separated list of all the posts in each Daily (column B).

The formula in column B includes a QUERY statement and is perhaps worthy of a blog post in it's own right. Here it is: =IF(ISBLANK(A2),"",JOIN(",",QUERY(ImportXML(A2,"//a[.='Link']/@href"),"Select * WHERE not(Col1 contains(''))"))). In brief the pseudocode is: if the cell is blank return nothing otherwise comma join the array of results from importXML for links which use he text ‘Link’ where it doesn’t contain a link to Note: there is a limitation of 50 importXML formula per spreadsheet so I’ll either have to flatten the data or switch to Yahoo Pipes

The resulting data is of limited use but it’s useful to see how many posts have been published via gRSShopper and by who:

CFHE12 posts per day CFHE12 posts per domain

It’s at this point tat we start to get an sign that the data isn’t entirely clean. For example, in cell A153 I’ve queried the Daily Newsletter posts from this blog and I get four results shown below:

You can see there is a double post and actually, at time of writing I've only made two posts tagged with cfhe12. Moving on, but coming back to this point later, lets now look at the feed options.


The course has feed options for: Announcements RSS; Blog Posts RSS; and OPML List of Feeds. I didn’t have much luck with any of these. The RSS feeds have data but aren’t valid RSS and the OPML (a file format which can be used for bundling lots of blog feeds together) only had 16 items and was also not valid (I should really have a peak at the source for gRSShopper and make suggestions for fixes, but time and lack of Perl knowledge has prevented that so far). I did attempt some custom Apps Script to capture the Blog Posts RSS to this sheet, but I’m not convinced it’s working properly, in part because the source feed is not valid. There are other feeds not listed on the home page I might dig into like the Diigo CFHE12 Group which I’m collecting data from in a Google Spreadsheet using this IFTTT recipe.

Generating an OPML and Blog Post RSS from ‘View List of Blogs’ data


All is not lost. gRSShopper also generates a page of blogs it’s aggregating. With a bit more XPath magic (=ImportXML("","//a[.='XML']/@href")) I can scrape the XML links for each registered blog into this sheet. Using the Spreadsheet -> OPML Generator I get this OPML file of CFHE12 blogs (because the spreadsheet and OPML generator sit in the cloud this link will automatically update as blogs are added or removed from the Participant Feeds page). For more details on this recipe see Generating an OPML RSS bundle from a page of links using Google Spreadsheets.

Blog posts RSS

Earlier I highlighted a possible issue with posts being included in the Daily Newsletter. This is because it can be very tricky to get an RSS feed for a particular tag/category/label from someone's blog. You only need to look at Jim Groom’s post on Tag Feeds for a variety of Blogging Platforms to get an idea of the variation between platforms. It’s worth underlying the issue here, each blogging platform has a different way of getting a filtered RSS feed for a specific tag/category/label. Also, in certain cases it’s not possible to get a filtered RSS feed. When a student registers a feed for an online course it can be difficult for them to identify their own blogs RSS feed, let alone a filtered feed.

CFHE12 PipeworkAs an experiment reusing the Participant Feeds page as a data source I’ve come up with this Yahoo Pipe which fetches all the feeds and tries to filter the results. It’s slightly crude in the way it’s filtering posts by looking for a course tag: as a category/tag (if exposed in the source feed), or in the post title or in the post content. Using this pipe it’s currently returning 48 items (although I it says 36 when outputting in a different file format) compared to the 77 from the Daily Newsletter Archives. The nice thing about pipes is I can get the data in different formats (e.g. RSS, JSON, CSV) so more mashing up is possible.

Before you go off thinking Yahoo Pipes is the answer for all your open course RSS aggregation there are some big questions over reliability and how this solution would scale. It’s also interesting to note all the error messages because of bad source feeds:

This Pipe ran successfully but encountered some problems:

warning Error fetching Response: Not Found (404)

warning Error fetching Response: OK (200). Error: Invalid XML document. Root cause: org.xml.sax.SAXParseException; lineNumber: 1483; columnNumber: 5; The element type "meta" must be terminated by the matching end-tag " ".

warning Error fetching Response: Not Found (404)

warning Error fetching Response: Bad Request (400)

warning Error fetching Response: Not Found (404)

warning Error fetching Response: OK (200). Error: Invalid XML document. Root cause: org.xml.sax.SAXParseException; lineNumber: 5; columnNumber: 37; The entity "rsaquo" was referenced, but not declared.

warning Error fetching Results may be truncated because the run exceeded the allowed max response timeout of 30000ms.

Rather than trying to work with messy data one strategy would be to start with a better data source. I have a couple of thoughts on this I’ve shared in Sketch of a cMOOC registration system.


So what have I shown? Data is messy. Data use often leads to data validation. Making any data available means someone else might be able to do something useful with it. In the context cMOOCs getting a filtered feed of content isn’t easy.

Something I haven’t touched upon is how the data is licensed. There are lots of issues with embedding license information in data files. For example, I’m sure technically the OPML file I generated should be licensed CC-BY-NC-SA CFHE12 because this is the license of the source data? I’m going to skip over this point but welcome your comments (you might also want to check the Open Data Licensing Animation from the OERIPR Support Project).

[I’ve also quietly ignored getting data from the course discussion forums and Twitter archive (the later is here)]

PS Looking forward to next weeks CFHE12 topic Big data and Analytics ;)


Update: Here's the updated Topsy Media Timeline Google Spreadsheet Template v2.1, which pulls data straight from Topsy. Follow the instructions in this template for your own Twitter/Topsy media timeline

Recently I posted an Experiment to dynamically timeline media posted on Twitter using Topsy and Timeline (my contribution to @Arras95) #arras95 which uses a Yahoo Pipes to extract tweets with images and videos using the Topsy Otter API, which is then pulled into a Google Spreadsheet before being rendered in a Timeline tool developed by Vérité.

This recipe appears to be working, that is the timeline is automatically updating with new media. There’s a separate question about practicality of the timeline and navigation which I’m quietly ignoring, instead I want to highlight some technical hit/misses and present a revised version.

Technical misses

Because Topsy includes tweets to image sites like twitpic and yfrog in the search results, which redirect to those sites rather than having an image source url these appear in frames (up until recently Timeline used the API to convert into a thumbnail but this was recently removed because the free service was stopped).


To get around this I’ve modified the source Yahoo Pipe to only let image urls (new source Pipe here). This limits results to those uploaded via the Twitter official interfaces (e.g. Web/New Tweetdeck). Update: I've now coded the data collection from Topsy directly in the Google Spreadsheet using Google Apps Script. New version is available via the link at the end of the post. image

If you wanted to replicate the original experiment another drawback was that you would have to host the Timeline code somewhere. As not everyone has easy access to a web host I’ve published an interface which lets you include the published Google Spreadsheet key in the URL. Here’s an example for #cam12

Here’s a new version of:

*** Topsy Media Timeline Google Spreadsheet Template v2.1 ***


[PS I’m looking forward to seeing what Sam comes up with using Timeline ;)]


Update: New version of this spreadsheet template here

There’s a new kid on the block if you are considering an open source timeline tools. For a long time the Simile Exhibit Timeline tool has been the tool of choice appearing in places like (click on Timeline in this page to see a history of internet search engines).

A nice feature of ‘Timeline’ is it’s focus on making it easy to embed content from other sites including individual tweets, videos from YouTube and Vimeo, images hosted on Flickr or with a direct url and audio from SoundCloud. Here’s an out-of-the-box example (I tried to use the embed code in this post but it seems to conflict with some of blog theme code (a jQuery problem))

I wanted to try out ‘Timeline’ to see how it preformed under different use cases. The two I had in mind were: Timeline/Google Spreadsheet as a simple OER creation tool (in part influenced by Pat Lockley’s post on using Google Maps as an OER authoring tool); and using Google Spreadsheet’s built-in functions to scrape and automagically publish information into a dynamic timeline.

The first case is fairly easy to do using the template and instructions on the Timeline site (although more complicated than Pat’s example). A couple of ‘gotchas’ for you. When I changed the spreadsheet setting to United Kingdom formats it messed up the dates on the timeline. I also had problems using Google Maps with external KML files (I’ve opened an issue). On to the fun bit though, gluing webservices together to generate dynamic timelines.

The glue – Google Spreadsheet

Because Google Spreadsheet sits in the cloud and has a number of ways to get live data feeds in they are great for gluing data streams together and republishing in different formats. Also as Timeline likes Google Spreadsheets  all we need to do is get some data in a format Timeline likes and it should happily start updating itself … in theory anyway.

The data side left me scratching my head a bit. There’s lots of data out there its just finding some with readable timestamps. I had thought about pulling information from Wikipedia but found tables of dates not particularly machine readable. Then I started reading about the @Arras95 event which is happening as part of the JISC funded WW1C project run by the University of Oxford.

Between the 9th April and 16th May 2012 an experiment in social media will take place. We will tweet the events of the Battle of Arras in realtime, from the perspective of a neutral reporter on the field. What makes this Twitter event different from other realtime tweeting initiatives (and there are some great ones out there!) is that @Arras95 will engage online communities, crowdsourcing facts about Arras and the individuals who played a part, asking for reappraisals and additions to the action as it happens.

You can read more about how to get involved in the Contribute. Collaborate. Commemorate. I could just scrape the @Arras95 tweets and put them in Timeline, but where would the fun be in that ;) Instead I want to capture some of the visual richness. Whilst I could start to unpick media links to videos and images from the official Twitter stream, there’s no need as the social web search site Topsy already does this and the data is accessible via the Topsy Otter API.

More glue – Yahoo Pipes

As Arras95 hasn’t started yet here’s an example call to #ukoer looking for video. The result is in JSON which is usually great for other mashups but unfortunately it’s a format Google Spreadsheet’s doesn’t like (although you can handle it with Google Apps Script, but on this occasion I was trying to avoid that route). Instead I turned to Yahoo Pipes, which hopefully won’t disappear just yet despite Yahoo laying off 2,000 of its staff this week.

Yahoo Pipe pulling Topsy dataPipes is right at home with JSON  and what's more (despite hiding the option) you can output the data in .csv which Google Spreadsheet does like. Here’s a Pipe which builds a search query for images and videos posted on Twitter for a user entered search term. I’ve also prepared a slightly different Pipe which has the search hard-coded as well as pulling tweets from the @Arras95 twitter account (in both these you can edit/clone the source)

Piecing it together – importing Yahoo Pipes into Google Spreadsheets

From the Timeline site there is a Google Spreadsheet Template. This gives us the format we need to get the data in. For now lets keep working with #ukoer as this gives us some data to play with. Here’s a copy of the template with an extra sheet called data. In cell B1 of the data sheet is the formula:



This comes from running the Pipe with a search term and copying the ‘Get as RSS’ link, which is:

getting the data feedYou’ll see I’ve highlighted two parts of this url. At _render I’ve changed rss to csv and in the formula the search term is replaced by a cell value (the latter was so I could share/reuse the template). I should say urlencode is a custom formula I wrote using Apps script to encode the search term. It’s a nice little ditty that goes like this:

function urlencode(text) {   
 return encodeURIComponent(text)

Down column A of data there is another custom function to convert entity numbers into characters eg turn ' into apostrophe’s. That particular ditty goes:

function entitydecode(text){   
 return text.replace(/&#(\d+);/g,function(match, number){ return String.fromCharCode(number); });

Back in the spreadsheet on the ‘od1’ sheet we start pulling in the bits of data we need for the timeline. This mainly uses ArrayFormulas in row 3 to populate all the data without having to manually fill in the column. For example in D3 we have:

=ARRAYFORMULA(IF(ISBLANK(data!E2:E),"",(data!E2:E/ 86400) + 25569))

which reads as ‘if the cell in column E of data is blank do nothing otherwise divide by 86400 and add 25569 (converts Unix epoch times used in the Topsy API into human/spreadsheet readable formats)

Slapping it into a Timeline

All that’s left to do is in the spreadsheet File > Publish to the web… and then find somewhere to host your timeline page. So that you can see what it looks like here’s one for #ukoer.

#ukoer media timeline

@Arras95 Living Timeline

Here is the @Arras95 timeline and the source spreadsheet.

@Arras95 Dynamic Timeline

Nothing much to see now apart from a test tweet. The theory is that this will self populate over time as data filters into Topsy. It’ll be interesting to see if it actually works or if I need to set up a Apps Script trigger for force a refresh.

If you would like to make your own dynamic timeline from tweeted media here’s:

*** The Topsy Timeline Template ***
[File > Make a copy to use]

1 Comment

The JISC OER Rapid Innovation projects are all quickly finding their feet and most are already fully embracing the open innovation model and blogging their progress. Having attended the programme start-up meeting on the 26th March 2012 and speaking to most of the projects there’s rich pickings for me to blog about over the next couple of months.

In our role (JISC CETIS) supporting this programme we’ve already dusted the programme with some of our wizardry. Phil Barker has aggregated all of the registered project RSS feeds into a single stream using Yahoo Pipes and I’ve bundled an OPML file of registered feeds (if you are a Google Reader user you can subscribe directly here) Note: Not all the projects have provided feeds yet. I’ve also started an archive of the #oerri tweets which is looking sparse now but will grow over time.

OERRI_FeedSomething I was interested in trying out was to see if there was a way to dynamically create a word cloud from a RSS feed. does have an option to generate a feed from a blog feed (shown here), but it looks like it’s a static image eg it won’t update as new project blog posts are created.

So I turned my attention to Jason Davies and his Cloud extension to the D3 javascript library.  Jason has a demonstration site which lets you experiment with wordcloud outputs using data from Twitter and wikipedia. Here’s an example for the Twitter search term jisccetis (clicking on a word starts a new search for that term).

OER RI posts straight from Yahoo PipeThere is also an option on Jason’s site to use a ‘custom’ url. This seems to accept a range of sources: html pages, rss feeds and json. You can just use the RSS output from Phil’s pipe to get this. This however looks a bit suspect to me. For example the word ‘rapid’ appears in the cloud but there are just as many occurrences of the word ‘innovation’ in the source text but it doesn’t appear. What I think is happening is the script is picking up the first 250 words and then counting the occurrences of those words. I haven’t had time to test that theory but if anyone else does leave a comment and I’ll update the post.

Instead I tried a workaround using Yahoo Pipes Term Extract. With this Pipe I take Phil’s Pipe as a source and for each blog post extract terms. I can then output this as json and use as a data source for Jason’s cloud generator creating a wordcloud that will update as more posts are published (although I’ve got no way of embedding it yet):

OER RI Posts using term extract
Dynamic cloud of OER-RI Posts using term extract

Visual inspection would suggest that this version is more reliable. There are however some things to remember:


Would it surprise you that since I’ve fallen out with Google Reader, we’re on a trail separation (I’m having a fling with Tiny Tiny RSS) that we’re back together again. We’ve got some new ground rules though, the main one being when I Google +1 a news item it should also hit my Twitter feed.

“How?” You might ask. Well it’s not through the Google+ API which currently omits +1 activity and there are no hidden RSS feeds on a persons +1’s page (that I’ve found anyway). No instead I’ve gone for a good old fashioned screen scrape.

Using this Google +1s to RSS (Reader to Twitter) Yahoo Pipe that I put together I can generate a basic RSS feed to put in the service to tweetout on my behalf. The main reason for going to this extreme is I liked having the share button on Google Reader that wasn’t device dependant which could be hooked up to other services.

If you want to do something similar here’s how (I should point out that I won’t be using this method myself as @fstoner has reminded me, that for now anyway, you can use to convert starred items into tweets)

  1. On your Google Plus profile page click on +1’s in the tab and then ‘Edit Profile’ to make sure your that you ‘Show this tab on your profile’. While your here also take note of your Google+ user id which should be a long set of numbers your browser address bar e.g. profile image
  2. Visit the Google +1’s Pipe and insert your Google Plus ID, Run Pipe then copy the Get as RSS link (if you want to skip this step just replace the plusid in this link with your own
  3. Head over to and sign-in/register and Add Route specifying the RSS feed from above and then setup your Twitter account as the destination.

Some things to be aware of. This will tweet all your plus ones, not just those from Google Reader. To prevent timeouts the pipe only pulls the last 5 items. It ain’t full proof. Because Google shorten post titles the pipe gets the full title from the link so it may come a cropper with missing page titles and redirects.

Hopefully all this will soon be obsolete when a person’s +1s are accessible in the API.   


[This post is probably less about creating a Twitter out of office service and more about an illustration of the power of Yahoo Pipes (and embedding flickr images with notes), I’ll let you read and decide]

Whilst the majority of twitter users will probably never need to setup a email style ‘out of office’ messaging service to respond to ‘@replies’ because they are never that far away from their timeline, I sure there are emerging cases where certain twitter accounts might need this feature. In particular, I thinking of something like a class twitter account is being used to send notifications (you might like to read this post on free SMS broadcast via twitter), curate discussions or one of the many other ways you can teach with twitter (compiled by Steve Wheeler).

In this scenario we want to respond to ‘@replies’ (tweets directed at you from other users), with a message to indicate that your won’t be able to immediate respond.  I did a quick ‘Google’ to find if anyone had setup a ‘Twitter – Out of Office’ service and couldn’t see anything (which probably suggests no-one needs this service or they just haven’t thought of it yet).

Starting with the Twitter Advance Search you’ll see there are a number of options to search for tweets based on keywords, people referenced and dates (as well as some other options). So it is very easy to setup a search which will filter messages sent to a user between dates, tweaking to remove tweets which might include RTs or via. Here is an example which ignores RT and via to mhawksey since 29th May until 29th May (twitter search is limited to the last 7 days so if you are trying this after the 5th June you won’t see any results, but hopefully you get the idea).

Twitter search results

Twitter - Feed for this querySo it is easy to setup a search which can identify possible messages you might want to send an out of office response, but how can we use this information? The key is that Twitter provides a feed for the search query. That is it provides the data for the search results in a machine readable format, RSS.

The next step in to use the data from twitter to generate a response message. The best service I know to do this is Yahoo Pipes. Pipes is a free service which provides an nice graphical interface for manipulating data like RSS.

Below is a screenshot of a pipe output I’ve created which takes a twitter username, date range and custom response message and manipulates it to produce a unique response message.

Yahoo Pipe results

If you are interested in how this pipe works you can click here for the Twitter – Out of Office (Date Range) pipe [Update: this new pipe includes the option for office hours] and view the source or the image below contains hotspots which explains what the blocks are doing:

The final step is to get your twitter account to send the ‘out of office’ message. This is where RSS comes to our rescue. As well as Yahoo Pipes being able to manipulate RSS it can also output in this format as well. By copying the ‘Get as RSS’ from after you run your pipe you can use this with one of the RSS to Twitter services (currently I use either or the ‘Publicize –> Socialize’ option in Feedburner). It will look something like:


When setting this up choose to tweet ‘Title only’ and untick ‘include link’ or ‘post link’. Once you’ve created your RSS to twitter service you can also reuse it for future holidays. To save you going back through running the pipe you can just edit the feed url with new start and finish dates.

There are a lot more things you can do with Yahoo Pipes. For example, here is another pipe which uses as named day to create a recurring out of office message (notes on this pipe are here).

Hopefully you get the idea of what is possible. If you are interested in more ‘Pipe’ manipulations I would recommend having a browse through Tony Hirst’s offerings.


A couple of weeks ago I was interested to read Joss Winn’s blog post on  Creating a PDF or eBook from an RSS feed in which he highlights using the FeedBooks service. This was ideal timing as we are always looking for new ways to make RSC NewsFeed readable in as many formats as possible.

The post has generated a number of comments, in particular, James Kwak at baselinescenario mentioned that a limitation of FeedBooks was that it didn’t include the post author or date in the automatically generated eBook.

This is very easy to do using Yahoo Pipes. Here is my ‘feedbooks pipe’. You can either run this pipe entering the url of the RSS feed of your blog. This will let you get the RSS feed required for FeedBooks (step 4 in Joss’s instructions). Alternatively you can just enter{enter your blog rss feed url here}. Feel free to clone this pipe if you would like to experiment with other manipulations. I’ve already created this extended version for WordPress users to only include last months posts

feedbooks pipe[All this pipe is doing is taking the feed url, copying the pubDate (item publish date), then using Regex to edit some of the post items. The first regex replaces the long date format (e.g. Fri, 15 Jan 2010 10:03:54 +0000) by extracting the pattern ‘digits character digits’. The next 2 entries modify the post description by putting ‘the author {dc:creator} | the date {date} plus break return’ before the existing content]


Using the festive period to stray slightly away from my core remit I thought I would document a little mashup which allows you to automatically tweet items you share in Google Reader.


I’m a big fan of Google Reader and its the main way I consume RSS feeds (unsure about RSS? Here it is explained in plain English). Already I use the  Shared Items Post Plugin to automatically post a digest of my shared Reader items. The idea is I’m acting as an intelligent filter, sifting through almost 150 subscriptions to pull out items which might be of most relevance to staff at our supported institutions. The nice thing about Google Reader is I can share items making a personal note or comment. This has parallels to micro-blogging sites like twitter. 

The emergence of twitter, and similar status update sites, is changing the way many people tap into information streams and for me it makes sense to make sure information I produce or find useful is disseminated through as many channels as possible.

How to do it

Go to your Google Reader Shared page (if you haven’t set-up a public page or can’t remember where it is login to Reader, click on ‘Your stuff’, then ‘share settings’, shown below).

Google Reader Screenshot

On the page that opens there should be a link to ‘Preview your shared items page in a new window’, on this page you need to copy your ‘Atom feed’ link.

At this point you can go to straight to an automatic tweeting service called twitterfeed and paste this link in as a new feed (Twitterfeed is a free service which allows you to submit a RSS feed. New feed items are then ‘tweeted’ on your behalf). Unfortunately doing it this way means that any notes you’ve written about a post are lost.

Not satisfied with this I decided to create a Yahoo Pipe which extracts my notes, if any, and tweets this instead. If you’ve never tried Yahoo Pipes its a great free service to take existing RSS feeds, do some tweaking and output a new custom RSS feed. I’ll explain how the pipe works at the end of this post. For now:

  1. open this ‘Tweet Google Reader Shared’ yahoo pipe
  2. paste your ‘Atom feed’ link from Google Reader and click ‘Run Pipe’.
  3. copy the ‘Get as RSS’ link into

Now when you share an item in Google Reader with a note, the note will be tweeted via (if you share an item without a note the existing item title will be used).

To see an example here is a tweet posted via twitterfeed which was pulled from the Google Reader Shared page shown below:

Google Reader Shared Page Screenshot

How the pipe works

Below is a screenshot of the pipe I created (click here to see it in Yahoo Pipes). The pseudo code is:

  1. Fetch Feed from Google Reader Shared page
  2. If feed contains annotation copy as title else do nothing
  3. Sort by date (new first)
  4. Remove <a href> tags from title 

Yahoo Pipe Screenshot  

Enjoy (and Seasons Greetings)!