Cooking: A Custom Search Engine for … with Edublog 2010 Nominees

In my spare time I like to do a bit of cooking. I’m mainly a recipes kind of a guy, using the experience of recreating something tried and tested to build a foundation of knowledge to explore some of my own personal creations.

The same is true for my interest in educational mash-ups. Trying out other peoples ideas to improve my basic comprehension as well as looking for opportunities for new twists.       

Tonight I’ve been faithfully recreating Tony Hirst’s A Custom Search Engine for the Computer Weekly IT Blog Awards 2010 Nominees using the list of shortlisted nominations for the 2010 Edublog Awards.

Like Tony’s experience with the Computer Weekly awards Edublogs have gone down the route of just having a linked list of nominations in each of the categories. It would have been nice to have an OPML file for the available feeds but with almost 600 shortlisted nominations I guess their focus is on other priorities.

Here’s how I got on. One of Tony’s required ingredients is a lists of nominated blog urls. Fortunately there is a list of these on the Edublogs homepage but the urls are behind text links so copying and pasting into Excel hides the url:

Excel Spreadsheet of 2010 Edublog award nominations

Not wanting to individually copy the url from 600 links I looked to see if there was a way of using a formula to get the links. I didn’t find a formula but ExcelTips.net has a handy macro for Extracting URLs from Hyperlinks. I’m not going to go into macros but if it is something you would like to find out about I’m sure ExcelTips is an ideal place to start. The macro they suggest is:

Sub ExtractHL()
    Dim HL As Hyperlink
    For Each HL In ActiveSheet.Hyperlinks
        HL.Range.Offset(0, 1).Value = HL.Address
    Next
End Sub

 

This cycles through the sheet and for each hyperlink it find it puts the link address in the cell next to it.

Excel spreadsheet with hyperlink addresses extracted

With this I was able to get back to Tony’s recipe. When it came to Step 2 Refinements I hit a bit of a snag as there appears to be a limit of 16 possible refinements in Google CSE and 23 Edublog award categories. My solution to merge the less bloggy type categories into ‘Other’.

In Step 4 Preparing the URLs I also went my own way, modifying Tony’s:

=IF(RIGHT(B2)=”/”,CONCATENATE(B2,”*”),IF(OR(RIGHT(B2,4)=”.com”,RIGHT(B2,6)

formula to:

=IF(RIGHT(B2)="/",B2&"*",B2&"/*")

[basically if end of the url has ‘/’ add ‘*’ else add ‘/*’. This catches .com and .co.uk but doesn’t work if the url ends with a querystring or .file_extension – manual sweep used to pick up any problems]

Here’s the resulting TSV file I uploaded for more example. With Google Spreadsheet you can output in different formats including TSV by changing the output=txt e.g. here’s the same sheets as TSV which got me wondering if you could just point CSE to a Google Spreadsheet. There are options in CSE to host your own annotations XML or point to a feed but I can’t see a way for hosting it as Spreadsheet (I did come across the csesheet project which lets you configure your custom search engine through a Google Spreadsheet, but will let someone else look into that one.

So here is the fruits of my labour a Google Custom Search Engine of the 2010 Edublogs Award nominations and of course using my How to Google Instant(ise) a Custom Search Engine (CSE) the

Instantised Google Custom Search Engine of the 2010 Edublogs Award nominations

Comments are closed.