Another post related to my ‘Hacking stuff together with Google Spreadsheets’ (other online spreadsheet tools are available) session at Dev8eD (A free event for building, sharing and learning cool stuff in educational technology for learning and teaching!!) next week. This time an example to demonstrate importHtml. Rather than reinventing the wheel I thought I’ve revisit Tony Hirst’s Creating a Winter Olympics 2010 Medal Map In Google Spreadsheets (hmm are we allowed to use the word ‘Olympic’ in the same sentence as Google as they are not an official sponsor ;s).
Almost 4 years on the recipe hasn’t changed much. The Winter Olympics 2010 medals page on wikipedia is still there and we can still use the importHTML formula to grab the table [=importHTML("http://en.wikipedia.org/wiki/2010_Winter_Olympics_medal_table","table",3)]
The useful thing to remember is importHtml and it’s cousins importFeed, importXML, importData, and importRange create live links to the data, so if the table on wikipedia was to change the spreadsheet would also eventually update.
Where I take a slight detour with the recipe is that Google now have a chart heatmap that doesn’t need ISO country codes. Instead this is happy try to resolve country names.
Once the data is imported from Wikipedia if you select Insert > Chart and choosing heatmap, using the Nation and Total columns as the data range you should get a chart similar to the one below shown to the right. The problem with this is it’s missing most of the countries. To fix this we need to remove the country codes in brackets. One way to do this is trim the text from the left until the first bracket “(“. This can be done using a combination of the LEFT and FIND formula.
In your spreadsheet at cell H2 if you enter =LEFT(B2,FIND("(",B2)-2) this will return all the text in ‘Canada (CAN)’ up to ‘(‘ minus two characters to exclude the ‘(‘ and the space. You could manually fill this formula down the entire column but I like using the ARRAYFORMULA which allows you to use the same formula in multiple cells without having to manually fill it in. So our final formula in H2 is:
Using the new column of cleaned country names we now get our final map
To recap, we used one of the import formula to pull live data into a Google Spreadsheet, stripped some unwanted text and generated a map. Because all of this is sitting in the ‘cloud’ it’ll quite happly refresh itself if the data changed.
Tony has another Data Scraping Wikipedia with Google Spreadsheets example here