After posting Analysing WordPress post velocity and momentum stats with Google Sheets (Spreadsheet) Tony quickly came back with WordPress Stats in R, which includes a handy custom function for querying the WordPress Stats API. Something I wanted to do in my original post was to generate sparklines for post activity. I wasn’t able to find a way of doing this in Google Sheets because of the number of data points (n. 366) and didn’t think Google Apps Script would handle the dataset because of timeouts. So given the start Tony had made with R it seemed like the ideal opportunity to refresh my R skills.
So below is the result (here’s the interactive version), which is followed by an explanation of how it was made.
Processing the data in R
Thanks to the code provided by Tony it’s easy to get postviews data back from WordPress using:
wp.postviews=wordpress.getstats.demo(APIKEY, BLOGURL, 'postviews', days='366', end='2012-12-31', period='day', limit=-1)
This returns a data frame that looks like this:
Note that 478 different post_titles are returned compared to 178 post_permalinks showing some of the data is missing and in fact wrong. This isn’t important for this particular project. I’ve uploaded a sample of the returned data here. The data returned includes a day stamp, post title and number of views. Posts with no views on that day are not returned.
For the sparklines I initially headed off down a dead end (lines 99-107) making a pivot table in R. This wasn’t entirely wasted effort because the process of casting this and handling null values was used later in the project. Having got stuck at this point searching for more information on sparklines in R turned up this question on stackoverflow and this set of slides referencing the sparkTable package. Initially I tried dropping and renaming some of the columns from the original data but on line 96 when trying the reshapeExt I got:
Error in `[<-.data.frame`(`*tmp*`, , attr(dat, "reshapeLong")[["timevar"]], : replacement has 32504 rows, data has 32538
Searching for variations around the error message didn’t turn anything up and if you know what is wrong I’d be very grateful for guidance. Instead I decided to follow a hunch and instead of using partial time series data filled in the blanks by casting and then melting i.e. from this:
to this (filling in missing data with ‘0’)
before going back to this
Doing this got me past reshapeExt and outputting a sparkTable as html (the other option is LaTeX) embedding 478 sparkBars as png images. The final step was to wrap the html using jQuery DataTables (the main thing was to add <thead> and <tbody> tags).
Was it worth it? I’ll let you decide if the product is any good, the process of doing it was definitely useful. So how would you improve it?