Did you tune into into Donald Clark’s “Don’t Lecture Me!” keynote at ALT-C 2010 or were you at Joe Dale’s FOTE10 presentation? These presentations have two things in common, both Donald and Joe posted their reflections of a ‘hostile’ twitter backchannel (see Tweckled at ALT! and Facing the backchannel at FOTE10) and I provided a way for them to endlessly relive their experience with a twitter subtitle mashup, wasn’t that nice of me ;) (see iTitle: Full circle with Twitter subtitle playback in YouTube (ALT-C 2010 Keynotes) and Making ripples in a big pond: Optimising videos with an iTitle Twitter track)
Something I’ve been meaning to do for a while is find a way to quickly (and freely) analysis a twitter stream to identify the audience sentiment. You can pay big bucks for this type of solution but fortunately viralheat have a sentiment API which gives developers 5,000 free calls per day. To use this you push some text to their API and you’ll get back the text mood and probability that the sentiment detected is correct (more info here).
So here is a second-by-second sentiment analysis of Donald and Joe’s presentations, followed by how it was done (including some caveats about the data).
Donald Clark – Don’t Lecture Me! from ALT-C 2010
Joe Dale – Building on firm foundations and keeping you connected in the 21st century. This time it’s personal! FOTE10
How it was done
Step 1: formatting the source data
This first part is very specific to the data source I had available. If you already have a spreadsheet of tweets (or other text) you can skip to the next part.
All viralheat needs is chunks of text to analyse. When I originally added the twitter subtitles to the videos I pulled the data from the twitter archive service Twapper Keeper but since March this year the export function has been removed (BTW Twapper Keeper was also recently sold to Hootsuite so I’m sure more changes are on the horizon). I also didn’t get a decent version of my Google Spreadsheet/Twitter Archiver working until February so had to find an alternate data source (To do: integrate sentiment analysis into twitter spreadsheet archiver ;).
So instead I went back to the subtitle files I generated in TT-XML format. Here’s an example line:
<p style="s1" begin="00:00:28" id="p3" end="00:01:01" title="Tweeted on 01 Oct 2010 14:45:28">HeyWayne: A fantastic talk by ??? @mattlingard #FOTE10 [14:45GMT]</p>
The format is some metadata (display times, date), then who tweeted the message, what they said and a GMT timestamp. The bits I’m interested in are the message and the date metadata, but in case I needed it later I also extracted who tweeted the message. Putting each of the <p> tags into a spreadsheet cell it’s easy to extract the parts I want using these formula:
Screen name (in column B)
Tweet (in column C)
Date/time (in column D)
- =VALUE(MID(A2,FIND("Tweeted on",A2)+11,20))
You can find more information about the cell formula used elsewhere but briefly:
- MID extracts some subtext based on start point and length;
- FIND finds the first occurrence of some text in text
- SEARCH is used to find the first occurrence of some text after a specific point (in this case I knew ‘:’ marked the end of who tweeted the message but if I used FIND it would have returned the position of the : in begin=
- LEN is the number of characters in a cell
- VALUE is used to convert a text string into another format like a number or in this case a date/time
This gives me a spreadsheet which looks like this:
Step 2: Using Google Apps Script to get the sentiment from viralheat
Google Apps Script is great for automating repetitive tasks. So in the Tools > Script editor… you can drop in the following snippet of code which loops through the text cells and gets a mood and probability from viralheat:
Step 3: Making the data more readable
If the script worked you should have some extra columns with the returned mood (positive or negative) and a probability factor. To make the data more readable I did a couple of things. For the mood I used conditional formatting to turn the cell green for positve, red for negative. To do this select the column with the mood vaules and then open Format > Conditional formatting and add these rules:
In the examples above you’ll see I graphed a sentiment rating over time. To do this I converted ‘negative’ and ‘positive’ into the values –1 and 1 using the formula where the returned mood is in column E:
I also wanted to factor in the probability by multiplying the value by the probability by adding:
- =IF(E2="positive",1,-1)*F2 (where the probability factor is in column F)
These values were then accumulated over time.
Using the time and accumulated sentiment columns you can then Insert > Chart, and if not already suggested, use the Trend > Time line chart.
One last trick I did, which I’ll let you explore in the published spreadsheets I’ll link to shortly, is extract and display certain tweets which are above a threshold probability.
As promised here are the spreadsheets for (you can reuse by File > Make a copy):
Some quick notes about the data
A couple of things to bear in mind when looking at this data
Noise – All of the analysed tweets don’t necessarily relate to the talk so if someone didn’t like the coffee at the break or someone is tweeting about the previous talk they liked that will effect the results
Quoting presenter – Donald’s talk was designed to get the audience to question the value of lectures so if he made negative statements about that particular format that were then quoted by someone in the audience it would be recorded as negative sentiment.
Sometimes it’s just wrong – and let not forget there maybe times when viralheat just get it wrong (there is a training api ;)
There is probably more to say like is there a way to link the playback with a portion of the sentiment chart or should I explore a way to use Google Apps Script and viralheat to automatically notify conference organisers of the good and bad. But that’s why I have a comment box ;)