Using the Viralheat Sentiment API and a Google Spreadsheet of conference tweets to find out how that keynote went down

Did you tune into into Donald Clark’s “Don’t Lecture Me!” keynote at ALT-C 2010 or were you at Joe Dale’s FOTE10 presentation? These presentations have two things in common, both Donald and Joe posted their reflections of a ‘hostile’ twitter backchannel (see Tweckled at ALT! and Facing the backchannel at FOTE10) and I provided a way for them to endlessly relive their experience with a twitter subtitle mashup, wasn’t that nice of me ;) (see iTitle: Full circle with Twitter subtitle playback in YouTube (ALT-C 2010 Keynotes) and Making ripples in a big pond: Optimising videos with an iTitle Twitter track)

Something I’ve been meaning to do for a while is find a way to quickly (and freely) analysis a twitter stream to identify the audience sentiment. You can pay big bucks for this type of solution but fortunately viralheat have a sentiment API which gives developers 5,000 free calls per day. To use this you push some text to their API and you’ll get back the text mood and probability that the sentiment detected is correct (more info here).

So here is a second-by-second sentiment analysis of Donald and Joe’s presentations, followed by how it was done (including some caveats about the data).

Donald Clark – Don’t Lecture Me! from ALT-C 2010


Open Donald Clark’s ALTC2010 Sentiment Analysis in new window

Joe Dale – Building on firm foundations and keeping you connected in the 21st century. This time it’s personal! FOTE10


Open Joe Dale’s Sentiment Analysis in new window

How it was done

Step 1: formatting the source data

This first part is very specific to the data source I had available. If you already have a spreadsheet of tweets (or other text) you can skip to the next part.

All viralheat needs is chunks of text to analyse. When I originally added the twitter subtitles to the videos I pulled the data from the twitter archive service Twapper Keeper but since March this year the export function has been removed (BTW Twapper Keeper was also recently sold to Hootsuite so I’m sure more changes are on the horizon). I also didn’t get a decent version of my Google Spreadsheet/Twitter Archiver working until February so had to find an alternate data source (To do: integrate sentiment analysis into twitter spreadsheet archiver ;).

So instead I went back to the subtitle files I generated in TT-XML format. Here’s an example line:

<p style="s1" begin="00:00:28" id="p3" end="00:01:01" title="Tweeted on 01 Oct 2010 14:45:28">HeyWayne: A fantastic talk by ??? @mattlingard #FOTE10 [14:45GMT]</p>

The format is some metadata (display times, date), then who tweeted the message, what they said and a GMT timestamp. The bits I’m interested in are the message and the date metadata, but in case I needed it later I also extracted who tweeted the message. Putting each of the <p> tags into a spreadsheet cell it’s easy to extract the parts I want using these formula:

Screen name (in column B)

  • =MID(A2,FIND(">",A2)+1,(SEARCH(":",A2,FIND(">",A2))-FIND(">",A2)-1))

Tweet (in column C)

  • =MID(A2,FIND(">",A2)+3+LEN(B2),LEN(A2)-FIND(">",A2)-LEN(B2)-16)

Date/time (in column D)

  • =VALUE(MID(A2,FIND("Tweeted on",A2)+11,20))

You can find more information about the cell formula used elsewhere but briefly:

  • MID extracts some subtext based on start point and length;
  • FIND finds the first occurrence of some text in text
  • SEARCH is used to find the first occurrence of some text after a specific point (in this case I knew ‘:’ marked the end of who tweeted the message but if I used FIND it would have returned the position of the : in begin=
  • LEN is the number of characters in a cell
  • VALUE is used to convert a text string into another format like a number or in this case a date/time

  This gives me a spreadsheet which looks like this:

Extracting old tweets

Step 2: Using Google Apps Script to get the sentiment from viralheat

Google Apps Script is great for automating repetitive tasks. So in the Tools > Script editor… you can drop in the following snippet of code which loops through the text cells and gets a mood and probability from viralheat:

Step 3: Making the data more readable

If the script worked you should have some extra columns with the returned mood (positive or negative) and a probability factor. To make the data more readable I did a couple of things. For the mood I used conditional formatting to turn the cell green for positve, red for negative. To do this select the column with the mood vaules and then open Format > Conditional formatting and add these rules:

Conditional formating 

In the examples above you’ll see I graphed a sentiment rating over time. To do this I converted ‘negative’ and ‘positive’ into the values –1 and 1 using the formula where the returned mood is in column E:

  • =IF(E2="positive",1,-1)

I also wanted to factor in the probability by multiplying the value by the probability by adding:

  • =IF(E2="positive",1,-1)*F2 (where the probability factor is in column F)

These values were then accumulated over time.

Using the time and accumulated sentiment columns you can then Insert > Chart, and if not already suggested, use the Trend > Time line chart.

Insert time line chart

One last trick I did, which I’ll let you explore in the published spreadsheets I’ll link to shortly, is extract and display certain tweets which are above a threshold probability.

As promised here are the spreadsheets for (you can reuse by File > Make a copy):

Some quick notes about the data

A couple of things to bear in mind when looking at this data

Noise – All of the analysed tweets don’t  necessarily relate to the talk so if someone didn’t like the coffee at the break or someone is tweeting about the previous talk they liked that will effect the results

Quoting presenter – Donald’s talk was designed to get the audience to question the value of lectures so if he made negative statements about that particular format that were then quoted by someone in the audience it would be recorded as negative sentiment.

Sometimes it’s just wrong – and let not forget there maybe times when viralheat just get it wrong (there is a training api ;)

There is probably more to say like is there a way to link the playback with a portion of the sentiment chart or should I explore a way to use Google Apps Script and viralheat to automatically notify conference organisers of the good and bad. But that’s why I have a comment box ;)

Last updated by at .

14 Responses to “Using the Viralheat Sentiment API and a Google Spreadsheet of conference tweets to find out how that keynote went down”


Comments are currently closed.

About

This blog is authored by Martin Hawksey Google+

JISC CETIS Learning Technology Advisor (OER Programme Support)
jisc cetis logo

The MASHezine (tabloid)

It's back! A tabloid edition of the latest posts in PDF format (complete with QR Codes). Click here to view the MASHezine

Preview powered by:
Bluga.net Webthumb

The MASHebook

You can also download this post as:

Subscribe to monthly email digest of posts

Loading...Loading...


Subscribe to per post email updates

Enter your email address:

Delivered by FeedBurner

Copyright License

Creative Commons Licence
This work is licensed under a Creative Commons Attribution 3.0 Unported License. CC-BY mhawksey

Privacy /Cookies

This blog uses Google Analytics (which makes use of 'cookie' technologies) to provide information on usage. Here's an overview of Google Analytics Privacy and how to opt-out (other 3rd party services like Twitter might also be tracking you via this site, but as far as possible I try and prevent this by removing official tweet buttons).

Badges

. . .