Archive for the 'Twitter' Category

Twitter throws a bone: Increased hits and metadata in Twitter Search API 1.1

Twitter has recently frustrated a number of developers and mashup artists moving to tighter restrictions on it’s latest API. Top of the list for many are all Twitter Search API requests need to be authenticated (you can’t just grab and run, a request has to be via a Twitter account), removal of XML/Atom feeds and reduced rate limits. There are some gains which don’t appear to be widely written about so I’ll share here

#1 Get the last 18,000 tweets instead of 1,500

Reading over the notes for the latest release discussion/notes for NodeXL I spotted that

you now specify how many tweets you want to get from Twitter, up to a maximum of 18,000 tweets

Previously in the old API the hard limits were 1,500 tweets from the last 7 days. This meant of you requested a very popular search term you’d only get the last 1,500 tweets making any tweets made earlier in the day inaccessible. In the new API there is still the ‘last 7 days’ limit but you can page back a lot further. Because the API limits to 100 tweets per call and 180 calls per hour this means you could potentially get 18,000 tweets in one hit. If you cache the maximum tweet id, wait an hour for the rate limit to refresh you could theoretically get even more (I’ve removed the 1.5k limit in TAGSv5.0, but haven’t fully tested how much of the 18k you can get before hit by script timeouts).

#2 Increased metadata with a tweet

Below is an illustration of the data returned in a single search result comparing the old and new search API.

Old and new Search API responses

If you look at the old data and the new data the main addition is a lot more profile data. A lot of this isn’t of huge interest (unless you wanted to do a colour analysis of profile colours), but there is some useful stuff. For example in this example I have profile information for the original and retweeter. as well as friend/follower counts, location and more (I’ve already shown how you can combine this data with Google Analytics for comparative analysis).

Whilst I’m sure this won’t appease the hardcore Twitter devs/3rd party for hackademics like myself grabbing extra tweets and more rich data has it’s benefits.

#LAK13: Recipes in capturing and analyzing data – Twitter

I’m enrolled on the Learning Analytics and Knowledge (LAK13) which is an open online course introducing data and analytics in learning. As part of my personal assignment I thought it would be useful to share some of the data collection and analysis techniques I use for similar courses and take the opportunity to extend some of these. I should warn you that some of these posts will include very technical information. Please don’t run away as more often than not I’ll leave you with a spreadsheet where you fill in a cell and the rest is done for you. To begin with let’s start with Twitter.

Twitter basics

Like other courses LAK is using a course tag hashtag to allow aggregation of tweets, in this case #lak13. Participants can either watch the Twitter Search for #lak13, or depending on their Twitter application of choice, view the stream there. Until recently a common complaint of the Twitter search is it was limited to the last 7 days (Twitter are now rolling out search for a small percentage of older tweets). Whilst this limit is perhaps less of an issue given the velocity of the Twitter stream for course tutors and students having longitudinal data can be useful. Fortunately the Twitter API (API is a way for machines to talk to each other) gives developers a way to use Twitter’s data and use it in their applications. Twitter’s API is in transition from version 1 to 1.1, version 1 being switched off this March, which is making things interesting. The biggest impact for the part of the API handling search results is the:

  • removal of data returned in ATOM feed format; and
  • removal of access without login

This means you’ll soon no longer to be able to create a Twitter search which you can watch in an RSS Feed Aggregator like Google Reader like this one for #lak13.

All is not lost as the new version of the API still allows access to search results but only as JSON.

 JSON (pron.: /ˈsən/ jay-sun, pron.: /ˈsɒn/ jay-sawn), or JavaScript Object Notation, is a text-based open standard designed for human-readable data interchange  – http://en.wikipedia.org/wiki/JSON

I don’t want to get too bogged down in JSON but basically it provides a structured way of sharing data and many websites and web services will have lots of JSON data being passed to your browser and rendered nicely for you to view. Let’s for example take a single tweet:

single tweets as displayed

Whilst the tweet looks like it just has some text, links and a profile image underneath the surface there is so much more data. To give you an idea highlighted are 11 lines from 130 lines of metadata associated with a single tweet. Here is the raw data for you to explore for yourself. In it you’ll see information about the user including location and friend/follower counts; a breakdown of entities like other people mentioned and links; and ids for the tweet and in reply to.

tweet metadata

One other Twitter basic that catches a lot of people out is the Search API is limited to the last 1500 tweets. So if you have a popular tag with over 1500 tweets in a day, at the end of the day only the last 1500 tweets are accessible via the Search API.

Archiving tweets for analysis

So there is potentially some rich data contained in tweets, but how can we capture this for analysis? There are a number of paid for services like eventifier that allow you to specify a hashtag for archive/analysis. As well as not being free the raw data isn’t also always available. My solution has been to develop a Google Spreadsheet to archive searches from Twitter (TAGS). This is just one of many other solutions like pulling data directly using R and Tableau the main advantage with this solution for me is I can set it up and it’s happy to automatically collect new data.

Setting this up to capture search results from #lak13 gives use the data in a spreadsheet.

spreadsheet of #lak13 tweets

This makes it easy to get overviews of the data using the built-in templates:

twitter summaryactivity over time

… or, as I’d like to spend the rest of this post, quickly looking at ways to create different views.

As you will no doubt discover using a spreadsheet environment to do this has pros and cons. On the plus side it’s easy to use built-in charts and formula to analyse the data, identifying queries that might be useful for further analysis. The downside is you are limited in the level of complexity. For example, trying to do things like term extraction, n-grams etc is probably not going to work. All is not lost as Google Sheets makes it easy to extract and consume the data in other applications like R, Datameer and others.

Using Google Sheets to import and query data

I’ve got a post on Feeding Google Spreadsheets: Exercises in using importHTML, importFeed, importXML, importRange and importData if you want to learn about other import options, but for now we are going to use importRange to pull data from one spreadsheet into another.

If you open this spreadsheet and File > Make a copy it’ll give you a version that you can edit. In cell A1 of the Archive sheet you should see the following formula  =importRange(“0AqGkLMU9sHmLdEZJRXFiNjdUTDJqRkNhLUxtZE5FZmc”,”Archive!A:K”)

What this does is pull the first couple of columns from this sheet where I’m already collecting LAK13 tweets (Note this techniques doesn’t scale well, so when LAK starts hitting thousands of tweets you are better doing manipulations in the source spreadsheet than using importRange. I’m doing it this way to get you started and try some things out).

FILTER, FREQUENCY and QUERY

On the Summary sheet I’ve extended the summary available in TAGS by including weekly breakdowns. The entire sheet is made with a handful of different formula used in slightly different ways with a dusting of conditional formatting. I’ve highlighted a couple of these:

  • cell G2 =TRANSPOSE(FREQUENCY(FILTER(Archive!E:E,Archive!B:B=B2),S$15:S$22))
    • FILTER – returns an array of dates the person named in cell B2 has made in the archive
    • FREQUENCY – calculates the frequency distribution of these dates based on the dates listed in S15:S22 and returns a count for each distribution in rows starting from the cell the formula is in
    • TRANSPOSE – converts the values from a vertical to horizontal response so it fills values across the sheet and not down
  • cell P2 =COUNTIF(H2:O2,">0")
    • counts if the values in row 2 from column H to O are greater than zero giving number of weeks the users has participated
  • cells H2:O – conditional formatting
    • conditional formating
  • cell B1 =QUERY(Archive!A:B," Select B, COUNT(A) WHERE B <> '' GROUP BY B ORDER BY COUNT(A) desc LABEL B 'Top Tweeters', COUNT(A) 'No.'",TRUE)
    • QUERY – allows you to use Google’s Query Language which is similar to SQL used in relational databases. In the example using the data source as columns A and B in the archive sheet we select columns B (screen name of tweeter) and count of A (could be any other column with a unique value) where B is not blank. The results are grouped by B (screen name) and ordered by count. The query also renames the columns.

QUERY Out

To give you some examples of possible queries you can use with data from Twitter in the spreadsheet you copied is a Query sheet with some examples. Included are some sample queries to filter tweets with ‘?’, which might indicate questions (even if rhetorical), time based filters and counts of messages between users.

Query sheet

Tony Hirst has written more about Using Google Spreadsheets as a Database with the Google Visualisation API Query Language, which includes creating queries to export data.

Other views of the data

The ability to export the data in this way opens up some other opportunities. Below is a screenshot of a ego/conversation centric view of #lak13 tweets rendered using the D3 javascript library. Whilst this view onto the archive is experimental hopefully it illustrates some of the opportunities.

ego/conversation centric view of #lak13 tweets

Summary

Hopefully this post has highlighted some of the limitations of Twitter search, but also how data can be collected and the opportunities to rapidly prototype some basic queries. I’m conscious that I have provided any answers about how this can be used within learning analytics beyond the surface activity monitoring but I’m going to let you work that one out. If you want so see some of my work in this area you might want to check out the following posts:

 

Twitter Archiving Google Spreadsheet TAGS v5

For a couple of years now to support my research in Twitter community analysis/visualisation I’ve been developing my Twitter Archiving Google Spreadsheet (TAGS). To allow other to explore the possibilities of data generated by Twitter I’ve released copies of this template to the community.

In September 2012 Twitter announced the release of a new version of their API (the spreadsheet uses this to request data from Twitter). Around the same time Twitter also announced that the old version of their API would be switched off in March 2013. This has required some modification of TAGS to work with the new API. The biggest change for TAGS is that all requests now need authenticated access.

So here it is:

*** Twitter Archive Google Spreadsheet – TAGS v5.0 ***
[If the first link doesn't work try Opening this Spreadsheet and File > Make a copy]

Instructions for setting up TAGSv5

Instructions are included on the Readme/Settings sheet of the template. If you are having problems it’s worth checking written by Stacy Blasiola (@Blasiola) or this modified version by Karen Smith & Shanifa Nasser made for Open Data Day Toronto available as CC-BY-SA.

What will happen to my existing TAGS sheets that aren’t version 5.0?

When Twitter turn off the old API (test outages this March) all authenticated and unauthenticated search requests will stop working.

How do I upgrade existing versions of TAGS spreadsheets (v3.x to v4.0) to keep collecting beyond March 2013?

As I can’t push an update to existing copies of TAGS you’ll have to manually update by opening your spreadsheet, then opening Tools > Script editor… and replacing the section of code that starts function getTweets() { and finishes 134 lines later (possiblly with the line function twDate(aDate){ ) with the code here. [And yes I know that’s a pain in the ass but best I could do] … or you can just start a new archive using TAGSv5.0

More additional tips and info when I get a chance

Keep your Twitter Archive fresh on Google Drive using a bit of Google Apps Script

Twitter Archive interfaceLike a growing number of other people I’ve requested and got a complete archive of my tweets from Twitter … well almost complete. The issue is that while Twitter have done a great job of packaging the archives even going as far as creating a search interface powered by HTML and JavaScript as soon as you’ve requested the data it is stale. The other issue is unless you have some webhosting where can you share your archive to give other people access.

Fortunately as Google recently announced site publishing on Google Drive by uploading your Twitter archive to a folder and then sharing the folder so that it’s ‘Public on the web’ you can let other people explore your archive (here’s mine). Note: Mark Sample (@samplereality) has discovered that if you have file conversion on during upload this will break your archive. [You can also use the Public folder in Dropbox if you don’t want to use a Google account]

The documentation wasn’t entirely clear on how to do this. Basically it seems that as long as there’s a index.html file in the folder root and links to subdirectories are relative all you need to do is open the folder in Google Drive and swap the first part of the url with https://googledrive.com/host/ e.g. https://drive.google.com/#folders/0B6GkLMU9sHmLRFk3VGh5Tjc5RzQ becomes https://googledrive.com/host/0B6GkLMU9sHmLRFk3VGh5Tjc5RzQ/

So next we need to keep the data fresh. Looking at how Twitter have put the archive together we can see tweets are stored in /data/js/tweets/ with a file for each months tweets and some metadata about the archive in /data/js/, the most important being tweet_index.js.

Fortunately not only does Google Apps Script provides an easy way to interface Drive and other Google Apps/3rd party services but the syntax is based on JavaScript making it easy to handle the existing data files. Given all of this it’s possible to read the existing data, fetch new status updates and write new data files keeping the archive fresh.

To do all of this I’ve come up with this Google Spreadsheet template:

*** Update Twitter Archive with Google Drive ***
[Once open File > Make a copy for your own copy]

Note: There is currently an open issue which is producing the error message ‘We’re sorry, a server error occurred. Please wait a bit and try again.’ Hopefully the ticket will be resolved soon

The video below hopefully explains how to setup and use:

A nice feature of this solution is that even if you don’t publically share your archive, if you are using the Google Drive app to syncs files with your computer the archive stays fresh on your local machine.

The model this solution uses is also quite interesting. There are a number of ways to create interfaces and apps using Google Apps Script. Writing data files to Google Drive and having a static html coded based interface is ideal for scenarios like this one where you don’t rely on heavy write processes or dynamic content (aware of course that there will be some sanitisation of code).

It would be easy to hook some extra code to push the refreshed files to another webserver or sync my local Google Drive with my webhost but for now I’m happy for Google to host my data ;s

Backup Twitter Status Updates to a Google Spreadsheet

It looks like Twitter are finally rolling out the option to download all your tweets. As well as providing a nice offline search interface it appears that “the archive also includes CSV and JSON files, the latter complete with each tweet’s metadata”. I’m looking forward to see the data visualisations/mashups people come up with around their data. 

The Twitter API has long allowed you to extract a user’s status updates the limitation being you can could only get the last 3,200 tweets. This is something Sheila MacNeill discovered when she tried vizify’s ‘Year on Twitter’ tool.

the archive of meExporting tweets was something I looked at in 2011 in the cryptically titled Google Spreadsheets and floating point errors aka when is 65078736491511804 + 1 = 65078736491511808 (and automatically archiving your Twitter Status Updates). With this I’ve got a record of my tweets going back to April 2010 which is triggered to update itself every week. A reason I do this is often I need to find things I’ve previously said in ‘the archive of me’.

Here’s the template (File > Make a copy) and follow the instructions if you want to try (please be aware of the Twitter Developer Rules of the Road). I’ve updated the code to make it compatibly with version 1.1 of the Twitter API. One of the options I’ve added is a JSON dump which is saved to your Google Drive. It only took two lines of code using Google Apps Script HT +Romain Vialard 

  var blob = Utilities.newBlob(Utilities.jsonStringify(json), "application/json", filename);
  DocsList.createFile(blob);

[The JSON dump is a bit buggy – some issues with character escaping somewhere]

Enjoy!

Twitter provides proof of real-time engagement with the public: How to record it as evidence

Alistair Brown has written an interesting post on the LSE Impact of Social Sciences blog – Proving dissemination is only one half of your impact story: Twitter provides proof of real-time engagement with the public. The post highlights the case of how a journal paper was picked up by a university media office, which ended in the author being interviewed on BBC Radio 4’s Today programme. As part of this Alistair highlights that:

REF impact involves an assessment of “significance” as well as “reach,” so the mere fact that research has been disseminated to a wide audience does not constitute an impact by itself; one has also to show the effect it has on those to whom it is disseminated. For this reason, citing the fact that a researcher has appeared on a primetime radio show with several million potential listeners might be one element of an impact statement, but one needs also to evidence that the audience has actively listened to what was being put out, and that it has affected, changed or benefitted them in some way

In the age of the second screen Alistair goes on to highlight how Twitter can be used as evidence of engagement, listeners tweeting personal reflections, feedback or just disseminating the information more widely. But as Alistair points out:

When a piece of academic work receives broadcast media coverage, then, it is useful to have a strategy in place to gather emerging responses, and it is also far easier to do this as it happens rather than retrospectively.

A strategy is required because, as Alistair points, out the Twitter search is limited to the last 7 days. While there are ways to view this activity in realtime how do you capture the evidence.  Here’s my response to the problem:

Here is the spreadsheet template I mention. So have you got a strategy for recording impact evidence from Twitter?

Guest Post for Big Data Week #bdw13: Getting Creative with Big Data and Google Apps

Getting Creative with Big Data and Google Apps I was recently asked to write a guest post for Big Data Week on using Google Apps as an interface for Big Data. For the post I decided to revisit an old recipe which uses Google Sheets (Spreadsheets) and Google Apps Script to interface the Twitter and Google Analytics API. One of the results is the bubble graph shown below which shows who has been tweeting my blog posts, how many visits their tweet generated, the number of retweets and how many followers the person has (click on the image for the interactive version). You can read more about his this was done and get a copy of the template in Getting Creative with Big Data and Google Apps

Visits, rewteets bubble graph

TAGSExplorer now includes filterable/searchable archive

Search Archive feature in TAGSExplorer

At IWMW12 I made a searchable/filterable version of TAGS Spreadsheets. This feature lets you use the Google Visualisation API to filter tweets stored in a Google Spreadsheet (more about TAGS). It has been available via a separate web interface for some time but I’ve never got around to publicizing it. As TAGSExplorer also uses the Google Visualisation API to wrap the same data in a different visualisation tool (predominantly d3.js) it made sense to merge the two. So now in any existing TAGSExplorer archive (like this one for #jiscel12) you have should now also have a button to ‘Search Archive’.

The archive view has some basic text filtering from tweeted text and who tweeted the message as well as a time range filter (dragging the handles indicated). The scattered dots indicate when messages were tweeted. The denser the dots, the more tweets made.

I’ve hastily thrown this together so feedback very welcome.

The most comprehensive aggregation and visualisation of #jiscel12 tweets

It’s here folks. The most advanced aggregation and visualisation of tweets for the JISC Innovating e-Learning 2012 online conference taking place next week. Over two years ago I started developing a Google Spreadsheet to archive tweets and since not only have I been evolving the code I’ve been creating tools which use the spreadsheet as a data source. It’s pleasing to see these tools being used for a wide range of projects from citizen journalism,  to a long list of academics, students and community groups, and even TV broadcasters.

I’ve been a little remise in posting some of the latest developments and I’ll have to cover those soon. For now here’s your #jiscel12 Twitter basecamp.  

Overview of features

 #jiscel12 Twitter basecamp

Whilst I probably just looks like another spreadsheet you should explore:

A. The ability to easily filter archive by person

The ability to easily filter archive by person
[Still need to document]

B. The TAGSExplorer conversation overview

TAGSExplorer conversation overview
[TAGSExplorer: Interactively visualising Twitter conversations archived from a Google Spreadsheet]

C. The entire searchable/filterable archive

entire searchable/filterable archive
[Still need to document]

D. The question and answer filter

question and answer filter
[Any Questions? Filtering a Twitter hashtag community for questions and responses]

Dashboard

image[Contains a number of summaries – I find ‘most RTs in last 24hrs’ one of the most useful (how this works also need documenting]

Currently these are automatically updating every hour, but I’ll probably crank up the frequency next week. Your thought on these always gratefully received ;) 

CFHE12 Week 3 Analysis: Exploring the Twitter network through tweets

For week 3 of cfhe12 analysis I thought I’d turn back to the Twitter data. I’m currently trying to prepare a Shuttleworth Fellowship application which has got me thinking more about the general premise of cMOOCs that “knowledge is distributed across a network of connections, and therefore that learning consists of the ability to construct and traverse those networks”  (from week 1 of cck11).

The aspect, which features in my Shuttleworth application, is providing mechanisms that aggregate data from distributed sub-networks which then can be processed to produce actionable insights to tutors or participants. The process I plan to adopted is to look at the data using heavyweight tools, like NodeXL, or just applying a bit of curiosity (this person has stopped tweeting, why? etc), and then converting some of these patterns into very lightweight applications or views to remove the complexity and highlight key parts of the data.

Some examples for you:

Summary of CFHE12 participant activity

Tweets

Tweets from CFHE12 are being collected in this Google Spreadsheet. As part of this template there are a number of summary views, one of these being a breakdown of individual participant activity. As part of this sparklines are used to display someone’s twitter activity. Looking at gsiemens you can see there is steady activity posting 45 tweets tagged #cfhe12. Towards the bottom of the graph is ViplavBaxi, who after initial high activity is no longer contributing to the hashtag. So what has happened to ViplavBaxi? There are a number of possible answers but let me highlight a couple which also highlights the limitation of the technique:

  • they have lost interest in the course ot time commitments prevent them from contributing (high drop outs aren’t unexpected in MOOCs)
  • no longer using #cfhe12 hashtag – the archive is only of #cfhe12 so if the have joined a sub community communicating without the hashtag it’s not recorded
  • found a different communication channel – this technique is only looking at Twitter activity, the person may have moved to another network channel like the discussion forum

Another interesting activity summary is for dieGoerelebt. They are one of the top 5 contributors in terms number of tweets, but recently their activity has trailed off. You can also see the ‘@s’ column, which is the number of times they’ve been mentioned in tweets is one of the lowest. Is the decline in activity a result of the lack of engagement?

The next question that springs to my mind is what did these people say. Within the spreadsheet it’s easy to filter what they said. To let you see too I’ve got this simple web interface primed with filtered tweets (I modified an existing tool I’ve developed to do this – unfortunately I’ve never documented it, but as I use it more and more I must get around to it):

Summary of CFHE12 participant activity with RT percentageFrom visual inspection dieGoerelebt had a high proportion of retweets. This is confirmed when I added a percentage of tweets that are retweets.

Something I noted in the filtered view for a persons tweets was that a lot of the context is lost (I can see they are @replying to someone, but I don’t know what they said.

To help with this I started looking at modifying the twitter questions filter I built to enable a view of the conversation.

This is a start, but as I noted when I published the question filter clicking through messages like the one showed below reveal there is more of the conversation that is missing.

 Part of the conversation

Bigger picture

Summary

So again I start exploring some ideas that branch off into many more avenues to follow. One thought is that the micro analysis of tweets might not my beneficial or practical, and given the issues with extracting a full conversation from Twitter a macro view might be better. Providing a summary of overall activity and the mode in which Twitter is being by people may be of the most use to tutors and participants to identify people they might want to connect with. As always your thoughts are greatly appreciated.

In this post I’ve taken an ego-centric approach contributions. In the next couple of days I’ll share an ego-centric approach to community connections.

About

This blog is authored by Martin Hawksey Google+

JISC CETIS Learning Technology Advisor (OER Programme Support)
jisc cetis logo

The MASHezine (tabloid)

It's back! A tabloid edition of the latest posts in PDF format (complete with QR Codes). Click here to view the MASHezine

Preview powered by:
Bluga.net Webthumb

The MASHebook

You can also download this post as:

Subscribe to monthly email digest of posts

Loading...Loading...


Subscribe to per post email updates

Enter your email address:

Delivered by FeedBurner

Copyright License

Creative Commons Licence
This work is licensed under a Creative Commons Attribution 3.0 Unported License. CC-BY mhawksey

Privacy /Cookies

This blog uses Google Analytics (which makes use of 'cookie' technologies) to provide information on usage. Here's an overview of Google Analytics Privacy and how to opt-out (other 3rd party services like Twitter might also be tracking you via this site, but as far as possible I try and prevent this by removing official tweet buttons).

Badges

. . .