Twitter network analysis and visualisation I: SocialBro

There is still a free version of SocialBro here. You get more features if you go Premium, but I find the free version still has enough for me

Last week a couple of network analysis tools landed in my inbox and after having a quick play I thought they were worth highlighting here. In this post I’m going to have a quick look at SocialBro which is billed as a tool to ‘manage and analyse your Twitter Community.

The main features are:

  • Browse your Community Search your followers and friends using different criteria such as name, location and description.
  • Filter and Sort Number of followers, followers/friends ratio, frequency of tweets, account age, recent activity, language, time zone.
  • Easy Follow Back Tools Discover your new followers and easily follow them back
  • Easy Unfollow Tools Detect noisy friends, potential spammers, inactive friends and easily unfollow them.
  • Track Unfollows Detect who recently unfollowed you and you can easily unfollow them back.
  • Manage Twitter Lists Organize your followers and friends by creating Twitter lists with the search results.
  • Backup your Twitter Community Download all your Twitter followers and friends to a local database which you can consult even when offline.
  • Fast Communication Tweet and direct message from search results.
  • Visualize Statistical Information Time zone charts, languages charts, users by number of followers, users by recent activity, etc.
  • See your Community in Map Visualize the world wide distribution of your community in a map.

SocialBro Dashboard

You’ll find a number of these features in other web services, for example, I already use TwUnfollow.com to see who unfollows me. The big difference with SocialBro is it’s the first downloadable Twitter analytics tool I’ve seen (I’m ignoring the Archivist Desktop because while its great at downloading search terms it doesn’t have much in the way of analysis).

Being a downloaded software program has pros and cons. The main pro is you are downloading data for offline use. The main con is you’ll need to fire the application up to synchronise the data and how often you do this will effect the resolution of time based data (e.g. follower growth)

The client uses Adobe Air which gives it cross-platform support (if you are using TweetDeck Desktop you already have Air installed). Once the software is installed you need a beta account with SocialBro, which for me was processed very quickly. I noticed that each time you start the software it checks that you have an account with them. You can cancel this and still access your locally stored data but you can’t synchronise. I’m guessing once they get out of beta there’ll be a freeium or even just a premium model.

Watching the garden grow

As well as being able to export your friends and followers as a csv file there are a couple of build-in reports for ‘Best time to tweet’ and ‘Insights’.

Here’s my full ‘best time to tweet’ report from SocialBro. Something I’m not sure about is “the free version of “Best Time to Tweet” is generated by analyzing only your top 100 followers”. I’m not sure how they are categorising ‘top’ but I’m guessing they mean ‘last’. It’s interesting to note that SocialBro and online Twitter analytics service Crowdbooster have very similar best time to tweet matrix charts.

SocialBro - best time to tweet Crowdbooster - best time to tweet
Best time to tweet from SocialBro Best time to tweet from Crowdbooster

Here’s a link to my ‘insights report’. The pdf version has breakdowns for language and timezones. The client also includes a map overlay:

SocialBro - follower map

Weeding and seeding

As well as SocialBro giving you a overview and option to filter things like non-reciprocal relationships for your account you can also add additional data sources. These sources can be Twitter searches, other Twitter user’s friends/followers or Twitter lists. For example, I added the eas11-delegate list and I can see that of the 116 members I only follow 20 so their might be some interesting people to checkout in the remaining 96.

SocialBro - Add new source

Final thoughts

SocialBro has some great tools to help you manage and analyse your Twitter community particularly if you are managing a class or community account. Big question for me is how long will it stay free.  Perhaps it’s time to revisit my collection of Twitter Google Spreadsheets …

What I’ve starred this month: August 28, 2011

Here’s some posts which have caught my attention this month:

Automatically generated from my Google Reader Shared Items.

My #eas11 world premier: Creating personalised Google Documents from Form submissions (and some other things)

Tomorrow (Friday 26th August) for my eAssessment Scotland 2011 workshop I’ll be doing a world premier extending the form submit sheet split example to let tutors create individual student Documents from submitted data (e.g. taking a sheet like this and at a click of a button produce this document.

Update: Still some refinement in the instructions below. In the meantime if you want to reverse engineer here is a copy of the final spreadsheet and Google Document Template used. David Sloan from the University of Dundee who was at the workshop has also usefully highlighted some accessibility issues to be aware of when using Google Apps.

[gview file="http://mashe.hawksey.info/wp-content/uploads/2011/08/Using-Google-Forms-for-Assessment-and-More1.pdf"]

And the most engaging OUseful.info post is…

This one Data Scraping Wikipedia with Google Spreadsheets

If you want to know how I arrived at this conclusion it’s mostly the same process as And the most engaging JISC Project is…, but as well as getting social bookmark/share counts I’m also counting the number of post comments (an important part of engagement).

Here’s a copy of the full spreadsheet with 771 analysed links (if you make a copy of this you can see the code in Tools > Script editor) and a summary below:


Counting comments

It actually took two methods. Knowing that WordPress blogs have a comment rss feed for each post I initially grabbed this and snaffled the number of comment items (at the same time I grabbed the post title, as the sitemap.xml doesn’t have this info).  Here’s the code for getting comment counts from comment rss feeds.

What I forgot was that the comment feed is limited to the number of rss items set in the blog administration, usually 10, so if a post has more these are missed. To solve this I wrote this additional code which goes through each post with 10 comments and confirms this count by screen scraping the number from the original HTML page (its ironic that Tony’s top post is also screen scraping using Google Spreadsheets using an alternative method).

Bonus – Topsy tweet data

In my previous post I mentioned how I used the Topsy service to get a secondary tweet count. As well as the link share count Topsy also keep a record of the tweet which made. This data is also accessible via their Topsy (Otter) API as part of trackbacks. As Tony has done a lot of work with community visualisation (some posts here) I thought that as I was querying Topsy I should grab the tweets and dump them in a separate sheet in their JSON format (I should say that unless you start paging the results the maximum you’ll get is 50 tweets). 

Engagement £value£

Something else I mentioned in my my previous post was can a monetary value be attached to a Tweet or a Like. In Sheila MacNeill’s related post I commented that services like PostRank put a suggested value on these forms of engagement. I’m not convinced that a comment is worth $2.50, there are too many factors like who made the comment, who sees the comment and many others to put a flat rate on it.

For example, using the PostRank formula I calculated that OUseful.info has generated $20,393.25 in ‘engagement’ which I think is a gross underestimate. Considering the number of things I’ve learned and followed up as a direct result of Tony’s work he’s due a sizeable cut from any ‘engagement value’ generated from this site.

And the most engaging JISC Project is…

social media engagement ring
social media engagement ring
Originally uploaded by pro1pr

Recently I posted a Google Spreadsheet of all the live JISC funded project websites from the last 3 years. Not too long before that I also posted Google Spreadsheets as a lean mean social bookmark/share counting machine, which used Google Apps Script to query different social network providers for share/like counts for a specified url.

I thought it would be interesting to combine the two and see which JISC funded  projects have been trickling through various social networks (the social engagement monitoring service PostRank did something similar with the TEDTalks, but since their purchase by Google the API, which would have made this a lot more easier, has been closed).

My starting point was the PIMS (2nd Pass) spreadsheet (I chose not to use PROD as I don’t think it has all of the JISC funded projects? – someone correct me if I’m wrong). I could have inserted cell formula for the custom Apps Script functions to getFacebookLike() etc. but as I mentioned in that post you can only use 100 of these before hitting timeouts, and with over 400 project websites it’s not straight forward.

The solution was two fold. Firstly, use Google Apps Script to iterate across the website urls fetching the results and recording it in the appropriate cell as a value. This saves the spreadsheet having to fetch the responses each time it is opened.

The second part of the solution was linked to this. As I was going to record values rather than use live data it made sense to try and aggregate the calls to individual social network services to avoid hitting urlfetch limits (I reckon you get about 400 of these a day?). As I mentioned in the original bookmark counting post I’d come across Yahel Carmon’s Shared Count API, which let you make one request and get a bunch of stats back for that url.

So here’s the code I used and the resulting Google Spreadsheet of JISC Project Social Favourites.

The most engaging JISC Project is…

And here is where the arguments start. The more accurate description is ‘the most engaging  JISC funded project website index page is…’ and even then there is the caveat of (including established websites part funded by JISC). This also excludes all the blog posts, wiki pages, supplemental pages, repository submissions generated by JISC projects and also not forgetting other forms of engagement like other people writing and linking back to JISC project websites/resources. The list goes on. Sheila MacNeill at JISC CETIS has already posted some thoughts in Socially favoured projects, real measures of engagement?.

So is there any value is this data? I’ll let you decide. The important part for me was the process. I now have a method for returning social bookmark/shares for a bunch of websites and a framework using Google Apps Script to start automatically adding urls and collecting data.

Problems encountered

So if the process was more important for me what did I learn along the way.

Shared Count API didn’t like some of the urls

For some reason the Shared Count API spat out the following 2 urls with 500 server errors.

I don’t know why it did this but my quick fix was to use the custom getFacebookLike formulas for these entries.

Twitter counts aren’t reliable

Twitter only recently started providing there own share/count button and as a result polling the official data isn’t always accurate. A separate service which has been monitoring the links people tweet for a lot longer is Topsy and fortunately for us Topsy have an API to pull similar data (these 3rd party APIs are becoming more scarce as the big boys buy up services and switch off APIs – I’m sure it will be a case that Topsy’s API will disappear soon as well :(

An example of the difference is the ticTOCs website which Twitter only thinks has been tweeted 30 times, but Topsy has 96 hits (the other advantage of Topsy is I can see what people said about ticTOCs – this data is also available via their API so I may be revisiting this source). When calculating ‘total engagement’ I took the maximum value between Twitter and Topsys (more about the total further down the page).

[As Topsy results aren’t included in the Shared Count API I grabbed these separately using the getTopsyCount function documented in my other bunch of other bookmark/share code snippets]

Hitting Apps Script urlFetch limits

Even using the Shared Count API (plus Topsy calls) I hit Apps Script urlFetch quota limits (I haven’t seen this documented anywhere but I’m guess its between 400-500). To get around this I shared the Spreadsheet and Script with another one of my Google accounts and was able to continue.

Stumble trip, stumble trip

I collected StumbleUpon stats mainly because they were part of the Shared Count API data, but unlike the other service details these are views rather than share counts so I didn’t include them in the totals as it’s a bit apple and pear-ish.

Buzz off

Buzz, Google’s second… no third… forth(?) attempt at social networking, is being eclipsed by Google+ but if like me you switched it on to automatically push updates from other services buzz counts potentially have a lot of noise in them.

For example, JISC funded projects which are hosted on Google Code (like Shuffl and meAggregator) end up with large Buzz counts (I’m guess each code commit generates a buzz) and not much other social activity. In the spreadsheet when I totalled the different service counts I also included a column excluding the Buzz counts.

Is a Like more valuable than a Tweet

This brings me back to to some of the questions around what does this all mean. I’ve already written/presented about how for services like Eventbrite there is more dollar value in a customer using a Facebook Like than Tweeting event information. So should a Facebook Like get more weighting than a tweet?

Where next

Umm not sure but if anyone has a collection of interesting  urls they’d like a spreadsheet of social counts for get in touch ;)

Open data equals open scrutiny but doesn’t always equal all of the answer

Eyes of Flickr
Eyes of Flickr
Originally uploaded by anyjazz65

Last week I got frustrated at not being able to find some JISC funded project outputs, which was  a little annoying. This led to a small exploration around JISC’s Programme Information Management System (PIMS). The system was originally only available to JISC executive staff but made available to all sometime last year and used to log all the JISC funded programmes and projects. As well as looking up projects via your browser some of the data can be accessed via it’s API.

At the back of my mind was a post by Jonas Eriksson (@joeriks) on Coding my own automated web site monitoring with Google Apps Script and a Spreadsheet. What I wanted to do was pull JISC funded project website addresses from the last 3 years* and automatically test to see if they were still alive.

To do this I first needed a list of project website urls from the last 3 years. Unfortunately the PIMS API doesn’t appear to let you access records based on a date range so instead I just grabbed the lot via http://misc.jisc.ac.uk/pims/api/dev/project which returns the data in XML format. I could have just dumped this straight into a Google Spreadsheet using the importXML formula, but I find this is *very* unreliable so opted to handle it using the Google Apps Script XML Service. [Because of my lack of knowledge of using this service I initially didn’t get very far so tried processing the response as JSON instead by using http://misc.jisc.ac.uk/pims/api/dev/project=json but got into more difficulty because the API returns objects with a duplicate ‘Project’ key so reverted back to XML]

So here is my code to get some selected columns of PIMS data for projects that have finished in the last 3 years:

* Para 27 and 29 of JISC’s general terms and conditions of funding state “27. The institution and its partners must create a web page and web site to explain the project aims and objectives and to disseminate information about project activities and results. …. 29. The lead institution or one of its partners must agree to host the web site on their server for a minimum of 3 years after the end of the project and to assist JISC in archiving it subsequently.”

The data isn’t clean

Any systems that gives the user freedom over the data entered will invariably get some ‘dirty data’. For example it would have been nice to just iterate across the project website urls but: a) not all projects have a url entered; b) not all of the projects are projects (some of the entries are holders for Programme Management or Evaluation); c) urls are entered with leading whitespace; d) the field may have multiple urls or text notes; or e) might just be entered wrong or be an old url.

You can add layers of code to factor some of these out like trim whitespaces or only process urls that begin ‘http:’ but at the end of the day there will always be an error factor.

Regardless of this I was keen to push on and find out how many of these urls were pingable using Apps Script. So here is my next bit of code to ping a spreadsheet of urls:

Using this I found that the UrlFetchApp service would hang if there were over 200 urls and not push it’s results into the sheet so I found that I had to manually adjust the for loop (line 5) to do batches of urls at a time.

The results for this first pass are in the embedded table below (also in this Google Spreadsheet). It’s notable that almost half of the entries don’t have website urls associated with projects. As mentioned earlier not all of the projects entered are externally funder projects a proportion are internal activities.

As there were also quite a few invalid website url entries (n. 32) I tided these up manually (a number of ‘tbc’ and other text entries), and ran again as a 2nd pass also summarised in the table. In the second pass I did some manual checking of the project entries returning 404 and 590 errors.

The 590 errors are all associated with project websites hosted on the HEA website associated with the OER programme. An example is http://www.heacademy.ac.uk/projects/detail/OER_IND_Bradford which returns a HEA themed page with no content. It appears these urls have been entered incorrectly or the HEA have changed the structure of their site as the following url does work  http://www.heacademy.ac.uk/projects/detail/oer/OER_IND_Bradford.

The manual checks only identified 1 or 2 mistaken missing urls indicating that the UrlFetchApp, which automatically follows redirects, is accurate.   


PRODing a different data source

Another data source which has JISC project data is the CETIS PROD directory. This pulls a number of data sources including PIMS and is given some extra TLC from CETIS staff who curate the data manually adding extra pieces of information. Like PIMS, PROD has an API to get data out, but as far as I could see there was no way to get all the data.

I was original made aware of PROD via JISC CETIS’s Wilbert Krann (@wilm) and so a few friendly tweets later I ended up with the following query for the TALIS/PROD data store (I should say I know very little about Linked Data/SPARQL so rather than show my ignorance I’m not even going to mention it):

PREFIX prod: <http://prod.cetis.ac.uk/vocab/>
PREFIX doap: <http://usefulinc.com/ns/doap#>
PREFIX mu: <http://www.jiscmu.ac.uk/schema/muweb/>
PREFIX jisc: <http://www.rkbexplorer.com/ontologies/jisc#>
SELECT DISTINCT *
WHERE {
    ?s a doap:Project .
    ?s doap:name ?project .
    ?s jisc:end-date ?date .
    ?s doap:homepage ?homepage .
}

Just as with the PIMS example it was possible to use Apps Script to fetch the results using this getProdData code. [You’ll see that the fetch query url has been shortened using bit.ly. This is because as Tony discovered Google Apps Script doesn’t like lonnnnnngggg queries, but it is happy to follow redirects.

Below is a summary of the results from pulling and pinging the project homepage urls from PROD (also available in this Google Spreadsheet). There are more than half the number of projects returned from PROD (n. 430), but all the projects with homepage urls are valid without any clean-up. There are still over 30% of projects without a homepage url, but this doesn’t mean that the project doesn’t have some sort of web presence and the PROD data contains other urls if they exist for a JISC page, rss feed, wiki site etc.

Even though PROD gets some extra love and attention 13% of the recorded projects homepage urls hit dead ends (mainly 404 errors). Just as with the PIMS data I had a look at these in a 2nd pass and found that 5 projects actually had an alternative web presence (usually a hosted blog). 

 

Many eyes and many things

What can we take from all of this? A third of JISC funded projects don’t have a project website? One in ten projects with websites aren’t available for 3 years after the project ends? I don’t think it’s as conclusive as that. Those states are based on the assumption that all JISC funded projects have agreed to the general terms and that the general terms have remained unchanged. What interests me more is how the information can be improved and reused.

In terms of improving the quality of the data just as PROD adds an extra level of human curation there is potentially the opportunity to add a wider level of community curation similar to Wikipedia or IBM’s Many Eyes. The challenge is less technical (in this sheet I’ve added a comment link for each entry which links to a comment form) and usually finding the right incentive to get contributions.

In terms of reuse I have one simple use case in mind and I’m sure there are many more. In a couple of hours it was possible to pull this data into a Google Spreadsheet and ping project website urls. It would only be a tiny step to not only automate this but trigger an email to someone when a website went off the radar.

[Here’s a link to the Google Spreadsheet with all the data I pulled]

Free Google Apps Script solutions for education from simpleappssolutions.com

One of the resources I regularly dip into when Apps Scripting is James Ferreira’s
simpleappssolutions.com. When introducing people to Google Apps Script I usually promote James’s work because he offers free consultation to education and non-profits. Today James sent me the following email and with his permission I’m sharing it here:

Hello,

I’ve been giving this a lot of thought and helping you get your job done is really where my heart is at. Working as a public servant for my day job I know that it is often difficult to get anything past purchasing. In my opinion this should not be a road block for you in getting better solutions to help accomplish your mission. Therefore, I am offering up all the apps I’m writing for free to Schools, Non-Profits and Gov. I’m also looking for more apps that can be universally applied to add to the library, suggestions? Please forward this email to teachers as there are many things I can help them automate such as testing and Google Sites.

My latest apps Print All for Gmail and Email Archiver are very easy to install and use and it would be great if you could give them a try. Want to write these kinds of apps yourself? Check out my Script Example site or visit the Google Apps Script help forum.

Donations to keep me fuelled with coffee for late night code writing sprees are gladly accepted. I am still working with Google on a book about Google Apps Script, my main programming language, and with some luck it will be published by O’Reilly.    

Keep up the good work you do.

Thank you,
James Ferreira
http://simpleappssolutions.com

In a follow up email James also highlights:

Google Apps is such a great tool for educators and as you know, Apps Script makes it possible to extend those core services to do just about anything. I would very much like to see educators use Apps Script for teaching basic programming and because it has the same collaborative features as the rest of the Apps suite a teacher could essentially guide students as they work. Have a look at Andy a UI I put together for Google I/O. Can you imagine the fun students would have making their own mini games?

I have also put together a template for making interactive websites that seems perfect for education.   

This resonates they way I see Apps Script too ;)

Exporting more friends or followers from Twitter to a Google Spreadsheet

Back in March 2011 I published Export Twitter Followers and Friends using a Google Spreadsheet. One of the problems with this solution is if you were trying to import a lot of friends or followers users would get the error message “Exceeded absolute-time timeout in milliseconds”. This is because there is a limitation to the amount of execution time each Google Apps Script gets.

Following feedback from users (the nature of my work is perpetual beta so if something isn’t working out for you just ask ;) I’ve come up with a workaround. If you have already setup the original export friend/follower spreadsheet you need to open it and select Tools > Script editor… and paste the following code into it and save.

To use this new code you need to enter the following information directly into the script editor where prompted:

  • sheetName – sheet name you want to import the data to;
  • friendOrFo – whether you want ‘friends’ or ‘followers’; and
  • optScreenName – the screen name of the person

To run the script Run > getLotsOfFriendAndFo manually as many times as needed (the script will import 100 friends/followers at a time) or if you want to go away and make a cup of tea use Triggers > Current project triggers and add getLotsOfFriendAndFo as a time-driven function to run every minute (once you’ve finished your tea best to delete this trigger).

If something goes wrong during the import you can start over manually deleting any data and running resetGetLotsOfFriendAndFo (this resets the cursor used with the Twitter API).

Any additional comments welcome ;)

What I’ve starred this month: July 28, 2011

Here’s some posts which have caught my attention this month:

Automatically generated from my Google Reader Shared Items.

Google Spreadsheets as a lean mean social bookmark/share counting machine

It’s been a whole social metricy couple of weeks. First there was UKOLNEIM where I got a chance to chat to Martin Weller about digital scholar dashboards, and yesterday I flicked through Nicola Osborne’s Listen, Repeat, Learn: How to use Social Media Conversations to Measure and Demonstrate Impact and Improve Engagement session at IWMW11.

In Nicola’s presentation she mentioned that they use a spreadsheet to aggregate stats for EDINA online activity (page views, social reaction etc). This fitted in nicely with the ideas I was forming after talking to Martin Weller about centralising this type of information for digital scholars. There are already services like PostRank and Readermeter which can be used to start aggregating different part of this web of data, but I’ve bigger and more personal ideas in mind.

Perhaps not surprising my glue to prototype this is Google Spreadsheets. Yesterday I started looking at pulling social bookmark/share counts for specified urls. What I’ve come up with (or found from elsewhere) is a bunch of Google Apps Script snippets for getting social bookmark/share counts as cell formula.

This script has already been added to this spreadsheet. If you look in cell D2 you’ll see the formula =getFacebookLike(B2). What this does is look up the Facebook API to see how many times the url in B2 has been liked. Here’s the complete list of formula this spreadsheet has access to:

  • getFacebookLike(url)
  • getFacebookShare(url)
  • getFacebookComment(url) 
  • getTweetCount(url)
  • getBuzzCount(url)
  • getPlusones(url) – this one was published by Tom Anthony
  • getLinkedInCount(url)
  • getDiggCount(url)
  • getDeliciousCount(url)
  • getStumbleCount(url)

Note: To use these formula you must have a copy of the Apps Script code mentioned earlier or make a copy of the spreadsheet

Limitations

It’s worth noting that if you start using these formula in a lot of cells you’ll start getting ‘-‘ as the returned value. This is because the code uses the UrlFetch service which appears to be rate limited. So if you have a sheet with over 100 of these formula in it, when it opens after Thinking… about the cell value and fetching the individual stats from the various service APIs it runs out of UrlFetches and can’t get anymore data (it appears the sheet also cache results but I don’t know for how long).

Where next

For what I have in mind I need over 100 calls to these formula so I’ll be looking for a different method, which might include using the Shared Count API which would aggregate a number of these stats into one call. 

Some things I learned along the way

I hate handling callbacks and this is how you create md5 hashes

About

This blog is authored by Martin Hawksey Google+

JISC CETIS Learning Technology Advisor (OER Programme Support)
jisc cetis logo

The MASHezine (tabloid)

It's back! A tabloid edition of the latest posts in PDF format (complete with QR Codes). Click here to view the MASHezine

Preview powered by:
Bluga.net Webthumb

The MASHebook

You can also download this post as:

Subscribe to monthly email digest of posts

Loading...Loading...


Subscribe to per post email updates

Enter your email address:

Delivered by FeedBurner

Copyright License

Creative Commons Licence
This work is licensed under a Creative Commons Attribution 3.0 Unported License. CC-BY mhawksey

Privacy /Cookies

This blog uses Google Analytics (which makes use of 'cookie' technologies) to provide information on usage. Here's an overview of Google Analytics Privacy and how to opt-out (other 3rd party services like Twitter might also be tracking you via this site, but as far as possible I try and prevent this by removing official tweet buttons).

Badges

. . .