Lost in format: html, rss, pdf, email, epub, mobi, Kindle It ~:-s

This morning I’ve managed to confuse myself over which are the best formats and ways to publish this blog. So in this post I review what I already use and show how you can embed ePub, mobi and Kindle links into your blog using the FiveFilters.org Kindle It service.

I’ve always been interested in discovering new ways to automatically publish in as many different formats. I can’t remember in which order these developments came about but there was:

  • HTML – simple rendering of WordPress posts in html which over the years have been wrapped in various themes include a mobile friendly WPTouch format
  • RSS – another out-of-the-box WordPress (and other blog/news site staple)
  • PDF (aka MASHezine) – this is newspaper format of my last 10 posts. Originally I used to use HP’s tabbloid service to get an emailed PDF of my RSS feed which I manually uploaded, then I developed a Make Tabbloid wordpress plugin which used their API to automatically do this. … until with no notice they pulled their API. In the end this was a good thing as I rewrote the plugin using the open source FiveFilters.org PDF Newspaper.
  • Email – at my time at the RSC when I was out and about it was very clear that a lot of people didn’t use or understand RSS feeds (commoncraft RSS in plain English). Aware of not wanting to overburden overworked academic staff with my ramblings I use the MailPress WordPress plugin to distribute a monthly digest wrapped in a custom theme. Here’s an example of last months. If you want to subscribe to this visit my blog (full not mobile) and there is a box half way down right-hand-side (as I’m no longer at the RSC I’m less caring towards overworked academics there is also an option for a by post email update ;)

This is where I start getting confused. As I’m not an ebook reader user I don’t really know the best way for you to consume my content. Maybe you use the free Calibre ebook management software to convert your favourite sites into ebook format and sync with your device? Maybe there is a Kindle service you use to do this?

I’ve gone through a couple of ebook services in the past. First there was FeedBooks which you could get a RESTful url to the latest posts from your RSS feed in mobi/ePub/Kindle formats (this feature was pulled by feedbooks). Then I experimented with NewsToEbook, but this takes you off site unless you manually update the links to the cached output. Recently I had quick look at dotepub.com which has a widget you can drop in to your website (or you could do something with the dotepub API), but you are limited to ePub format.

Instead I’ve returned to another FiveFilters.org offering called Kindle It. Here’s why:

  • As well as Kindle it can generate ePub and mobi
  • For ePub and mobi it looks like I can use a RESTful url (i.e. I enter it once and Kindle It does the rest of the work keeping the output up-to-date)
  • The service allows you to email a post straight to your kindle (Use case scenario I have in mind is: Jeff is browsing my blog, spots one of my verbose posts and wants to read it later. Clicking on Kindle It he is able to send it to his Kindle for reading on the train)

If you’d like to use Kindle It on your own site below is the snippet of code I used which automatically passes the current page url to Kindle It. Update: I’ve written this little widget which I can call with <script type="text/javascript" src="http://mashe.hawksey.info/script/kindle-it.js"></script>

So you ebook users does this option work for you or is there a better way?

What I’ve starred this month: September 28, 2011

Here’s some posts which have caught my attention this month:

Automatically generated from my Google Reader Shared Items.

Using the Viralheat Sentiment API and a Google Spreadsheet of conference tweets to find out how that keynote went down

Did you tune into into Donald Clark’s “Don’t Lecture Me!” keynote at ALT-C 2010 or were you at Joe Dale’s FOTE10 presentation? These presentations have two things in common, both Donald and Joe posted their reflections of a ‘hostile’ twitter backchannel (see Tweckled at ALT! and Facing the backchannel at FOTE10) and I provided a way for them to endlessly relive their experience with a twitter subtitle mashup, wasn’t that nice of me ;) (see iTitle: Full circle with Twitter subtitle playback in YouTube (ALT-C 2010 Keynotes) and Making ripples in a big pond: Optimising videos with an iTitle Twitter track)

Something I’ve been meaning to do for a while is find a way to quickly (and freely) analysis a twitter stream to identify the audience sentiment. You can pay big bucks for this type of solution but fortunately viralheat have a sentiment API which gives developers 5,000 free calls per day. To use this you push some text to their API and you’ll get back the text mood and probability that the sentiment detected is correct (more info here).

So here is a second-by-second sentiment analysis of Donald and Joe’s presentations, followed by how it was done (including some caveats about the data).

Donald Clark – Don’t Lecture Me! from ALT-C 2010


Open Donald Clark’s ALTC2010 Sentiment Analysis in new window

Joe Dale – Building on firm foundations and keeping you connected in the 21st century. This time it’s personal! FOTE10


Open Joe Dale’s Sentiment Analysis in new window

How it was done

Step 1: formatting the source data

This first part is very specific to the data source I had available. If you already have a spreadsheet of tweets (or other text) you can skip to the next part.

All viralheat needs is chunks of text to analyse. When I originally added the twitter subtitles to the videos I pulled the data from the twitter archive service Twapper Keeper but since March this year the export function has been removed (BTW Twapper Keeper was also recently sold to Hootsuite so I’m sure more changes are on the horizon). I also didn’t get a decent version of my Google Spreadsheet/Twitter Archiver working until February so had to find an alternate data source (To do: integrate sentiment analysis into twitter spreadsheet archiver ;).

So instead I went back to the subtitle files I generated in TT-XML format. Here’s an example line:

<p style="s1" begin="00:00:28" id="p3" end="00:01:01" title="Tweeted on 01 Oct 2010 14:45:28">HeyWayne: A fantastic talk by ??? @mattlingard #FOTE10 [14:45GMT]</p>

The format is some metadata (display times, date), then who tweeted the message, what they said and a GMT timestamp. The bits I’m interested in are the message and the date metadata, but in case I needed it later I also extracted who tweeted the message. Putting each of the <p> tags into a spreadsheet cell it’s easy to extract the parts I want using these formula:

Screen name (in column B)

  • =MID(A2,FIND(">",A2)+1,(SEARCH(":",A2,FIND(">",A2))-FIND(">",A2)-1))

Tweet (in column C)

  • =MID(A2,FIND(">",A2)+3+LEN(B2),LEN(A2)-FIND(">",A2)-LEN(B2)-16)

Date/time (in column D)

  • =VALUE(MID(A2,FIND("Tweeted on",A2)+11,20))

You can find more information about the cell formula used elsewhere but briefly:

  • MID extracts some subtext based on start point and length;
  • FIND finds the first occurrence of some text in text
  • SEARCH is used to find the first occurrence of some text after a specific point (in this case I knew ‘:’ marked the end of who tweeted the message but if I used FIND it would have returned the position of the : in begin=
  • LEN is the number of characters in a cell
  • VALUE is used to convert a text string into another format like a number or in this case a date/time

  This gives me a spreadsheet which looks like this:

Extracting old tweets

Step 2: Using Google Apps Script to get the sentiment from viralheat

Google Apps Script is great for automating repetitive tasks. So in the Tools > Script editor… you can drop in the following snippet of code which loops through the text cells and gets a mood and probability from viralheat:

Step 3: Making the data more readable

If the script worked you should have some extra columns with the returned mood (positive or negative) and a probability factor. To make the data more readable I did a couple of things. For the mood I used conditional formatting to turn the cell green for positve, red for negative. To do this select the column with the mood vaules and then open Format > Conditional formatting and add these rules:

Conditional formating 

In the examples above you’ll see I graphed a sentiment rating over time. To do this I converted ‘negative’ and ‘positive’ into the values –1 and 1 using the formula where the returned mood is in column E:

  • =IF(E2="positive",1,-1)

I also wanted to factor in the probability by multiplying the value by the probability by adding:

  • =IF(E2="positive",1,-1)*F2 (where the probability factor is in column F)

These values were then accumulated over time.

Using the time and accumulated sentiment columns you can then Insert > Chart, and if not already suggested, use the Trend > Time line chart.

Insert time line chart

One last trick I did, which I’ll let you explore in the published spreadsheets I’ll link to shortly, is extract and display certain tweets which are above a threshold probability.

As promised here are the spreadsheets for (you can reuse by File > Make a copy):

Some quick notes about the data

A couple of things to bear in mind when looking at this data

Noise – All of the analysed tweets don’t  necessarily relate to the talk so if someone didn’t like the coffee at the break or someone is tweeting about the previous talk they liked that will effect the results

Quoting presenter – Donald’s talk was designed to get the audience to question the value of lectures so if he made negative statements about that particular format that were then quoted by someone in the audience it would be recorded as negative sentiment.

Sometimes it’s just wrong – and let not forget there maybe times when viralheat just get it wrong (there is a training api ;)

There is probably more to say like is there a way to link the playback with a portion of the sentiment chart or should I explore a way to use Google Apps Script and viralheat to automatically notify conference organisers of the good and bad. But that’s why I have a comment box ;)

Embedding .mp3 audio files into Google Sites

I was preparing this for someone else but thought I’d share here. Here’s how to embed a .mp3 audio file in Google Sites.

On the page you want to embed audio enter edit mode and move the cursor to where you want the audio player to appear. Then from the Insert menu, at the very bottom select More gadgets…

Insert > More gadgets

In the search box enter ‘Google Audio Player’ and click Search.

Select google Audio Player

A couple of options should appear. There isn’t an official player gadget as such but ‘google Audio Player’ works well (it’s just a wrapper by Philippe Chappuis for an audio player Google uses in Google Reader – here’s the xml).

For the .mp3 file you need the location of it online. If you haven’t got an online location for it you can upload the file to the site via the Manage Site menu but you might have problems with upload limits). I usually untick the ‘Include a border around gadget’

Enter the file location

Click OK and save the page and that’s it …

Save the page

Evernote – IS a personal e-portfolio solution for students!

evernote-remember-everything.jpg

Back in April 2009 I posted  Evernote – a personal e-portfolio solution for students?. In the post I highlighted how the features of this young start-up potentially made it a nice solution for a FREE ‘personal’ e-portfolio (that is, removed from the shackles of institutionally bought systems). At the time though I did point out some potential shortcomings:

  • lack of mobile application for non iPhone/iPod Touch and Windows Mobile users
  • an easy way to privately share assests
  • notes are stored in proprietary Evernote format
  • the limit to only uploading pdf documents with the basic free service 

Over time these original issues have been whittled down.

Mobile - In May 2009 it was announced Evernote for BlackBerry Is Here and then in December Evernote for Android: It’s here! and there have been been numerous software updates and enhancement for tablet devices when they come along.

Sharing – From January 2010 there have been several updates adding note sharing with Mac, web, Windows and mobile apps. Sharing isn’t done privately instead using ‘security by obscurity’ (having publically available notes accessed via an obscure url). Update: Oops You’ll see from the comment below that it is possible to share notebooks privately. From the sharing knowledge base:

Evernote allows both free and premium users to share notebooks privately with other Evernote users. Notebooks shared by premium users have the option of being editable by the users with whom the notebook is shared. In other words, if Bob the premium user shares a notebook with Fred the free user, Bob may choose to allow Fred to edit the contents of his shared notebook.

Export – When I started presenting Evernote as a personal e-portfolio system back in 2009 one of the questions I usually got asked is how could a student back-up or export notes stored on Evernote servers. At the time the desktop clients for Mac and Windows, which synchronise with Evernote so that you always have a local and remote copy of your files, could export your notes in a proprietary XML format. This meant you could import them into another Evernote account but that was it. In May 2009 Evernote however started rolling out html export for single or batches of notes starting with Mac (May 2009) and eventually getting around to Windows (November 2010).

File types – Back in April this was the deal breaker for me. With the free account you could only upload text, image, audio and PDF files. Having a place to also backup word documents and other electronic resources as well as making this searchable was the one thing I thought would put most tutors off of suggesting Evernote as a tool for their students. Fortunately this month (September 2011) Evernote announced that they had Removed File Type Restrictions for Free Accounts.

So what’s left? Will you be recommending Evernote to your students?

PS Here’s a collection of links from Purdue University on Evernote in Education and not surprisingly Evernote themselves ran an Evernote in Education Series.

PPS I recently downloaded the free Android App Droid Scan Lite which lets me snap and reshape pics of docs which I can then share to Evernote as a JPEG (Evernote OCR’s images to make them searchable ;)

The art of discovery: Looking at how UK Web Focus, OUseful.info and MASHe interconnect using Google Spreadsheets and NodeXL

UKWebOUMash-nolabels.gif

Moving away from social networks I wanted to discover how Brian Kelly’s UK Web Focus, Tony Hirst’s OUseful.info and my MASHe are linked together. I chose Brian’s and Tony’s blogs for this because they have highlighted my work in the past so I know I’ll get some data. The main reason for this exercise was to learn more about NodeXL (learning through enquiry).

So where to begin. The biggest lesson from the last week is forced network diagrams generally just need an edge list which contains a start and end vertex for each data point. So to see how the three respective blogs are linked together I need to build a list of links that take you from A to B.

Previously with my other work using Google Spreadsheets to capture social engagement I’ve imported the sitemap from a site to get a list of posts. I’ve also used Google Apps Script to iterate across this list fetching each post to get additional information. It’s therefore relatively straight forward to grab each post and extract all the links the author has made. (One of the tricks though is instead of getting the public post webpage which has lots of other links for navigation is to instead get the xml version. Fortunately Tony has recorded how to get a Single Item RSS Feeds from WordPress Blogs by appending ?feed=rss2&withoutcomments=1 to the post url (it also fortunate that all three of us use wordpress blogs)).

Google Spreadsheet showing link data for each postHaving added a cell of outbound links for each post we can use some Google Apps Script to build a list of As (the post url) to Bs (the outbound link), which for my blog looks like this. I was a little surprised that in 430 odd posts I’ve used over 6200 links which when plotted looks like this (makes you appreciate the job search engines have to do):

MASHe outbound links

Initially I tried importing this data (one for each blog) into NodeXL and filtering it only for the three respective blogs but it appears Excel only allows two custom filter criteria. You can get around this but doing some filters then deleting rows, but I found it easier just to add a couple of lines of Apps Script to filter the generated list (also because Tony and I have changed domain names in the last couple of years additional criteria are required for old domain urls).

Here’s the final data set of links between UK Web Focus, OUseful.info and MASHe. From this point you can pretty much follow my Getting started with the @WiredUK friends network post to generate a basic graph. Here’s what I came up with:

UKOUMash
[Key: red – MASHe; dark blue – UK Web Focus; light blue – OUseful.info

Initially because the nodes are urls adding labels completely obliterated the network diagram. NodeXL has an option to truncate labels but this would just give domain names so I added a lookup converting urls to post titles and truncated these instead –still not ideal.

Within NodeXL you can add tooltips and menu actions making navigation and analysis of the data a lot easier. Here’s a quick screencast to show you how this is done:

[It would be great if NodeXL had a way of publishing graphs whilst maintaining some of this interactivity, a bit like the way I can embed basic Twitter networks using the Hirst-Hawksey Protovis (Friendviz) Google Gadget. After I tip from Tony I had a look at the D3.js library which has superseded Protovis. I had a go at changing the data source in this example by adding a custom column to the Edges sheet in NodeXL with ="{source: """&[@[Vertex 1]]&”"”, target: “”"&[@[Vertex 2]]&”"”, type: “”licensing”"},” but which generated something – more tweaking required]

Here’s a clean view of how the posts are linked. The grouping on the left are these collection of posts we did on the Twitter Video Subtitling concept, which show a progression and branching of the topic.

UKWebOUMash-nolabels

Exploding the right cluster into a grid view (shown below) reveals that while there are post threads these are clusters of 3/4 posts (again this is a bit meaningless because I’ve got no way of labelling the nodes). The remaining posts have a single edge.

Right cluster

What does this all mean?

Who cares, pretty pictures ;) I was more interested in learning the process, practical application to follow. Unless you have any ideas?

If you want to do similar make a copy of this spreadsheet and have a look at Tools > Script editor… and here’s also a copy of the NodeXL file and as .graphML

PS Turns out Tony was doing something similar about a year ago (doh!). Tony said:

Here’s a script i tried once to look at internal trackbacks in wordpress…http://bit.ly/cLSQIu

I even considered trying to start thinking “academically” around it at one point… http://bit.ly/cgk7zO

original reason for me looking at graph was to in uncourse justification context http://bit.ly/oq4IpB

[I’ve pre-expanded those links for trackback ;)]

PPS Also Brian has related a guest post today Web archives: more useful than just a ‘historical snapshot’

Using Google Spreadsheet/Apps Script and Google Social Graph to get Twitter edges for visualizing in NodeXL and Gephi

GraphImage.png

One of the things I really liked about the network analysis and visualisation tool NodeXL which I wrote about last week was the built-in tools for grabbing data from Twitter. I said:

The advantage of NodeXL, particularly for graphing Twitter communities, is it has built-in features for grabbing the data for you. Not only that the coding is clever enough to handle the data collection for mere mortals, so when you hit your rate limit NodeXL waits until it should be able to get more data.

What I didn’t mention at the time was that it can take a long time to get complex network data (as in set it running overnight) and also I was having problems getting this to work. I haven’t looked closely at how NodeXL is generating the data but say for example I wanted to find out if the people I followed also followed each other. So I can get all the user ids of the people I follow using https://dev.twitter.com/docs/api/1/get/friends/ids which give me 497 ids.

To see if @psychemedia also follows any of these I could get his list of friend ids and see how many id’s match. Assuming everyone I follow has less than 5000 friends (this is the maximum the Twitter API can return in one call) I can do this in 496 API calls. NodeXL also captures more user information (friend/follower counts, bio etc) which it can do in batches of 100 using https://dev.twitter.com/docs/api/1/get/users/lookup

So in summary:

  • 1x my following ids
  • 496 x who are my friends following (but more if any of my friends follow more than 5000)
  • 4 x details user info

Which means at least 501 calls (with a 350 api calls per hour this has to be done in two batches). And if you want to look beyond that and seeing who are the friends of your friends it’s a lot more.

Fortunately there is a way to get this information a lot quicker. A separate API which is mentioned at the very end of Tony’s Visualising Twitter Friend Connections Using Gephi is the Google Social Graph API. The Social Graph API “makes information about public connections between people easily available and useful” and importantly includes connection information from Twitter. Here’s a demo page from Social Graph using my Twitter id as a starting point.

As I’ve already Ported: Tony Hirst’s Using Protovis to Visualise Twitter Connections which uses Social Graph to a Google Spreadsheet it was a quick hack to modify this to take a list of twitter usernames your interested in and create a two column edge list required by NodeXL, Gephi and other network visualisation tools.

How-to get Twitter edges using Google Social Graph

Before I start I should say the endpoint in this is using NodeXL but Gephi is also able to import the edge csv file generated with this spreadsheet.

  1. Grab a copy of this Google Spreadsheet template for Get Twitter Edges from Screen Names

    If you are having problems getting a copy of the template try opening this spreadsheet and File > Make a copy
  2. Paste a list of twitter usernames you want to graph connections for in the source sheet (I’ve a collection of Google Spreadsheets I’ve developed which can be used to get friends/follows, search terms and more. I also recently wrote about SocialBro which can do a .csv export of usernames (but as pointed out by Angelo you can’t export other peoples data)) If you want to play along here’s a list from #eas11
  3. In the Twitter Edges spreadsheet open Tools > Script editor…  and follow the instructions for registering for a Twitter API key and secret (if you have used any of my other Twitter/Google Spreadsheet mashups you can use the same key/secret).
  4. Once you’ve entered the key and secret don’t forget to Run > authorize
  5. Next on line 24 you need to enter the level of data. For this example lets stick with 1.5
  6. Now Run > getConnections.
  7. Once the the yellow status bar with Running getConnections disappears close the script window.
  8. Back in the spreadsheet view open the ‘Edges’ sheet which should be populated with data
  9. Select File > Download as … > CSV (Current Sheet)
  10. Start a new NodeXL template
  11. Open the CSV in Excel and copy the edges data to your new NodeXL template Edges sheet
  12. From the NodeXL ribbon menu click on Prepare Data (left hand side) and select ‘Merge Duplicate Edges’ and then again from Prepare Data select ‘Get Vertices from Edge List’

And now you can do the rest of your analysis of the data (if you haven’t generated Twitter network visualisations in NodeXL before you can follow parts of the the basic recipe here).

and here’s one I made earlier

#eas11 hashtag community

One very important caveat – the data from Social Graph isn’t 100% accurate and as I posted in Social media wars: Measuring the battle lines since Google+ has come along this data might be becoming less reliable.

PS In making this I found that when I passed 50 usernames to the Social Graph API I only got data back for 15 usernames. I wasn’t able to find any documentation on why. Reducing the call to 10 names at a time seemed to work (but means there are bugs in my other Social Graphs bits and pieces:(

Some Google webinars exploring Google Apps Script in Education

If your interested in learning more about Google Apps Script in Education Google are doing some webinars as part of their Google Apps for Education series. Here are a copy of the dates/times that were emailed to me (seminar are usually delivered in the afternoon American time so if it’s past your bedtime you might prefer to catch the recording. It’s also interesting to see Google push the Chromebook for Education):

Tuesday, September 6, 2011
Digital Science Notebooks part 2
In this webinar, you will learn how to use forms, spreadsheets, charts, and docs to create a digital science notebook.
Register

Friday, September 9, 2011
Chromebooks for Education Overview
Join us for a deep dive on Chromebooks for Education, new computers that offer a faster, simpler and more secure experience with fewer IT hassles. During this web seminar we’ll explore the use of Chromebooks in the classroom as part of 1-to-1 or 1-to-many programs and describe the Chromebook management capabilities and support offering from Google.
Register

Tuesday, September 13, 2011
Google Apps Scripts tutorial
In this webinar you will walk through mail merge and calendar creation tutorials  using Google Apps Scripts.
Register

Wednesday, September  21, 2011
Automating school processes with Google Apps Scripts
This session will showcase several examples from schools who are using Google Apps Scripts to automate workflows using docs, spreadsheets, forms, calendar, email, and sites.
Register

Monday, September  26, 2011
Chromebooks for Education Overview
Join us for a deep dive on Chromebooks for Education, new computers that offer a faster, simpler and more secure experience with fewer IT hassles. During this web seminar we’ll explore the use of Chromebooks in the classroom as part of 1-to-1 or 1-to-many programs and describe the Chromebook management capabilities and support offering from Google.
Register

Monday, September  26, 2011
Introduction to javascript for Google Apps Scripts
Learn the basics of javascript to understand the tutorials and get started coding your own automated Google Apps Scripts.
Register

You can also access all past webinars, slide presentations, and Q&A transcripts on the Google Apps for Education Webinars Page.

We hope to see you there!
Dana Nguyen Google Apps for Education Team

Twitter network analysis and visualisation II: NodeXL – Getting started with the @WiredUK friends network

The other tool that I got wind of just after SocialBro was Network Overview, Discovery and Exploration for Excel – NodeXL. As indicated in the title NodeXL is an add-on for Microsoft Excel (Windows version) but the code is free and open source. Here’s the description from the website:

For a while I’ve been admiring Tony Hirst’s work visualising large networks like Twitter communities using the open source and cross-platform tool Gephi. Tony has lots of great posts for getting you started with Gephi including Visualising Twitter Friend Connections Using Gephi: An Example Using the @WiredUK Friends Network.

I’d been put off cooking something up myself until now because a) Tony has been doing a great job and I couldn’t see what I could add b) large network visualisations need large amounts of data (Tony has previously published his Twitter Community Grabbing Code – newt.py, but as I’m not whitelisted with the Twitter API I only get 350 hits/hr and not 20,000 which can be somewhat of a hindrance when getting follower relationships).

The advantage of NodeXL, particularly for graphing Twitter communities, is it has built-in features for grabbing the data for you. Not only that the coding is clever enough to handle the data collection for mere mortals, so when you hit your rate limit NodeXL waits until it should be able to get more data. NodeXL also has “built-in connections for getting networks from Flickr, YouTube, and your local email. Additional importers for Exchange Email, Facebook, and Hyperlink networks are available”.  

To let you see how to use NodeXL and to allow me to make comparisons with Gephi I thought I’d re-run Tony’s WiredUK example (besides why should I break my habit of only ever building on Tony’s work ;).

In Tony’s original post the beginning (getting the data) is at the end. Fortunately with NodeXL we can start here. I’m assuming you’ve downloaded and installed NodeXL so we begin by starting a new template – I do this from the Windows Start menu and selecting the NodeXL Excel Template shortcut from the Microsoft NodeXL application folder. From the NodeXL ribbon select Import > From Twitter Users’s Network. In the import dialog box enter:

  • Get the Twitter Network of the user with the username: wiredUK
  • Add a vertex for each: Person followed by the user
  • Levels to include: 1.5
  • and what level of authentication you want to use

NodeXL - get data from a user's network

Once the data has been collected (you can see updates in the status bar of the import dialog box), when you click  ‘Show Graph’ you’ll get the raw form:

NodeXL - raw form

At this point Tony highlights that:

Sometimes a graph may contain nodes that are not connected to any other nodes. (For example, protected Twitter accounts do not publish – and are not published in – friends or followers lists publicly via the Twitter API.) Some layout algorithms may push unconnected nodes far away from the rest of the graph, which can affect generation of presentation views of the network, so we need to filter out these unconnected nodes. The easiest way of doing this is to filter the graph using the Giant Component filter.

NodeXL has some ‘Dynamic Filters’ that include bounding the graph by x and y which could be used to crop the image, but I couldn’t find a component filter

NodeXL - Dynamic Filters

Next Tony colours the graph using “the modularity statistic. This algorithm attempts to find clusters in the graph by identifying components that are highly interconnected.” NodeXL doesn’t have a built-in function for calculating ‘modularity’ but we can cluster nodes into groups using other algorithms, in this case Clauset-Newman-Moore. From the Groups menu make sure this algorithm is selected then click ‘Group by Cluster’

NodeXL - Group by Cluster

When you Refresh Graph you’ll see the nodes have been colour coded as per group.

NodeXL - Cluster colour applied 

If you navigate to the Groups sheet there is a column where this colour is set (the right-click to set the colour doesn’t work for me but with the cell highlighted you can use the color picker within the Visual Properties part of the ribbon (top-right of the screenshot below)):

NodeXL - group colour

In Tony’s example he says: “While we have the Statistics panel open, we can take the opportunity to run another measure: the HITS algorithm. This generates the well known Authority and Hub values which we can use to size nodes in the graph.” NodeXL doesn’t have a statistics panel as such but can calculate some but not as many metrics.

NodeXL - calculating metrics

Next Tony looks at graph layout. In NodeXL there aren’t as many options but enough to get started with (I stuck with Fruchterman-Reingo). To add Twitter IDs and have a varying node size we Autofill the Visual Properties. As NodeXL doesn’t have a HITS algorithm I’m using Betweeness Centrality (for an explanation of this see Sheila MacNeill’s Betweenness Centrality – helping us understand our networks post).

NodeXL - node size and labelling

Within the Graph Options there are some further adjustments you can do like changing the joining lines to curves and adjusting the label font (unfortunately the font-size is fixed, it’s just the node icon that scales relative to the betweenness centraility.

NodeXL - graph options

It’s still hard to see what is going on, but we have some more layout tricks. To start with we can layout graphs for groups in separate boxes and also adjust the strength of the repulsive force.

NodeXL - Layout options 

Once you’re happy if you right click on the graph there is an option to save it as an image.

NodeXL - save image

And here is the final result

NodeXL - WiredUK

and for comparison here’s what Tony produced

Which is better Gephi or NodeXL? For entry level (if such a thing exists given the number of different algorithms and theories in network analysis) NodeXL ticks a lot of the boxes. Its easy to grab data and do basic processing. If you want to do more you might want to switch to Gephi. The good news is NodeXL can export the data files in Gephi supported formats so potentially you can get the best of both worlds.

Twitter network analysis and visualisation I: SocialBro

There is still a free version of SocialBro here. You get more features if you go Premium, but I find the free version still has enough for me

Last week a couple of network analysis tools landed in my inbox and after having a quick play I thought they were worth highlighting here. In this post I’m going to have a quick look at SocialBro which is billed as a tool to ‘manage and analyse your Twitter Community.

The main features are:

  • Browse your Community Search your followers and friends using different criteria such as name, location and description.
  • Filter and Sort Number of followers, followers/friends ratio, frequency of tweets, account age, recent activity, language, time zone.
  • Easy Follow Back Tools Discover your new followers and easily follow them back
  • Easy Unfollow Tools Detect noisy friends, potential spammers, inactive friends and easily unfollow them.
  • Track Unfollows Detect who recently unfollowed you and you can easily unfollow them back.
  • Manage Twitter Lists Organize your followers and friends by creating Twitter lists with the search results.
  • Backup your Twitter Community Download all your Twitter followers and friends to a local database which you can consult even when offline.
  • Fast Communication Tweet and direct message from search results.
  • Visualize Statistical Information Time zone charts, languages charts, users by number of followers, users by recent activity, etc.
  • See your Community in Map Visualize the world wide distribution of your community in a map.

SocialBro Dashboard

You’ll find a number of these features in other web services, for example, I already use TwUnfollow.com to see who unfollows me. The big difference with SocialBro is it’s the first downloadable Twitter analytics tool I’ve seen (I’m ignoring the Archivist Desktop because while its great at downloading search terms it doesn’t have much in the way of analysis).

Being a downloaded software program has pros and cons. The main pro is you are downloading data for offline use. The main con is you’ll need to fire the application up to synchronise the data and how often you do this will effect the resolution of time based data (e.g. follower growth)

The client uses Adobe Air which gives it cross-platform support (if you are using TweetDeck Desktop you already have Air installed). Once the software is installed you need a beta account with SocialBro, which for me was processed very quickly. I noticed that each time you start the software it checks that you have an account with them. You can cancel this and still access your locally stored data but you can’t synchronise. I’m guessing once they get out of beta there’ll be a freeium or even just a premium model.

Watching the garden grow

As well as being able to export your friends and followers as a csv file there are a couple of build-in reports for ‘Best time to tweet’ and ‘Insights’.

Here’s my full ‘best time to tweet’ report from SocialBro. Something I’m not sure about is “the free version of “Best Time to Tweet” is generated by analyzing only your top 100 followers”. I’m not sure how they are categorising ‘top’ but I’m guessing they mean ‘last’. It’s interesting to note that SocialBro and online Twitter analytics service Crowdbooster have very similar best time to tweet matrix charts.

SocialBro - best time to tweet Crowdbooster - best time to tweet
Best time to tweet from SocialBro Best time to tweet from Crowdbooster

Here’s a link to my ‘insights report’. The pdf version has breakdowns for language and timezones. The client also includes a map overlay:

SocialBro - follower map

Weeding and seeding

As well as SocialBro giving you a overview and option to filter things like non-reciprocal relationships for your account you can also add additional data sources. These sources can be Twitter searches, other Twitter user’s friends/followers or Twitter lists. For example, I added the eas11-delegate list and I can see that of the 116 members I only follow 20 so their might be some interesting people to checkout in the remaining 96.

SocialBro - Add new source

Final thoughts

SocialBro has some great tools to help you manage and analyse your Twitter community particularly if you are managing a class or community account. Big question for me is how long will it stay free.  Perhaps it’s time to revisit my collection of Twitter Google Spreadsheets …

About

This blog is authored by Martin Hawksey Google+

JISC CETIS Learning Technology Advisor (OER Programme Support)
jisc cetis logo

The MASHezine (tabloid)

It's back! A tabloid edition of the latest posts in PDF format (complete with QR Codes). Click here to view the MASHezine

Preview powered by:
Bluga.net Webthumb

The MASHebook

You can also download this post as:

Subscribe to monthly email digest of posts

Loading...Loading...


Subscribe to per post email updates

Enter your email address:

Delivered by FeedBurner

Copyright License

Creative Commons Licence
This work is licensed under a Creative Commons Attribution 3.0 Unported License. CC-BY mhawksey

Privacy /Cookies

This blog uses Google Analytics (which makes use of 'cookie' technologies) to provide information on usage. Here's an overview of Google Analytics Privacy and how to opt-out (other 3rd party services like Twitter might also be tracking you via this site, but as far as possible I try and prevent this by removing official tweet buttons).

Badges

. . .