Twitter network analysis and visualisation II: NodeXL – Getting started with the @WiredUK friends network

The other tool that I got wind of just after SocialBro was Network Overview, Discovery and Exploration for Excel - NodeXL. As indicated in the title NodeXL is an add-on for Microsoft Excel (Windows version) but the code is free and open source. Here’s the description from the website:

For a while I’ve been admiring Tony Hirst’s work visualising large networks like Twitter communities using the open source and cross-platform tool Gephi. Tony has lots of great posts for getting you started with Gephi including Visualising Twitter Friend Connections Using Gephi: An Example Using the @WiredUK Friends Network.

I’d been put off cooking something up myself until now because a) Tony has been doing a great job and I couldn’t see what I could add b) large network visualisations need large amounts of data (Tony has previously published his Twitter Community Grabbing Code – newt.py, but as I’m not whitelisted with the Twitter API I only get 350 hits/hr and not 20,000 which can be somewhat of a hindrance when getting follower relationships).

The advantage of NodeXL, particularly for graphing Twitter communities, is it has built-in features for grabbing the data for you. Not only that the coding is clever enough to handle the data collection for mere mortals, so when you hit your rate limit NodeXL waits until it should be able to get more data. NodeXL also has “built-in connections for getting networks from Flickr, YouTube, and your local email. Additional importers for Exchange Email, Facebook, and Hyperlink networks are available”.  

To let you see how to use NodeXL and to allow me to make comparisons with Gephi I thought I’d re-run Tony’s WiredUK example (besides why should I break my habit of only ever building on Tony’s work ;).

In Tony’s original post the beginning (getting the data) is at the end. Fortunately with NodeXL we can start here. I’m assuming you’ve downloaded and installed NodeXL so we begin by starting a new template – I do this from the Windows Start menu and selecting the NodeXL Excel Template shortcut from the Microsoft NodeXL application folder. From the NodeXL ribbon select Import > From Twitter Users’s Network. In the import dialog box enter:

  • Get the Twitter Network of the user with the username: wiredUK
  • Add a vertex for each: Person followed by the user
  • Levels to include: 1.5
  • and what level of authentication you want to use

NodeXL - get data from a user's network

Once the data has been collected (you can see updates in the status bar of the import dialog box), when you click  ‘Show Graph’ you’ll get the raw form:

NodeXL - raw form

At this point Tony highlights that:

Sometimes a graph may contain nodes that are not connected to any other nodes. (For example, protected Twitter accounts do not publish – and are not published in – friends or followers lists publicly via the Twitter API.) Some layout algorithms may push unconnected nodes far away from the rest of the graph, which can affect generation of presentation views of the network, so we need to filter out these unconnected nodes. The easiest way of doing this is to filter the graph using the Giant Component filter.

NodeXL has some ‘Dynamic Filters’ that include bounding the graph by x and y which could be used to crop the image, but I couldn’t find a component filter

NodeXL - Dynamic Filters

Next Tony colours the graph using “the modularity statistic. This algorithm attempts to find clusters in the graph by identifying components that are highly interconnected.” NodeXL doesn’t have a built-in function for calculating ‘modularity’ but we can cluster nodes into groups using other algorithms, in this case Clauset-Newman-Moore. From the Groups menu make sure this algorithm is selected then click ‘Group by Cluster’

NodeXL - Group by Cluster

When you Refresh Graph you’ll see the nodes have been colour coded as per group.

NodeXL - Cluster colour applied 

If you navigate to the Groups sheet there is a column where this colour is set (the right-click to set the colour doesn’t work for me but with the cell highlighted you can use the color picker within the Visual Properties part of the ribbon (top-right of the screenshot below)):

NodeXL - group colour

In Tony’s example he says: “While we have the Statistics panel open, we can take the opportunity to run another measure: the HITS algorithm. This generates the well known Authority and Hub values which we can use to size nodes in the graph.” NodeXL doesn’t have a statistics panel as such but can calculate some but not as many metrics.

NodeXL - calculating metrics

Next Tony looks at graph layout. In NodeXL there aren’t as many options but enough to get started with (I stuck with Fruchterman-Reingo). To add Twitter IDs and have a varying node size we Autofill the Visual Properties. As NodeXL doesn’t have a HITS algorithm I’m using Betweeness Centrality (for an explanation of this see Sheila MacNeill’s Betweenness Centrality - helping us understand our networks post).

NodeXL - node size and labelling

Within the Graph Options there are some further adjustments you can do like changing the joining lines to curves and adjusting the label font (unfortunately the font-size is fixed, it’s just the node icon that scales relative to the betweenness centraility.

NodeXL - graph options

It’s still hard to see what is going on, but we have some more layout tricks. To start with we can layout graphs for groups in separate boxes and also adjust the strength of the repulsive force.

NodeXL - Layout options 

Once you’re happy if you right click on the graph there is an option to save it as an image.

NodeXL - save image

And here is the final result

NodeXL - WiredUK

and for comparison here’s what Tony produced

Which is better Gephi or NodeXL? For entry level (if such a thing exists given the number of different algorithms and theories in network analysis) NodeXL ticks a lot of the boxes. Its easy to grab data and do basic processing. If you want to do more you might want to switch to Gephi. The good news is NodeXL can export the data files in Gephi supported formats so potentially you can get the best of both worlds.

14 thoughts on “Twitter network analysis and visualisation II: NodeXL – Getting started with the @WiredUK friends network

  1. Excellent walk through, Martin:-) I've been keen to see how NodeXL works in practice, but I'm rarely in a Windows environment (which means I also miss out on Tableau ...;-)

    Are there other options among the dynamic filters? The x/y filter looks handy: I often find I have some nodes that whizz off to the sides (often unconnected nodes) which I usually prune with a filter on degree (i.e. I filter out nodes with degree 0). Something else I haven't worked out how to do in Gephi (no idea if it's possible) is to select a bunch of nodes and move them together (a bit of hand tweaking can often help the layout...) Does NodeXL let you drag nodes around, or does it just create a flat image? (I suspect the ability to act on node layout in Gephi is one of the reasons why it's sensitive in terms of memory on large networks.)

    Is there a way of sizing the labels in NodeXL? This can help make the network visualisation "glanceable" [ http://ouseful.open.ac.uk/blogarchive/010298.html ]

    PS can you inspect the macros the NodeXL uses? The ability to harvest data nicely from the Twitter API with hitting the rate limiter sounds interesting. Can it grab list memberships too?

    1. Post author

      It was interesting to follow your recipe with different ingredients. The big thing it highlighted for me was there is a whole world of social network analysis I'm not tapping into yet and I was even tempted to buy a book on it (something I haven't done since I taught myself 3D Studio Max).

      The dynamics filters are interesting. With the Twitter data I have options to filter on followed/follower counts, tweets, favourites, timezone offset and joined date. Once you've calculated the graph metrics (in-degree, out-degree) these also all become available filters. There is also an option to filter on node size. All of these can be controlled with sliders mapped to the distribution histogram.

      Node selection is also easy to do via a number of ways. You can click on nodes in the graph view (which is dynamic) or select nodes from the sheet by selecting cells. The cells can be from any of the sheets (vertices, edges, groups) This screencast http://screenr.com/eDWs demonstrates how you can do this (changes are lost when you refresh the graph).

      There's no way to scale the node label (that I've found anyway) but you can autofill the node image with the users twitter avatar which is scalable eg http://flic.kr/p/ahkekR It's only slightly improved glanceability and reliant on you knowing the community you are looking at.

      Here's the source code for NodeXL. I think it's all open source (I have had problems with the rate limit handling but the community support is 1st class). And yes it can grab lists too ;)

      Definitely some more personal discovery required in this area

      Martin

  2. I'm starting to think I maybe need to fire up my WIndows partition ;-) As far as books go, I think I need to find one two. I've got a copy of http://www.amazon.co.uk/Networks-Crowds-Markets-Reasoning-Connected/dp/0521195330/ that I keep dipping in to, but I'm not sure how much it focusses just on the SNA stuff? This new O'Reilly book - Mining the Social Web[ http://www.amazon.co.uk/Mining-Social-Web-Analyzing-Facebook/dp/1449388345/ ] - is also on my "to get" list becuase I suspect (and hope!) that it blends bits of theory with practical and pragmatic examples...
    In the meantime: bluffers guide [ http://www.orgnet.com/sna.html ], or tutorial [ http://www.faculty.ucr.edu/~hanneman/nettext/ ]

  3. Pingback:

  4. Pingback:

  5. Pingback:

  6. pooja

    hey how can we visualise the hashtag -converstaion based graph in node-xl that martin has generated?

  7. Pingback:

  8. Pingback:

  9. Pingback:

  10. Pingback:

  11. Pingback:

Comments are closed.