Start exploring Twitter’s Social Graph today a 4pm GMT from the comfort of your Windows PC/Excel #NodeXL

At 4pm GMT today (see in your own timezone) I’ll be doing a Google Hangout On Air with Marc Smith one of NodeXL’s creators and coordinators. NodeXL is a handy free add-in for MS Excel (Windows) allowing you to generate, explore and analyse network graphs. I started using NodeXL a couple of years ago as I found it was an easy way to extract a social graph (friend/follower relationships) from Twitter. In the session we’ll go through extracting a social graph from a Twitter hashtag and analysing the results.

You can view/participate in a number of ways. I’ll embed the livestream in this post, or you can see/join the hangout when it appears in my Google+ timeline. You can ask question via Twitter or Google+ using the combined hashtags #lak13 and #nodexl  e.g. ‘What the best place to find out more about NodeXL? #lak13 #nodexl’ (this is a LAK13 fringe event)

#LAK13: Recipes in capturing and analyzing data – Using SNA on Canvas Discussions with NodeXL (for when it’s not a SNAPP)

In my last post on Canvas Network Discussion Activity Data I mentioned I was a little disappointed to not be able to use social network analysis (SNA) modelling techniques on the Canvas Network discussion boards. My main barrier was accessing the data via the Canvas API using my preferred toolset. Fortunately Brian Whitmer at instructure.com (the company behind Canvas) posted a comment highlighting that as a student on the course it was easy for me to get access to this data using a token generated on my Canvas profile page. With this new information in this post I’ll cover three main areas:

  • a very quick introduction into techniques/opportunities for analysing threaded networks using SNA;
  • how I retrieved data from the Canvas platform for the #LAK13 discussions; and finally
  • some analysis using the NodeXL add-on for Microsoft Excel (Windows).

On Friday 1st March at 4pm GMT I’ll also be having a live Hangout on Air with Marc Smith, one of the original creators and continued project coordinator of NodeXL. The live feed will be embedded below embedded here and you can ask question via Twitter or Google+ using the combined hashtags #lak13 and #nodexl  e.g. ‘What the best place to find out more about NodeXL? #lak13 #nodexl’. For the hangout session we’ll look at how easy it is to use NodeXL to analyse a Twitter hashtag community in a couple of clicks. [The rest of this post is introducing a more advanced use of NodeXL so if I lose you in the rest of this post fear not as the session will be a lot easier going]

Opportunities for analysing threaded networks using SNA

Hello graph

WiredUK friend/follower graphOne of the dangers of SNA is people see images like the one to the right and miss the point of using this modelling technique (the image is from one of my first goes at using NodeXL so I’d be the first to admit there’s room for improvement). If you do there are couple of things to bear in mind. Often these images are exhaust, generated as a snapshot of a visual and/or exploratory analytic. A certain level of literacy is required to understand the underlying structure of the graph. Taking this last point a threshold concept for me was understanding that a basic concept of these graphs are more often than not constructed from an edge list which is essentially two columns of data representing and start point and end point to a node. For example, all the names in columns Vertex 1 and Vertex 2 are nodes and each row represents an edge between the nodes so in this case Beth –> Adam generates the highlighted edge.

Basic edge list and force layout

There is a lot more to understand about these types of graphs, but this basic concept means I know if I have any relationship data its easy to graph and explore.

Hello SNAPP

At this point it’s worth mentioning the browser plugin SNAPP.

The Social Networks Adapting Pedagogical Practice (SNAPP) tool performs real-time social network analysis and visualization of discussion forum activity within popular commercial and open source Learning Management Systems (LMS). SNAPP essentially serves as a diagnostic instrument, allowing teaching staff to evaluate student behavioural patterns against learning activity design objectives and intervene as required a timely manner.

Valuable interaction data is stored within a discussion forum but from the default threaded display of messages it is difficult to determine the level and direction of activity between participants. SNAPP infers relationship ties from the post-reply data and renders a social network diagram below the forum thread. The social network visualization can be filtered based upon user activity and social network data can be exported for further analysis in NetDraw. SNAPP integrates seamlessly with a variety of Learning Management Systems (Blackboard, Moodle and Desire2Learn) and must be triggered while a forum thread is displayed in a Web browser.

The social network diagrams can be used to identify:

  1. isolated students
  2. facilitator-centric network patterns where a tutor or academic is central to the network with little interaction occurring between student participants
  3. group malfunction
  4. users that bridge smaller clustered networks and serve as information brokers

The paper referencing SNA research supporting these areas was presented at LAK11 (if you don’t have access also available in the presentation’s slidedeck).  The paper Visualizing Threaded Conversation Networks: Mining Message Boards and Email Lists for Actionable Insights (Hansen, Shneiderman & Smith, 2010) also highlights simple ways to identify question people, answer people and discussion starters which are all potentially very useful within courses for identifying network clusters individuals might want to join/follow.

Retrieving data from Canvas

Hopefully with that quick intro you can see there might be some value in using SNA from threaded discussion analysis. Reading the SNAPP overview hopefully you spotted that it currently doesn’t support extracting data from Canvas discussion boards. This is an opportunity to understand some of the analysis SNAPP is doing behind the scenes.

Hello Google Apps Script

If you have been following my posts you’ll see that I favour using Google Apps Script as a lightweight tool for extracting data. Thanks to Brian (Instructure) I’ve got a way to access the Discussion Topics API. Looking at the API documents I decided the best way to proceed was to get all of the LAK13 discussion topics (top level information) and use this to get the full topic data. If you speak JSON we are essentially turning this:

Data as rendered in Canvas

into this:

Data in JSON

finally getting this (web version here):

Data in spreadsheet

The code to do this is available here. I’m going to spare you the details of the code but here are the instructions is you’d like to export data from other Canvas hosted discussion boards. If you’re not interested in that you can just jump to the next section.

Generating an edge list (extracting data) from Canvas to Google Sheets

  1. Create a new Google Spreadsheet and then in Tool > Script editor copy in the code from here
  2. If you are not pulling data from LAK13 you need to edit values in lines 2-4. If you visit your course homepage hopefully you can decode the url pattern based on the example for LAK13 https://learn.canvas.net/courses/33 (I should also point out you need to be enrolled on the course to receive data. Also read Canvas API Policy)
  3. Next you need an access token which is generated from your Canvas Profile Settings page. Scroll down to the bottom and click New Access Token, filling in a purpose and leaving expires blank. Make a copy of the token as it’s needed for the next step (I added a copy to a .txt file just in case the next step didn’t work
  4. Back in the Script Editor in Google Spreadsheets click File > Project Properties. In the ‘Project properties’ tab click  ‘+ Add row’ and replace (name) with access_token and (value) with the token you got from Canvas before clicking Save
    Project Properties
  5. Make sure everything is saved in the Script Editor and then Run > getCanvasDiscussionEdges, wait for the script to finish and on Sheet1 you should have a bunch of data to play with.

Using NodeXL to analyse Canvas Discussions

There are a number of different questions we could ask of the LAK13 data. The particular one I want to look at is who are the core community members stimulating/facilitating discussion (e.g. applying a connectivist theory who are the people you might want to connect with). To do this we need to (I’m assuming you’ve already installed NodeXL):

  1. Download the data extracted to the Google Spreadsheet (File > Download as > Microsoft Excel). [If you just want the data I’ve extracted here’s the download link – the data is automatically refreshed nightly]
  2. Open the download file in Excel and in the created_at column select all and Format Cells as General (I needed to do this because NodeXL was miss formating dates on import)
  3. Start a new NodeXL Template (I use the NodeXL Excel Template option from my windows Start menu)
  4. From the NodeXL ribbon you want to Import > From Open Workbook
    Import > From Open Workbook
  5. In the import dialog vert1_name and vert2_name are edges, anything else prefixed with ‘vert’ is assigned to the corresponding Vertex n property column and everything else is an  Vertex 1 property:
    import dialog
  6. Once imported you can open the Edges sheet, select the created_at column and Format Cells reassigns a date/time format.
  7. In Prepare Data chose ‘Count and merge duplicate edges’ and select Count and Vertex1 and Vertex 2
    Count and merge duplicate edges
  8. In the Graph section of the NodeXL ribbon we want to make this a directed graph (replies are directed) and choose you layout algorithm (I usually go Harel-Koren Fast Multiscale)
    make this a directed graph
  9. Next we want to prepare the data we want to analyse. In the Autofill Columns (within Visual Properties portion of the ribbon) set Edge Visibility to ‘topic_id’ and in Edge Visibility Options set ‘If the source column number is: Not equal to 558’ Show otherwise Skip (this will skip edges that are responses to the Pre-course discussion forum – I’ll let you question this decision in the comments/forum)
    Edge Visibility Options
  10. Click Ok then Autofill
  11. Next open the Vertices sheet and select all the rows (Ctrl+A) and from the Visibility option select ‘Show if in an Edge’
    Show if in an Edge
  12. Now find the row in the Vertices sheet for George Siemens and Skip (doing this were creating a ‘what if George wasn’t there’ scenario
    ‘what if George wasn’t there’ scenario
  13. Open the Graph Metrics window and add Vertex in-degree, vertex out-degree, Vertex betweenness and closeness centrality and Top items (in the Top items options you’ll need to add these as the metrics you want top 10s for), finally click Calculate metrics.
    Graph Metrics window

At this point you could use the calculated metrics to weight nodes in a graph, but for now I’m going to skip that. You should now have a Top Items sheet with some useful information. In the Betweenness Centrality list you should have these names:

  • Martin Hawksey
  • Simon Knight
  • Alex Perrier
  • Khaldoon Dhou
  • Rosa Estriégana Valdehita
  • Maha Al-Freih
  • Suzanne Shaffer
  • Maxim Skryabin
  • Bryan Braul
  • Peter Robertso

Excluding the pre-course discussion forum and George Siemens the discussions these people have engaged with provide the shortest paths to other people engaging in discussions on the Canvas site. Strategically these are potentially useful people  within the network that you might want to follow, question or engage with.

Getting to this point obliviously hasn’t been straight forward and had SNAPP been available in this instance it would have turned this in to a far shorter post. Programmatically using tools like R we could have arrived at the same answer with a couple of lines of code (that might be my challenge for next week ;). What it has hopefully illustrated is if you have data in an edge format (two column relationships) tools like NodeXL make it possible for you use SNA modelling techniques to gain insight. (I’m sure it also illustrates that data wrangling isn’t always straight forward, but guess what that’s life).

NodeGL for LAK13 discussionsSo at this point some of you might be wondering what does the LAK13 reply network look like. I could give you a flat image but why don’t you use my NodeXL online graph viewer to explore it yourself or download the data from the NodeXL Graph Gallery.

What I’ve starred this month: February 28, 2013

Here's some posts which have caught my attention this month:

Automatically generated from my Diigo Starred Items.

#LAK13: Recipes in capturing and analyzing data – Canvas Network Discussion Activity Data

In my last post I looked at the data available around a course hashtag from Twitter. For this next post I want to start looking at what’s available around the Canvas Network platform which is being used to host Learning Analytics and Knowledge (LAK13). Sizing up what was available I did come across the Canvas LMS API documentation, which provides a similar method of accessing data as the Twitter API. I wasn’t sure if this extended to Canvas Network but because the authentication method it uses (oAuth2) isn’t possible using my dev tools of choice (mainly Google Apps Script) I looked for something else.

Whilst browsing on the discussion page for the course I noticed that my browser was auto-detecting a feed:

image

an excerpt of this feed is below:

https://learn.canvas.net/courses/33/discussion_topicsLearning Analytics and Knowledge Discussion Feed
  2013-02-20T12:39:02+00:00
  	Discussion: Week 8 Discussion Forum
    tag:canvas.instructure.com,2013-02-01:/discussion_topics/discussion_topic_580
    2013-02-01T21:15:20+00:00
    2013-02-01T21:15:02+00:00
    	George SiemensDiscussion: Week 3 Discussion Forum
    tag:canvas.instructure.com,2013-02-01:/discussion_topics/discussion_topic_575
    2013-02-01T21:15:56+00:00
    2013-02-01T21:13:24+00:00
    	

      George Siemens

Looking at the raw feed I could see it wasn’t limited (often feeds only contain the last 10 entries) and contained the entire content of messages.

Update: Below I use Google Apps Script to extract the data. Since then I’ve created a ‘no code’ solution that only uses existing Spreadsheet formula. There is an accompanying presentation (the webinar should eventually appear here)

Looking for an easy way to consume this I first turned to the importFeed formula in Google Spreadsheet’s but unfortunately it only returned the last 20 results. A trick I’ve used in the past is to put feeds through Yahoo Pipes to get a JSON/CSV to work with, but as working with dates this way isn’t straight forward I opted for some Google Apps Script which would create a custom formula to fetch the feed from Canvas Network and enter the results into a sheet.  The 12 lines of code for the main part of this are below:

function getCanvasDiscussions(url) {
  var response = UrlFetchApp.fetch(url);
  var contentHeader = response.getHeaders();
  if (response.getResponseCode() == 200) {
    var d = Xml.parse(response.getContentText()).feed.entry;
    var output = [['published','updated','title','author','link','id','content']];
    for (i in d){
      output.push([getDateFromIso(d[i].published.Text),getDateFromIso(d[i].updated.Text),d[i].title.Text,d[i].author.name.Text,d[i].link.href,d[i].id.Text,d[i].content.Text]);
    }
    return output;
  }
}

The getDateFromIso is a subfunction I use quite often and is available in this stackoverflow answer. Adding the above code to a Google Sheet (via Tools > Script editor..) allows me to use a custom formula to fetch the data.

Below is a quick look at the data returned (here it is published in a table). In the columns we have publish dates, title, author, link, uri and post content. As it goes this isn’t too bad. The big thing that is missing is whilst we can see which topic the message is in the reply threading is lost.

Canvas Network Discussion Data

Even with this like the #lak13 Twitter dashboard from last week I can quickly add some formulas to process the data and get an overview of what is going on (for the live view visit this spreadsheet – File > Make a copy is you want to edit).

CanvasNetworkDashboard

This obviously isn’t a complicated analytic and it wouldn’t surprise me if the course tutors didn’t have something similar on the backend of Canvas Network. As a student it’s useful for me to see how I’m doing compared to others on the course and get a sense of who else is contributing. [Adam Cooper has a great post on How to do Analytics Right... with some tips he picked up for John Campbell who is behind Purdue’s Signals Project which fits in nicely here.]

Summary

So with a bit of ken and a couple lines of code I can see how the #lak13 discussions are going. Again I’ve avoided any deep analytics such as analysing what has been said, to who, at what time, but hopefully now that I’ve highlighted and freed the data you can do something else with it. Not being able to extract the conversation thread is a little disappointing as it would have been nice to fire up SNAPP or NodeXL, but I’ll have to save those for another day ;)

#LAK13: Recipes in capturing and analyzing data – Twitter

I’m enrolled on the Learning Analytics and Knowledge (LAK13) which is an open online course introducing data and analytics in learning. As part of my personal assignment I thought it would be useful to share some of the data collection and analysis techniques I use for similar courses and take the opportunity to extend some of these. I should warn you that some of these posts will include very technical information. Please don’t run away as more often than not I’ll leave you with a spreadsheet where you fill in a cell and the rest is done for you. To begin with let’s start with Twitter.

Twitter basics

Like other courses LAK is using a course tag hashtag to allow aggregation of tweets, in this case #lak13. Participants can either watch the Twitter Search for #lak13, or depending on their Twitter application of choice, view the stream there. Until recently a common complaint of the Twitter search is it was limited to the last 7 days (Twitter are now rolling out search for a small percentage of older tweets). Whilst this limit is perhaps less of an issue given the velocity of the Twitter stream for course tutors and students having longitudinal data can be useful. Fortunately the Twitter API (API is a way for machines to talk to each other) gives developers a way to use Twitter’s data and use it in their applications. Twitter’s API is in transition from version 1 to 1.1, version 1 being switched off this March, which is making things interesting. The biggest impact for the part of the API handling search results is the:

  • removal of data returned in ATOM feed format; and
  • removal of access without login

This means you’ll soon no longer to be able to create a Twitter search which you can watch in an RSS Feed Aggregator like Google Reader like this one for #lak13.

All is not lost as the new version of the API still allows access to search results but only as JSON.

 JSON (pron.: /ˈsən/ jay-sun, pron.: /ˈsɒn/ jay-sawn), or JavaScript Object Notation, is a text-based open standard designed for human-readable data interchange  – http://en.wikipedia.org/wiki/JSON

I don’t want to get too bogged down in JSON but basically it provides a structured way of sharing data and many websites and web services will have lots of JSON data being passed to your browser and rendered nicely for you to view. Let’s for example take a single tweet:

single tweets as displayed

Whilst the tweet looks like it just has some text, links and a profile image underneath the surface there is so much more data. To give you an idea highlighted are 11 lines from 130 lines of metadata associated with a single tweet. Here is the raw data for you to explore for yourself. In it you’ll see information about the user including location and friend/follower counts; a breakdown of entities like other people mentioned and links; and ids for the tweet and in reply to.

tweet metadata

One other Twitter basic that catches a lot of people out is the Search API is limited to the last 1500 tweets. So if you have a popular tag with over 1500 tweets in a day, at the end of the day only the last 1500 tweets are accessible via the Search API.

Archiving tweets for analysis

So there is potentially some rich data contained in tweets, but how can we capture this for analysis? There are a number of paid for services like eventifier that allow you to specify a hashtag for archive/analysis. As well as not being free the raw data isn’t also always available. My solution has been to develop a Google Spreadsheet to archive searches from Twitter (TAGS). This is just one of many other solutions like pulling data directly using R and Tableau the main advantage with this solution for me is I can set it up and it’s happy to automatically collect new data.

Setting this up to capture search results from #lak13 gives use the data in a spreadsheet.

spreadsheet of #lak13 tweets

This makes it easy to get overviews of the data using the built-in templates:

twitter summaryactivity over time

… or, as I’d like to spend the rest of this post, quickly looking at ways to create different views.

As you will no doubt discover using a spreadsheet environment to do this has pros and cons. On the plus side it’s easy to use built-in charts and formula to analyse the data, identifying queries that might be useful for further analysis. The downside is you are limited in the level of complexity. For example, trying to do things like term extraction, n-grams etc is probably not going to work. All is not lost as Google Sheets makes it easy to extract and consume the data in other applications like R, Datameer and others.

Using Google Sheets to import and query data

I’ve got a post on Feeding Google Spreadsheets: Exercises in using importHTML, importFeed, importXML, importRange and importData if you want to learn about other import options, but for now we are going to use importRange to pull data from one spreadsheet into another.

If you open this spreadsheet and File > Make a copy it’ll give you a version that you can edit. In cell A1 of the Archive sheet you should see the following formula  =importRange(“0AqGkLMU9sHmLdEZJRXFiNjdUTDJqRkNhLUxtZE5FZmc”,”Archive!A:K”)

What this does is pull the first couple of columns from this sheet where I’m already collecting LAK13 tweets (Note this techniques doesn’t scale well, so when LAK starts hitting thousands of tweets you are better doing manipulations in the source spreadsheet than using importRange. I’m doing it this way to get you started and try some things out).

FILTER, FREQUENCY and QUERY

On the Summary sheet I’ve extended the summary available in TAGS by including weekly breakdowns. The entire sheet is made with a handful of different formula used in slightly different ways with a dusting of conditional formatting. I’ve highlighted a couple of these:

  • cell G2 =TRANSPOSE(FREQUENCY(FILTER(Archive!E:E,Archive!B:B=B2),S$15:S$22))
    • FILTER – returns an array of dates the person named in cell B2 has made in the archive
    • FREQUENCY – calculates the frequency distribution of these dates based on the dates listed in S15:S22 and returns a count for each distribution in rows starting from the cell the formula is in
    • TRANSPOSE – converts the values from a vertical to horizontal response so it fills values across the sheet and not down
  • cell P2 =COUNTIF(H2:O2,">0")
    • counts if the values in row 2 from column H to O are greater than zero giving number of weeks the users has participated
  • cells H2:O – conditional formatting
    • conditional formating
  • cell B1 =QUERY(Archive!A:B," Select B, COUNT(A) WHERE B <> '' GROUP BY B ORDER BY COUNT(A) desc LABEL B 'Top Tweeters', COUNT(A) 'No.'",TRUE)
    • QUERY – allows you to use Google’s Query Language which is similar to SQL used in relational databases. In the example using the data source as columns A and B in the archive sheet we select columns B (screen name of tweeter) and count of A (could be any other column with a unique value) where B is not blank. The results are grouped by B (screen name) and ordered by count. The query also renames the columns.

QUERY Out

To give you some examples of possible queries you can use with data from Twitter in the spreadsheet you copied is a Query sheet with some examples. Included are some sample queries to filter tweets with ‘?’, which might indicate questions (even if rhetorical), time based filters and counts of messages between users.

Query sheet

Tony Hirst has written more about Using Google Spreadsheets as a Database with the Google Visualisation API Query Language, which includes creating queries to export data.

Other views of the data

The ability to export the data in this way opens up some other opportunities. Below is a screenshot of a ego/conversation centric view of #lak13 tweets rendered using the D3 javascript library. Whilst this view onto the archive is experimental hopefully it illustrates some of the opportunities.

ego/conversation centric view of #lak13 tweets

Summary

Hopefully this post has highlighted some of the limitations of Twitter search, but also how data can be collected and the opportunities to rapidly prototype some basic queries. I’m conscious that I have provided any answers about how this can be used within learning analytics beyond the surface activity monitoring but I’m going to let you work that one out. If you want so see some of my work in this area you might want to check out the following posts:

 

Twitter Archiving Google Spreadsheet TAGS v5

For a couple of years now to support my research in Twitter community analysis/visualisation I’ve been developing my Twitter Archiving Google Spreadsheet (TAGS). To allow other to explore the possibilities of data generated by Twitter I’ve released copies of this template to the community.

In September 2012 Twitter announced the release of a new version of their API (the spreadsheet uses this to request data from Twitter). Around the same time Twitter also announced that the old version of their API would be switched off in March 2013. This has required some modification of TAGS to work with the new API. The biggest change for TAGS is that all requests now need authenticated access.

So here it is:

*** Twitter Archive Google Spreadsheet – TAGS v5.0 ***
[If the first link doesn't work try Opening this Spreadsheet and File > Make a copy]

Instructions for setting up TAGSv5

Instructions are included on the Readme/Settings sheet of the template. If you are having problems it’s worth checking written by Stacy Blasiola (@Blasiola) or this modified version by Karen Smith & Shanifa Nasser made for Open Data Day Toronto available as CC-BY-SA.

What will happen to my existing TAGS sheets that aren’t version 5.0?

When Twitter turn off the old API (test outages this March) all authenticated and unauthenticated search requests will stop working.

How do I upgrade existing versions of TAGS spreadsheets (v3.x to v4.0) to keep collecting beyond March 2013?

As I can’t push an update to existing copies of TAGS you’ll have to manually update by opening your spreadsheet, then opening Tools > Script editor… and replacing the section of code that starts function getTweets() { and finishes 134 lines later (possiblly with the line function twDate(aDate){ ) with the code here. [And yes I know that’s a pain in the ass but best I could do] … or you can just start a new archive using TAGSv5.0

More additional tips and info when I get a chance

Academic Activity Streams: A look at Diigo Groups ‘Popular’ resources (and adding social share counts to feeds using Yahoo Pipes)

The University of Phoenix have been in the news recently having been awarded US Patent US8341148 B1 for the ‘Academic activity stream’:

Abstract
A method and computer-readable medium for generating an activity stream is provided. The activity stream includes a ranked set of objects that are presented to one or more users. The ranking of objects is updated to reflect events associated with objects.

I was alerted to this news in this post by Phil Hill on e-Literate in which he highlighted:

“The patent lists several “embodiments” of the concept – examples of approaches that could be pursued to implement the activity stream. These embodiments include re-ranking of a book chapter based on recent student comments or preferences and notifications when 75% of students have completed an assigned reading.”

As pointed out by Scott Wilson in the comments it looks hard for this patent to stick given the amount of prior art knocking about (especially as I used to use PostRank and still use Google Reader ‘sort by magic’ to make sure the important information stays at the top of my reader). Also the Talis Aspire Reading List Dashboard may address the last case, although it’s unclear if this is prior art.

diigo group popularAnother example of ranking based on associated event I stumbled on today was in bookmarking site Diigo. As part of Diigo’s service you can create groups to collaboratively collect and curate resources. As well as a chronological view there is an option to sort by ‘popular’. Here’s an example of a Diigo Group for #etmooc.

I’m not entirely sure what the ranking is based on but it doesn’t appear to solely be on page views. I’d imagine number bookmarks by other users is another factor or the number of times someone hits the ‘Like’ button.  Another example of prior art?

An aside: Using Yahoo Pipes to add social share counts to a feed

I had a quick go with pulling resources from the Diigo ‘popular’ page and using social share counts as another factor in ranking results. I got as far as this pipe which scrapes the top 20 popular items (no api/feed available) and then loops through the results using this sub-pipe, which uses the sharedcount.com API to take a url and see how many times it’s been shared across social networks (here’s a more general pipe that adds social counts to any RSS feed).

pipes feed with social counts

On one hand because a lot of the bookmarked resource aren’t brand new the counts might be a weak indicator of something (you may want to normalise the weighting based on age of resource), but overall this feels like a rabbit hole (not least because I couldn’t use the social share data to rank the results), so I’m tagging this as a failed idea.

It did get me wondering if within a open course (cMOOC) context pulling bookmarked resources into your course hub using FeedWordPress and then getting participants to rate might be … umm interesting  (GD Star Ratings plugin looks good for this although I did spot on the etmooc hub they are using WP PostRatings).

What I’ve starred this month: January 28, 2013

Here’s some posts which have caught my attention this month:

Automatically generated from my Diigo Starred Items.

Peer Interview for Networked Researcher

Peer Interview: Martin Hawksey on Networked ResearcherI was recently asked by Ernesto Priego if I’d like to do an interview for Networked Researcher. I’ve been an admirer of Ernesto’s work (I can even tell you the day I became aware of him 22nd September 2011. He was tweeting from the #studentexp event organised by @GdnHigherEd. If you’d like to see what he said it’s always been the default graph on TAGSExplorer ;), so it was an honour to be asked.

I inhabit a weird world where I no longer know what I do is classified as so I’m not sure I’d call myself a researcher in a traditional sense, but it was great to have the opportunity to share what I do and acknowledge some of the people who have influenced/inspired me along the way.

Here’s the Peer Interview: Martin Hawksey on Networked Researcher

Registering blog addresses and generating a OPML file (Notes on FeedWordPress and MOOC-In-a-Box)

Last week I got a number of enquiries about using Google Forms. One of these was a question from Martin Weller (OU) who is in the process of setting up H817 which is an Open University open course (MOOC) on Openness in Education. He asked:

how do I go from this [Google Form] to an OPML file that feedwordpress can use (also if you have any ideas about how to do autodiscovery of feeds so if people only put in their main url it’ll find the feed, that would be great).

My response was to direct Martin to http://opml-generator.appspot.com/ which I used in Generating an OPML RSS bundle from a page of links using Google Spreadsheets. This takes a Google Spreadsheet of RSS feeds and generates an OPML bundle. Part of this solution relies on the user entering their blog RSS feed, which isn’t always known. It’s possible to use RSS auto-discovery, which at the time I didn’t have a scalable solution, but Martin appeared to have enough information to edge forward a bit more. I’ve subsequently come up with this

*** Blog Registration OPML Generator Template ***
[File > Make a copy for your own version]

With this a person can enter their blog url, some code tries to fetch the rss feed and uses the data to generate an OPML file. If you’d like to see it in action fill in this form (embedded below) … and see the generated OPML here (this should update every 5 minutes. It’s also *.tsv but the URL is enough to trick most feed aggregators).

I’ll let you discover uses for this template other than open course registrations.

Registering feeds with FeedWordPress (FWP)

Back on Martin’s Weller’s blog he’s written about is IT support woes. As part of this Alan Levine has kindly added some comments about how FeedWordPress handles blog registrations. The comment that caught my eye was:

I am not sure why you are fiddling with the OPML. You should should copy the URLs and paste them into FWP.

I’m assuming Martin had assumed that generating a OPML file accessible by url would mean FWP would keep checking for new blog registrations. As Martin has subsequently discovered this does not appear to be the case. When you import feeds from a url FWP still prompts you to select which feeds you want to include. This suggests to me that it’s not a fully automated process.

add opml

Alan suggested just copying the urls into FWP. Lets look at how that works. Clicking the ‘add multiple’ source button in the FWP admin lets you dump a blog url, one per line.

add multiple

When you click add FWP visits the blog and tries to auto-discover the feed using the first RSS feed it finds (this is the same trick used in my Google Form). So in short as FWP doesn’t appear to keep checking the OPML file and because you can just dump blog urls Martin may as well just collect these and do any cleaning up after.

So as Martin is discovering there are some very fiddle bits with using FeedWordPress as a open course aggregator and he ends his post with:

One last plea – I joked with Alan that I needed DS106 out of a box. I think I’m serious though – it would be great to have a step by step, idiots guide to installing and setting up a DS106-like environment. The rest of us don’t have Alan and Jim’s tech skills, so getting to the starting line is difficult. I know they’ll say you should invent your own way, but they done so much great work that I don’t think they realise just how much expertise they have. A simple installation that let the rest of us get started, would mean we could all go off in different directions then.

This takes me back to Jim Groom’s post on Integrating FeedWordPress with BuddyPress:

No where in the raging discussion around MOOCs is there anyone talking about sharing the infrastructural/architectural work they’ve done freely with others.  CUNY’s Commons-In-a-Box project makes the innovative development work they have been doing for years freely available for others to experiment with. That is the spirit of sharing that seems to me should characterize the groundswell for open, online education

+1 for ds106-In-a-Box

About

This blog is authored by Martin Hawksey Google+

JISC CETIS Learning Technology Advisor (OER Programme Support)
jisc cetis logo

The MASHezine (tabloid)

It's back! A tabloid edition of the latest posts in PDF format (complete with QR Codes). Click here to view the MASHezine

Preview powered by:
Bluga.net Webthumb

The MASHebook

You can also download this post as:

Subscribe to monthly email digest of posts

Loading...Loading...


Subscribe to per post email updates

Enter your email address:

Delivered by FeedBurner

Copyright License

Creative Commons Licence
This work is licensed under a Creative Commons Attribution 3.0 Unported License. CC-BY mhawksey

Privacy /Cookies

This blog uses Google Analytics (which makes use of 'cookie' technologies) to provide information on usage. Here's an overview of Google Analytics Privacy and how to opt-out (other 3rd party services like Twitter might also be tracking you via this site, but as far as possible I try and prevent this by removing official tweet buttons).

Badges

. . .