My post titles just get better and better. As part of my research into twitter subtitling I’ve focused on integrating with the twitter search and Twapper Keeper archive into the twitter subtitle generator tool, but I’m aware there is a wider world of timed data for subtitlizing. When Tony contacted me on Friday with some timed data he had as part of his F1 data junkie series it seemed like the ideal opportunity to see what I could do.
The data provided by Tony was in a *.csv spreadsheet format the first couple of lines included below:
2010-04-18 08:01:54,PIT,Lewis last car's coming into position now.,PW
2010-04-18 08:02:05,PIT,All cars in position.,PW
2010-04-18 08:02:59,COM,0802: The race has started,CM
My first thought was to just format it in Excel but quickly got frustrated with the way it handles dates/time, so instead uploaded it to Google Spreadsheet. Shown below is how the same data appears:
Having played around with the timed-text XML format I knew the goal was to convert each row into something like (of course wrapping with the obligatory XML header and footer):
<p style="s1" begin="00:00:00" id="p1" end="00:00:11">PIT: Lewis last car's coming into position now.</p>
Previously I’ve played with Google Apps Script to produce an events booking systems, which uses various components of Google Apps (spreadsheet, calendar, contacts and site), so it made sense to use the power of Scripts for timed text. A couple of hours later I came up with this spreadsheet (once you open it click File –> Make a copy to allow you to edit).
On the first sheet you can import your timed data (it doesn’t have to be *.csv, it only has to be readable by Google Spreadsheet), and then clicking ‘Subtitle Gen –> Timed Data to XML’ on the XMLOut sheet it generates and timed text XML.
Below is the main function which is doing most of the work, the comments indicating what’s going on:
If your timed data has different headers you can tweak this by clicking ‘Tools –> Script –> Script editor …’ and changing how the
str on line 18 is constructed.
I’m the first one to admit that this spreadsheet isn’t the most user friendly and it only includes the tt-XML format, but hopefully there is enough structure for you to go, play and expand (if you do please use the post comments to share your findings)