Tuesday, July 14, 2009

Using Google Spreadsheets and RSS feeds to edit IPNI

One thing I find myself doing a lot is creating Excel spreadsheets and filling them will lists of taxonomic names and bibliographic references, for which I then try to extract identifiers (such as DOIs). This is a tedious business, but the hope is that by doing it once I can create a useful resource. However, often I get bored and the spreadsheets lie forgotten in some deep recess of my computer's hard drive.

It occurs to me that making these spreadsheets publicly available would be useful, but how to do this? In particular, how to do this in a way that makes it easy for me to extract recent edits, and to update the data from new sources? Google Spreadsheets seems an obvious answer, but I wasn't aware of just how obvious until I started playing with the spreadsheet APIs. These enable you to add data via the API (using HTTP PUT and ATOM), which means that I can easily push new data to the spreadsheet.

As a test, I've harvested the IPNI RSS feeds I created earlier (see http://bioguid.info/rss), extracted basic details about the name and any bibliographic identifiers my RSS generator had found, and sent these direct to a Google Spreadsheet. Some IPNI references didn't parse, so I can manually edit these, and many references lack an identifier (my tools usually finds those with DOIs). Often with a bit searching one can turn up a URL or a Handle to a paper, or even simply expand on the bibliographic details (which are a bit skimpy in IPNI). I'm also toying with using Zotero as an online bibliographic store for references that don't have an online presence.

So, what I've got now is a spreadsheet that can be edited, updated, and harvested, and will persist beyond any short term enthusiasm I have for trying to annotate IPNI.