Monday, December 21, 2009


Today I finally got a project out the door. BioStor is my take on what an interface to theBiodiversity Heritage Library (BHL) could look like. It features the visualisations I've mentioned in earlier posts, such as Google maps based on extracted localities, and tag trees. It also has a modified version of my earlier BHL viewer.

There are a number of ideas I want to play with using BioStor, but the main goal this site is to provide article-level metadata for BHL. As I've discussed earlier (see also Chris Freeland's post But where are the articles?), BHL has very little article-level metadata, making searching for articles a frustrating experience. BioStor aims to make this easier by providing an OpenURL resolver that tries to find articles in BHL.

BioStor supports the OpenURL standard, which means it can be used from
within EndNote and Zotero. Web sites that support COinS (such as Drupal-based Scratchpads and EOL's LifeDesks) can also be uses BioStor (see for details).

My approach to finding articles in BHL is to take existing metadata from bilbiographies and databases, and use this to search BHL using techniques ranging from reasonably elegant (Smith-Waterman alignment on words to match titles) to down-and-dirty regular expression matching. Since this metadata may contain errors, BioStor provides basic editing tools (using reCAPTCHA rather than user logins at this point).

There's much to be done, the article finding is somewhat error-prone, and the search requires a local copy of BHL, and mine is rather out of date. However, it is a start.

To get a flavour of BioStor, try browsing some references:

or view information for a journal:

or an author:

or a taxon name:

Friday, December 11, 2009

BHL interface ideas

I've been buried in programming (and it's exam time at Glasgow) so I've not blogged for a month (gasp). I've been playing with ways to visualise Biodiversity Heritage Library content for a while (click here for a list of previous posts), and have occasionally surfaced to tweet a screenshot via twitpic. The more I play with the BHL content the more I think it's a gold mine, and that many of the ideas I played with for my ill-fated Elsevier Challenge entry (website here, background paper at hdl:10101/npre.2009.3173.1) are taking on a new life with this project.

I'm hoping to release my BHL article finding and visualising web site by the end of the month, but meantime I'm gathering the screenshots here.

The first shows a Google map generated from latitude and longitudes extracted from OCR text using some simple regular expressions from page 7705952 in the BHL.There's quite a bit of latitude and longitude information in BHL, and that's before trying georeferencing tools.


The idea is to display this map next to the article so that user get's an immediate sense of what region in the world the article covers, such as this article about Riekia wasps:


I'm also interested in useful ways to display search results. Here's an experiment using TileBars to visualise how relevant a search result is. The width of the bar is a function of how many pages are in the article, the vertical stripes indicate pages that have the search term. The idea is to get a quick visual impression of whether the article mentions the term in parsing, or treats it in some detail.


TileBars were developed by Marti Hearst, whose web site has some great resources. Partly inspired by her BioText projec, as well as the thumbnail page display in JSTOR I'm now experimenting with showing thumbnails in search results. For example, here's a search for the deep sea octopus Graneledone pacifica, showing two articles:


I display thumbnails for pages that (a) have the name on the page, and (b) have what look like figure captions on them. The idea is that an article that figures a taxon is likely to be a fairly important article to look at, so displaying thumbnails will highlight those articles. The second article in the search results is the paper that published the name Graneledone pacifica, and the figures illustrate the taxon.

These are all pretty rough, but they give some idea of what I've been working on the last month.