Wednesday, June 22, 2016

What happens when open access wins?

The last few days I've been re-reading articles about Ted Nelson's work (including ill-fated Project Xanadu), reading articles celebrating his work (brought together in the open access book "Intertwingled"), playing with Hypothes.is, and thinking about annotation and linking. One of the things which distinguishes Nelson's view of hypertext from the current web is that for Nelson links are first class citizens, they are persistent, they are bidirectional, and can be links not just documents but between parts of documents. In the web we have links that are unidirectional, when I link to something, the page I link to has no idea that I've made that link. Knowing who links to you turns out to be both hard to work out, and very valuable. In the academic world, links between articles (citations) form the basis of commercial databases such as the Web of Science. And of course, the distribution of links between web pages forms the basis of Google's search engine. Just as attempts to build free and open citation databases have come to nothing, there is no free and open search engine to compete with Google.

The chapters in "Intertwingled" make clear that hypertext had a long and varied history before being subsumed by the web. One project which caught my eye was Microcosm, which lead me to the paper "Dynamic link inclusion in online PDF journals" (doi:10.1007/BFb0053299, there's a free preprint here). This article tackles the problem of adding links to published papers. These links could be to other papers (citations), to data sets, to records in online databases (e.g., DNA sequences), names of organisms, etc. The authors outline four different scenarios for adding these links to an article.

In first scenario the reader obtains a paper from a publisher (either open access from behind a paywall), then using a "linkbase" that they have access too they add link to the paper.

Links1

This is very much what Hypothes.is offers, you use their tools to add annotations to a paper, and those annotations remain under your control.

In the second scenario, the publisher owns the linkbase and provides the reader with an annotated version of the paper.

Links2

This is essentially what tools like ReadCube offer. The two remaining scenarios cover the case where the reader doesn't get the paper from the publisher but instead gets the links. In one of these scenarios (shown below) the reader sends the paper to the publisher and gets the linked paper back in return, in the other (not shown) the reader gets the links but uses their own tools to embed them in the paper.

Links3

If you're still with me at this point you may be wondering how all of this relates to the title of this essay ("What happens when open access wins?"). Well, imagine that academic publishing eventually becomes overwhelmingly open access, so that publishers are making content available for free. Is this a sustainable business model? Might a publisher, seeing the writing on the wall, start to think about what they can charge for, if not articles (I'm deliberately ignoring the "author pays" model of open access as I'm not convinced this has a long term future).

In the diagrams above the "linkbase" is on the publisher's side in two of the three scenarios. If I was a publisher, I'd be looking to assembling proprietary databases and linking tools to create value that I could then charge for. I'm sure this is happening already. I suspect that the growing trend to open access for publications is not going to be enough to keep access to scientific knowledge itself open. In many ways publications themselves aren't terribly useful, it's the knowledge they contain that matters. Extracting, cross linking, and interpreting that knowledge is going to require sophisticated tools. The next challenge is going to be ensuring that the "linkbases" generated by those tools remain free and open, or an "open access" victory may turn out to be hollow.