PhysMath Central Blog

Why machine-readable data should matter to you
One of the things we do here at PhysMath Central (and our sister companies BioMed and Chemistry Central) which not all publishers do is format our full-text articles in freely-available XML and MathML. From a production point of view it makes sense as we can generate html and pdf versions of the article from the same source, but beyond that there are a plethora of possibilities that anyone could exploit due to their machine-readability. However it seems that machine-readable documents have yet to find an enthusiastic audience beyond a few data-miniing specialists.
However, a recent presentation by Mike Ellis entitled 'don't think websites: think data' re-ignited my belief that the future lies not in postscript, tex or pdf, but in a complete cultural shift in what we consider to be a publishable unit (I resist using the term 'article' or 'maunscript' here as that ideologically harks back to era of preparing data for print publications).
The following is a quote from the associated blog entry: 'Pushing MRD out from under the geek rock':
MRD (That’s Machine Readable Data – I couldn’t seem to find a better term..) is probably about as important as it gets. It underpins an entire approach to content which is flexible, powerful and open. It embodies notions of freely moving data, it encourages innovation and visualisation. It is also not nearly as hard as it appears – or doesn’t have to be.
...
The problem isn’t the geeks. The problem is that MRD needs to move beyond the realm of the geek and into the realm of the content owner, the budget holder, the strategist, for these technologies to become truly embedded. We need to have copyright holders and funders lined up at the start of the project, prepared for the fact that our content will be delivered through multiple access routes, across unspecified timespans and to unknown devices. We need our specifications to be focused on re-purposing, not on single-point delivery. We need solution providers delivering software with web API’s built in. We need to be prepared for a world in which no-one visits our websites any more, instead picking, choosing and mixing our content from externally syndicated channels.
In short, we now need the relevant people evangelising about the MRD approach.
Now in this case, Ellis is envisioning a future where the data is open, interoperable & reusable, but there is no reason why raw scientific data and the associated text (a publishable unit) should not be so right now.
We are envisioning a future where semantic markup and interoperable, resuable text, code and 'other' files are constructed to deliver a peer-reviewed scientific advance that is as far removed from today's html/pdf standard article as html itself is from Guttenberg's press.
In order to make this work, we'll need an army of evangelists in the right field, but once it is working, researchers in all fields should take notice and if you want your work to be relevant, cited and reused in 10 years time, you should be making MRD your priority. Either do it yourself, or publish the work with someone who will do it for you, but - and this is a bit of a personal hobby horse, rather than the views of my employer - please don't rely on tex/ps/pdf as being the best way you can present your data and results. Their usefulness is finite and their days, numbered.
Posted by Chris Leonard at 13:02 Comments (0)
