Categories


About

Search

Links


Archive


PhysMath Central Blog

Friday Apr 04, 2008

Open Access is not just a free pdf!

There is an exceptional editorial published recently in PLoS Computational Biology which goes into some detail about the benefits of making the full-text version of open access articles available to all as XML. The authors (Philip Bourne, Lynn Fink and Mark Gerstein) sometimes border on evangelism, but that is what is needed to inspire programmers & researchers to not only make use of this data (for thats what it is), but also to publish their results in open access journals which convert the full-text to XML. From the editorial:

 Papers published as PDFs do not lend themselves to easy manipulation by computer. HTML is better, but the markup has more to do with presentation on a Web page than the semantic content of the paper, which is where the great opportunities lie. XML versions of the paper offer the most promise. When publishers make XML versions available, most conform to the National Library of Medicine (NLM) Document Type Definition (DTD) (http://dtd.nlm.nih.gov). In addition, several markup languages have been developed, such as CellML (http://www.cellml.org) and MathML (http://www.w3.org/Math), which can be used in addition to the NLM DTD to further describe the semantic content of a paper. Semantically aware markup is further elaborated in a systematic fashion in the construction of the semantic Web, where the XML tags are related to each other in explicit ontologies. The analogy between an XML file of content offered by a publisher and XML content provided by a database provider should not be missed. As a community, we have been at the forefront of using the latter; will we be at the forefront of using the former? While the DTD and markup languages provide for extensions to meet the needs of each discipline, publishers and researchers have made little use of them to date. This is somewhat of a chicken-and-egg situation. When significant markup is available, it will be used; then again, why go to the trouble of adding significant markup if there are no applications demanding it? The best way out would seem to be to do something significant with the markup we have, which may then inspire authors, publishers, and others to see the research and commercial potential of the corpus.

The use of such markup is a hallmark of Web 2.0 and is manifest in the idea of a mashup. Simply put, a mashup is an integration of Web content from multiple sources to provide a new and more powerful service beyond what can be achieved by any of the individual sources of information it comprises. This type of integration is facilitated if the semantic content from each information source can be identified and thus allow meaningful integration to take place. Specifically in relation to publishing, the mashup manifests the blurring of the distinction between databases and journals, which will continue in future.

Here, here! That is why we ensure that all papers published by BioMed Central, Chemistry Central and PhysMath Central are all available as XML and we implore people to make use of this data in a way which could potentially allow for forward leaps, rather than steps, in scientific research.  

Open Access: Taking Full Advantage of the Content
PLoS Comput Biol 4(3): e1000037 (March 28, 2008)
 

 

 

Comments:

Post a Comment:
  • HTML Syntax: Allowed