BioMed Central Blog

On the unbearable lightness of mandatory data sharing
Guest blog by Tommi Nyman (Department of Biology, University of Eastern Finland), Winner of the Open Data Award at the BioMed Central 5th Annual Research Awards
One of the most pleasant surprises of this spring was that yours truly with coauthors Veli Vikberg, David R. Smith, and Jean-Luc Boevé received BioMed Central’s Open Data Award for our article ‘How common is ecological speciation in plant-feeding insects? A 'Higher' Nematinae perspective’. (The other highlight was naturally Finland’s phenomenal victory in the Ice Hockey World Championship Final last Sunday).
We were very happy to receive the prize, as we don’t
get awards as frequently as we’d like to! At the same time, we fully realized
that a large portion of the credit in our case must go to a persistent, anonymous
referee of our paper, who demanded—twice—that we also publish the background data
used in our phylogeny-based ecological analyses, not just the sequence data that
we used to reconstruct the phylogenetic trees. So, since the reviewer didn’t
give up (and the editor sided with the referee), we sat down and did what we
should have done voluntarily in the first place: we gathered all relevant ecological
information (host-plant associations and species numbers of various sawfly
taxa) into a (hopefully) coherent table, and included it as an additional online
appendix in the end of the article.
Online
archiving of original data has been standard practice for a long time in research
on phylogenetics and population genetics, and scientific journals typically will
release articles only after all DNA sequences used in the analyses have been
submitted to a public database such as GenBank. Now this mode of data sharing
is making its way to more ecologically-oriented journals as well; for example, Evolution introduced a mandatory data archiving policy in
2011, and now requires that raw data should be presented in a way that makes it
possible to repeat all statistical analyses used in an accepted article. The ongoing
rise of open-access online journals will make data sharing easier than ever,
since page space is not a limiting factor anymore.
To the individual researcher, preparing and referencing background data for archiving may feel like an unwelcome addition to the already-tedious publication process but, at least in our case, we knew at heart that the reviewer was very correct in their demands. In general, mandatory archiving is a way to bring rigour to data collection and management and will, at the same time, improve transparency of research and publishing.
There are also other benefits to science that will become evident only in the long run. In my field of ecological phylogenetics, statistical methods are improving with an astonishing speed, meaning that re-analyses of previously-published datasets undoubtedly will be common in the future, and may lead to reassessments of old results. Meta-analyses combining results of multiple studies also will benefit greatly from access to raw original data rather than to a few statistical indices extracted from previously-made analyses. In particular, I expect that archived datasets will be useful in higher-level education, as repeating the statistical analyses used in a published article is an efficient way of learning how science is done in practice. Naturally, the end goal of such exercises must be that students come up with solutions that are better than the ones used by the original researchers.
Posted by Kim West at 15:12 Comments (0)



