Categories


Contact

Search

Links


Archive


GigaBlog

Saturday Jan 28, 2012

GigaScience Journal Part of Global Data-Sharing Effort: New Standards Allow Disparate Data Sets to Integrate

ISA-commons logoLead by researchers at the University of Oxford, a group of more than 30 scientific organizations around the globe, have worked to produce a common standard that will make possible the consistent description of enormous and radically different databases compiled in fields ranging from genetics to stem cell science, to environmental studies. One of the contributors playing a role in the project is GigaScience, as we feel it potentially very useful to aid in the handling of the wide-variety of data-types covered by our scope our scope.

The new standard provides a way for scientists in widely disparate fields to co-ordinate each other’s findings by allowing behind-the-scenes combination of the mountains of data produced by modern, technology driven science.

This standard-compliant data sharing effort and the establishment of it’s on-line presence, the ISA Commons – www.isacommons.org, is described in a Commentary (and highlighted in the editorial) published today in the journal Nature Genetics.

“We are now working together to provide the means to manage enormous quantities of otherwise incompatible data, ranging from the biomedical to the environmental,” says Susanna-Assunta Sansone, Team Leader of the project at the Oxford e-Research Centre, and founder of the BioSharing Network (of which BMC and GigaScience are both members).

”An example of how this works at the Harvard Stem Cell Institute is that we can now find a relationship between experiments involving normal blood stem cells in fish and cancers in children”, says Winston Hide, Professor of Bioinformatics at the Harvard School of Public Health (for more see this related publication).

It was necessary to establish common data standards, say the commentary’s authors, because of the tsunami of data and technologies washing over the sciences. “There are hundreds of new technologies coming along but also many ways to describe the information produced” said Sansone, noting that "we can take a jigsaw puzzle of different sciences and now fit the many pieces together to form a complete picture".

"One of the things that I find most empowering about this effort is that now small research groups can begin to store laboratory data using this framework, complying to community standards, without their own dedicated bioinformatic support. It is a bit like facebook allowing everyone to create their own website pages - suddenly you don't need to be an expert in computing to get your data out to the rest of the world", says Dr. Jules Griffin, of the University of Cambridge.

"What we like about it is its unifying nature across different bioscience fields and institutions”, says Dr. Christoph Steinbeck, The European Bioinformatics Institute.

And "it also has the potential to work for large centers too”, says Scott Edmunds, of the BGI and GigaScience. As GigaScience aims to take as many types of “large-data” as possible, the need to handle as many formats as possible was essential, and the large number of data-types supported by ISA-commons and ability to create new configurations potentially addresses this very important issue. This has lead to GigaScience being the first journal to offer authors the option to submit data in ISA-commons format, and these resources have also been made available to the BGI (the worlds largest Genomics institute) to release their enormous quantities of data quicker the wider research community through the associated GigaDB database.

For more on the aims and goals of GigaScience please see this previous BMC Blog posting, and for news and updates follow GigaBlog and the @GigaScience twitter feed. The journal is now taking submissions for “big-data” associated research, tools and software for handling large-scale data, and reviews and commentary on issues dealing with data-handling and standards.

 References:

1. ISA Commons: isacommons.org
2.
It's not about the data. Nature Genetics 44, 2 (2012).
3. Sansone, S-A. et al. Toward interoperable bioscience data. Nature Genetics 44, 2 (2012).
4. Ho Sui SJ et al. The Stem Cell Discovery Engine: an integrated repository and analysis system for cancer stem cell comparisons. Nucleic Acids Res. 1;40(D1):D984-D991. (2012).

Monday Jan 23, 2012

Data Citation Enters the (year of the) Dragon

DragonToday marks the first day of the Chinese Lunar New Year, and as we enter the supposedly auspicious year of the Dragon now is a good opportunity to look towards developments in the nascent field of data publication over the upcoming year. This week marked important announcements of new and improved data publication platforms. Those lucky enough to attend Science Online (or filter through the nearly 30,000 tweets produced by the meetings end!) will have seen the new-look Figshare website promoted in the "Dealing with Data" session, and there has also been good coverage online of the platforms launch including in the Wellcome trust blog. Since the launch of the original website roughly a year ago, the recent support from Digital Science (a sister company of Nature Publishing Group) has allowed them to release a much improved front-end, increased storage (currently 250MB, but potentially unlimited), and importantly where data citation is concerned, the use of citable DataCite DOIs. 

Following on from the many developments in the last year (see our posting from last months IDCC meeting) another publisher has just thrown their hat into the data publishing ring, with Hindawi announcing the launch of "Datasets International", a new platform for "archiving, documenting, and distributing scholarly research datasets". Like Figshare, Dryad and the other platforms already announced (including our associated GigaDB), they follow best practice by asking authors to provide data under a creative commons CC0 license, although it is currently unclear how much (if any) data hosting is included in their $300 article processing charge.

As we've written previously in this blog, how you cite data is important in tracking and maximizing its use. Ultimately the adoption of data-publication will be greatly aided by publishers, authors and the indexing services correctly carrying out best practice for data-citation, and citing the dataset DOIs in the references. CrossRef have just aided this with a supportive call for publishers to cite DataCite DOIs in the reference sections of articles, although the article they use as an example is illustrative of the problem by not following this. Our recent success in getting one of our GigaDB dataset DOIs integrated into the references of a Genome Biology article is a great example that this can done. The utility of this example is further highlighted this month, as BioMed Central (our publisher alongside BGI) has now used this paper as an example in BioMed Central’s reference style guide, found in any journal's instructions for authors. It now explicitly mentions datasets and provides it as an example of a dataset citation:

“Only articles, datasets and abstracts that have been published or are in press, or are available through public e-print/preprint servers, may be cited

...

Dataset with persistent identifier
Zheng, L-Y; Guo, X-S; He, B; Sun, L-J; Peng, Y; Dong, S-S; Liu, T-F; Jiang, S; Ramachandran, S; Liu, C-M; Jing, H-C (2011): Genome data from sweet and grain sorghum (Sorghum bicolor). GigaScience. http://dx.doi.org/10.5524/100012."

Springer, BioMed Centrals parent publisher, is also providing examples of correctly cited data, with this recent example correctly citing a Dryad dataset in its references.

Those interested in GigaScience journal and its integrated GigaDB database should contact us on editorial@gigasciencejournal.com. Currently the database is being populated with datasets from BGI collaborations and projects, but upon launch of the journal we will be hosting and issuing DOIs to datasets associated with GigaScience articles, giving an extra form of credit, and increasing the discoverability and impact of an authors work. Please contact us via the above email or submit your big-data associated research articles through our submission page.

Gong Xi Fa Chai! Happy New Year of the Dragon!

References

1. Zheng LY, Guo XS, He B, Sun LJ, Peng Y, Dong SS, Liu TF, Jiang S, Ramachandran S, Liu CM, Jing HC. Genome-wide patterns of genetic variation in sweet and grain sorghum (Sorghum bicolor).  Genome Biol. 2011 Nov 21;12(11):R114.

2. Zheng, L-Y; Guo, X-S; He, B; Sun, L-J; Peng, Y; Dong, S-S; Liu, T-F; Jiang, S; Ramachandran, S; Liu, C-M; Jing, H-C (2011): Genome data from sweet and grain sorghum (Sorghum bicolor). GigaScience. http://dx.doi.org/10.5524/100012.

3. Hodkinson BP, Uehling JK, Smith ME (2012) Lepidostroma vilgalysii, a new basidiolichen from the New World. Mycological Progress, online in advance of print. doi:10.1007/s11557-011-0800-z

4. Hodkinson BP, Uehling JK, Smith ME (2012) Data from: Lepidostroma vilgalysii, a new basidiolichen from the New World. Dryad Digital Repository. doi:10.5061/dryad.j1g5dh23

*UPDATE* 24/1/12: Hindawi have confirmed they will include data hosting (see comments).