BioMed Central Blog

Journal of Translational Medicine launches Nutrition & Metabolism section

The Journal of Translational Medicine has just launched a new section dedicated to Nutrition & Metabolism. Led by Section Editor Laura Soldati, and with a dedicated editorial board, the section aims to provide a platform for the interaction and mutual validation between basic and applied research in the field of nutrition. To mark the launch of the section, Prof Soldati has co-written the editorial “Introducing the Nutrition & Metabolism section of Journal of Translational Medicine".
Nutrition is a topic of growing in importance among the medical community with over 60% of deaths world-wide now attributed to nutrition-linked diseases such as diabetes, cardiovascular disease, cancer, and Alzheimer’s. In the area of translational medicine in particular, it is the opinion of many that the field of nutrition ought to be taken into greater account in order to generate advanced applications for diagnosis and therapy and, at the same time, to develop new instruments of investigation.
Further information about the Nutrition & Metabolism section can be found on the journal’s website, where you can also submit your article.
Posted by Philip Dooner at 15:50 Comments (0)
Open Knowledge Foundation launch Panton Fellowships
Guest blog post from Laura Newman, Community Coordinator at the Open Knowledge Foundation
The Open Knowledge Foundation is delighted to announce the launch of the Panton Fellowships. Funded by Open Society Foundations, Panton Fellowships will be awarded to scientists who actively promote open data in science.
• Visit the Panton Fellowships home page for more information, including details of how to apply.
We firmly believe that “open data means better science”. In 2009, the Panton Principles were formulated in order to encourage and assist scientists in placing scientific data in the public domain. Now in 2012, the Panton Fellowships represent a major step forward towards this goal.
Thanks to the support of Open Society Foundations, the Open Knowledge Foundation are able to offer two Panton Fellowships in 2012. The nature of the Fellowships make them ideally suited towards graduate students and early-stage career scientists, although anyone with an active interest in open science is encouraged to apply.
Fellowships will be held for one year, and will have a value of £8k p.a. Fellows will have the freedom to undertake a range of activities, which could include e.g. exploring practical solutions for making data open, facilitating discussions about openness, and catalysing the scientific community. Fellows will continue to be employed and/or study at their current institution throughout the Fellowship, and activities undertaken for the Panton Fellowship should ideally complement and enhance their existing endeavours. Applicants are strongly encouraged to propose their own work plan.
Dr Cameron Neylon, of the Panton Fellowships Advisory Board, commented on the ‘real potential’ of the Fellowships to influence practice surrounding open data in the scientific community. ‘Panton Fellowships will allow those who are still deeply involved in research to think closely about the policy and technical issues surrounding open data’, observed Dr Neylon. By allowing scientists the scope both to explore the ‘big picture’ – gathering evidence to promote discussion throughout the community – and also to work on specific technical solutions to individual problems, the Panton Fellowship scheme has the potential to make a real impact upon the practice of open data in science.
Posted by Iain Hrynaszkiewicz at 19:24 Comments (0)
OSTP publishes public comments in response to RFIs on public access to publications and data
The US Office of Science and Technology Policy has published the comments received as part of the latest phase of its public consultations on Public Access to Peer-Reviewed Scholarly Publications Resulting From Federally Funded Research and Public Access to Digital Data Resulting From Federally Funded Scientific Research.
BioMed Central responded to both Requests for Information, and our contributions are now publicly available online:
Posted by Matthew Cockerill at 12:24 Comments (0)
Connecting the evidence: an “ontology” for Threaded Publications
Unpublished research is a serious problem for evidence-based decision making in healthcare, and this was recently highlighted on BBC Radio 4’s Today programme and in an entire issue of the BMJ. Systematic reviews aim to present the totality of the evidence, and a problem for those preparing and maintaining these reviews is how to find unpublished studies and data. But, even when clinical trials are reported in journals and their supplements the formats and descriptions are widely heterogeneous and studies can remain difficult to discover and challenging to compare with similar trials.
Clearly connecting trial-related publications is a way to help with this problem and is a major goal of BioMed Central’s Threaded Publications initiative. To achieve its fundamental aims of connecting all digital published content relating to the evidence about a particular trial, however, Threaded Publications must go beyond a single journal or publisher. Through our partnership with CrossRef – an organisation founded by publishers, for publishers – and engagement with editors and publishers we hope to achieve interoperability across different publishing platforms. The desired outcome is that articles reporting the protocol or the findings of a trial published in different journals or by different publishers will be linked in a thread, which should also include the trial’s entry in a research register.
The Threaded Publications concept and prototype, which was demonstrated at a number of medical communications and publishing meetings in 2011, builds on CrossRef’s widely-established digital object identifier (DOI) infrastructure and more recently-developed CrossMark tool. CrossMark, which has recently been implemented by the Royal Society, conveys additional information, such as corrections and updates, about journal articles in a standard way. This ‘non-bibliographic article-level metadata’, to use its more technical term, can be displayed for any article which uses CrossMark – including, in principle, the publication thread – by a reader clicking on the CrossMark logo (see figure; but noting that the terms used in the sample thread need further work).
With cross-publisher interoperability comes a need for standardization and, specifically, a simple “ontology” (a controlled vocabulary – a bit like a dictionary – of agreed terms on a specific topic) of different article types that might be included in a thread of publications. This may sound complicated or difficult to implement but, if we know what each element of the thread is, and where it sits in the thread, we stand a good chance of making that information usefully available to a reader, patient, practitioner, or researcher, including those preparing systematic reviews. Agreeing on the terminology for describing the literature will help achieve this.
Any journal or publisher participating in the Threaded Publications scheme will need to agree to and use these standard terms – whether they publish the study protocol, results, methodology, an editorial discussing the trial, the dataset, or something else. Trial registration records, such as those in the ISRCTN register, are central to transparent reporting but these are not currently citable and discoverable in the same way as journal articles. A logical development to help implement Threaded Publications would be assign DOIs to trial registration records. This would better enable citation – and possibly academic credit – of these records in the same fashion as journal articles might potentially provide more motivation to ensure the completeness of the registry entries.
The team at BioMed Central, working closely with our external and recently expanded advisory group (see Acknowledgements below), have drafted a concept document, for standard terms describing publications that might be related to a clinical trial.
Professor Mike Clarke, Director of the All-Ireland Hub for Trials Methodology Research put the current complexity of the literature into context: “A research study is so much more than the few thousand words that make it into a journal article describing its findings and, yet, this is all that is readily available for most studies. By providing the means to thread together a published ‘life story’ for a study, BioMed Central are helping to release its full potential to influence practice and future research. Users will be able to enter the thread at any point and travel in either direction to find out more about the design, conduct and outcome of the study.”
To be part of a threaded publication a document must be one that is explicitly relevant to the published evidence associated with a particular trial. This sets this functionality apart from currently available features that link to [possibly] ‘related articles’. We believe we have captured all the types of article or record that currently exist in the literature and scientific databases, but we may have missed, or have named some inappropriately. We invite the clinical research and publishing communities to download the document and share your comments on our first step towards an ontology of publications related to clinical trials.
Acknowledgements: Thanks to Prof Doug Altman, Sir Iain Chalmers, Prof Mike Clarke and Dr Ben Goldacre for their comments on an earlier draft of the ontology, and this blog.
Footnote: The ontology concept document is licensed under a Creative Commons Attribution License.
Posted by Iain Hrynaszkiewicz at 18:36 Comments (0)
GigaScience part of global data-sharing effort: new standards allow disparate data sets to integrate
Guest blog post by the editors of GigaScience, which is now accepting submissions. This post has also been published on the GigaScience journal blog. Follow @GigaScience on Twitter.
Lead by researchers at the University of Oxford, a group of more than 30 scientific organizations around the globe, have worked to produce a common standard that will make possible the consistent description of enormous and radically different databases compiled in fields ranging from genetics to stem cell science, to environmental studies. One of the contributors playing a role in the project is GigaScience, as we feel it potentially very useful to aid in the handling of the wide-variety of data-types covered by our scope.
The new standard provides a way for scientists in widely disparate fields to co-ordinate each other’s findings by allowing behind-the-scenes combination of the mountains of data produced by modern, technology driven science.
This standard-compliant data sharing effort and the establishment of its online presence, the ISA Commons – www.isacommons.org, is described in a Commentary (and highlighted in the Editorial) published on 27th January 2012 in the journal Nature Genetics.
“We are now working together to provide the means to manage enormous quantities of otherwise incompatible data, ranging from the biomedical to the environmental,” says Susanna-Assunta Sansone, Team Leader of the project at the Oxford e-Research Centre, and founder of the BioSharing Network (of which BMC and GigaScience are both members).
”An example of how this works at the Harvard Stem Cell Institute is that we can now find a relationship between experiments involving normal blood stem cells in fish and cancers in children”, says Winston Hide, Professor of Bioinformatics at the Harvard School of Public Health (for more see this related publication).
It was necessary to establish common data standards, say the Commentary’s authors, because of the tsunami of dataandtechnologies washing over the sciences. “There are hundreds of new technologies coming along but also many ways to describe the information produced” said Sansone, noting that "we can take a jigsaw puzzle of different sciences and now fit the many pieces together to form a complete picture".
"One of the things that I find most empowering about this effort is that now small research groups can begin to store laboratory data using this framework, complying to community standards, without their own dedicated bioinformatic support. It is a bit like facebook allowing everyone to create their own website pages - suddenly you don't need to be an expert in computing to get your data out to the rest of the world", says Dr Jules Griffin, of the University of Cambridge.
"What we like about it is its unifying nature across different bioscience fields and institutions”, says Dr Christoph Steinbeck, The European Bioinformatics Institute.
And "it also has the potential to work for large centers too”, says Scott Edmunds, of the BGI and GigaScience. As GigaScience aims to take as many types of “large-data” as possible, the need to handle as many formats as possible was essential, and the large number of data-types supported by ISA-commons and ability to create new configurations potentially addresses this very important issue. This has lead to GigaScience being the first journal to offer authors the option to submit data in ISA-commons format, and these resources have also been made available to the BGI (the world's largest Genomics institute) to release their enormous quantities of data quicker the wider research community through the associated GigaDB database.
For more on the aims and goals of GigaScience, please see this previous BMC Blog posting, and for news and updates follow GigaBlog and the @GigaScience twitter feed. The journal is now taking submissions for “big-data” associated research, tools and software for handling large-scale data, and reviews and commentary on issues dealing with data-handling and standards.
References:
1. ISA Commons: isacommons.org
2. It's not about the data. Nature Genetics 44, 2 (2012).
3. Sansone, S-A. et al. Toward interoperable bioscience data. Nature Genetics 44, 2 (2012).
4. Ho Sui SJ et al. The Stem Cell Discovery Engine: an integrated repository and analysis system for cancer stem cell comparisons. Nucleic Acids Res. 1;40(D1):D984-D991. (2012).
Laurie Goodman, Editor-in-Chief
Scott Edmunds, Editor
Alexandra Basford, Assistant Edit
Posted by Gabriella Anderson at 17:08 Comments (0)

“Journal of Negative Results in Biomedicine's immediate goal is to provide scientists and physicians with responsible and balanced information in order to improve experimental designs and clinical decisions”, comments Prof Bjorn Olsen, Editor-in-Chief of this journal.
The importance and usefulness of negative results is something that is arguably overlooked in the scientific arena; they are often perceived as less important due to the fact that they fail to confirm various hypotheses. This view however is gradually changing, with a growing awareness of how constructive and useful they can actually be to science.
Journal of Negative Results in Biomedicine promotes the publication of negative results and data, and supports the idea that scientists should be provided with balanced information which can offer a more complete scientific record, thereby reducing the risk of publication bias or later rebuttal of research. Prof Olsen and JNRBM co-founder, Dr Christian Pfeffer, also strongly believe that “such "negative" observations and conclusions, based on rigorous experimentation and thorough documentation, ought to be published in order to be discussed, confirmed or refuted by others”.
Perhaps in reflection of this rising awareness of the importance of publishing negative results, 2011 saw an increase in submissions to JNRBM, leading to the recruitment of a number of Associate Editors to provide their scientific expertise to assist with the peer-review process. The Associate Editor model is a strategy which Biomed Central successfully introduced in 2008 in order to improve speed and quality of peer-review on its BMC series journals. The hope is that adopting a similar model for JNRBM will lead not only to a more efficient peer-review process, but also an improved capacity to publish even more of this incredibly valuable research.
To submit your manuscript documenting negative data or results, please click here. If you are interested in writing a Commentary article about your views on negative results, please email bjorn_olsen@hms.harvard.edu to discuss your proposal.
Posted by Gabriella Anderson at 17:39 Comments (0)
Citing and linking data to publications: more journals, more examples...more impact?
Since BioMed Central introduced additional data sharing resources for authors and editors last year, there have been a number of further developments in the field that have necessitated an update to our supporting data information.
Eight further journals, including Retrovirology, Cell & Bioscience, and Frontiers in Zoology have introduced the ‘Availability of supporting data’ section to either encourage or require all authors to consistently link their supporting data to their publication, or clearly indicate supporting data are included within the article and its additional files. As articles submitted after the introduction of these policies have begun to be published we now have a growing number of examples, from a variety of biomedical domains.
In BMC Research Notes, which was amongst the first journals to introduce this article section, Schulz et al. have included their programming script within the additional files of their article, which describes a software tool for automated assessment of cardiopulmonary resuscitation skills.
Anderson and Elizur, in their study of hepatic reference genes in female Atlantic salmon also in BMC Research Notes, have deposited all their supporting data in the PANGAEA repository for adult and juvenile samples they collected. PANGAEA specializes in publishing geo-referenced data for earth and environmental sciences and helps to ensure permanence and citation of data by assigning digital object identifiers (DOIs) issued by DataCite.
It’s particularly pertinent to see links to PANGAEA from BMC Research Notes, having just returned from the EuroMarine workshop on Scientific Data Integration in Bremen, which focused on linking scientific data to journal publications. At the workshop session chair Dr. Michael Diepenbroek, who heads-up PANGAEA's systems development, alerted attendees, which included publishers, editors, researchers and software developers, to a new study of the impact of sharing data underlying publications.
The study – an abstract presented at the American Geophysical Union 2011 meeting – reported a 35% increase in citations to articles published in the journal Paleoceanography, when supporting data were freely available. Of 1,331 articles sampled over the 18-year study period, the 171 articles with publicly-available data received nearly 20% (8,056) of the aggregate citations.
Similarly, a study deposited in the ArXiv pre-print repository in November 2011 and distributed on Connotea also found citation rates in the astronomy field were higher for articles with links to supporting data.
These studies are, of course, limited to specific fields or journals – and those yet to be published in journals will likely be subject to further peer review – but providing evidence of the benefits of data sharing for individual researchers and research groups is undoubtedly important. We already know that sharing detailed microarray data is associated with increased citations to the papers reporting the results and that there are many benefits of data sharing for society as a whole but a common barrier to data sharing is lack of credit and incentives for individuals. The possibility of increased research impact may provide further motivation to those producing but not necessarily reusing data. Another desirable development is for citations to datasets assigned DOIs or equivalent persistent identifiers to contribute to measures of researcher impact, as is established for citations to journal articles and measured by a number of common tools, such as Web of Science.
As well increasing links between articles and data, another aim of the ‘Availability of supporting data’ section is help address this issue – to increase academic credit for data sharing by encouraging data citation. This month we have made data citation even more strongly encouraged with an update to BioMed Central’s reference style guide, found in any journal's instructions for authors. It now explicitly mentions datasets and provides an example of a dataset citation.
“Only articles, datasets and abstracts that have been published or are in press, or are available through public e-print/preprint servers, may be cited
...
“Dataset with persistent identifier
Zheng, L-Y; Guo, X-S; He, B; Sun, L-J; Peng, Y; Dong, S-S; Liu, T-F; Jiang, S; Ramachandran, S; Liu, C-M; Jing, H-C (2011): Genome data from sweet and grain sorghum (Sorghum bicolor). GigaScience. http://dx.doi.org/10.5524/100012."
Data citation is recommended according to the standards proposed by DataCite, where persistent identifiers are displayed as linkable, permanent URLs. Finally, the ‘Availability of supporting data’ resources page has been updated with more information on citing and linking to data, in particular a link to a comprehensive guide from the Digital Curation Centre.
Posted by Iain Hrynaszkiewicz at 14:04 Comments (0)
New database for surgical trials
“Randomised control trials (RCTs) have played a role in the assessment of surgical innovations and there is scope and need for greater use”. While common in other areas of medical research, RCTs are often under used in evaluating surgical interventions due to the practical and methodological issues they present researchers.
By far one of the most vexing issues that RCTs throw up is the so called 'clustering effect'. The experience, training and level of practice possessed by a surgeon means multiple patients operated on by the same surgeon often experience similar outcomes to their procedures. This phenomenon, known as 'clustering', can lead to a loss of precision when evaluating the results of RCTs and means that researchers must be extremely careful when choosing the sample size of their studies.
A new study published in Trials describes the creation of a database of previous surgical trials to quantify clustering effects at both institution and surgeon levels. Calculating the intracluster correlation coefficiencies (ICCs) in 10 multicenter surgical trials for a possible 108 outcomes, the authors found evidence for a clustering effect in a large number of possible outcomes. Lead investigator and Editorial Board member of Trials Jonathan Cook noted, "Our data on clustering effect for multicentre trials of surgical interventions suggests it is more of an issue than has previously been acknowledged."
At the moment researchers have a shortage of data on which to assess the impact of clustering. This inability to judge the level of clustering present in RTCs makes it difficult for researchers to adjust their trial designs to compensate for any loss of precision. The authors hope that by adding datasets from future surgical trials to the database, researchers will one day be able to access a valuable resource that will help to inform and improve the design of surgical trials.
Posted by Jack Cochrane at 15:56 Comments (0)
"Research is all about testing and re-testing ideas, tools, methods and concepts. In the past it wasn't easy to make available the details of how research was done but today we have the ability to share software, analysis, data, and computational tools” says Open Research Computation Editor-in-Chief and open data proponent Cameron Neylon.
With the growth of open movements during the last decade, the biomedical world has seen increasing awareness of the benefits of open data and data-sharing. This is in part due to a shift towards data-intensive science, but also due to the realization that much can be achieved by the pooling of resources and through scientific collaboration.
To promote and support open data in science, BMC Research Notes has been publishing an ongoing thematic series to endorse the practice of publishing underlying data files associated with journal articles in standard, reusable formats. This endorsement is further supported by Biosharing, who encourage authors to submit educational Data Notes which can then be linked to the BioSharing catalogue.
A recently published article which examines the BrainMap project demonstrates how neuroimaging data standards are now being utilized. This project provides the human brain mapping community with datasets and computational tools to establish a basis for neuroimaging-based models of healthy brain function. Another recent study provides data standardization for cancer therapy development for the first time; the Guidelines for Information About Therapy Experiments (GIATE) is a minimum information checklist which provides a consistent framework to transparently report the purpose, methods and results of the therapeutic experiments.
Other well-annotated and reusable datasets highlight the advantages of data-sharing and standardization, such as the immense benefits of cost savings which can then be re-invested into further research; the comparisons that can be drawn between different models; and the flexibility of form and function of such web-based Data Notes.
Despite continuing obstacles to the development of data-sharing, such as patient confidentiality and an eagerness on the investigator’s part to protect their financial investment and rights to their data, it seems that a lot can be accomplished through data standardization and scientific collaboration. In light of this, we are seeking to collate even more of these high-quality examples of novel datasets, in order to build on the promising achievements of this thematic series so far.
To discuss submission of your standardized dataset or to propose a contribution on other aspects of data-sharing and open data, please contact researchnotes@biomedcentral.com.
Posted by Gabriella Anderson at 18:23 Comments (0)
GigaScience – a repository for large datasets
The recent explosion of genomics
technology has revolutionized biology, but it is only really of use if people are able to analyze and use the resulting sequences. Storage of such vast quantities of data is problematic, as the ongoing uncertainty over the future of NCBI’s arm of the Sequence Read Archive shows (SRA). The BGI, in conjunction with BioMed Central, recently launched GigaScience, a journal aimed specifically at projects generating a lot of data, which can accommodate such large datasets alongside the articles describing them. GigaScience also anticipates becoming a repository for stand-alone datasets such as those resulting from genome sequencing projects. One such dataset has just been released, and it contains the assembled and annotated sequences of genomes from three strains of sorghum, a plant of huge economic importance in the developing world as a source of food, fodder, fuel and fiber. The article describing these data has been published in Genome Biology; the raw reads are available from the SRA, and the assembled reads from GigaScience. This is the first time that a genome dataset has been cited as a DoI in an article's reference list, so is the first step in the process leading to researchers getting citation credits for the data they generate.
Posted by Andrew Cosgrove at 12:12 Comments (4)
Sharing data from clinical trials: where there’s a will there’s a way
More organizations with interests in clinical research – most recently leaders in evidence-based medicine the Cochrane Collaboration – are calling for better access to research data. So what’s getting in the way?
In human health research the benefits of increased transparency – more reliable, efficient research to better inform clinical practice – are arguably the most tangible, yet it’s been reported that researchers in this discipline are amongst the least likely to share their data. The case for sharing data and results from all clinical trials is made by Peter C. Gøtzsche (Nordic Cochrane Centre) in a commentary published today in Trials.
Gøtzsche reviews a number of past high-profile cases where lack of access to clinical trial data has been detrimental to human health (such as celecoxib, rosiglitazone and reboxetine) or lead to misplaced tax-payer resources (oseltamivir). Gøtzsche argues that commercial, academic and regulatory interests can lead to a lack of transparency, and that the benefits of data sharing greatly outweigh the potential harms – such as selective interpretation by competing scientists.
Going further than a code of conduct for data sharing previously proposed in Trials, Gøtzsche calls for data to be made available to other researchers for “any relevant purpose”, without needing to pre-specify analyses and obtain permission from the original researchers. Maintaining competitiveness in drug development has been put forward as a reason not to share all data immediately, but Gøtzsche argues that transparency regarding failures in drug research and development could be beneficial, if costly research and development is not unnecessarily duplicated.
BioMed Central encourages and supports the sharing and publication of underlying research data, but recognizes the challenges and caveats associated with different data types and research domains. Requirements for sharing have been successfully established in some fields, for example in genomics where mechanisms and policies and a clear need for collaboration on large datasets exist. In this field data must be released, often immediately, and researchers allowed relatively short periods in which to exclusively analyze the data. Gøtzsche proposes a similar, but legally enforced, supranational model for clinical trials: “It should be a legal requirement to provide all results and raw data within an appropriate period of time, which, in accordance with most calls for data sharing, should be no later than 12 months after the randomized phase of the trial ended”.
The article aims to convince those with doubts that data sharing is needed and does not focus on the associated practical issues to be overcome, but recognizes efforts aimed at developing guidance and best practice to overcome these practical issues, such as those initiated by Trials.
“It is a moral imperative and we should act now, as it [full access to trial data] will empower citizens and convey tremendous scientific, economic, and social benefits,” concludes Gøtzsche.
Central to, or rather the starting point of, transparency in clinical trials is prospective study registration in databases such as the ISRCTN register. Trial registration is required by law in the US and under international ethical guidelines in the Declaration of Helsinki, parts of which are considered as 'customary international law norm'. Incentivizing and providing platforms for sharing of all trial-related publications is a major aspect for BioMed Central’s threaded publications initiative.
This substantial, non-commissioned commentary forms part of Trials’ growing thematic series on Sharing clinical research data, edited by Dr Andrew Vickers.
Posted by Iain Hrynaszkiewicz at 12:00 Comments (0)
One of the essays in the fascinating – and open access – text on data-intensive science, The Fourth Paradigm, envisages an age of instantaneous knowledge translation, where scientific discovery can be instantly applied to clinical practice. This concept was named the ‘healthcare singularity’ and its achievement will involve real-time generation, integration and processing of human genetic and clinical disease data. The axis (pictured) is approaching but will remain elusive without fundamental changes to the way science is conducted, and communicated.
Image adapted from: Gillam et al.: The Healthcare Singularity and the Age of Semantic Medicine. In The Fourth Paradigm (2009)
Open access to scientific data is a means to achieve the healthcare singularity’s end – of more efficient, reliable and reproducible research which will ultimately improve human health. Dr Eric Schadt (Mount Sinai School of Medicine and Editor-in-Chief of BioMed Central journal Open Network Biology) and John Wilbanks (Senior Fellow at the Kauffman Foundation, Research Fellow at Lybba and Open Network Biology Editorial Board member) are two scientists working to transform research on human disease.
A
major barrier to effective sharing of genetic and clinical data are
suboptimal processes for obtaining informed consent from patients, and
intellectual property restrictions placed on individual patient data
obtained through research. One of the outcomes of Workgroup D at this spring's Sage Commons Congress, attended by BioMed Central,
was to develop a suite of legal tools to empower research participant
control over the use and access to their samples and data. John Wilbanks
– who recently left his full-time post
as Vice President for Science at Creative Commons to focus on data
sharing – is taking on this challenge with his latest project, Consent to Research, which celebrates its alpha release today.
And once we have more ready access to these complex, heterogeneous and voluminous data types – genotype, gene expression, clinical, and others – they need to be interpreted and put to effective use for fighting disease. Eric Schadt is no stranger to this kind of “extreme science” and recently joined Mount Sinai Medical School to further his study of complex human disease biology, and apply it to medical treatments.
Amongst their moves to new positions, occurring at each end of the summer, Eric and John have been leading and supporting, respectively, the launch of the aforementioned new BioMed Central journal, Open Network Biology. Here they answer some questions about their latest projects, the new journal, and how we might achieve more universal and effective personalized healthcare.
[Read More]Posted by Iain Hrynaszkiewicz at 18:43 Comments (0)
Understanding Alzheimer’s disease by data mining the Donepezil Data Repository
Large clinical trial databases represent a wealth of clinical and scientific information which could potentially hold the key to new discoveries, breakthroughs and advancements in knowledge gaps of a particular disease or its treatment and management. In a recent study published in Trials, the value of data mining the Donepezil Data Repository in furthering the understanding of Alzheimer’s disease was demonstrated.
A team of leading researchers from academia and industry came together to investigate the nature of Alzheimer’s disease and the effectiveness of treatment using donepezil, marketed under the trade name Aricept®. The Donepezil Data Repository, which consisted of 18 randomized, controlled trials conducted between 1991 and 2005, was also used to address questions beyond the aims and scope of the original individual trials. This work resulted in six scientific papers being published in leading journals covering different topics such as rates of cognitive change in Alzheimer’s disease, predicting cognitive decline and the effect of donepezil in reducing clinical symptoms of mild, moderate and severe Alzheimer’s disease.
The value of data sharing and collaboration has previously been demonstrated in this field, in particular in August 2010, when The New York Times published an article titled Sharing of Data Leads to Progress on Alzheimer’s. In this high-profile initiative, a group of scientists and executives from the National Institutes of Health, Food and Drug Administration, industries, universities and non-profit organisations collaborated in a project to investigate the biological markers involved in the progression of Alzheimer’s disease in the human brain. The purpose was to share all the data and make the findings publicly available. This effort has not only produced numerous scientific papers on the early diagnosis of Alzheimer’s, but also led to more than 100 studies to test drugs which could slow or stop the disease.
Whilst projects sponsored by the pharmaceutical industry can sometimes fail to spark interest in the academic research community due to concerns over commercial goals, the article in Trials reported a successful and fruitful collaboration between industry and academia. Furthermore, it demonstrates the value of data mining a clinical trial database, where data from multiple studies can be combined and analyzed to accelerate the advancement of medical knowledge.
Adeline Siew - Assistant Editor
Posted by Iain Hrynaszkiewicz at 15:55 Comments (0)
Journal of Biomedical Semantics at BioHackathon 2011
Guest blog post by Mark Wilkinson and Philippe Rocca-Serra
Following in the footsteps of the original Open-Bio Foundation's “BioHackathons”, the Japanese Database Centre for Life Sciences (DBCLS) initiated a series of BioHackathons beginning in 2008. These yearly events provide an opportunity for open-source bioinformatics code projects to come together with the goal of sharing ideas and experiences, and coordinating their efforts to allow their respective tools to more easily work together. A few weeks ago, the 2011 BioHackathon was held in Kyoto, Japan, sponsored by the Japanese Agency for Scientific Technology's (JST) National Bioscience Database Centre (NBDC). In a natural progression, the theme of each BioHackathon has evolved from Web Service interoperability (2008), to integration of Web Services into visualization tools and mashups (2009), to the utilization and adoption of Linked Data standards (2010) to this year's theme of the Semantic Web in life sciences. With more than 65 participants, the activities of the 2011 BioHackathon were diverse and numerous, but we will attempt to highlight some of the achievements here to showcase the excellent work that is done at these events.
General Information and How-To's
A large number of participants were involved in discussions around “how do I set-up my legacy database as a Semantic Web/Linked-Data resource”, and expressed curatorial concerns (how easily are RDF databases maintained? What is their responsiveness compared to traditional Relational Databases?). From these discussions came a series of tutorials, best-practice suggestions, and examples that can act as a resource for others who wish to begin publishing data using these new technologies: https://github.com/dbcls/bh11/wiki/ConstructionOfLinkedDataDB.
Uniform Resource Identifier (URI) Standards
To truly achieve the Semantic Web vision, there should be agreement on the identifier used for every entity in the bioinformatics space. While there are a number of “shared names” initiatives, none have been widely accepted to date. To this end, members of the newly established Identifiers.org project proposed a standard for identifier structure and resolution, backed by a funded curatorial process. In collaboration with curators from the Life Sciences Resource Names (LSRN) project, an agreement was reached about what Resource Description Framework (RDF) metadata should be returned when resolving an Identifiers.org URI, and the decision was made to sunset the competing LSRN initiative. In addition, the PSICQUIC, Bio2RDF and SADI projects all agreed to support Identifiers.org URI's in their infrastructure. We remain hopeful that the backing of these highly visible Semantic Web projects will lead to a much wider adoption of the Identifiers.org URI schema in the community, which would significantly advance the aims of the Semantic Web in Life Sciences.
Semantic Web Service Standards
The SADI project gave a one-day workshop on how to publish SADI-compliant Semantic Web Services, and subsequently worked with other groups on modeling data and services that were important to them. Of note was agreement on the OWL ontology describing a BLAST result – the most complex data structure modeled by the SADI project to date, and one that will act as a template for a large number of other algorithmic services in the bioinformatics space (e.g. HMMER). Care was taken to ensure that the model contained both curatorial information (database information, etc.) as well as the biological information semantically relating query sequences and hit sequences. Importantly, the goal was to represent the semantics of the information contained in the BLAST report, not the structure of the report. As such, the resulting data structures should be usable, verbatim, by downstream tools that consume a diverse array of data-types, including sequences, alignments, or species information. Several SADI-based BLAST services were published during the BioHackathon based on these models, and the Open-Bio participants began building support for these models into their tools.
Tooling
Open-Bio participants undertook a survey of Semantic Web technology support in each of their languages, and then focused on providing support for serializing and de-serializing RDF into their respective object models. Since the Open-Bio objects use the same object model for a wide range of similar data-types (e.g. EMBL vs. GenBank sequence files), achieving this goal would greatly facilitate interoperability by ensuring that all structured bioinformatics data has the same semantic representation regardless of its origin.
Visualization
The Cytoscape 3 visualization environment was targeted for enhancement of its ability to represent RDF data and OWL ontological information. Support for SPARQL was added to Cytoscape, where the resulting output can be visualized in the Cytoscape environment using SPARQL CONSTRUCT queries. This was tested over the RDF version of BioMart, and enhancements were made to simplify SPARQL query building. An open question remaining for this team was what to do with blank nodes in RDF (which are extremely common in, for example, the RDF representation of a BLAST report described above!).
Vocabularies
This year, the Ontology group focused on a number of practical cases, ranging from conversion from RDF to OWL, to more specific conversions such as a GFF3 file format to OWL. Groundwork laying was also performed to deliver tools for carrying out functional enrichment analysis with any OWL formatted resource. It gave the opportunity to evaluate the various OWL reasoners now available. Another 'tour-de-force' achieved by the group consisted in exploring alignment by means of semantic features as if they were sequences. The work spanned four days, with almost two days required for the computation of transitive closures. Further work as a follow up of the BioHackathon meeting will be needed to mine the results.
BioDBcore session - Resource Description and Discovery
The BioDBcore team was dedicated to reviewing and finalizing descriptors of database resources to provide key information about data resources, expressing licensing terms, access point and their protocols as well as the nature of datatypes stored. BioDBcore information will be made available as RDF graphs, building on biositemap information model. This should ensure smooth exchange with existing registries. The BioDBcore meeting allowed alignment with Medals, the Japanese resource cataloguing effort. Cross pollination occurred and reliance on identifiers.org URI for referencing key information (taxonomic information, bibliographic records, annotation standards as provided by the Biosharing catalog) was also discussed and will be channeled back to the BioDBcore group, a joint effort of the International Society of Biocuration and the Biosharing initiative for final vetting. Additional topics addressed record internationalization and dealing with information in languages other than English.
Testament of the benefit of events such the BioHackathon, the work on BioDBcore triggered the creation of an RDF data sharing requirement group which surveyed the nature of information to embed in a named graph to enable dynamic catalog generation from distributed sources simply based on metadata declaration.
G-language group
Following up on earlier work, the G-language group met again and explored 2 tracks: the first one along an identifier conversion service aiming at simplifying bioinformatics data integration tasks. This G-language based REST service accepts regular and common identifiers as input but more interestingly, is also capable of dealing with sequence input (BLAT algorithm) and returns associated identifiers and persistent URLs. All this information is available in classic flavors (e.g. GenBank or tab-delimited) but most relevant to this audience, the RDF flavor is the sweetest and was the results of efforts carried out during BioHackathon 2011.
The second stroll took the group on the path of visualization, with the goal of providing creative methods to reduce the 'hair-ball' effect, which often limits visibility of Linked Data visual rendering. To this end, the group used filtering and enrichment techniques applied to an E. coli gene set to query the linked data space taking advantage of the restauro-g.v2 service we just described. Cramer's V for nominal data and Spearman's rank correlation for continuous data were used to assess relatedness of information, while simple Fisher's exact test was used for evaluating enrichment of top 25% continuous data (in comparison to all genes) against nominal data (categories). Implementation relied on the Javascript InfoViz Tookit and the group is now looking into ways to make this service available to JSON feed from a SPARQL endpoint. We look forward to their progress.
BioHackathon 2011 is over but will probably be remembered as a turning point that saw many of the components required for realizing the Linked Data Vision in the Life Science domain being created, refined or delivered by a group where enthusiasm and humor mixes with craftsmanship and patience. These will be captured in a "thematic series" of manuscripts written by the BioHackathon 2011 participants over the next few months, and published in Journal of Biomedical Semantics. While transitive closures, just like watermelons, were cracked by brute force, more subtle methods offered an exquisite assortment of technical solutions. Could this rival with the incredible 14-course meal of Japanese gastronomy the participants had the opportunity to experience? Some of the attendees are still debating the issue. With all the work done this year, one can only eagerly await for the next BioHackathon 2012 to take place and use the winter months to refine use cases and questions to be thrown at this fast evolving infrastructure.
The success of BioHackathon 2011 owes so much to the impeccable organization and sagacity of our Japanese hosts who went to great length to ensure that all the cats were herded safely at all times, probably helped in their task by the careful watch of Inari. We are very grateful and can only say one thing: Okini.
Mark Wilkinson (The Wilkinson Laboratory; Editorial Board Member of Journal of Biomedical Semantics)
Philippe Rocca-Serra (Technical Project Leader of the Standards and Data Sharing Infrastructure)
Posted by Gabriella Anderson at 12:55 Comments (0)
Data sharing: lessons from the Wellcome Trust Sanger Institute
Sharing of scientific data has many benefits. It boosts research, speeds
up translation and helps ensure that good practice is maintained. The teams that
generate the data are rewarded too: studies show that sharing data increases
citation rates.
The Wellcome Trust Sanger Institute in the UK, a key player in the Human Genome Project, has often led the way in this area, and in the latest issue of Genome Medicine, Tim Hubbard and Stephanie Dyke from the Wellcome Trust Sanger Institute explain how they developed and implemented the Institute’s policy.
The Human Genome Project was groundbreaking in many ways, one of which was the decision, known as the “Bermuda Agreement” or “Bermuda Principles”, to release data as soon as possible and well ahead of publication. Over the years, early data sharing became established as an import aspect of genomic research.
The success
of the Human Genome Project led to the launch of many new initiatives and
studies based on genomic science and different types of data. In addition, more and
more human data were used and this had implications for confidentiality. A consensus was reached around 2006-2007 that
the way forward was to share data in a managed way that limited access to the
data for approved purposes. In recognition of this, the Wellcome Trust Sanger
Institute began consulting on its own data-sharing policy.
Tim Hubbard and colleagues approached the task by breaking it into three steps: guidance, facilitation and oversight. The aim of the guidance part was to understand the types of data being generated, the quality required to be useful to colleagues and the timelines involved. During the facilitation stage, the barriers to sharing - technical (such as the time required to get the data in the correct format) and concerning credit for the work – were addressed.
The final
step required was oversight; a way of monitoring data sharing was needed and a
governance body, the data-sharing working group, was established.
As the field of genomic medicine moves on, probably with the integration of electronic health records with genetic data, policies for data sharing will evolve and need to be updated. The three steps followed by the Wellcome Trust Sanger Institute are likely to provide a framework for other institutes implementing their own policy.
Posted by Maria Hodges at 13:43 Comments (0)



