Categories


Contact

Search

Links


Archive


GigaBlog

Monday Apr 30, 2012

Shanghai (Epigenomics) Surprise

With genetics and genomics being complicated enough, epigenomics adds even more layers of control and regulation of gene expression, and high-throughput global analyses of epigenetic changes further add to the reams of biological information many people are already referring to as the "data-deluge".  As the field is a key part of our “big-data” scope, the GigaScience team was on the road last week for the Shanghai International Conference of Epigenetics in Development and Disease (SICEDD). As the third time this meeting has been held in Shanghai, this year it also merged with 7th Asian Epigenome Alliance meeting, giving it a very interesting flavor of the field over the Asian region and beyond. On top of speakers from the Asia-Pacific, with the Cold Spring Harbor Asia Epigenetics, Chromatin and Transcription meeting held just after in nearby Suzhou, a very international and broad audience and group of speakers was assembled for the 46 talks and 6 themed sessions.

BMC SICEDD boothOur sister BioMed Central journal Genome Medicine organized a particularly strong workshop on the epigenetics/epigenomics of disease, with an initial session on the epigenetics of cancer expertly co-chaired by Genome Medicine's Becky Furlong, followed by a session on other diseases such as immune disorders, schizophrenia and cardiovascular disease. With the conference put together by BGI-collaborator Jingde Zhu (see here for his recent nice write-up of our talk at BGI's ICG conference), this coupled with the BMC involvement were further reasons for us to attend, and BMC even had a booth to help promote us and their other journals.

As large-scale "big-data" producing projects are our focus, there was plenty on display, with a number of speakers being involved in the International Human Epigenome Consortium to produce reference maps of human epigenomes for key cellular states relevant to health and diseases. Since the NIH Roadmap Epigenomics Program was announced in 2008, many other global funders and centers have joined and internationalized the effort. Bing Ren plugged the NIH program and resources such as the UCSC epigenome browser, and Henk Stunnenberg flew the flag for European IHEC efforts and the Blueprint project focusing on hematopoetic epigenomes. Large collaborative projects such as IHEC require very standardised operating procedures, data-pipelines and techniques, and steering committee member Susan Clark presented some of her contributions. On top of the more established MeDIP-Seq data presented by many of the speakers, Susan presented new data using a novel bisulphite-treated chromatin immunoprecipitated DNA sequencing technique (BisChiP-seq), and Henk presenting on his related sequential ChIP-bisulfite-sequencing (ChIP-BS-seq) technique. Despite the similarities in application and name, both techniques have just been published back-to-back in Genome Research.

On top of DNA-methylation, there were many talks on other epigenetic changes, with Bin-Tean Teh covering chromatin remodeling in cancer, Craig Peterson presenting on chromatin dynamics in genome integrity, and many talks covering histone modification. Probably the most surprising area covered was trans-generational responses and epigenetic inheritance of complex behaviors, with fascinating, if unexplained phenomena seeming to pass in a Lamarkian manner.  Marcus Pembrey presented fascinating population based data coming from the ALSPAC study, showing that fathers taking up smoking before the age of 11 seem to have effected the BMI of their subsequent children. Unable to explain the mechanisms behind this seemingly epigenetic inheritance, Marcus appealed to experimentalists in the audience to try to explain the basis for these findings. Attempting to tackle this issue using animal models, Isabelle Mansuy presented work on depressive phenotypes passed on transgenerationally due to maternal stress in mice, and has MeDIP data that is currently being analyzed to try to see if alterations DNA-methylation could be responsible. Moshe Szyf presented related studies looking if early life experience alters adult DNA methylation in animal and humans brains, and took a more network modelling approach.

You can follow coverage of the meeting on the @gigascience twitter feed, and (the slightly noisy) #SICEDD hashtag. As Epigenomics is a key area in our scope, we currently have some example epigenomic data is hosted in our GigaDB database, and are already reviewing and handling related papers that will hopefully come out closely timed with our upcoming launch issue. If you have related work on epigenomics (or other areas covered by our broad big-data scope) you are interested in submitting to GigaScience, please contact us at editorial@gigasciencejournal.com or use our online submission system.

References

1. Statham AL, Robinson MD, Song JZ, Coolen MW, Stirzaker C, Clark SJ. Bisulphite-sequencing of chromatin immunoprecipitated DNA (BisChIP-seq) directly informs methylation status of histone-modified DNA. Genome Res. 2012 Mar 30.

2. Brinkman AB, Gu H, Bartels SJ, Zhang Y, Matarese F, Simmer F, Marks H, Bock C, Gnirke A, Meissner A, Stunnenberg HG. Sequential ChIP-bisulfite sequencing enables direct genome-scale investigation of chromatin and DNA methylation cross-talk. Genome Res. 2012 Mar 30.

Tuesday Apr 03, 2012

The State of the Curation Nation

Of the of the many issues needing addressing in this era of the so-called "data deluge" (apologies genomics bingo), on top of the well documented difficulties in computing power, bandwidth and storage keeping pace with data production, less attention has been paid on the efforts required to present and package this biological information to users. The key people managing and integrating this data are Biocurators, and this week is the International Society of Biocuration's annual get together at the Biocuration 2012 meeting in Washington DC. With growing challenges in data volumes and heterogeneity - particularly from sequencing technologies and with the promise of nanopore looming on the horizon, the meeting is a good opportunity to discuss some of the downstream consequences of these rapid developments amongst the people really harnessing the "data-tsunami".

With our publisher BioMed Central as one of the sponsors of the meeting, and with its relevance to our big-data scope and associated GigaDB database, GigaScience has been been pleased to be present at the Georgetown University venue. Our new Biocurator Tam Sneddon has been representing the database side of GigaScience, and our Editor-in-Chief Laurie Goodman has also been there on behalf of the journal. The first day has covered many topics essential to keep on top of these large data-volumes such as community annotation, and workflows and tools to aid and automate tasks for data curators, producers and users.

Having been involved in the crowdsourcing of the genome of the deadly 2011 outbreak E. coli 0104:H4 strain, community annotation is a subject close to our hearts, and it was fantastic to see similar moves to open up and share the burden of annotation and analyses for species as diverse as Skates and Rays, with Cathy Wu presenting on SkateBase. Wiki's are the obvious platform to handle these types of tasks, and Andrew Su presented on one of the most successful examples of these with GeneWiki. Whilst we have written about this in a previous meeting report, the user base continues to grow, and Andrew's most recent slides are available here.

On top of the distributed "many-eyes"/"many-hands" approach, better automation of curation tasks is essential, and the workflows and tools session provided insight into where the state of the art of curation management currently is, with excellent examples on show in particular from Reactome and PRIDE. The benefits of this were clearly shown by Attila Csordas (of personal proteomics fame) from the EBI, who showed that the PRIDE proteomics databases semi-automated pipeline and tools reduces curation time to 1/6th.

Being both a journal and database, GigaScience was well placed to take part in the "Databases & Journals – How to have a sustainable long term plan for journals and databases?" panel co-organized by our editorial board member Francis Ouellette and Mike Cherry (Stanford). Being quite a partisan pro-open-access audience and panel, Laurie joined other Editors-in-Chief including Thomas Lemberger from Molecular Systems Biology, and David Landsman from DATABASE, and Michael Galperin representing the NAR Database issue, and all were equally open-data - pushing the need for all supporting data in a paper to be available to aid reproducibility, usability and prevent fraud. Discussion also turned to altmetrics and data citation, with Laurie in particular plugging our work with DataCite to give datasets independently citable DOIs. Bringing a curation perspective to the discussion was the final panelist Pascale Gaudet (chairperson of the ISB), who discussed BioDBcore, the community-defined checklist of the core attributes of biological databases that allows users to fully evaluate the scope and relevance of available resources.

A large proportion of the talks presented databases built on the open-source GMOD (generic model organism database) platform, and the conference is followed by the satellite GMOD meeting, which also includes a Galaxy workshop. Laurie and Tam will be on hand all week to answer your questions about GigaScience, so feel free to grab them or contact us at editorial@gigasciencejournal.com. Many of the talks have been published in a virtual Biocuration issue of DATABASE, and you can also follow the action over the rest of the week on twitter at #isb2012.

Further Reading

1. Howe D et al., (2008). Big data: The future of biocuration Nature, 455 (7209), 47-50 DOI: 10.1038/455047a

2. Sanderson K (2011). Bioinformatics: Curation generation Nature, 470 (7333), 295-296 DOI: 10.1038/nj7333-295a

3. Burge S et al., (2012). Biocurators and Biocuration: surveying the 21st century challenges Database, 2012 DOI: 10.1093/database/bar059

4. Csordas A et al., (2012). PRIDE: Quality control in a proteomics data repository Database, 2012 DOI: 10.1093/database/bas004

Tuesday Mar 20, 2012

Genomic Standards Community Go Shenzhen: GigaScience session overview from #GSC13

Policies and Standards for Reproducible Research: from theory to practice

This month GigaScience co-hosted a session at the Genomic Standards Consortium meeting in Shenzhen on "Policies and Standards for Reproducible Research: from theory to practice. The session brought together a diverse group of speakers with different roles in the production, dissemination and use of data, to discuss all of the issues surrounding the role of policies and standards enabling reproducible research and data sharing. Co-chaired by our editorial board member Susanna-Assunta Sansone, a diverse panel was assembled representing the different stakeholders involved in the data-production cycle, including those able enforce data policies such as research funders and journal editors, users and managers of databases, data producers, and facilitators of all of these processes.  

 

Susanna opened proceedings and set the scene with an overview of BioSharing, of which BioMed Central and GigaScience are both members (slides here). Giving an overview of the evolving portfolio of data sharing enablers, this was very relevant forum as BioSharing aims to strengthen collaborations between exactly the groups present in this session, and to discourage redundant (accidental) competition between standards-generating groups.

 

Scott Edmunds at GSC13The second scene-setting talk was from our editor Scott Edmunds, covering the issues and additional incentives needed to enable and encourage data dissemination (see the slides below and video here). Covering work that GigaScience and the BGI has done to release datasets with citable DOIs, the utility of releasing genomes pre-publication was nicely shown by the resulting crowd-sourcing of the the deadly 2011 E. coli O104:H4 outbreak genome sequenced by the BGI (and partners in Hamburg and Birmingham) and released by us in this manner. A further recent development relating to this is a new study in PLoS One that used draft unassembled genome sequence data to directly develop a targeted bactericidal agent to kill O104-positive E. coli. To enable data-citation to be a recognized form of credit and viable incentive to  encourage faster data dissemination in this manner, journals need to allow data to be cited in the references in the same way as articles. Scott then presented many of the recent examples highlighted in this blog of journals including Genome Biology and Nature Biotechnology (both represented at this meeting) now carrying this out.

 

Being key gatekeepers able to enforce and influence data policies and standards the perspective of funders was then covered, with Paula Olsiewski representing the Alfred P Sloan Foundation, and Professor Rita Colwell providing her wealth of experience as former director of the NSF, and highlighting the elephants in the room that needed solving. A group of talks then covered “Breaching the Bio-Domain, providing a more hands-on point of view from data-producers, curators and database managers. Philippe Rocca-Serra brought a 'data commoning' perspective and presented on the ISA-Commons system, of which we have recently collaborated on a publication. Folker Meyer talked of his experience running MG-RAST, highlighting that of the 41,000 datasets in the database only a minority were publicly accessible, and appealing for funders to insist more on this. Srikrishna Submanian (Institute of Microbial Technology, India) gave a similarly open-data talk giving examples and a structural genomics perspective for data sharing (for an example see TOPSAN). Yong Zhang gave a final “data-producer” and BGI perspective outlining the scale of the challenges ahead, and giving a preview of some of the work underway to build biobanks and datacenters that hope to become the China National Genebank.

 

The session ended with final perspectives from journal editors, with Genome Biology editor Clare Garvey and Craig Mak from Nature Biotechnology giving overviews of their journal policies and examples of both of their publishers schemes encouraging aiding data sharing and standardization.

On top of this session, the meeting was well attended and covered on twitter and google+ (a first for us), and for a genomics-focussed meeting participants were remarkably restrained in dropping any "genomics bingo" buzzwords. The organizers hope to be posting slides on the conference wiki, and we have also archived some (such as the opening address from GSC president Dawn Field) on our slideshare account. There is further coverage on the BGI news page, pictures on their Flickr page, and videos are currently being posted on the BGI Youtube and GSC Scivee pages.

GSC13 special series: call for papers highlighting best practice in genomics research

As mentioned in a previous posting, to tie in with the meeting we are launching a call for submissions to a thematic series of discussion and research from the conference and wider community highlighting best practice in genomics research, and we are currently reviewing a number of candidate submissions. BGI is generously covering the open-access article-processing charges for the journal’s first year, so please contact us at editorial@gigasciencejournal.com if you have related work you would like to submit to this series or journal, or submit a manuscript here.

References

1. Rohde, H. et al. Open-source genomic analysis of Shiga-toxin-producing E. coli O104:H4. N Engl J Med. 365(8):718-24. (2011)

2Scholl, D. et al.  Genome Sequence of E. coli O104:H4 Leads to Rapid Development of a Targeted Antimicrobial Agent against This Emerging Pathogen. PLoS ONE 7(3): e33637. (2012)

3. Sansone, S-A. et al. Toward interoperable bioscience data. Nature Genetics 44, 2 (2012).

Sunday Mar 04, 2012

GigaScience at the Genomic Standards Consortium meeting – call for papers for a special series

GSC/BGI logoTo tie in with this week’s Genomic Standards Consortium (GSC) meeting in Shenzhen, GigaScience is launching a call for submissions to a thematic series of discussion and research from the conference and wider community highlighting best practice in genomics research. 

As the 13th meeting of the
GSC, the topic this year is “Genomes to Interactions to Communities to Models” - all areas key to the scope of the journal. As the conference is hosted by BGI Shenzhen, the world’s largest Genomics Organization, the overarching theme this year: “the rise of the megasequencing project”, was part of the rationale for BGI to launch GigaScience with BioMed Central.

With Susanna-Assunta Sansone (co-founder of BioSharing – of which BioMed Central and GigaScience are both members), GigaScience is co-chairing a panel on Policies and Standards for Reproducible Research: from Theory to Practice, attended by a wide-spectrum of stakeholders in the production and handling of data such as editors (including Clare Garvey of Genome Biology), funders, databases and data-producers. On top of the novel publication format combining large-scale biological dataset hosting on the GigaDB database, GigaScience journal hopes this series can be a forum for discussion that can help further its aims to revolutionize data dissemination, organization, understanding, and use. GigaScience is also taking submissions on cloud computing, software for data handling, and research highlighting best practices in use of standards and data-sharing for the series, as these are all key to open-up research as we enter the era of "big-data".

BGI is generously covering the open-access article-processing charges for the journal’s first year, so please contact us at editorial@gigasciencejournal.com if you have related work you would like to submit to this series or journal, or submit a manuscript here.

Follow the conference on Twitter at @gigascience and the #gsc13 hashtag. Slides will be made available from slideshare, GigaBlog, and the conference wiki.

Friday Dec 23, 2011

Data Citation December

"Citation needed" Despite the approaching holidays its been another busy month in the GigaScience office, with Alexandra attending the InCoB/ISMB-Asia meeting in Kuala Lumpur (see her talk slides here) and the Human Variome Project meeting in Beijing, and Scott attending a number of meetings and workshops in the UK, including the International Digital Curation Conference (IDCC) in Bristol. The "Digital" in the meeting title was a bit of a giveaway of the level of technological savvy of the attendees, as it was heavily tweeted (see #idcc and this storify), blogged (see here for a good example), and videos are also available for many of the talks, so we will not repeat what is already well covered.

With additional workshops on data impact and reuse, Bristol was the center of the Data Citation universe in December, with representatives and talks from many data publishing projects, databases and issuing bodies such as our DataCite collaborators, so it was an excellent opportunity to assess where things currently stand. Interesting new infrastructure was presented by Mark Hahnel, giving a preview of the new design of the FigShare platform launching in the new year, which for the first time will use citable DOIs for their datasets. Brian Hole from Ubiquity press presented on "Publication and Citation", and mentioned data publishing platforms coming from them, and the representatives of other publishers present showed that there are obviously other commercial projects in the pipeline (for example this from F1000).

Being a curation conference, researcher driven approaches were also on display, and the Environmental Sciences community in particular have been publishing datasets with DOIs for many years, both from the well established Pangaea database, and by individual data centers (Sarah Callaghan's talk representing NERC’s environmental data centers being a great example). Phillip Bourne's excellent talk imagined the possibilities that mixing open data stores with well integrated widgets and tools to mashup and produce new analyses could bring, and he mentioned that the very well established PDB (Protein Database) uses DOIs as accessions, but these are not integrated and cited into associated publications. This is a bit of a missed opportunity, and Mark Hahnel (video here) and Heather Piwowar (slides and video) both highlighted the needs for proper attribution and impact tracking for datasets to incentivise sharing of data. Our recent examples of DOIs linked to datasets from our GigaDB database getting integrated into articles in Nature Biotechnology (see more here), and Genome Biology (see here) demonstrates that this is feasible to link datasets with global, resolvable identifiers into articles.

Whilst Pangaea and the Environmental Science community have managed to do this for a number of years (including examples from as far back as 2005), the integration of data DOIs into the references of the Genome Biology article was the first time we are aware of that this has been accomplished in the field of genomics. This example is a great example of the practicalities of how data can be cited (following the best practice guidelines of the DCC), but until the bibliometric indices properly track them this is only a first step. With this important next step likely to finally happening in the new year, this meeting was a good opportunity for the data DOI producers and publishers to compare notes and ready themselves for the important year ahead. As December comes to a close, we at GigaScience would like to wish you all seasons greetings, and we look forward to an exciting 2012 for the field of data publishing!

Sunday Dec 11, 2011

HVP Beijing: dealing with variation

The Human Variome Project (HVP) Beijing Meeting has officially ended (though a number of delegates will be busy tomorrow at the Advisory Council meeting). The energy and commitment towards better understanding and treatment of heritable diseases displayed by both the speakers and participants was great to see.

Peter Taschner’s talk on the Leiden Open (source) Variation Database (LOVD) system was very well received, and a number of other speakers were using LOVD for their locus-specific databases. I enjoyed Peter Robinson’s presentation on phenotype ontology and representation in gene and disease specific databases. He discussed his free differential diagnosis tool Phenomizer, which integrates OMIM and the Orphanet rare disease nosology (Orphanet was also the subject of an earlier talk by Mariana Jovanovic).

Ethics and curation were recurring themes at this HVP meeting as well.  Many of the ethical issues concerning the sharing of human genetic data were raised by Sue Povey and Carol Isaacson Barash: consent; the tradeoff regarding the release of a carrier’s or affected individual’s geographical and ethnic information, which is both potentially identifying and of great scientific use; the impact of culture on the idea of genetic privacy; and the use of databases for diagnosis and treatment decisions. They are all thorny issues that will be discussed again and again as the field continues to change.

I was very impressed by the universal recognition of the value of curation. Most presentations discussed curation procedures and the various challenges of curating potentially sensitive and potentially diagnostic human genetic, genomic and medical data. Arleen Auerbach’s talk on the Fanconi Anemia Mutation Database (which now uses LOVD) dealt with curation issues exclusively. Anthony Brookes spoke on the technical standards and data models necessary for system interoperability, which fit well with the mood of the audience who wanted to share data without enforcing a one-system rule. The open access software Cafe Variome for sharing the existence of data, but not necessarily the data itself, that he described generated a lot of interest. Mauno Vihinen introduced VariO, an ontology for varation at the DNA and RNA level, and took the curation discussion in a different direction; Vihinen proposed an independent evaluation and rating system for human gene variation databases akin to hotel stars or Michelin stars.

The meeting ended with a brainstorming session for new recommendations for action by the Human Variome Project. It sounds like the HVP group has ambitious plans to start on before they meet again in Paris this spring!

Monday Dec 05, 2011

InCoB/ISMB-Asia: keynotes and curation

I recently returned from the InCoB/ISMB-Asia meeting. The meeting officially ended a couple of days ago but I am still digesting the good food, the good conversations and the good science, all of which I know will be with me a good while.  In the interest of avoiding a copious monograph, I’ll try to stick to a few personal high points. However, I encourage you to check out the supplemental issues in Immunome Research and our fellow BioMed Central journals BMC Bioinformatics and BMC Genomics for a more complete view of the meeting.

I would like to compliment the conference organizers for generating an excellent lineup of keynote speakers. Minoru Kanehisa gave an update on the new developments in the KEGG databases, including their ambitious new resource KEGG MEDICUS that aims to ingrate medical, pharmaceutical and genomic information for use by researchers, clinicians, pharmacists and the public. Pascale Gaudet spoke on the ever-increasing need for biocuration and the importance of biocurators, the ongoing efforts of International Society for Biocuration, and community standards and BioDBCore.  Several of her themes were echoed in the later sessions “Standards in Bioinformatics” and “BioCloud/Grid Computing for Sharing Bioinformatics Resources.” Jun Wang talked about three “Million Genomes” projects underway at BGI, leading some members of the audience (at least those of a certain age who were raised in the States) to conclude that BGI may want to invest in a signboard similar to the red one that used to appear in conjunction with golden arches. Alex Bateman discussed the ways in which Pfam and Rfam have been working with the Wikipedia community to the mutual benefit of all parties. He also gave a brief how-to for scientists looking to get involved in Wikipedia and a prod to those among us (including myself) who lack social responsibility, using but not editing Wikipedia.

My favorite keynote was Arthur Olson’s. While I generally find myself to be a highly visual learner who derives little additional benefit from other types of teaching aids, I freely admit that a set of tinker toys got me through O. Chem. Had the models and Tangible User Interface in development at the Molecular Graphics Laboratory been available when I was still in school, my scientific trajectory might have been quite different. They are doing some seriously cool stuff. And my informal survey suggested that Olson’s shake-and-play self-assembling viral model would be a welcome present for the scientist on your holiday gift list.

I also enjoyed Janet Kelso’s presentation on ancient genomics and evolution, Susanna-Assunta Sansone’s talk on the continuing progress of the BioSharing and ISAcommons communities (GigaScience is involved in both efforts), and the series of talks by Tin Wee Tan, Shoba Ranganathan and their collaborators on database standards development and their push for archive-able and easily reinstate-able databases. I am extremely grateful to have been invited to speak amongst so many prominent scientists at InCoB/ISMB-Asia (slides available on slideshare). My only real complaint about the meeting was the lack of network connections that kept me from Tweeting.

Thanks to all of you (including the many that remain unmentioned here as, despite my promises and best efforts, I’ve already produced a tome-like blog post) who made the meeting both fun and productive. I had a great time in KL!

Saturday Nov 19, 2011

GigaScience at #ICG6: announcing the release of GigaDB and new datasets

GigaScience release posterAnother busy week for the GigaScience team, with the release of a new-look database, more datasets, and a number of talks and announcements at BGI's annual International Conference of Genomics in Shenzhen. It was a great (if exhausting) meeting this year, with the state-of-the art in genomics science on display, announcements of three exciting "Million Genomes" projects to come from the BGI and their many collaborators, and a chance to catch up with many members of our editorial board and friends.

GigaDb - a new look website
The biggest news at the meeting was the launch of our new-look GigaDB.org website and additional datasets at the pre-conference data release workshop and press-conference. This is still very much in beta-form (comments and feedback greatly appreciated at editorial@gigasciencejournal.com), but builds upon our original release of datasets in July and presents them together in a single portal.  Following the success of the outbreak E. coli 0:104 and Macaque genome datasets in demonstrating the practicalities of data citation, we have released another 20 datasets with citable DOIs. These span most of the tree of life, and include previously unsupported data-types.

New Data from across the Tree of Life
Following on from the release of seven vertebrate genomes from the Genome10K project in July, we have now added genomic data from the Sheep, Tibetan Antelope and Naked Mole Rat. Genome, transcriptome and methylome data is provided from an Asian Individual, and we are currently uploading data from Ancient DNA studies on an Eskimo and Aboriginal Australian. We now have plant genomes from the Potato, Foxtail Millet, Sorghum, Cucumber, Chinese Cabbage and Pigeon Pea, and invertebrate genomes from three species of ants, many strains of silkworm and a pathogenic pig roundworm. Many of these datasets (including the Sheep, Tibetan Antelope, Millet, Sorghum and transcriptome data) are previously unpublished, this novel and more rapid release of data
should potentially speed up research in these important model and commercial species, and in human health.

For more coverage on the meeting check out the #icg6 hashtag on twitter, and reporting on the software and data release in Bio-IT World. Laurie's slides are available here, and slides from Scott's talk on data issues in the Bioinformatics session are also available here. To see a video of Laurie's talk you can also see the following clip on youtube.

Friday Nov 11, 2011

Genomicists go Shenzhen: GigaScience at the International Conference on Genomics VI

ICG logoAfter many months on the road visiting conferences it's nice when one comes to you. This weekend marks BGI's annual big bash: the 6th International Conference on Genomics, this year held in the mock-Swiss splendor of the Shenzhen OCT East resort. With a great line-up featuring many of our editorial board members (including Stephan Beck, Wang Jun, Ming Qi, Sumio Sugano and Richard Durbin), there are sessions spanning many key areas of our scope, including cloud computing, metagenomics, epigenomics, and personalized medicine.  Follow this blog, our twitter page, and the hashtag #icg6 for live updates from the meeting, and stay tuned for some important announcements regarding GigaScience.

After GigaScience being announced at last years meeting (and a nice plug in Nature Genetics coming from it), this year we will be making some important announcements at the welcome reception and press-conference on Saturday, as well as presenting on the Tuesday 15th November in the Bioinformatics session. Our editor-in-chief Laurie Goodman will also be chairing workshops on the final afternoon, so watch this space for news on how they go, as well as news and updates from the conference.

Wednesday Oct 19, 2011

ICHG2011: Genetics and Genomics Gets Personal

ASHG posterGigaScience was on hand to witness plenty of lively discussion last week at the annual American Society of Human Genetics jamboree: the International Conference of Human Genetics in Montreal. As always, the meeting had a strong medical genetics presence but the rapid growth and uptake of genomics technologies in the field produced much fascinating work on display this year. However, some amongst the heavy clinical contingent were obviously uncomfortable with the lack of clinical validation of much of this work and debate was heated in many of the plenary debate sessions. This can be followed if you are patient enough to trawl the >4700 tweets utilizing the hashtag #ichg2011 or, fortunately, a growing number of ICHG 2011 webcasts.

The scene was set in the opening "Whole Genome Sequencing: To Do It or Not to Do It?" panel (involving the always controversial James Watson memorably talking about "Genetic Losers") and the technology v. medicine debate was particularly polarized in the "Current and Emerging Sequencing Technologies" panel the following day (nicely summarized by Luke Jostins in his blog). Whilst there was a consensus that sequencing will become a standard tool in the diagnosis of genetic diseases, the second panel was divided on whether this approach should be a purely targeted one, restricted to finding the pathogenic mutations causing a disease. Some of the more clinically focused members argued that medical genome sequencing was "hype", held back by the lack of genetic councilors, lack of clear policies from healthcare providers and insurance companies, and a very poor level of genetic training of clinicians in general.

The concerns raised have much merit but the circular arguments and calls for further debate didn't really acknowledge that technological advances and events on the ground are threatening to make them redundant.  It was obvious that panelists such as Rade Drmanac from Complete Genomics were going to argue against genomics technology in the clinic, but some of the clinicians on the board provided evidence that many physicians are already using them on a large scale. Joris Veltman provided examples from his recent work using exome sequencing rather than single gene tests on 500 individuals, and our editorial board member Ming Qi also admitted to doing similar work in the clinic. With one lucky attendee at the conference winning the chance to have their exome sequenced from 23 & Me (market value of 999 USD from 23 & Me - or BGI), many on Twitter pointed out that the market will likely decide the debates' outcome.

Also on display at the meeting were many examples of larger and larger projects utilizing exomic or even whole genome sequencing. Announcements were made at the meeting from Autism Speaks and our colleagues at the BGI about a new project to sequence 10,000 individuals with Autism spectrum disorders. Initial data was presented by Tim Spector of his EpiTwin project, a BGI collaboration to sequence the epigenome of 5,000 twins. Cisca Wijmenga also presented an overview of the "Genome of the Netherlands", another BGI collaboration that has already sequenced 250 Dutch trios. With many similar scale projects presented at the meeting such as Nick Schork's work with Complete Genomics to produce 1000 human reference genomes of the "Wellderly", it's clear that the field is having to deal with bigger and bigger datasets. A nice visual representation of this was shown by Cisca Wijmenga when she presented a slide showing the number of discs needed to transfer 770 whole genomes worth of GoNL project data from BGI back to the Netherlands.

The particularly challenging issues of scale remain. Hints of future ways this is likely to be tackled included an announcement at the meeting from DNAnexus about their tie-in with Google to host a mirror of the (no longer defunct) Short Read Archive in the cloud. The prickly topic of patient data security also needs resolving, and there were promising posters on display trying to improve utility and security of this type of data with tools such as MedSavant by Marc Fiume and GWAS data encryption protocols from Itsik Pe'er. All of these issues surrounding data handling are very relevant to the scope of GigaScience, and we are currently commissioning papers covering the many issues surrounding the handling of medical data. If you have an interesting point of view you would like to put forward as a commentary or review in this area or if you have useful research or tools relating to this, please contact us at editorial@gigasciencejournal.com about submitting to the journal. Whilst there is a huge amount still to resolve and do, these areas are of great interest and we are keen to follow and be part of the debate in the future.

Tuesday Sep 20, 2011

Beyond the Genome: taking GigaScience into the Clouds

cloudsWith the summers conference season over, GigaScience are still keeping mobile, and this week Laurie is taking in “Beyond the Genome”, our BioMed Central stablemates Genome Biology and Genome Medicine meeting in Washington DC.  Now in its second year, and its great line-up voted by Genome Web as one of the top-3 genomics meetings, by covering key parts of our “big-data” scope and having our editorial board members Mike Schatz and Karen Nelson on the scientific committee, it was obvious we had to attend.

Monday kicked off proceedings with the Genome Informatics pre-Meeting, excellently chaired by Mike who put together a great line-up of talks on Cloud Computing (Matt Wood from AWS, and Ben Langmead plugging his Myrna and  Crossbow tools) Lincoln Stein giving interesting and extremely "big-data" insights into the handling of the enormous ICGC datasets, reproducible workflows and Galaxy from James Taylor (a subject close to our hearts, his slides here) and a BGI perspective from our very own Yingrui Li (slides here) amongst others.

Technical Notes - call for Cloud computing tools

With Cloud computing becoming such a key tool in data-intensive science, and coming from the BGI being in the the unique position of being journal with it's own Cloud (BGI-Cloud), today was a good opportunity to announce our call for submissions and volunteers to work with us on a new type of Cloud computing article - Technical Notes. By using BGI-Cloud as a test environment, GigaScience would like to particularly highlight tools, methods or procedures for the analysis or handling of large-scale data that are optimized to run in a cloud environment. Whilst there are already several hubs and platforms for useful cloud-based tools and workflows (CloudBioLinux being an excellent example), our series/hub hopes to combine some of the advantages of these with the visibility and quality assessment of the more traditional journal article.

By offering reviewers and editors access and free time to review and test these articles and tools in a standard environment, we hope to increase reproducibility and ease-of-testing of research, and take a first step towards what many hope will be a future of "executable articles". To trial this we are offering initial volunteers with tools of interest the opportunity of some free time in the BGI cloud (on top of BGI's already generous covering of the open-access article-processing charges for the journals first year), so please contact us at editorial@gigasciencejournal.com if you you would like to talk to us about submitting a Technical Note and associated application.

With days on Cancer, Exomes (nicely tied in with the Genome Biology special issue) and Microbiomes still to come, Beyond the Genome has already been interesting and insightful, and it will be hard to top the first day. For those not fortunate enough to attend you can follow the action on twitter with the hashtag #BtG11 or from Oliver Hofmann's fantastic notes. We'd like to thank Mike, Yingrui and Lincoln for the nice GigaScience mentions and plugs in their talks, and our colleagues at BMC for letting us attend.

Tuesday Sep 13, 2011

HUPO 2011: lessons for Proteomics from the Genomics Tsunami

Our whistlestop summer conference tour circumnavigating the globe has come to a jetlagged end, with the final conference being last weeks HUPO (Human Proteomics Organisation) congress in Geneva. With it being the 10th anniversary meeting it was a good opportunity to look back on how Proteomics has progressed over the past decade, from it's early gel-based origins to its current more mass-spectrometry based incarnation as a key high-throughput "Omics" technology. Whilst there have been huge challenges and some criticism relating to issues with reproducibility (leading even to a "fix-proteomics" campaign), the several sessions relating to standards, data and repositories were good opportunities to observe how these are currently being addressed.

The many talks from members of HUPO-PSI (Proteomics Standards Initiative), including four from our editorial board member Henning Hermjakob, demonstrated how organized the community has been to systematically divide up and produce standards, formats, tools and repositories for a diverse range of data types. The HUPO-PSI Initiative Program session followed the full spectrum, from 2D-gels (Juan Pablo Albar presenting on his recent BMC Research Notes paper on best practice for data sharing in Proteomics) to Molecular Interaction data (Sandra Orchard presenting on the IMEx consortium).

Many of the biggest challenges seemed to be economic and cultural rather than technical, with much discussion on the closing of Peptidome by NCBI, and recent stability issues at the main ProteomExchange raw data portal - Tranche. Whilst this is unfortunate, there seemed to be much work in process to rectify issues with raw data hosting, and processed and annotated data seemed to be in safe hands with the PRIDE and PeptideAtlas repositories. Whilst adoption and journal compliance is still building up (for an example see our last GigaBlog posting), PRIDE in particular offers authors and reviewers great visualization and quality assessment tools (PRIDEInspector), and in light of this our editorial policies strongly recommend deposition of suitable data in this database.

With a 9-year history and over 50-publications and white-papers produced to date, HUPO-PSI has tried to follow many of the lessons learned by Proteomics slightly older "big-brother" the Genomics community. With this subject in mind GigaScience presented a talk at the "Proteomics Repositories and Journals - a partnership made in heaven/hell?" session specifically focusing on lessons learned for the Proteomics community from the Genomics "Tsunami" (slides here). Whilst Proteomics data-volumes are still smaller than the petabytes that the genomics community are currently struggling with, it's reassuring that the growing Proteomics community are trying to preempt these issues. There were interesting talks on show demonstrating very "genomics-esque" cloud-based workflow systems such as ISB's TPP (transproteomic pipeline) amztpp command line tool. It was also interesting to see areas the two fields are coalescing, with Mike Snyder presenting a fantastic personalized-medicine oriented multi-"Omics" talk on what he terms Whole "Omics" Profiling (and BGI calls "Trans-Omics").

Whilst there are obviously huge challenges that lie ahead, it is clear Proteomics has come a long way in the last decade, and as a key part of the scope of GigaScience we hope to be there to cover much of what will progress as the field matures in the decades to come. Please contact us at editorial@gigasciencejournal.com if you have Proteomics data related research, reviews and comment you would like us to consider for the journal. Looking forward to meeting many of you at HUPO 2012 in Boston!

Saturday Sep 03, 2011

Exercises in blogging at #solo11. Show me the data!

Latest stop on the GigaScience magical mystery conference tour is Science Online London, and this year they have tried to make the format more interactive by organizing several interactive workshops and breakout sessions, including one on blogging that this is posting is a product of. One of the main themes running through the meeting has of course been open science (especially in the great keynote by Michael Nielson), and open-data (particularly on the "linking with the literature" and "dealing with data" panels). Data-citation was raised on several occasions, and we did our part promoting it and the recent crowdsourcing of E. coli in the Microattribution session.

The theme of the second day has been on the better use of tools and information in the context of a the rare genetic disease SMA (Spinal Muscular Atrophy). Maryann Martony from the Neuroscience Information Network initially set the scene, discussing the state of play in Neuroscience data-sharing, and outlining the technical and cultural challenges sharing this type of data which in comparison to genomics or proteomics has a much wider and more heterogeneous set of data types. The afternoon consisted of several several workshops relevant to SMA research, and the "Beyond Scholarly Publication" workshop aimed to showcase scholarly HTML and the wordpress platform by producing blog postings on SMA research.

Due to a combination of geographic and inertia issues I've not written this up on wordpress, but thought I would contribute on our normal blog with some relevant issues regarding data-sharing. With it being one of the themes of the meeting, of the suggested SMA articles to discuss none of them had the raw data available in public repositories. Whilst this was mostly due to the papers being based on datatypes without well recognized repositories such as electrophysiological data, one of the papers (Wu et al., BMC Neuroscience) was based on Proteomics data, which does have the Tranche and PRIDE databases to hold raw and processed data. Heading off to the HUPO proteomics meeting tomorrow (and due to talk at the repositories session there), it's timely to highlight the challenges this particular field still faces, and it will be very interesting to hear this week how these resources are being accepted by their communities.

Despite all of the positive words regarding datasharing at Science Online, this particular exercise has been a bit of a reality check, and its very clear that there is still an enormous of work still to do. With the meeting about to end, and the attendees all going there separate ways around the world, one positive thing to take is that they and organizations such as the NIF are at least using their combined energies to address this. Lets all hope they succeed, and #solo12 will be a nice opportunity to assess how this will progress. See you then, and don't forget to follow the huge amounts of coverage on twitter (follow #solo11) and the soon-to-be-posted videos.

References

Wu CY, Whye D, Glazewski L, Choe L, Kerr D, Lee KH, et al. Proteomic assessment of a cell model of spinal muscular atrophy. 2011;12:25+. Available from: http://dx.doi.org/10.1186/1471-2202-12-25.


Friday Sep 02, 2011

The complexity of life

ICSB 2011 left me with a greater than ever appreciation for, to borrow from the title of the last plenary session, the complexity of life.  I was impressed by the increasingly complex and explanative models that are being built, and faster and more detailed imaging methods under development, not to mention the exciting new applications of systems biology for disease treatment.  I am sure I’m not the only one who feels this way. 

I want to especially thank the participants in the Plant Systems Biology and the Systems Neuroscience Sessions.  These parallel sessions were smaller, but both the audience and the speakers were dedicated.

This year’s ICSB was a great meeting.  Facetiously, my only disappointment was that there were only awards for posters and not for the best models of the poster winners as well.

For those of you who are already missing the meeting, or who missed out on the meeting, or who just can’t wait until Toronto next year, you can relive the meeting here.

Finally, a reminder that Scott and I will be at the HUPO World Congress in Geneva starting on Sunday.  Hope to see some of you there!

Monday Aug 29, 2011

Systems, synthetics and semantics

As those of you who have been paying attention well know, I am currently attending this year’s International Conference on Systems Biology (#ICSB) meeting in either Heidelberg or Mannheim (there’s ongoing debate about the meeting locale).

The meeting opened yesterday with a plenary talk by Jean Peccoud talking about DNA “grammar,” making linguistic models of yeast cell cycle regularly network genetics, and the open source application (with a database too) GenoCAD, which I really recommend you check out.  It was a great opening act.

Continuing on the language theme, today I really enjoyed two talks that both had a bit of a semantics bent.  The first was Frank Bergmann’s “Emerging Standards in Systems Biology.”  He gave an overview on markup languages for systems biology (such as the Systems Biology Markup Language, SBML) and other standardized notation efforts.  He then went on to describe a framework of tools, the Systems Biology Workbench, that were developed to make SBML associated standards more accessible.  Frank also runs a blog on these topics.

The other talk I wanted to highlight was Dagmar Waltemath’s on concepts for model management for model re-use and the reproducibility of model results.  She stressed the importance of good meta-data in model storage for future localization and retrieval, tracking model evolution, and linking simulation experiments with the model.  She discussed some tools and techniques developed at Rostock for handling these issues, the code of which is freely available: http://bives.sorceforge.net and http://sombi.sorceforge.net .

And the conference is not even half over...