Authored by Baylor College of Medicine, Results Published in Cell Systems and Introduce Open-Source Juicer Tool
SAN DIEGO, August 1, 2016 – Edico Genome’s DRAGEN™ Bio-IT processor accelerated by nearly 20-fold the analysis of massive data sets generated from the study of three-dimensional (3-D) structures of DNA, as published in Cell Systems. In the paper, Dr. Erez Lieberman Aiden, assistant professor of molecular and human genetics and director of The Center for Genome Architecture (TC4GA) at Baylor College of Medicine and Rice University, and his colleagues introduced Juicer, an open-source tool for used in three-dimensional (3-D) genome sequencing (“Hi-C”) experiments.
Hi-C experiments generate terabases of data to create high-resolution contact maps and to comprehensively map the loops that the genome forms when it folds up inside the nucleus of a cell. Identifying these loops is crucial to understanding genetic regulation, which could improve our understanding of genetic diseases or inform drug development. Juicer features a fully automated pipeline that allows users with little or no computational background to transform raw next-generation sequence data into genome-wide maps of looping. The DRAGEN hardware greatly accelerated the processing of the extensive datasets generated by the sequencing pipeline, with specific analysis time summarized in the table below.
||Total Time to Process 1.5 Billion Paired-End Hi-C Reads (Hour: Minute)
|Amazon Web Services g2.8 x Large
|Broad Univa Grid Engine
|Rice PowerOmics + FPGA (DRAGEN)
In 2009, Dr. Aiden and collaborators at the Broad Institute of MIT and Harvard and at UMass Medical School invented Hi-C, a method that produces a genome-wide measure of the probability of contact between pairs of loci. Hi-C combines high-throughput sequencing with earlier technologies, including Nuclear Ligation Assay and Chromosome Conformation Capture. In 2014, members of Dr. Aiden’s team showed that it was possible to use Hi-C to create a genome-wide map of loops, in which the genome bends backward, bringing genes close to crucial regulatory elements that lie far away in one dimension. But there was a catch: even a single map requires billions of reads to generate. The researchers immediately recognized that new hardware solutions for analyzing massive sequencing datasets would be crucial to the fledgling field.
In the current study, researchers tested Juicer by creating the deepest 3-D map to date, spanning over three terabytes of sequence data drawn from a single experimental condition. The team, led by Neva Durand, Ph.D., Muhammad Shamim, and Ido Machol, also benchmarked the performance of Juicer on four different cluster systems, including a system based on Edico’s DRAGEN platform and IBM’s Power8 architecture. The DRAGEN-based system yielded the fastest analysis times of all the systems tested.
“The study published in Cell Systems describes our team’s new, end-to-end system for analysis of 3-D genome sequencing data. It is the first system of its kind, making it possible to map the loops in a mammalian genome in a fully automated fashion,” said Dr. Durand, a senior scientist at TC4GA and co-first author of the study.
Mr. Machol, a co-author of the study, added, “When we ran our pipeline on a hybrid DRAGEN/Power system, the data analysis was 20-fold faster than running the pipeline on an industry standard cluster. That kind of difference opens the door to many analyses that would have been very impractical before.”
“Dr. Aiden and his team’s application of DRAGEN to accelerate Juicer is a great example of DRAGEN’s effectiveness in processing massive amounts of raw sequencing data in minimal time and without requiring any additional training or post-graduate degree. In addition, one DRAGEN/Power system replaces a cluster of servers, making for a very compact and economic bioinformatics solution,” said Pieter van Rooyen, Ph.D., chief executive officer of Edico Genome. “We are continually working to optimize DRAGEN and expect the next version to be even faster than the speed we have already achieved.”
DRAGEN is highly reconfigurable, using a field-programmable gate array (FPGA) to provide hardware-accelerated implementations of genome pipeline algorithms, such as BCL conversion, compression, mapping, alignment, sorting, duplicate marking and haplotype variant calling. The flexible DRAGEN platform allows users to develop custom algorithms as well as refine and improve existing pipelines. Updated versions are made available for customers through simple remote downloads.
Although pipelines for Hi-C data analysis exist, current solutions are not designed to annotate loops or process data at the terabase scale. Juicer features the ability to automatically annotate loops and contact domains, and is compatible with multiple cluster operating systems and with Amazon Web Services. Juicer is available at http://aidenlab.org/juicer/.
“Given the dramatic acceleration that we observed, we are excited about the extraordinary potential of FPGA technology in 3-D genomics,” said Mr. Shamim, who is co-first author of the study and currently working towards a M.D.-Ph.D. at Baylor College of Medicine.
The Cell Systems paper can be found by visiting http://dx.doi.org/10.1016/j.cels.2016.07.002.
Other contributors to this work include James T. Robinson, Jill P. Mesirov, and Eric S. Lander of the Broad Institute of Harvard and MIT, and Suhas Rao and Miriam Huntley, from The Center for Genome Architecture.
About Edico Genome
Edico Genome has created the world’s first bioinformatics processor designed to analyze next-generation sequencing data, DRAGEN™. The use of next-generation sequencing is growing at an unprecedented pace, creating a need for a technology that can process this big data rapidly and accurately. Edico Genome’s computing platform has been shown to speed whole genome data analysis from hours to minutes, while maintaining high accuracy and reducing costs, enabling clinicians and researchers to reveal answers more quickly. For more information, visit www.EdicoGenome.com or follow @EdicoGenome.
About Baylor College of Medicine
Baylor College of Medicine (www.bcm.edu) in Houston is recognized as a premier academic health sciences center and is known for excellence in education, research and patient care. It is the only private medical school in the greater southwest and is ranked 20th among medical schools for research and 9th for primary care by U.S. News & World Report. Baylor is listed 20th among all U.S. medical schools for National Institutes of Health funding and number one in Texas. Located in the Texas Medical Center, Baylor has affiliations with seven teaching hospitals and jointly owns and operates Baylor St. Luke’s Medical Center, part of CHI St. Luke’s Health. Currently, Baylor trains more than 3,000 medical, graduate, nurse anesthesia, physician assistant and orthotics students, as well as residents and post-doctoral fellows. Follow Baylor College of Medicine on Facebook (http://www.facebook.com/BaylorCollegeOfMedicine) and Twitter (http://twitter.com/BCMHouston).
About the Aiden Lab and The Center for Genome Architecture at Baylor College of Medicine
Directed by Erez Lieberman Aiden, The Center for Genome Architecture is a world leader in the study of 3-D genomics. In 2009, Dr. Aiden and colleagues introduced the Hi-C technology, the first method for sequencing entire genomes in 3-D. In 2014, researchers at TC4GA published the first reliable map of loops across the human genome. In 2015, researchers at TC4GA performed the first successful surgery on the human genome, changing how it is folded inside the nucleus of a cell by means of ultra-targeted DNA modifications. Their work has appeared on the cover of Nature and Science; the laboratory has also been recognized on the floor of the U.S. House of Representatives for its discoveries about the structure of DNA. For more information, visit www.tc4ga.com or follow @theaidenlab on twitter.