Autism Research Database - Principal Investigator Project Details | IACC

Project Element	Element Description
Project Title Project Title	Compressive Genomics for Large Omics Data Sets: Algorithms, Applications and Tools
Principal Investigator Principal Investigator	Berger, Bonnie
Description Description	High-throughput experimental technologies are generating increasingly massive and complex genomicsequence data sets. While these data hold the promise of uncovering entirely new biology, their sheerenormity threatens to make their interpretation computationally infeasible. The continued goal of thisproject is to design and develop innovative compression-based algorithmic techniques for efficientlyprocessing massive biological data. We will branch out beyond compressive search to address theimminent need to securely store and process large-scale genomic data in the cloud, as well as to gaininsights from massive metagenomic data. ��The key underlying observation is that genomic data is highly structured, exhibiting high degrees ofself-similarity. In our previous granting period, we exploited its high redundancy and low fractaldimension to enable scalable compressive storage and acceleration for search of sequence data aswell as other biological data types relevant to structural bioinformatics and chemogenomics. In thisrenewal, we will continue to capitalize on the structure (i.e., compressibility) of genomic data to: (i)overcome privacy concerns that arise in sharing sensitive human data (e.g. on the cloud); (ii) addressnew challenges, beyond search, with metagenomic data; and (iii) seek to widen the adoption of theprevious and newly-proposed compressive algorithms for industry, research, and clinical use. We willdemonstrate the utility of our compressive techniques to the characterization of human genomic andmetagenomic variation.�We will collaborate with co-I Sahinalp's lab (Indiana University, Bloomington) on developing andapplying these tools to high-throughput data sets including autism spectrum disorder (with IsaacKohane and Evan Eichler) and cancer (with PCAWG, Pan Cancer Analysis of Whole Genomes), themicrobiome (with Eric Alm and Jian Peng), as well as human variation analysis (GATK, with EricLander and Eric Banks). The broad, long-term goal is to apply our compressive approach tomassive biological data sets to elucidate the still obscure molecular landscape of diseases.��Successful completion of these aims will result in computational methods and tools that will significantlyincrease our ability to securely store, access and analyze massive data sets and will revealfundamental aspects of genetic variation, as well as testable hypotheses for experimentalinvestigations. Not only will all developed software be made publicly available, but as part of ourintegration aim, we will also ensure that the research community can make use of our innovations withminimal effort. Through our research collaborations, we will both build these tools and demonstratetheir relevance to the characterization of human health and disease.��
Funder Funder	National Institutes of Health
Funding Country Funding Country	United States
Fiscal Year Funding Fiscal Year Funding	372014
Current Award Period Current Award Period	2013-2020
Strategic Plan Question Strategic Plan Question	Question 2: What is the Biology Underlying ASD?
Funder’s Project Link Funder’s Project Link	NIH RePORTER Project Page
Institution Institution	Massachusetts Institute of Technology
Institute Location Institute Location	United States
Project Number Project Number	2R01GM108348-04A1
Government or Private Government or Private	Government
History/Related Projects History/Related Projects	N/A

Back to Top