Skip to content
Interagency Autism Coordinating Committee (IACC)
Autism Research Database
Project Element Element Description

Project Title

Project Title

Compressive Genomics for Large Omics Data Sets: Algorithms, Applications and Tools

Principal Investigator

Principal Investigator

Berger, Bonnie

Description

Description

High-throughput experimental technologies are generating increasingly massive and complex genomicsequence data sets. While these data hold the promise of uncovering entirely new biology, their sheerenormity threatens to make their interpretation computationally infeasible. The continued goal of thisproject is to design and develop innovative compression-based algorithmic techniques for efficientlyprocessing massive biological data. We will branch out beyond compressive search to address theimminent need to securely store and process large-scale genomic data in the cloud, as well as to gaininsights from massive metagenomic data. ��The key underlying observation is that genomic data is highly structured, exhibiting high degrees ofself-similarity. In our previous granting period, we exploited its high redundancy and low fractaldimension to enable scalable compressive storage and acceleration for search of sequence data aswell as other biological data types relevant to structural bioinformatics and chemogenomics. In thisrenewal, we will continue to capitalize on the structure (i.e., compressibility) of genomic data to: (i)overcome privacy concerns that arise in sharing sensitive human data (e.g. on the cloud); (ii) addressnew challenges, beyond search, with metagenomic data; and (iii) seek to widen the adoption of theprevious and newly-proposed compressive algorithms for industry, research, and clinical use. We willdemonstrate the utility of our compressive techniques to the characterization of human genomic andmetagenomic variation.�We will collaborate with co-I Sahinalp's lab (Indiana University, Bloomington) on developing andapplying these tools to high-throughput data sets including autism spectrum disorder (with IsaacKohane and Evan Eichler) and cancer (with PCAWG, Pan Cancer Analysis of Whole Genomes), themicrobiome (with Eric Alm and Jian Peng), as well as human variation analysis (GATK, with EricLander and Eric Banks). The broad, long-term goal is to apply our compressive approach tomassive biological data sets to elucidate the still obscure molecular landscape of diseases.��Successful completion of these aims will result in computational methods and tools that will significantlyincrease our ability to securely store, access and analyze massive data sets and will revealfundamental aspects of genetic variation, as well as testable hypotheses for experimentalinvestigations. Not only will all developed software be made publicly available, but as part of ourintegration aim, we will also ensure that the research community can make use of our innovations withminimal effort. Through our research collaborations, we will both build these tools and demonstratetheir relevance to the characterization of human health and disease.��

Funder

Funder

National Institutes of Health

Funding Country

Funding Country

United States

Fiscal Year Funding

Fiscal Year Funding

372014

Current Award Period

Current Award Period

2013-2020

Strategic Plan Question

Strategic Plan Question

Question 2: What is the Biology Underlying ASD?

Funder’s Project Link

Funder’s Project Link

NIH RePORTER Project Page Go to website disclaimer

Institution

Institution

Massachusetts Institute of Technology

Institute Location

Institute Location

United States

Project Number

Project Number

2R01GM108348-04A1

Government or Private

Government or Private

Government

History/Related Projects

History/Related Projects

N/A

Back to Top