Skip to main content
Grant Details

Grant Number: 5R35CA197449-06 Interpret this number
Primary Investigator: Lin, Xihong
Organization: Harvard School Of Public Health
Project Title: Statistical Methods for Analysis of Massive Genetic and Genomic Data in Cancer Research
Fiscal Year: 2020


 DESCRIPTION (provided by applicant): With the advances of technologies, cancer research enterprise is rapidly becoming data-intensive and data- driven. One example is the explosion of biotechnologies and the generation of massive genetic and genomic data, such as whole genome sequencing data. Another example is health informatics, which allows rapid avail- ability of large administrative health care databases, such as electronic medical records and Medicare claim data. Cancer data science has emerged to be increasingly important in cancer research. Indeed, massive data provide unprecedented opportunities for new discovery in cancer. This project aims at development and application of statistical and computational methods for analysis of massive and complex genetic and genomic data, together with epidemiological and clinical data, in population and medical science of cancer research. Our ultimate goal is to use rich data sources to understand cancer etiology, risk, and prognosis, and discover new effective strategies for cancer prevention, intervention and treatment. It has become increasingly evident that limited methods suitable for analyzing massive data have emerged as a bottleneck to effectively translate rich information into meaningful knowledge. There is a pressing need to develop statistical and computational methods for massive cancer data to bridge the technology and information transfer gap, and accelerate innovations in cancer prevention and treatment. This Project aims at narrowing this gap. Specifically, to advance genetic and genomic cancer epidemiology, we will develop statistical and computational methods for (a) analysis of whole genome sequencing association studies; (b) integrative analysis of genetic, genomic, and environment data; (c) study of gene-environment interactions; (d) risk prediction using whole genome genetic and genomic data and environmental data. To advance cancer genomic medicine, we will develop statistical and computational methods for integrative analysis of genetic, genomic and clinical data to understand cancer prognosis and advance precision medicine using (a) data from genetic epidemiological cohort studies; (b) combining data from genetic epidemiological cohort studies with administrative databases such as electronic medical records and Medicare claim data. We have assembled a strong collaborative interdisciplinary team of researchers involving biostatisticians, computational biologists, health informaticians, genetic epidemiologists and clinical scientists. We will apply te proposed methods to lung, breast and nasopharynx cancer genetic epidemiological and clinical studies. We will develop open access user friendly software to be distributed to the research community, and open online educational modules for training cancer researchers in using the methods developed in this Project.