Skip to main content
An official website of the United States government
Grant Details

Grant Number: 5P01CA196569-08 Interpret this number
Primary Investigator: Gauderman, William
Organization: University Of Southern California
Project Title: Statistical Methods for Integrative Genomics in Cancer
Fiscal Year: 2023


OVERALL ABSTRACT The overall goal of this Program Project is to develop novel statistical methods for integrating multi-omic data to address etiology, prognosis, and treatment of cancer through a collaboration of four closely related projects and four shared cores (see inset). The four projects can be broadly described as spanning the spectrum of analysis challenges including feature selection, mediation, interaction, and characterization. The first of these, “High-Dimensional Regression for Data Integration,” develops new strategies for the analysis of longitudinal -omic data incorporating external functional information, maintaining a rigorous inferential foundation. The second project, “Integration of Omic Data to Estimate Mediation or Latent Structures,” develops novel latent factor and mediation models using high-dimensional omic data or GWAS summary statistics to identify and distinguish genotype, exposure and omic effects. The third project, “Integration of Omic Data in the Analysis of Gene x Environment Interaction,” incorporates gene expression and other -omics data into powerful multi-step approaches to scan for interactions leveraging exposure or disease marginal associations. Project 3 will also add novel approaches to identify transcriptional interactions, hierarchical GxE models with heredity constraints (i.e., requiring interactions to include the corresponding main effects), and extensions to longitudinal, survival, and quantitative traits. The fourth project, “Statistical Methods for Genome Characterization,” automates annotation of gene function using phylogenetic inference to identify new cancer- specific regions of conserved DNA methylation. Project 4 also proposes a novel approach for agnostic pathway gene set enrichment analysis. These projects will be supported by four cores: Administrative Core (A), Functional Annotation Core (B), Computation and Software Development Core (C), and Data Analysis and Research Translation Core (D). Core B will maintain up-to-date copies of key bioinformatics resources and will develop a software application that will provide a single unified portal for creating annotation files that integrates data from multiple resources. Core C will assist with high-volume computing needs and will develop user-friendly software packages that implement novel methods. Core D will focus on translation of new methods, both by supporting applications to real cancer datasets and by developing materials for training outside investigators in the use of our methods and software. Our proposed work will have both methodological and substantive importance. On the one hand, we will develop novel statistical methods that will be applicable to a wide range of cancer epidemiology studies and clinical trials. These methods will, for example, allow more powerful discovery of genetic associations and interactions through leveraging biological information from other sources. They will have translational significance in the areas of risk prediction and targeted interventions. Our program is designed to be highly integrative, with the various projects and cores being inter-related, so that together they will be more informative than any of them could be on their own. Program members have access to extraordinary data resources at USC and elsewhere, assuring that the methods we develop will be motivated by, and applicable to, important questions arising in current cancer research.