Grant Details
Grant Number: |
5R03CA183006-02 Interpret this number |
Primary Investigator: |
Long, Qi |
Organization: |
Emory University |
Project Title: |
Feature Selection for Genomic Data Using Known and Novel Biological Information |
Fiscal Year: |
2015 |
Abstract
DESCRIPTION (provided by applicant): Our long-term goal is to reduce cancer risk by building accurate prediction models for cancer risk and prognosis and developing individualized prevention and treatment strategies based on diverse data including clinical and genomic data. Our immediate goal in the current study is to develop innovative statistical methods to identify genomic features in relation to cancer risk and prognosis with incorporation of biological information including prior-knowledge gene pathways and novel microRNA (miRNA) regulatory network. The underlying rationale for this research is that: 1) high-dimensional data such as genomic biomarkers have been obtained in many research studies and will likely be readily available in practice in the foreseeable future; 2) feature selection is imperative in order to buid good prediction models using high-dimensional genomic data; 3) incorporating known and novel biological information allows information-borrowing in feature selection, resulting in greater power; and 4) semiparametric methods are more robust to model misspecification than parametric methods that have dominated the literature in feature selection. These considerations lead to four specific aims: 1) develop hierarchical feature selection of high-dimensional biomarkers in semiparametric accelerated failure time (AFT) models for cancer outcomes (e.g., time to cancer recurrence or death) with incorporation of known and novel biological information; 2) develop Bayesian feature selection of high- dimensional biomarkers in AFT models for cancer outcomes (e.g., time to cancer recurrence or death) with integrative analysis of the miRNA regulatory network and incorporation of known and novel biological information; 3) develop efficient algorithms and user-friendly software with the goal of disseminating them to cancer researchers; and 4) perform systematic evaluation of the proposed methods through extensive numerical studies including simulations and real data analyses. Our proposed methods distinguish themselves from existing approaches in that we use both known and novel biological information to guide feature selection, and we investigate two alternative approaches, semiparametric and fully Bayesian joint-modeling, each of which has its own strengths and weaknesses. Progress on all aims will be guided by and evaluated on motivating prostate cancer and brain tumor data, and by extensive simulation studies. The proposed methods will allow investigators to identify key genomic signatures as well as biological pathways that are predictive of cancer risk and prognosis, leading to potential drug targets and subsequently effective personalized treatments. They promise similar benefits to a wide range of biomedical science settings where similar data and biological information are often encountered.
Publications
None