Skip to main content
An official website of the United States government
Grant Details

Grant Number: 1R03CA183006-01 Interpret this number
Primary Investigator: Long, Qi
Organization: Emory University
Project Title: Feature Selection for Genomic Data Using Known and Novel Biological Information
Fiscal Year: 2014


DESCRIPTION (provided by applicant): Our long-term goal is to reduce cancer risk by building accurate prediction models for cancer risk and prognosis and developing individualized prevention and treatment strategies based on diverse data including clinical and genomic data. Our immediate goal in the current study is to develop innovative statistical methods to identify genomic features in relation to cancer risk and prognosis with incorporation of biological information including prior-knowledge gene pathways and novel microRNA (miRNA) regulatory network. The underlying rationale for this research is that: 1) high-dimensional data such as genomic biomarkers have been obtained in many research studies and will likely be readily available in practice in the foreseeable future; 2) feature selection is imperative in order to buid good prediction models using high-dimensional genomic data; 3) incorporating known and novel biological information allows information-borrowing in feature selection, resulting in greater power; and 4) semiparametric methods are more robust to model misspecification than parametric methods that have dominated the literature in feature selection. These considerations lead to four specific aims: 1) develop hierarchical feature selection of high-dimensional biomarkers in semiparametric accelerated failure time (AFT) models for cancer outcomes (e.g., time to cancer recurrence or death) with incorporation of known and novel biological information; 2) develop Bayesian feature selection of high- dimensional biomarkers in AFT models for cancer outcomes (e.g., time to cancer recurrence or death) with integrative analysis of the miRNA regulatory network and incorporation of known and novel biological information; 3) develop efficient algorithms and user-friendly software with the goal of disseminating them to cancer researchers; and 4) perform systematic evaluation of the proposed methods through extensive numerical studies including simulations and real data analyses. Our proposed methods distinguish themselves from existing approaches in that we use both known and novel biological information to guide feature selection, and we investigate two alternative approaches, semiparametric and fully Bayesian joint-modeling, each of which has its own strengths and weaknesses. Progress on all aims will be guided by and evaluated on motivating prostate cancer and brain tumor data, and by extensive simulation studies. The proposed methods will allow investigators to identify key genomic signatures as well as biological pathways that are predictive of cancer risk and prognosis, leading to potential drug targets and subsequently effective personalized treatments. They promise similar benefits to a wide range of biomedical science settings where similar data and biological information are often encountered.


Instruments for determining clinically relevant fatigue in breast cancer patients during radiotherapy.
Authors: Andic F. , Miller A.H. , Brown G. , Chu L. , Lin J. , Liu T. , Sertdemir Y. , Torres M.A. .
Source: Breast cancer (Tokyo, Japan), 2020 Mar; 27(2), p. 197-205.
EPub date: 2019-09-06.
PMID: 31493295
Related Citations

Impact of Regional Nodal Irradiation and Hypofractionated Whole-Breast Radiation on Long-Term Breast Retraction and Poor Cosmetic Outcome in Breast Cancer Survivors.
Authors: Wang D. , Yang X. , He J. , Lin J. , Henry S. , Brown G. , Chu L. , Godette K.D. , Kahn S.T. , Liu T. , et al. .
Source: Clinical breast cancer, 2020 Feb; 20(1), p. e75-e81.
EPub date: 2019-09-18.
PMID: 31780378
Related Citations

Full axillary lymph node dissection and increased breast epidermal thickness 1 year after radiation therapy for breast cancer.
Authors: Lin J.Y. , Yang X. , Serra M. , Miller A.H. , Godette K.D. , Kahn S.T. , Henry S. , Brown G. , Liu T. , Torres M.A. .
Source: Journal of surgical oncology, 2019 Dec; 120(8), p. 1397-1403.
EPub date: 2019-11-08.
PMID: 31705561
Related Citations

Scalable Bayesian variable selection for structured high-dimensional data.
Authors: Chang C. , Kundu S. , Long Q. .
Source: Biometrics, 2018 Dec; 74(4), p. 1372-1382.
EPub date: 2018-05-08.
PMID: 29738602
Related Citations

Bayesian Multiresolution Variable Selection for Ultra-High Dimensional Neuroimaging Data.
Authors: Zhao Y. , Kang J. , Long Q. .
Source: IEEE/ACM transactions on computational biology and bioinformatics, 2018 Mar-Apr; 15(2), p. 537-550.
PMID: 29610102
Related Citations

Integrative analysis of transcriptomic and metabolomic data via sparse canonical correlation analysis with incorporation of biological information.
Authors: Safo S.E. , Li S. , Long Q. .
Source: Biometrics, 2018 Mar; 74(1), p. 300-312.
EPub date: 2017-05-08.
PMID: 28482123
Related Citations

Impact of Selection Bias on Estimation of Subsequent Event Risk.
Authors: Hu Y.J. , Schmidt A.F. , Dudbridge F. , Holmes M.V. , Brophy J.M. , Tragante V. , Li Z. , Liao P. , Quyyumi A.A. , McCubrey R.O. , et al. .
Source: Circulation. Cardiovascular genetics, 2017 Oct; 10(5), .
PMID: 28986451
Related Citations

Incorporating biological information in sparse principal component analysis with application to genomic data.
Authors: Li Z. , Safo S.E. , Long Q. .
Source: BMC bioinformatics, 2017-07-11; 18(1), p. 332.
EPub date: 2017-07-11.
PMID: 28697740
Related Citations

Evaluation of a 24-gene signature for prognosis of metastatic events and prostate cancer-specific mortality.
Authors: Pellegrini K.L. , Sanda M.G. , Patil D. , Long Q. , Santiago-Jiménez M. , Takhar M. , Erho N. , Yousefi K. , Davicioni E. , Klein E.A. , et al. .
Source: BJU international, 2017 Jun; 119(6), p. 961-967.
EPub date: 2017-02-11.
PMID: 28107602
Related Citations

Bayesian modeling and prediction of accrual in multi-regional clinical trials.
Authors: Deng Y. , Zhang X. , Long Q. .
Source: Statistical methods in medical research, 2017 Apr; 26(2), p. 752-765.
EPub date: 2014-11-03.
PMID: 25367100
Related Citations

Addressing issues associated with evaluating prediction models for survival endpoints based on the concordance statistic.
Authors: Wang M. , Long Q. .
Source: Biometrics, 2016 Sep; 72(3), p. 897-906.
EPub date: 2016-01-12.
PMID: 26756274
Related Citations

The Impact of Axillary Lymph Node Surgery on Breast Skin Thickening During and After Radiation Therapy for Breast Cancer.
Authors: Torres M.A. , Yang X. , Noreen S. , Chen H. , Han T. , Henry S. , Mister D. , Andic F. , Long Q. , Liu T. .
Source: International journal of radiation oncology, biology, physics, 2016-06-01; 95(2), p. 590-6.
EPub date: 2016-01-23.
PMID: 27055397
Related Citations

Hierarchical Feature Selection Incorporating Known and Novel Biological Information: Identifying Genomic Features Related to Prostate Cancer Recurrence.
Authors: Zhao Y. , Chung M. , Johnson B.A. , Moreno C.S. , Long Q. .
Source: Journal of the American Statistical Association, 2016; 111(516), p. 1427-1439.
EPub date: 2017-01-04.
PMID: 28435175
Related Citations

Variable selection in the presence of missing data: resampling and imputation.
Authors: Long Q. , Johnson B.A. .
Source: Biostatistics (Oxford, England), 2015 Jul; 16(3), p. 596-610.
EPub date: 2015-02-18.
PMID: 25694614
Related Citations

Temporal changes in serum biomarkers and risk for progression of gastric precancerous lesions: a longitudinal study.
Authors: Tu H. , Sun L. , Dong X. , Gong Y. , Xu Q. , Jing J. , Long Q. , Flanders W.D. , Bostick R.M. , Yuan Y. .
Source: International journal of cancer, 2015-01-15; 136(2), p. 425-34.
EPub date: 2014-06-19.
PMID: 24895149
Related Citations

Global transcriptome analysis of formalin-fixed prostate cancer specimens identifies biomarkers of disease recurrence.
Authors: Long Q. , Xu J. , Osunkoya A.O. , Sannigrahi S. , Johnson B.A. , Zhou W. , Gillespie T. , Park J.Y. , Nam R.K. , Sugar L. , et al. .
Source: Cancer research, 2014-06-15; 74(12), p. 3228-37.
EPub date: 2014-04-08.
PMID: 24713434
Related Citations

A nonparametric multiple imputation approach for data with missing covariate values with application to colorectal adenoma data.
Authors: Hsu C.H. , Long Q. , Li Y. , Jacobs E. .
Source: Journal of biopharmaceutical statistics, 2014; 24(3), p. 634-48.
PMID: 24697618
Related Citations

Back to Top