Skip to main content
Grant Details

Grant Number: 5R01CA204120-03 Interpret this number
Primary Investigator: Ma, Shuangge
Organization: Yale University
Project Title: Novel Methods for Identifying Genetic Interactions in Cancer Prognosis
Fiscal Year: 2018
Back to top


? DESCRIPTION (provided by applicant): Project Summary In cancer prognosis, beyond the main effects of environmental/clinical (E) and genetic (G) risk factors, the interactions between G and E factors (G*E interactions) and those between G and G factors (G*G interactions) also play critical roles. The existing findings are insufficient, and there is a strong need for identifing more prognostic interactions. Most of the existing effort has been focused on data collection. In contrast, the development of effective analysis methods has been lagging behind. Compared to data collection, methodological development takes much less resources but is equally critical in making reliable findings. Most of the existing interaction analysis methods share the limitation of lacking robustness properties. In practice, data contamination and model mis-specification are not uncommon and can lead to severely biased model parameter estimation and false marker identification. The development of robust genetic interaction analysis methods is very limited. There are a few methods for case-control data, but they are not applicable to prognosis data. For prognosis data and interaction analysis, there is some very recent progress in quantile regression and rank-based methods, but the development has been limited and unsystematic. Last but not least, the existing robust methods have the common drawback of adopting ineffective marker selection techniques. Our group has been at the frontier of developing robust interaction analysis methods. Our statistical investigations and simulations have provided convincing evidences that the robust methods using the penalization technique outperform alternatives with significantly more accurate marker identification and model parameter estimation. In data analysis, important interactions missed by the existing analyses have been identified for multiple cancer types. However, we have also found that the scope of the existing studies needs to be significantly expanded in terms of both methodological development and data analysis. This project has been motivated by the importance of interactions in cancer prognosis and limitations of the existing studies. Our objectives are as follows. (Aim 1) Develop novel marginal analysis methods that are robust to data contamination and model mis-specification for identifying important interactions. (Aim 2) Develop novel joint analysis methods that are robust to data contamination and model mis-specification for identifying important interactions. (Aim 3) Develop tailored inference approaches to draw more definitive conclusions on the identified interactions. (Aim 4) Develop public R software and a dynamic project website. Identify prognostic interactions for multiple cancers. For the identified interactions, we will conduct extensive bioinformatic and statistical analysis, evaluations, and comparisons. With our unique expertise, extensive experiences, and promising preliminary studies, this project has a high likelihood of success.

Back to top


Assisted gene expression-based clustering with AWNCut.
Authors: Li Y. , Bie R. , Teran Hidalgo S.J. , Qin Y. , Wu M. , Ma S. .
Source: Statistics In Medicine, 2018-12-20 00:00:00.0; 37(29), p. 4386-4403.
EPub date: 2018-08-09 00:00:00.0.
PMID: 30094873
Related Citations

Identification of cancer omics commonality and difference via community fusion.
Authors: Sun Y. , Jiang Y. , Li Y. , Ma S. .
Source: Statistics In Medicine, 2018-11-12 00:00:00.0; , .
EPub date: 2018-11-12 00:00:00.0.
PMID: 30421444
Related Citations

Overlapping clustering of gene expression data using penalized weighted normalized cut.
Authors: Teran Hidalgo S.J. , Zhu T. , Wu M. , Ma S. .
Source: Genetic Epidemiology, 2018-10-09 00:00:00.0; , .
EPub date: 2018-10-09 00:00:00.0.
PMID: 30302823
Related Citations

A Forward and Backward Stagewise Algorithm for Nonconvex Loss Functions with Adaptive Lasso.
Authors: Shi X. , Huang Y. , Huang J. , Ma S. .
Source: Computational Statistics & Data Analysis, 2018 Aug; 124, p. 235-251.
EPub date: 2018-03-28 00:00:00.0.
PMID: 30319163
Related Citations

Robust identification of gene-environment interactions for prognosis using a quantile partial correlation approach.
Authors: Xu Y. , Wu M. , Zhang Q. , Ma S. .
Source: Genomics, 2018-07-16 00:00:00.0; , .
EPub date: 2018-07-16 00:00:00.0.
PMID: 30009922
Related Citations

Dissecting gene-environment interactions: A penalized robust approach accounting for hierarchical structures.
Authors: Wu C. , Jiang Y. , Ren J. , Cui Y. , Ma S. .
Source: Statistics In Medicine, 2018-02-10 00:00:00.0; 37(3), p. 437-456.
EPub date: 2017-10-16 00:00:00.0.
PMID: 29034484
Related Citations

Analysis of cancer gene expression data with an assisted robust marker identification approach.
Authors: Chai H. , Shi X. , Zhang Q. , Zhao Q. , Huang Y. , Ma S. .
Source: Genetic Epidemiology, 2017 Dec; 41(8), p. 779-789.
EPub date: 2017-09-14 00:00:00.0.
PMID: 28913902
Related Citations

Integrative sparse principal component analysis of gene expression data.
Authors: Liu M. , Fan X. , Fang K. , Zhang Q. , Ma S. .
Source: Genetic Epidemiology, 2017 Dec; 41(8), p. 844-865.
EPub date: 2017-11-08 00:00:00.0.
PMID: 29114920
Related Citations

Sparse boosting for high-dimensional survival data with varying coefficients.
Authors: Yue M. , Li J. , Ma S. .
Source: Statistics In Medicine, 2017-11-19 00:00:00.0; , .
EPub date: 2017-11-19 00:00:00.0.
PMID: 29152776
Related Citations

Identifying gene-gene interactions using penalized tensor regression.
Authors: Wu M. , Huang J. , Ma S. .
Source: Statistics In Medicine, 2017-10-16 00:00:00.0; , .
EPub date: 2017-10-16 00:00:00.0.
PMID: 29034516
Related Citations

Inferring gene regulatory relationships with a high-dimensional robust approach.
Authors: Zang Y. , Zhao Q. , Zhang Q. , Li Y. , Zhang S. , Ma S. .
Source: Genetic Epidemiology, 2017 Jul; 41(5), p. 437-454.
EPub date: 2017-05-02 00:00:00.0.
PMID: 28464328
Related Citations

Accommodating missingness in environmental measurements in gene-environment interaction analysis.
Authors: Wu M. , Zang Y. , Zhang S. , Huang J. , Ma S. .
Source: Genetic Epidemiology, 2017-06-28 00:00:00.0; , .
EPub date: 2017-06-28 00:00:00.0.
PMID: 28657194
Related Citations

Analyzing large datasets with bootstrap penalization.
Authors: Fang K. , Ma S. .
Source: Biometrical Journal. Biometrische Zeitschrift, 2017 Mar; 59(2), p. 358-376.
EPub date: 2016-11-21 00:00:00.0.
PMID: 27870109
Related Citations

Focused Information Criterion and Model Averaging with Generalized Rank Regression.
Authors: Zhang Q. , Duan X. , Ma S. .
Source: Statistics & Probability Letters, 2017 Mar; 122, p. 11-19.
EPub date: 2016-10-31 00:00:00.0.
PMID: 28566799
Related Citations

Back to Top