Skip to main content

COVID-19 Resources

What people with cancer should know:

Guidance for cancer researchers:

Get the latest public health information from CDC:

Get the latest research information from NIH:

Grant Details

Grant Number: 5R21CA191383-03 Interpret this number
Primary Investigator: Ma, Shuangge
Organization: Yale University
Project Title: Penalization Methods for Identifying Gene Envrionment Interactions and Applications to Melanoma and Other Cancer Types
Fiscal Year: 2017


DESCRIPTION (provided by applicant): Considerable effort has been devoted to developing statistical methods for identifying G*E interactions in cancer GWAS studies. The existing methods suffer serious limitations. First, most of them take a model-based approach. The model assumptions are difficult to verify in data analysis, and there is a high risk of model mis- specification, which leads to false marker identification. The existing robust methods have limited applicability. Second, the existing methods adopt ineffective statistical techniques. Recently, we and others introduced effective penalization techniques for identifying important G*E interactions and showed that they significantly outperform the existing techniques. However, the existing penalization methods also have limitations. They adopt an estimation-based marker identification strategy, which is sensitive to tuning parameter selection, lacks stability, and does not have a direct false discovery rate control. In addition, they incur prohibitively high computational cost. The aforementioned limitations can mask the identification of important effects, lead to inconsistent findings across studies, and result in suboptimal predictive models. In this study, we will develop novel methods for detecting G*E interactions in the analysis of cancer etiology, prognosis, and biomarker data. The proposed methods will have the robustness property not shared by the model-based approach. They will adopt novel penalization techniques and advance from the existing penalization methods by adopting and directly comparing multiple marker identification strategies. They will be able to conduct both marginal and joint analyses and both individual marker- and pathway-level analyses. By adopting a progressive approach, they will be computationally affordable with whole-genome data. Specifically, we will (Aim 1) Develop robust penalization methods for identifying important environmental, genetic, and G*E risk factors associated with cancer risk, survival, and biomarker. We will develop effective computational algorithms and rigorously prove the robustness and consistency properties. Extensive simulations and comparisons will be conducted. (Aim 2) Develop user-friendly software and a project website. We will make the software and other research results easily accessible. (Aim 3) Analyze data on melanoma and other cancer types and identify important G*E interactions. We will comprehensively evaluate the identified markers and compare with the results obtained using existing methods. This study will deliver a set of novel methods which will have superior statistical and numerical properties and identify important markers missed by existing methods. They will be broadly applicable to a large number of cancer types and to multiple types of genetic, genomic, and epigenetic measurements. In data analysis, the identified markers will provide important insights into the biological mechanisms underlying melanoma and other cancers and serve as basis for future validation studies and clinical practice.


Semiparametric Bayesian variable selection for gene-environment interactions.
Authors: Ren J. , Zhou F. , Li X. , Chen Q. , Zhang H. , Ma S. , Jiang Y. , Wu C. .
Source: Statistics in medicine, 2020-02-28; 39(5), p. 617-638.
EPub date: 2019-12-21.
PMID: 31863500
Related Citations

Robust identification of gene-environment interactions for prognosis using a quantile partial correlation approach.
Authors: Xu Y. , Wu M. , Zhang Q. , Ma S. .
Source: Genomics, 2019 09; 111(5), p. 1115-1123.
EPub date: 2018-07-17.
PMID: 30009922
Related Citations

Identifying gene-environment interactions incorporating prior information.
Authors: Wang X. , Xu Y. , Ma S. .
Source: Statistics in medicine, 2019-04-30; 38(9), p. 1620-1633.
EPub date: 2019-01-13.
PMID: 30637789
Related Citations

Authors: Chai H. , Zhang Q. , Huang J. , Ma S. .
Source: Statistica Sinica, 2019 Apr; 29(2), p. 877-894.
PMID: 31073263
Related Citations

Robust network-based regularization and variable selection for high-dimensional genomic data in cancer prognosis.
Authors: Ren J. , Du Y. , Li S. , Ma S. , Jiang Y. , Wu C. .
Source: Genetic epidemiology, 2019 04; 43(3), p. 276-291.
EPub date: 2019-02-11.
PMID: 30746793
Related Citations

Integrative Interaction Analysis using Threshold Gradient Directed Regularization.
Authors: Li Y. , Li R. , Qin Y. , Wu M. , Ma S. .
Source: Applied stochastic models in business and industry, 2019 Mar-Apr; 35(2), p. 354-375.
EPub date: 2018-05-29.
PMID: 33071651
Related Citations

A Selective Review of Multi-Level Omics Data Integration Using Variable Selection.
Authors: Wu C. , Zhou F. , Ren J. , Li X. , Jiang Y. , Ma S. .
Source: High-throughput, 2019-01-18; 8(1), .
EPub date: 2019-01-18.
PMID: 30669303
Related Citations

Sparse boosting for high-dimensional survival data with varying coefficients.
Authors: Yue M. , Li J. , Ma S. .
Source: Statistics in medicine, 2018-02-28; 37(5), p. 789-800.
EPub date: 2017-11-19.
PMID: 29152776
Related Citations

Identifying gene-gene interactions using penalized tensor regression.
Authors: Wu M. , Huang J. , Ma S. .
Source: Statistics in medicine, 2018-02-20; 37(4), p. 598-610.
EPub date: 2017-10-16.
PMID: 29034516
Related Citations

Dissecting gene-environment interactions: A penalized robust approach accounting for hierarchical structures.
Authors: Wu C. , Jiang Y. , Ren J. , Cui Y. , Ma S. .
Source: Statistics in medicine, 2018-02-10; 37(3), p. 437-456.
EPub date: 2017-10-16.
PMID: 29034484
Related Citations

Integrative sparse principal component analysis of gene expression data.
Authors: Liu M. , Fan X. , Fang K. , Zhang Q. , Ma S. .
Source: Genetic epidemiology, 2017 12; 41(8), p. 844-865.
EPub date: 2017-11-08.
PMID: 29114920
Related Citations

Analysis of cancer gene expression data with an assisted robust marker identification approach.
Authors: Chai H. , Shi X. , Zhang Q. , Zhao Q. , Huang Y. , Ma S. .
Source: Genetic epidemiology, 2017 12; 41(8), p. 779-789.
EPub date: 2017-09-14.
PMID: 28913902
Related Citations

Identifying gene-environment interactions for prognosis using a robust approach.
Authors: Chai H. , Zhang Q. , Jiang Y. , Wang G. , Zhang S. , Ahmed S.E. , Ma S. .
Source: Econometrics and statistics, 2017 Oct; 4, p. 105-120.
EPub date: 2016-11-16.
PMID: 31157309
Related Citations

Accommodating missingness in environmental measurements in gene-environment interaction analysis.
Authors: Wu M. , Zang Y. , Zhang S. , Huang J. , Ma S. .
Source: Genetic epidemiology, 2017 09; 41(6), p. 523-554.
EPub date: 2017-06-28.
PMID: 28657194
Related Citations

Inferring gene regulatory relationships with a high-dimensional robust approach.
Authors: Zang Y. , Zhao Q. , Zhang Q. , Li Y. , Zhang S. , Ma S. .
Source: Genetic epidemiology, 2017 07; 41(5), p. 437-454.
EPub date: 2017-05-02.
PMID: 28464328
Related Citations

Greedy outcome weighted tree learning of optimal personalized treatment rules.
Authors: Zhu R. , Zhao Y.Q. , Chen G. , Ma S. , Zhao H. .
Source: Biometrics, 2017 06; 73(2), p. 391-400.
EPub date: 2016-10-04.
PMID: 27704531
Related Citations

Focused Information Criterion and Model Averaging with Generalized Rank Regression.
Authors: Zhang Q. , Duan X. , Ma S. .
Source: Statistics & probability letters, 2017 Mar; 122, p. 11-19.
EPub date: 2016-10-31.
PMID: 28566799
Related Citations

A penalized robust semiparametric approach for gene-environment interactions.
Authors: Wu C. , Shi X. , Cui Y. , Ma S. .
Source: Statistics in medicine, 2015-12-30; 34(30), p. 4016-30.
EPub date: 2015-08-03.
PMID: 26239060
Related Citations

Back to Top