||2R01CA204120-05A1 Interpret this number
||Novel Methods for Identifying Genetic Interactions for Cancer Prognosis
For the prognosis of melanoma, lung cancer, and many other cancers, G-E (gene-environment) interactions
have important implications. Through a series of studies, our group has taken a unique robustness perspective
and a leading role in developing the foundation of G-E interaction analysis using cutting-edge high-dimensional
and regularized statistics. Recently, our group pioneered I-E (histopathological imaging-environment) interaction
analysis and significantly expanded the scope of cancer analytics. We have made important discoveries for NHL,
melanoma, and lung cancer, impactfully advancing their translational research and clinical practice.
Our overarching goal is to construct more powerful prognosis models and more accurately identify G-E/I-
E interactions so as to truthfully describe cancer biology and informatively guide clinical decision-making. In this
project, we will be the first to develop paradigm-shifting SDL (statistically principled deep learning) techniques
tailored to G-E/I-E interaction analysis for cancer prognosis. The proposed methods will inherit strengths from
the existing deep learning and regression techniques and be superior to both. We will continue analyzing data
on melanoma and lung cancer, further enhancing the high translational and clinical impact of our study.
We will: (Aim 1) Develop foundational SDL techniques tailored to G-E/I-E interaction analysis. We will
first develop “benchmark” nonrobust losses and then innovatively advance to losses that are robust to model
mis-specification and long-tailed distribution/contamination. A novel penalization technique will be applied for
architecture construction, which will accommodate the unique characteristics of the main G/I effects, main E
effects, and their interactions in a customized manner, screen out noises, and respect the “main effects,
interactions” hierarchy. (Aim 2) Boost performance by incorporating additional information. We will cost-
effectively improve SDL performance by incorporating additional information on (a) the interconnections between
prognosis and G-E/I-E interactions as well as main G/I effects, and (b) the interconnections among G/I variables.
(Aim 3) Expand analysis scope and integrate multiple types of G/I measurements. Motivated by their overlapping
but also independent information for prognosis, we will develop novel SDL methods and be the first to integrate
multiple types of molecular and imaging measurements in interaction analysis. (Aim 4) Analyze the Yale SPORE
and TCGA data on melanoma and lung cancer. Analysis will be conducted on multiple prognosis outcomes.
Demographic/clinical/environmental risk factors, multiple types of molecular measurements (protein, gene
expression, mutation, methylation, and microRNA), and histopathological imaging features will be analyzed. The
analysis results will be thoroughly and rigorously evaluated, extensively compared to those using alternatives,
and validated in multiple ways.