Skip to main content
An official website of the United States government
Grant Details

Grant Number: 1R01CA158113-01 Interpret this number
Primary Investigator: Johnson, Valen
Organization: University Of Tx Md Anderson Can Ctr
Project Title: Consistent Model Selection in the P>>n Setting
Fiscal Year: 2011


Abstract

DESCRIPTION (provided by applicant): Among the most fundamental and commonly encountered statistical problems in medical research is the problem of model selection. Model selection is the process by which researchers identify the relationships between measured quantities; thus it plays a central role in the analysis of essentially all high-throughput screening data. Model selection procedures represent the primary analytical mechanism through which the associations between diseases and large numbers of biochemical, genetic and pharmacological variables are discovered. The fundamental hypothesis tested in this application is that a new class of model selection procedures can be used to effectively identify associations between biological variables and disease outcomes, even in settings where there are many more potential biological correlates than there are observations on each variable. The goals of this project are to develop these variable selection procedures so that they can be applied to high-throughput screening data, and to apply the resulting methodology in three important application areas. To achieve these goals, the following specific aims will be addressed. Known theoretical properties of the proposed model selection procedures will be extended to cases in which there are many more biological measurements available than there are observations on each measurement (i.e., p n setting). Constraints on the number of variables that can be included in final models for outcome variables will be determined, and efficient numerical algorithms will be developed so that these methods can be applied to actual high-throughput screening data. The new model selection procedures will be used to define binary classification algorithms that can predict clinical outcomes from high-dimensional gene expression data sets. The new model selection procedures will be used to identify and analyze interactions between genes that are associated with cancer and other diseases in genome-wide association studies using single-nucleotide polymorphism data. The new model selection procedures will be used to analyze biological pathways as informed by high- throughput molecular interrogation data. The algorithms developed during this project constitute a major innovation in the field of model selection and will provide medical researchers with a new and unique set of tools for effectively identifying biological associations among biomarkers, disease attributes, and patient outcomes from high-throughput screening data. PUBLIC HEALTH RELEVANCE: Model selection procedures are statistical techniques that allow researchers to discover the associations between disease and the large number of variables that are measured in emerging high-throughput screening technologies. For example, model selection techniques are used to discover which genes are associated with particular forms of cancer. This project proposes a new class of model selection procedures that will make it easier for researchers to discover such associations.



Publications

Bayes factor functions for reporting outcomes of hypothesis tests.
Authors: Johnson V.E. , Pramanik S. , Shudde R. .
Source: Proceedings Of The National Academy Of Sciences Of The United States Of America, 2023-02-21 00:00:00.0; 120(8), p. e2217331120.
EPub date: 2023-02-13 00:00:00.0.
PMID: 36780516
Related Citations

Efficient alternatives for Bayesian hypothesis tests in psychology.
Authors: Pramanik S. , Johnson V.E. .
Source: Psychological Methods, 2022-04-14 00:00:00.0; , .
EPub date: 2022-04-14 00:00:00.0.
PMID: 35420854
Related Citations

Bayesian Edge Regression in Undirected Graphical Models to Characterize Interpatient Heterogeneity in Cancer.
Authors: Wang Z. , Kaseb A.O. , Amin H.M. , Hassan M.M. , Wang W. , Morris J.S. .
Source: Journal Of The American Statistical Association, 2022; 117(538), p. 533-546.
EPub date: 2022-01-05 00:00:00.0.
PMID: 36090952
Related Citations

A Hyperparameter-Free, Fast and Efficient Framework to Detect Clusters From Limited Samples Based on Ultra High-Dimensional Features.
Authors: Rahman S. , Johnson V.E. , Rao S.S. .
Source: Ieee Access : Practical Innovations, Open Solutions, 2022; 10, p. 116844-116857.
EPub date: 2022-11-01 00:00:00.0.
PMID: 37275750
Related Citations

Single-cell ATAC and RNA sequencing reveal pre-existing and persistent cells associated with prostate cancer relapse.
Authors: Taavitsainen S. , Engedal N. , Cao S. , Handle F. , Erickson A. , Prekovic S. , Wetterskog D. , Tolonen T. , Vuorinen E.M. , Kiviaho A. , et al. .
Source: Nature Communications, 2021-09-06 00:00:00.0; 12(1), p. 5307.
EPub date: 2021-09-06 00:00:00.0.
PMID: 34489465
Related Citations

A Modified Sequential Probability Ratio Test.
Authors: Pramanik S. , Johnson V.E. , Bhattacharya A. .
Source: Journal Of Mathematical Psychology, 2021 Apr; 101, .
EPub date: 2021-03-04 00:00:00.0.
PMID: 35496657
Related Citations

On the Existence of Uniformly Most Powerful Bayesian Tests With Application to Non-Central Chi-Squared Tests.
Authors: Nikooienejad A. , Johnson V.E. .
Source: Bayesian Analysis, 2021 Mar; 16(1), p. 93-109.
EPub date: 2020-01-07 00:00:00.0.
PMID: 34113418
Related Citations

A pedigree-based prediction model identifies carriers of deleterious de novo mutations in families with Li-Fraumeni syndrome.
Authors: Gao F. , Pan X. , Dodd-Eaton E.B. , Recio C.V. , Montierth M.D. , Bojadzieva J. , Mai P.L. , Zelley K. , Johnson V.E. , Braun D. , et al. .
Source: Genome Research, 2020 Aug; 30(8), p. 1170-1180.
EPub date: 2020-08-18 00:00:00.0.
PMID: 32817165
Related Citations

BAYESIAN VARIABLE SELECTION FOR SURVIVAL DATA USING INVERSE MOMENT PRIORS.
Authors: Nikooienejad A. , Wang W. , Johnson V.E. .
Source: The Annals Of Applied Statistics, 2020 Jun; 14(2), p. 809-828.
EPub date: 2020-06-29 00:00:00.0.
PMID: 33456641
Related Citations

Penetrance Estimates Over Time to First and Second Primary Cancer Diagnosis in Families with Li-Fraumeni Syndrome: A Single Institution Perspective.
Authors: Shin S.J. , Dodd-Eaton E.B. , Gao F. , Bojadzieva J. , Chen J. , Kong X. , Amos C.I. , Ning J. , Strong L.C. , Wang W. .
Source: Cancer Research, 2020-01-15 00:00:00.0; 80(2), p. 347-353.
EPub date: 2019-11-12 00:00:00.0.
PMID: 31719099
Related Citations

Penetrance of Different Cancer Types in Families with Li-Fraumeni Syndrome: A Validation Study Using Multicenter Cohorts.
Authors: Shin S.J. , Dodd-Eaton E.B. , Peng G. , Bojadzieva J. , Chen J. , Amos C.I. , Frone M.N. , Khincha P.P. , Mai P.L. , Savage S.A. , et al. .
Source: Cancer Research, 2020-01-15 00:00:00.0; 80(2), p. 354-360.
EPub date: 2019-11-12 00:00:00.0.
PMID: 31719101
Related Citations

Functional Horseshoe Priors for Subspace Shrinkage.
Authors: Shin M. , Bhattachrya A. , Johnson V.E. .
Source: Journal Of The American Statistical Association, 2020; 115(532), p. 1784-1797.
EPub date: 2019-09-17 00:00:00.0.
PMID: 33716358
Related Citations

Transformed low-rank ANOVA models for high-dimensional variable selection.
Authors: Jung Y. , Zhang H. , Hu J. .
Source: Statistical Methods In Medical Research, 2019 04; 28(4), p. 1230-1246.
EPub date: 2018-01-31 00:00:00.0.
PMID: 29384042
Related Citations

GWASinlps: non-local prior based iterative SNP selection tool for genome-wide association studies.
Authors: Sanyal N. , Lo M.T. , Kauppi K. , Djurovic S. , Andreassen O.A. , Johnson V.E. , Chen C.H. .
Source: Bioinformatics (oxford, England), 2019-01-01 00:00:00.0; 35(1), p. 1-11.
PMID: 29931045
Related Citations

statistics.
Authors: Johnson V.E. .
Source: The American Statistician, 2019; 73(Suppl 1), p. 129-134.
EPub date: 2019-03-20 00:00:00.0.
PMID: 31123367
Related Citations

Transcriptome Deconvolution of Heterogeneous Tumor Samples with Immune Infiltration.
Authors: Wang Z. , Cao S. , Morris J.S. , Ahn J. , Liu R. , Tyekucheva S. , Gao F. , Li B. , Lu W. , Tang X. , et al. .
Source: Iscience, 2018-11-30 00:00:00.0; 9, p. 451-460.
EPub date: 2018-11-02 00:00:00.0.
PMID: 30469014
Related Citations

Scalable Bayesian Variable Selection Using Nonlocal Prior Densities in Ultrahigh-dimensional Settings.
Authors: Shin M. , Bhattacharya A. , Johnson V.E. .
Source: Statistica Sinica, 2018 Apr; 28(2), p. 1053-1078.
PMID: 29643721
Related Citations

Tractable Bayesian variable selection: beyond normality.
Authors: Rossell D. , Rubio F.J. .
Source: Journal Of The American Statistical Association, 2018; 113(524), p. 1742-1758.
EPub date: 2018-06-28 00:00:00.0.
PMID: 30906086
Related Citations

Bayesian block-diagonal variable selection and model averaging.
Authors: Papaspiliopoulos O. , Rossell D. .
Source: Biometrika, 2017 Jun; 104(2), p. 343-359.
EPub date: 2017-04-24 00:00:00.0.
PMID: 29861501
Related Citations

On the Reproducibility of Psychological Science.
Authors: Johnson V.E. , Payne R.D. , Wang T. , Asher A. , Mandal S. .
Source: Journal Of The American Statistical Association, 2017; 112(517), p. 1-10.
EPub date: 2016-10-07 00:00:00.0.
PMID: 29861517
Related Citations

NON-LOCAL PRIORS FOR HIGH-DIMENSIONAL ESTIMATION.
Authors: Rossell D. , Telesca D. .
Source: Journal Of The American Statistical Association, 2017; 112(517), p. 254-265.
EPub date: 2017-05-03 00:00:00.0.
PMID: 29881129
Related Citations

Bayesian variable selection for binary outcomes in high-dimensional genomic studies using non-local priors.
Authors: Nikooienejad A. , Wang W. , Johnson V.E. .
Source: Bioinformatics (oxford, England), 2016-05-01 00:00:00.0; 32(9), p. 1338-45.
EPub date: 2016-05-01 00:00:00.0.
PMID: 26740524
Related Citations

A robust Bayesian dose-finding design for phase I/II clinical trials.
Authors: Liu S. , Johnson V.E. .
Source: Biostatistics (oxford, England), 2016 Apr; 17(2), p. 249-63.
PMID: 26486139
Related Citations

Designing alternative splicing RNA-seq studies. Beyond generic guidelines.
Authors: Stephan-Otto Attolini C. , Peña V. , Rossell D. .
Source: Bioinformatics (oxford, England), 2015-11-15 00:00:00.0; 31(22), p. 3631-7.
EPub date: 2015-11-15 00:00:00.0.
PMID: 26220961
Related Citations

Predictive classification of correlated targets with application to detection of metastatic cancer using functional CT imaging.
Authors: Wang Y. , Hobbs B.P. , Hu J. , Ng C.S. , Do K.A. .
Source: Biometrics, 2015 Sep; 71(3), p. 792-802.
PMID: 25851056
Related Citations

A Unified Family of Covariate-Adjusted Response-Adaptive Designs Based on Efficiency and Ethics.
Authors: Hu J. , Zhu H. , Hu F. .
Source: Journal Of The American Statistical Association, 2015-04-22 00:00:00.0; 110(509), p. 357-367.
PMID: 26120220
Related Citations

Detecting differential patterns of interaction in molecular pathways.
Authors: Yajima M. , Telesca D. , Ji Y. , Müller P. .
Source: Biostatistics (oxford, England), 2015 Apr; 16(2), p. 240-51.
PMID: 25519431
Related Citations

Estimating and Identifying Unspecified Correlation Structure for Longitudinal Data.
Authors: Hu J. , Wang P. , Qu A. .
Source: Journal Of Computational And Graphical Statistics : A Joint Publication Of American Statistical Association, Institute Of Mathematical Statistics, Interface Foundation Of North America, 2015-04-01 00:00:00.0; 24(2), p. 455-476.
PMID: 26361433
Related Citations

A K-fold Averaging Cross-validation Procedure.
Authors: Jung Y. , Hu J. .
Source: Journal Of Nonparametric Statistics, 2015; 27(2), p. 167-179.
PMID: 27630515
Related Citations

BIG DATA AND STATISTICS: A STATISTICIAN'S PERSPECTIVE.
Authors: Rossell D. .
Source: Metode Science Studies Journal : Annual Review, 2015; 5, p. 143-149.
PMID: 27722040
Related Citations

Biomarker Detection in Association Studies: Modeling SNPs Simultaneously via Logistic ANOVA.
Authors: Jung Y. , Huang J.Z. , Hu J. .
Source: Journal Of The American Statistical Association, 2014-12-01 00:00:00.0; 109(508), p. 1355-1367.
PMID: 25642005
Related Citations

Evaluation of image registration spatial accuracy using a Bayesian hierarchical model.
Authors: Liu S. , Yuan Y. , Castillo R. , Guerrero T. , Johnson V.E. .
Source: Biometrics, 2014 Jun; 70(2), p. 366-77.
PMID: 24575781
Related Citations

QUANTIFYING ALTERNATIVE SPLICING FROM PAIRED-END RNA-SEQUENCING DATA.
Authors: Rossell D. , Stephan-Otto Attolini C. , Kroiss M. , Stöcker A. .
Source: The Annals Of Applied Statistics, 2014 Mar; 8(1), p. 309-330.
PMID: 24795787
Related Citations

On Numerical Aspects of Bayesian Model Selection in High and Ultrahigh-dimensional Settings.
Authors: Johnson V.E. .
Source: Bayesian Analysis, 2013-12-01 00:00:00.0; 8(4), p. 741-758.
PMID: 24683431
Related Citations

Revised standards for statistical evidence.
Authors: Johnson V.E. .
Source: Proceedings Of The National Academy Of Sciences Of The United States Of America, 2013-11-26 00:00:00.0; 110(48), p. 19313-7.
EPub date: 2013-11-26 00:00:00.0.
PMID: 24218581
Related Citations

Bayesian adaptive phase II screening design for combination trials.
Authors: Cai C. , Yuan Y. , Johnson V.E. .
Source: Clinical Trials (london, England), 2013; 10(3), p. 353-62.
PMID: 23359875
Related Citations

UNIFORMLY MOST POWERFUL BAYESIAN TESTS.
Authors: Johnson V.E. .
Source: Annals Of Statistics, 2013; 41(4), p. 1716-1741.
PMID: 24659829
Related Citations

Reno: regularized non-parametric analysis of protein lysate array data.
Authors: Li B. , Liang F. , Hu J. , He A.X. .
Source: Bioinformatics (oxford, England), 2012-05-01 00:00:00.0; 28(9), p. 1223-9.
EPub date: 2012-05-01 00:00:00.0.
PMID: 22467912
Related Citations

Goodness-of-fit diagnostics for Bayesian hierarchical models.
Authors: Yuan Y. , Johnson V.E. .
Source: Biometrics, 2012 Mar; 68(1), p. 156-64.
PMID: 22050079
Related Citations

Bayesian Model Selection in High-Dimensional Settings.
Authors: Johnson V.E. , Rossell D. .
Source: Journal Of The American Statistical Association, 2012; 107(498), .
PMID: 24363474
Related Citations



Back to Top