Skip to main content
Grant Details

Grant Number: 5R21CA152460-02 Interpret this number
Primary Investigator: Dobbin, Kevin
Organization: University Of Georgia
Project Title: Evaluation of Sample Sizes Used to Train Classifiers and Prognostic Predictors
Fiscal Year: 2012


DESCRIPTION (provided by applicant): The overall goal of this project is to produce methods that will improve the development of models for cancer prognosis and diagnosis. These improvements may expedite the translation of novel technologies towards clinically useful tools. Recent years have seen the development of many biological assays that measure hundreds or thousands of analytes in parallel. Examples include gene expression microarrays, microRNA assays, sequencing assays and SNP chips. Two common objectives of these studies are 1) to develop prognostic predictors of cancer patient survival or recurrence outcome, and 2) to develop classifiers that may be useful in patient treatment selection. Development of a prognostic predictor or classifier requires a training set, which is a collection of samples used to formulate the prognostic prediction or classification rule. This R21 project will develop methods for establishing the sample size required to train prognostic predictors and classifiers in high dimensional settings. Critical to evaluation of the methods will be assessment of the training performance on large datasets. The methods will be validated on microarray datasets because this high dimensional technology is relatively well-studied and there are publicly available cancer microarray datasets with required clinical data. The specific aims of this proposal are therefore to 1) develop novel methods for sample size estimation in high dimensional training studies, 2) develop novel methods for removing batch effects from high dimensional datasets, 3) validate the training sample size methodology on large agglomerated datasets that used the same microarray platform and studied similar patient populations. Long term objective: It is foreseen that this R21 will develop into a suite of sample size methods for the design of studies to train high and medium dimensional classifiers and prognostic predictors. While the application in this R21 focuses on microarray data, expansion of the sample size and batch effect elimination methods to other technologies is foreseen as an important future direction of this research.


Proportional Hazards Model with Covariate Measurement Error and Instrumental Variables.
Authors: Song X. , Wang C.Y. .
Source: Journal of the American Statistical Association, 2014-12-01; 109(504), p. 1636-1646.
PMID: 25663724
Related Citations

Covariance adjustment for batch effect in gene expression data.
Authors: Lee J.A. , Dobbin K.K. , Ahn J. .
Source: Statistics in medicine, 2014-07-10; 33(15), p. 2681-95.
EPub date: 2014-03-28.
PMID: 24687561
Related Citations

Sample size requirements for training high-dimensional risk predictors.
Authors: Dobbin K.K. , Song X. .
Source: Biostatistics (Oxford, England), 2013 Sep; 14(4), p. 639-52.
EPub date: 2013-07-19.
PMID: 23873895
Related Citations

Nonparametric receiver operating characteristic-based evaluation for survival outcomes.
Authors: Song X. , Zhou X.H. , Ma S. .
Source: Statistics in medicine, 2012-10-15; 31(23), p. 2660-75.
PMID: 22987578
Related Citations

Back to Top