Skip to main content

COVID-19 Resources

What people with cancer should know:

Guidance for cancer researchers:

Get the latest public health information from CDC:

Get the latest research information from NIH:

Grant Details

Grant Number: 5R03CA176760-02 Interpret this number
Primary Investigator: Sinha, Samiran
Organization: Texas A&M University
Project Title: Innovative Approaches for Analyzing Seer Breast Cancer Data
Fiscal Year: 2014


DESCRIPTION (provided by applicant): The Surveillance, Epidemiology and End Results (SEER) Program is a premier source for cancer statistics in the United States. Proper and efficient use of the available resources from the SEER program is of public and national interest. Therefore, we propose innovative methods for estimating 5-year survival probability, identifying important predictors for survival, and estimating the effect of predictor variables on the survival time of cancer patients using the SEER data. In particular, we consider breast cancer survival data as it is the most common type of cancer among women. Modeling survival time in terms of several disease characteristics and demographic factors is challenging due to the censored nature of the data and the presence of many parameters (high- dimensional problem). In Aim A, we consider an accelerated failure time (AFT) type model, and propose a nonparametric Bayesian solution to this problem. The solution involves modeling mean in terms of many parameters corresponding to the disease characteristics and demographic fac- tors, and modeling variance as a smooth nonparametric function of the mean. The nonparametric error distribution of the AFT model is handled via a constrained Dirichlet process prior. A variable selection technique is adopted to reduce the effective dimension of the problem as the mean involves a large number of parameters. The main innovation is treating the AFT model from such a real and general perspective which no one has done it before. Many of the disease characteristics in the SEER database contain significant proportion of missing values. Ignoring the subjects accompanied with missing values in any disease characteristic may distort the conclusion, and would definitely reduce the power to detect a potential association between the survival time and predictor variables. In Aim B we propose a semiparametric method of handling a missing predictor variable in the linear transformation model, a semiparametic model which contains the proportional hazard and the proportional odds model as two special cases. The main innovation of this part is how we handle missing data, and make inference about a finite dimensional parameter in the presence of an infinite-dimensional parameter. Finally, our proposed methods permit a useful and accurate interpretation of results of the analysis from modern epidemiological perspective. Our models are broad, and we seek a distribution- free procedure to estimate the model parameters either in the presence of many predictors or in the presence of a missing predictor.


Bayesian variable selection in the accelerated failure time model with an application to the surveillance, epidemiology, and end results breast cancer data.
Authors: Zhang Z. , Sinha S. , Maiti T. , Shipp E. .
Source: Statistical methods in medical research, 2018 04; 27(4), p. 971-990.
EPub date: 2016-07-20.
PMID: 28034170
Related Citations

Frequentist Standard Errors of Bayes Estimators.
Authors: Lee D. , Carroll R.J. , Sinha S. .
Source: Computational statistics, 2017 Sep; 32(3), p. 867-888.
EPub date: 2017-01-30.
PMID: 28943721
Related Citations

Functional Mixed Effects Model for Small Area Estimation.
Authors: Maiti T. , Sinha S. , Zhong P.S. .
Source: Scandinavian journal of statistics, theory and applications, 2016 Sep; 43(3), p. 886-903.
EPub date: 2016-03-15.
PMID: 27795610
Related Citations

Semiparametric approach for non-monotone missing covariates in a parametric regression model.
Authors: Sinha S. , Saha K.K. , Wang S. .
Source: Biometrics, 2014 Jun; 70(2), p. 299-311.
EPub date: 2014-02-26.
PMID: 24571224
Related Citations

Semiparametric analysis of linear transformation models with covariate measurement errors.
Authors: Sinha S. , Ma Y. .
Source: Biometrics, 2014 Mar; 70(1), p. 21-32.
EPub date: 2013-12-18.
PMID: 24350758
Related Citations

Analysis of Multivariate Disease Classification Data in the Presence of Partially Missing Disease Traits.
Authors: Miao J. , Sinha S. , Wang S. , Diver W.R. , Gapstur S.M. .
Source: Journal of biometrics & biostatistics, 2014; 5, .
PMID: 25530913
Related Citations

Back to Top