Skip to main content
An official website of the United States government
Grant Details

Grant Number: 1R43CA080484-01A1 Interpret this number
Primary Investigator: Mehta, Cyrus
Organization: Cytel, Inc
Project Title: Software for Missing Data in Cancer Clinical Trials
Fiscal Year: 1999


The problem of missing covariate data is a common one in cancer clinical trials, as the numerical values of some prognostic factors are frequently lost or unrecorded on some of the study subjects. When confronted by missing covariate values, the major commercial software packages resort to a complete-case analysis , whereby a subject s entire record is dropped if even a single covariate value is missing. This approach is known to produce biased, inconsistent and inefficient estimates. A few packages utilize imputation techniques to fill in the missing values and then treat the resulting dataset as though it were complete. These approaches are ad hoc, and their statistical properties are not well understood, for they do not correspond to maximum likelihood estimates (MLE) (see Little and Rubin, 1987). The MLE approach is rightly considered the gold standard for the missing data problem. Firmly grounded in statistical theory, it produces consistent estimates with known distributional properties. The advent of the EM algorithm, combined with Markov-chain Monte Carlo sampling techniques, make it computationally feasible to perform MLE based inference on regression models with either categorical, continuous or mixed covariates. There exists no commercial software to take advantage of this methodological breakthrough. The goal of this SBIR proposal is to fill the gap with software that performs MLE based inference for generalized linear models, longitudinal models, and survival models in the presence of missing covariate data. The final software package will handle both ignorable and non-ignorable missingness. It will be available as external SAS procedures, as external S-plus functions, as an option within the new LogXact for Windows package and as an option within the new Egret for Windows package. PROPOSED COMMERCIAL APPLICATION Missing covariate data is a common property of biomedical data. In current commercial software an entire record is dropped even if it is missing data on only on covariate, thus leading to inconsistent and inefficient estimates for the regression model. There is a commercial opportunity to develop a friendly software package that incorporates the latest computational and methodological advances for fitting regression models by maximum likelihood methods in the presence of missing data.



Back to Top