Skip to main content

COVID-19 Resources

What people with cancer should know:

Guidance for cancer researchers:

Get the latest public health information from CDC:

Get the latest research information from NIH:

Grant Details

Grant Number: 5R01CA085848-15 Interpret this number
Primary Investigator: Davidian, Marie
Organization: North Carolina State University Raleigh
Project Title: Flexible Statistical Methods for Biomedical Data
Fiscal Year: 2015


DESCRIPTION (provided by applicant): An ongoing challenge in health sciences research is the development of statistical models used to study relationships between subject characteristics and interventions and disease onset, recurrence, and progression and other health outcomes. This enterprise has become considerably more complex as new technologies, the quest to discover new biomarkers, and improved resources for handling vast repositories of data have led to the collection of high-dimensional information, and there has been extensive research on formal methods for identifying important prognostic variables to include in a model to be used, e.g., to assess population risk. The objective of the first two specific aims of this renewal application is to develop new methods for such variable selection in model-building. Many key health status variables collected in studies of chronic disease, e.g., blood pressure or serum biomarker levels, are imprecise measurements of a "true" quantity relevant to understanding risk, such as long-term blood pressure. The first aim is to develop methods for variable selection when some such covariates are subject to such measurement error. Linear and generalized linear models for independent data and their mixed-effects counterparts for longitudinal and other clustered data are widely used, but these parametric models may not be sufficiently flexible to approximate the complex relationships involved. The second aim is to extend advances made in our previous project period toward new methods for simultaneous parameter estimation and variable selection in more flexible semiparametric such models to develop new techniques that allow for arbitrary numbers of both parametric and nonparametric covariate effects, general outcome variables (e.g., continuous, binary), and adaptive identification of such effects. A key objective in many stud- ies is to elucidate the association between features of longitudinal profiles of biomarkers or other continuous measures and a primary health outcome using so-called joint models. Standard joint models represent the subject-specific profiles via a mixed-effects model, e.g., as straight lines with random subject-specific intercepts and slopes, whose random parameters are included as covariates in a model for the primary outcome. In some settings, interest may focus on the association between outcome and not only features such as slopes but also intra-subject variation in the longitudinal measure. The third aim is to develop new methods for joint models involving both random intra-subject mean and variance parameters, exploiting techniques developed in the previous project period. Many longitudinal measures are censored due to limits of quantification of the assay used in their determination, and longitudinal analysis must take this into appropriate account. Our fourth aim focuses on development of new methods for mixed-effects models that address not only this issue but draw on work in previous project periods to relax the usual normality assumption on random effects and yield an estimate of their density, providing the analyst with a tool for exploring underlying features of the population.