Skip to main content

Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted.

The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov.

Updates regarding government operating status and resumption of normal operations can be found at opm.gov.

An official website of the United States government
Grant Details

Grant Number: 1R03CA252782-01 Interpret this number
Primary Investigator: Li, Yan
Organization: Univ Of Maryland, College Park
Project Title: Improving Population Representativeness of the Inference From Non-Probability Sample Analysis
Fiscal Year: 2020


Abstract

SUMMARY The critical role of population-representativeness for estimating disease incidence and prevalence has been widely accepted in epidemiologic studies. Improving population representativeness of nonprobability samples, such as samples of volunteers in epidemiologic studies or electronic health records, however, has received little attention by biostatisticians or epidemiologists. In this project, we propose two innovative “pseudoweight” construction methods: 1) two-step matching, and 2) calibration, under an adapted exchangeability assumption, for unbiased estimation of disease incidence and prevalence in the target population. The proposed methods, combined with machine learning methods for propensity score estimation, will achieve significant bias reduction, especially when selection into nonprobability samples is driven by complex relationships between the covariates. We will quantify the bias reduced by the proposed “pseudoweights”, numerically and empirically, on the estimation of disease incidence and prevalence in the target population. Monte Carlo simulation studies are designed under varying degrees of departure from the adapted exchangeability assumption to evaluate the bias of the proposed estimates. The robustness of the proposed estimators against varying sample sizes, number of clusters in survey, and complexities of the true propensity score modeling will be investigated in scenarios that differ by levels of non-linearity, non-additivity and correlations between covariates in the true propensity model. Using data from National Institutes of Health and the American Association of Retired Persons (NIH-AARP, a nonprobability cohort sample) data and the US National Health Interview Survey (NHIS, a probability survey sample), the proposed methods will be applied to estimate the prevalence of self-reported diseases and all-cause or all-cancer mortality rates for people aged 50-71 in the US. To test our methods, we will purposely select outcome variables that are available in both the NIH-AARP and the NHIS. Thus, the amount of bias in NIH-AARP estimates corrected by the proposed pseudoweights can be quantified in practice, assuming the weighted NHIS estimate is true. The proposed methods, although motivated by the volunteer-based epidemiological studies, have wide applications outside of epidemiology, such as electronic health records or web surveys. The results from this project can be used by epidemiologists and health policy makers to improve the understanding of the health-related characteristics in the general population. Computer software that implements the proposed methods will be made available for public use.



Publications

Error Notice

The database may currently be offline for maintenance and should be operational soon. If not, we have been notified of this error and will be reviewing it shortly.

We apologize for the inconvenience.
- The DCCPS Team.

Back to Top