||5R21CA242940-02 Interpret this number
||Harvard School Of Public Health
||Semi-Supervised Algorithms for Risk Assessment with Noisy Ehr Data
Large electronic health record research (EHR) data integrated with -omics data from linked biorepositories
have expanded opportunities for precision medicine research. These integrated datasets open opportunities for
developing accurate EHR-based personalized cancer risk and progression prediction models, which can be
easily incorporated into clinical practice and ultimately realize the promise of precision oncology. However,
efficiently and effectively using EHR for cancer research remains challenging due to practical and
methodological obstacles. For example, obtaining precise event time information such as time of cancer
recurrence is a major bottleneck in using EHR for precision medicine research due to the requirement of
laborious medical record review and the lack of documentation. Simple estimates of the event time based on
billing or procedure codes may poorly approximate the true event time. Naive use of such estimated event
times can lead to highly biased estimates due to the approximation error. Such biases impose challenges to
performing pragmatic trials when the study endpoint is time to events and captured using EHR. The overall
goal of this proposal is to fill these methodological gaps in risk assessment for cancer research using EHR
data, which will advance our ability to achieve the promise of precision oncology. Statistical algorithms and
software will be developed to (i) automatically assign event time information using longitudinally recorded EHR
information; and (ii) to perform accurate risk assessment using noisy proxies of event times. The proposed
tools for risk assessment using imperfect EHR data without requiring extensive manual chart review could
greatly improve the utility of EHR for oncology research.
Developing and evaluating risk prediction models with panel current status data.
, Wang X.
, Jazić I.
, Peskoe S.
, Zheng Y.
, Cai T.
Biometrics, 2021 06; 77(2), p. 599-609.
Robust and efficient semi-supervised estimation of average treatment effects with application to electronic health records data.
, Ananthakrishnan A.N.
, Cai T.
Biometrics, 2021 06; 77(2), p. 413-423.
sureLDA: A multidisease automated phenotyping method for the electronic health record.
, Zhou D.
, He Z.
, Sun J.
, Castro V.M.
, Gainer V.
, Murphy S.N.
, Hong C.
, Cai T.
Journal of the American Medical Informatics Association : JAMIA, 2020-08-01; 27(8), p. 1235-1243.