Grant Details
Grant Number: |
1R21CA269425-01 Interpret this number |
Primary Investigator: |
Epstein, Mara |
Organization: |
Univ Of Massachusetts Med Sch Worcester |
Project Title: |
Identifying Recurrent Non-Hodgkin Lymphoma in Electronic Health Data |
Fiscal Year: |
2022 |
Abstract
PROJECT SUMMARY/ABSTRACT
Electronic health records (EHRs) represent a rich resource for the efficient study of outcomes for patients
diagnosed with non-Hodgkin lymphoma (NHL). However, cancer recurrence is not required to be reported to
cancer registries, and as a result, innovative algorithms based in EHR data are needed to validly identify this
important patient outcome for population-based studies. This proposal aims to construct algorithms first using a
rule-based approach informed by expert knowledge, and second using a data-driven machine learning
approach to detect recurrence of two common histologic subtypes of NHL: the aggressive diffuse large B-cell
lymphoma (DLBCL) and the more indolent follicular lymphoma (FL). Approximately 20-25% of DLBCL and 25-
35% of FL survivors will experience disease recurrence, yet modifiable risk factors for recurrence are largely
unknown. Subtype-specific algorithms will be developed using longitudinally collected EHR data from two large
healthcare systems serving demographically diverse populations, and who share a history of conducting
collaborative cancer research. The long-term goal for this work is to apply the validated algorithms to additional
healthcare systems with shared data infrastructure to establish a multi-site study of patients diagnosed with
NHL to identify determinants of lymphoma outcomes and advance this understudied field of research.
The proposed project will include 1,128 DLBCL and 519 FL cases aged 18 years and older at diagnosis (2000-
2018) with follow-up through 2021 from Henry Ford Health System (Detroit, MI) and the Meyers Primary Care
Institute/Reliant Medical Group (Worcester, MA). Essential post-diagnosis data including detailed treatment
history, tumor characteristics, and healthcare utilization will be compiled for all study participants, along with
text-based clinical notes and reports. The proposed research aims to: 1) develop and evaluate rule-based
algorithms integrating data from health claims, EHRs, and tumor registries, including specific treatment data
and results from relevant procedures; and 2) adopt a machine learning approach integrated with natural
language processing to improve algorithm performance. We will validate the algorithms for each NHL subtype
against a gold-standard recurrence registry at HFHS and targeted EHR review at both study sites. The positive
predictive value of each algorithm will be calculated.
By successfully identifying patients with recurrent NHL in real-world electronic health data, we will take the first
step towards identifying factors that increase a patient’s risk of recurrence. This cohort of patients with NHL will
benefit from up to 21 years of clinical follow-up and detailed treatment data collected from standardized
electronic data resources. Accurately capturing disease recurrence in patients with NHL through EHR data will
facilitate the application of these algorithms to additional healthcare systems with shared data infrastructure,
and allow for the efficient conduct of large-scale, population-based studies of critical, yet understudied, patient
outcomes.
Publications
None