Skip to main content
An official website of the United States government
Grant Details

Grant Number: 1R21CA269425-01 Interpret this number
Primary Investigator: Epstein, Mara
Organization: Univ Of Massachusetts Med Sch Worcester
Project Title: Identifying Recurrent Non-Hodgkin Lymphoma in Electronic Health Data
Fiscal Year: 2022


PROJECT SUMMARY/ABSTRACT Electronic health records (EHRs) represent a rich resource for the efficient study of outcomes for patients diagnosed with non-Hodgkin lymphoma (NHL). However, cancer recurrence is not required to be reported to cancer registries, and as a result, innovative algorithms based in EHR data are needed to validly identify this important patient outcome for population-based studies. This proposal aims to construct algorithms first using a rule-based approach informed by expert knowledge, and second using a data-driven machine learning approach to detect recurrence of two common histologic subtypes of NHL: the aggressive diffuse large B-cell lymphoma (DLBCL) and the more indolent follicular lymphoma (FL). Approximately 20-25% of DLBCL and 25- 35% of FL survivors will experience disease recurrence, yet modifiable risk factors for recurrence are largely unknown. Subtype-specific algorithms will be developed using longitudinally collected EHR data from two large healthcare systems serving demographically diverse populations, and who share a history of conducting collaborative cancer research. The long-term goal for this work is to apply the validated algorithms to additional healthcare systems with shared data infrastructure to establish a multi-site study of patients diagnosed with NHL to identify determinants of lymphoma outcomes and advance this understudied field of research. The proposed project will include 1,128 DLBCL and 519 FL cases aged 18 years and older at diagnosis (2000- 2018) with follow-up through 2021 from Henry Ford Health System (Detroit, MI) and the Meyers Primary Care Institute/Reliant Medical Group (Worcester, MA). Essential post-diagnosis data including detailed treatment history, tumor characteristics, and healthcare utilization will be compiled for all study participants, along with text-based clinical notes and reports. The proposed research aims to: 1) develop and evaluate rule-based algorithms integrating data from health claims, EHRs, and tumor registries, including specific treatment data and results from relevant procedures; and 2) adopt a machine learning approach integrated with natural language processing to improve algorithm performance. We will validate the algorithms for each NHL subtype against a gold-standard recurrence registry at HFHS and targeted EHR review at both study sites. The positive predictive value of each algorithm will be calculated. By successfully identifying patients with recurrent NHL in real-world electronic health data, we will take the first step towards identifying factors that increase a patient’s risk of recurrence. This cohort of patients with NHL will benefit from up to 21 years of clinical follow-up and detailed treatment data collected from standardized electronic data resources. Accurately capturing disease recurrence in patients with NHL through EHR data will facilitate the application of these algorithms to additional healthcare systems with shared data infrastructure, and allow for the efficient conduct of large-scale, population-based studies of critical, yet understudied, patient outcomes.



Back to Top