Grant Details
Grant Number: |
1U01CA269264-01 Interpret this number |
Primary Investigator: |
Banerjee, Imon |
Organization: |
Mayo Clinic Arizona |
Project Title: |
Flexible Nlp Toolkit for Automatic Curation of Outcomes for Breast Cancer Patients |
Fiscal Year: |
2022 |
Abstract
Project summary/Abstract
Breast cancer has the largest number of new cases in world (11.7%). Although the prognosis of
breast cancer patients is generally favorable due to early detection and comprehensive treatment,
20%–30% of patients will still develop distant metastases and cases with progressive stage only
have a median two-year survival time. Breast cancer is widely recognized as a heterogeneous
disease in the sense of both primary tumor metastatic capacity and time to metastatic spread of
disease. High-quality population-based cancer surveillance data are needed to: (1) describe
cancer burden, patterns, and outcomes in order to (2) inform cancer prevention, detection and
control activities; and (3) evaluate interventions on the basis of past and future trends so that
optimal approaches to alleviate burden and suffering from cancer can be adopted. However, the
laborious manual curation process makes the population wise surveillance data collection
challenging. It has been shown in studies that a large percentage of total registry cost is devoted
to labor for data curation, even in the low-income countries. In this project, our mission is to build
a flexible NLP toolset that can be executed locally at the institution level and will curate the clinical
and patient-centered outcomes of breast cancer patients by parsing longitudinally acquired clinic
notes, radiology and pathology reports. In order to test the generalizability of the tools and to
initiate their deployment for data collection, we will partner with both Georgia SEER and California
state cancer registry and will curate the outcome data of past 10-years breast cancer patients
from two institutions across US representing diverse patient populations - Emory University
hospital (Georgia) and Stanford Medical Center (California). We will leverage the previously
developed tools and technologies and extend them to automatically curate the clinical and patient-
centered outcome data – recurrence date and site of recurrence, treatment administered, mental
and physical outcomes – from clinic notes and convert these into structured and query-able
format. The NLP tools will be dockerized and run locally at the hospital registry level for automated
outcome curation. Finally, the NLP extracted outcomes will be shared with State Cancer registry
for evaluation. From a methodological perspective, the framework and the open-source software
tools developed can be employed for cancer research beyond the scope of our project for curating
outcomes regardless of the problem domain.
Publications
A large language model-based generative natural language processing framework fine-tuned on clinical notes accurately extracts headache frequency from electronic health records.
Authors: Chiang C.C.
, Luo M.
, Dumkrieger G.
, Trivedi S.
, Chen Y.C.
, Chao C.J.
, Schwedt T.J.
, Sarker A.
, Banerjee I.
.
Source: Headache, 2024-03-25 00:00:00.0; , .
EPub date: 2024-03-25 00:00:00.0.
PMID: 38525734
Related Citations
Exposing Vulnerabilities in Clinical LLMs Through Data Poisoning Attacks: Case Study in Breast Cancer.
Authors: Das A.
, Tariq A.
, Batalini F.
, Dhara B.
, Banerjee I.
.
Source: Medrxiv : The Preprint Server For Health Sciences, 2024-03-21 00:00:00.0; , .
EPub date: 2024-03-21 00:00:00.0.
PMID: 38562849
Related Citations
Fusion Modeling: Combining Clinical and Imaging Data to Advance Cardiac Care.
Authors: van Assen M.
, Tariq A.
, Razavi A.C.
, Yang C.
, Banerjee I.
, De Cecco C.N.
.
Source: Circulation. Cardiovascular Imaging, 2023 Dec; 16(12), p. e014533.
EPub date: 2023-12-11 00:00:00.0.
PMID: 38073535
Related Citations
A Large Language Model-Based Generative Natural Language Processing Framework Finetuned on Clinical Notes Accurately Extracts Headache Frequency from Electronic Health Records.
Authors: Chiang C.C.
, Luo M.
, Dumkrieger G.
, Trivedi S.
, Chen Y.C.
, Chao C.J.
, Schwedt T.J.
, Sarker A.
, Banerjee I.
.
Source: Medrxiv : The Preprint Server For Health Sciences, 2023-10-03 00:00:00.0; , .
EPub date: 2023-10-03 00:00:00.0.
PMID: 37873417
Related Citations
Evolution of Breast Cancer Recurrence Risk Prediction: A Systematic Review of Statistical and Machine Learning-Based Models.
Authors: El Haji H.
, Souadka A.
, Patel B.N.
, Sbihi N.
, Ramasamy G.
, Patel B.K.
, Ghogho M.
, Banerjee I.
.
Source: Jco Clinical Cancer Informatics, 2023 Aug; 7, p. e2300049.
PMID: 37566789
Related Citations
Fusion of imaging and non-imaging data for disease trajectory prediction for coronavirus disease 2019 patients.
Authors: Tariq A.
, Tang S.
, Sakhi H.
, Celi L.A.
, Newsome J.M.
, Rubin D.L.
, Trivedi H.
, Gichoya J.W.
, Banerjee I.
.
Source: Journal Of Medical Imaging (bellingham, Wash.), 2023 May; 10(3), p. 034004.
EPub date: 2023-06-28 00:00:00.0.
PMID: 37388280
Related Citations
Graph convolutional network-based fusion model to predict risk of hospital acquired infections.
Authors: Tariq A.
, Lancaster L.
, Elugunti P.
, Siebeneck E.
, Noe K.
, Borah B.
, Moriarty J.
, Banerjee I.
, Patel B.N.
.
Source: Journal Of The American Medical Informatics Association : Jamia, 2023-04-07 00:00:00.0; , .
EPub date: 2023-04-07 00:00:00.0.
PMID: 37027831
Related Citations
Predicting 30-day all-cause hospital readmission using multimodal spatiotemporal graph neural networks.
Authors: Tang S.
, Tariq A.
, Dunnmon J.A.
, Sharma U.
, Elugunti P.
, Rubin D.L.
, Patel B.N.
, Banerjee I.
.
Source: Ieee Journal Of Biomedical And Health Informatics, 2023-01-13 00:00:00.0; PP, .
EPub date: 2023-01-13 00:00:00.0.
PMID: 37018684
Related Citations