Skip to main content
An official website of the United States government
Grant Details

Grant Number: 1R21CA258242-01 Interpret this number
Primary Investigator: Yetisgen, Meliha
Organization: University Of Washington
Project Title: Extraction of Symptom Burden From Clinical Narratives of Cancer Patients Using Natural Language Processing
Fiscal Year: 2021


Project Summary / Abstract Cancer patients frequently experience high levels of pain, tiredness, shortness of breath, decreased appetite, nausea, drowsiness, anxiety, and decreased sense of wellbeing, often related to the disease itself, its treatments, or both. This high symptom burden leads to significant impairment of cancer patients’ quality of life and may be associated with impaired survival. Optimal symptom management is required to minimize symptom burden and maximize quality of life for cancer patients throughout the course of their disease. Supportive care in cancer (SCC) teams are multidisciplinary teams that are focused on the prevention and management of the adverse effects of cancer and its treatments across the continuum of the cancer experience from diagnosis through treatment and beyond. These teams typically lack the resources to see all cancer patients and need to prioritize patients with the highest need, often relying on oncology physicians for referral. However, oncology physicians are often too focused on curing cancer than treating its symptoms. As a result, SCC services are often accessed by chance even when available, often later in the cancer trajectory. To improve recognition of SCC needs and to identify the symptom burden of cancer patients for better management and care, we propose to build natural language processing (NLP) approaches that can automatically extract symptom information from unstructured narratives. The proposed systems will utilize neural nets and build on the state of the art information extraction methods. To accomplish our goals, we will create a dataset of clinical notes for a large cohort of prostate cancer and Diffuse Large B Cell Lymphoma (DLBCL) patients treated in Seattle Cancer Care Alliance (SCCA) and Huntsman Cancer Institute (HCI) between 1.1.2015 and 1.1.2020. We focus on these two types of cancer as examples of two very different and prevalent cancer types. We propose to represent symptom burden documented in clinical narratives with a generalizable frame representation that captures fine-grained details including presence/absence, change-of- state, severity, characteristics, duration, frequency, and anatomy information related to patient symptoms. We will use active learning to create a diverse and representative gold standard annotated with symptom frames to train and test the proposed neural-based NLP approaches. All models and their implementations produced during the execution of this project will be shared with the community as open source resources. After successful completion of the project, the developed NLP methods will be integrated into the information access methods of SCCA and HCI clinical repositories.


Leveraging natural language processing to augment structured social determinants of health data in the electronic health record.
Authors: Lybarger K. , Dobbins N.J. , Long R. , Singh A. , Wedgeworth P. , Uzuner Ă–. , Yetisgen M. .
Source: Journal of the American Medical Informatics Association : JAMIA, 2023-07-19; 30(8), p. 1389-1397.
PMID: 37130345
Related Citations

Extracting COVID-19 diagnoses and symptoms from clinical text: A new annotated corpus and neural event extraction framework.
Authors: Lybarger K. , Ostendorf M. , Thompson M. , Yetisgen M. .
Source: Journal of biomedical informatics, 2021 May; 117, p. 103761.
EPub date: 2021-03-26.
PMID: 33781918
Related Citations

Back to Top