Grant Details
Grant Number: |
3R21CA258242-01S1 Interpret this number |
Primary Investigator: |
Yetisgen, Meliha |
Organization: |
University Of Washington |
Project Title: |
Extraction of Symptom Burden From Clinical Narratives of Cancer Patients Using Natural Language Processing |
Fiscal Year: |
2022 |
Abstract
Project Summary/Abstract
Although cancer in children and adolescents is rare, it is the leading cause of death by disease past infancy
among children in the United States. The US Department of Health defines SDOH as “conditions in the
environment that affect health, functioning, and quality of life outcomes and risks." There is an extensive
literature base linking race, ethnicity, and SDOH to pediatric cancer outcomes. SDOH are commonly queried in
pediatric clinical practice. Very few of the SDOH data points are noted as discrete data-fields such as race and
ethnicity; most are documented as clinical narratives in Electronic Health Records (EHRs) which makes it
difficult to collect SDOH in clinical and research settings to improve patient care and advance clinical research.
We therefore propose to develop novel deep learning-based NLP technologies that can extract detailed SDOH
information from EHRs of pediatric patients for secondary use. Our dataset will include clinical notes of
pediatric patients from two institutions: Seattle Cancer Care Alliance (SCCA) and University of Washington
Medical Center (UWMC). SCCA cohort will include only pediatric cancer patients. To ensure the
generalizability of extraction approaches across different institutions and patient populations, UWMC cohort will
include a random sample from general pediatric population. Our final corpus will include thousands of clinical
notes of hundreds of pediatric patients over a period of ten years (1.1.2012-12.31.2021). We will design a
frame-based event representation schema to capture the salient details of the following categories of SDOH:
(1) health care access and quality, (2) living arrangements, (3) economic stability, (4) housing and hunger
insecurity, (5) prior trauma/loss, (6) education access and quality, (7) patient and family substance use history,
and (8) patient/family mental. We will use active learning to sample a diverse and representative set of notes
for gold standard annotation. Given this gold standard, our goal is automated extraction of SDOH from
clinical narratives of pediatric patients with deep learning-based NLP approaches. The proposed frame-
based event representation, active learning framework and NLP architectures will be based on ongoing work
from our ITCR - R21 project titled “Extraction of Symptom Burden from Clinical Narratives of Cancer Patients
using Natural Language Processing” (1 R21 CA258242-01). All models and their implementations produced
during the execution of this project will be shared with the community as open-source resources.
Publications
None. See parent grant details.