Skip to main content
An official website of the United States government
Grant Details

Grant Number: 2U24CA248010-06 Interpret this number
Primary Investigator: Savova, Guergana
Organization: Boston Children'S Hospital
Project Title: Cancer Deep Phenotype Extraction From Electronic Medical Records (RENEWAL)
Fiscal Year: 2025


Abstract

Summary Cancer patients accumulate a wealth of electronic medical record (EMR) data during the diagnostic, decision- making, treatment, and follow-up processes of their care; most of these data are found in unstructured narrative form that remains dormant for secondary research purposes. Even when patients enroll in clinical trials that gather detailed case report forms, a holistic picture of their cancer journey is the exception, not the rule. Answering seemingly simple questions requires intensive manual review of patient records, a tedious process that can take hours per patient case, limiting researchers’ ability to construct large observational cohorts. With the exponential growth in the quantity of EMR data, it is not tenable for even a very large team of manual curators to thoroughly and exhaustively evaluate records at scale. Understanding the “deep phenotype” of a cancer patient requires a complete picture of both tumor and host. Critical cancer phenotypic variables include morphology, tumor location, extent of invasion, predictive and prognostic biomarkers, treatment exposure history, and response to treatment. Host phenotypic variables include fitness (eg performance status and comorbidities), adverse effects of treatment, and non-medical determinants of health (eg global distress, financial toxicity, and behavioral habits). Phenotypic profiles are typically constructed from multiple data sources and temporality is critically important. As many phenotypic variables are available only in EMR free text created over time, the cancer research community needs new, openly-available natural language processing (NLP) methods and systems to transform phenotypic detail from EMRs to data for advancing translational research. We have been developing DeepPhe, a platform for turning this rich data into computable longitudinal summaries of cancer diagnostic, prognostic, and treatment information. Since our last submission in 2019, there has been an unprecedented speed of developments within the Artificial Intelligence field, mainly in its subfield of text processing as exemplified by the advent of large language models (LLMs) and then very large language models. In this renewal, we will build on our and community’s methodology advancements, including the use of LLMs for EMR processing, to deliver a state-of-the-art, comprehensive, modern open-source tool for extracting deep phenotype information and provide novel visual analytics approaches. Our case studies will demonstrate the utility of our tools and drive the development of a vibrant community of cancer researchers using DeepPhe. Our community development efforts are aligned with the mission of the NCI Cancer Research Data Commons to advance methods of extracting and representing precision medicine phenotypes.



Publications

Error Notice

The database may currently be offline for maintenance and should be operational soon. If not, we have been notified of this error and will be reviewing it shortly.

We apologize for the inconvenience.
- The DCCPS Team.

Back to Top