Grant Details
| Grant Number: |
2U24CA248010-06 Interpret this number |
| Primary Investigator: |
Savova, Guergana |
| Organization: |
Boston Children'S Hospital |
| Project Title: |
Cancer Deep Phenotype Extraction From Electronic Medical Records (RENEWAL) |
| Fiscal Year: |
2025 |
Abstract
Summary
Cancer patients accumulate a wealth of electronic medical record (EMR) data during the diagnostic, decision-
making, treatment, and follow-up processes of their care; most of these data are found in unstructured narrative
form that remains dormant for secondary research purposes. Even when patients enroll in clinical trials that
gather detailed case report forms, a holistic picture of their cancer journey is the exception, not the rule.
Answering seemingly simple questions requires intensive manual review of patient records, a tedious process
that can take hours per patient case, limiting researchers’ ability to construct large observational cohorts. With
the exponential growth in the quantity of EMR data, it is not tenable for even a very large team of manual curators
to thoroughly and exhaustively evaluate records at scale.
Understanding the “deep phenotype” of a cancer patient requires a complete picture of both tumor and host.
Critical cancer phenotypic variables include morphology, tumor location, extent of invasion, predictive and
prognostic biomarkers, treatment exposure history, and response to treatment. Host phenotypic variables include
fitness (eg performance status and comorbidities), adverse effects of treatment, and non-medical determinants
of health (eg global distress, financial toxicity, and behavioral habits). Phenotypic profiles are typically
constructed from multiple data sources and temporality is critically important. As many phenotypic variables are
available only in EMR free text created over time, the cancer research community needs new, openly-available
natural language processing (NLP) methods and systems to transform phenotypic detail from EMRs to data for
advancing translational research. We have been developing DeepPhe, a platform for turning this rich data into
computable longitudinal summaries of cancer diagnostic, prognostic, and treatment information.
Since our last submission in 2019, there has been an unprecedented speed of developments within the Artificial
Intelligence field, mainly in its subfield of text processing as exemplified by the advent of large language models
(LLMs) and then very large language models. In this renewal, we will build on our and community’s methodology
advancements, including the use of LLMs for EMR processing, to deliver a state-of-the-art, comprehensive,
modern open-source tool for extracting deep phenotype information and provide novel visual analytics
approaches. Our case studies will demonstrate the utility of our tools and drive the development of a vibrant
community of cancer researchers using DeepPhe. Our community development efforts are aligned with the
mission of the NCI Cancer Research Data Commons to advance methods of extracting and representing
precision medicine phenotypes.
Publications
Error Notice
The database may currently be offline for maintenance and should be operational soon. If not, we have been notified of this error and will be reviewing it shortly.
We apologize for the inconvenience.
- The DCCPS Team.