Grant Details
| Grant Number: |
2U24CA248010-06 Interpret this number |
| Primary Investigator: |
Savova, Guergana |
| Organization: |
Boston Children'S Hospital |
| Project Title: |
Cancer Deep Phenotype Extraction From Electronic Medical Records (RENEWAL) |
| Fiscal Year: |
2025 |
Abstract
Summary
Cancer patients accumulate a wealth of electronic medical record (EMR) data during the diagnostic, decision-
making, treatment, and follow-up processes of their care; most of these data are found in unstructured narrative
form that remains dormant for secondary research purposes. Even when patients enroll in clinical trials that
gather detailed case report forms, a holistic picture of their cancer journey is the exception, not the rule.
Answering seemingly simple questions requires intensive manual review of patient records, a tedious process
that can take hours per patient case, limiting researchers’ ability to construct large observational cohorts. With
the exponential growth in the quantity of EMR data, it is not tenable for even a very large team of manual curators
to thoroughly and exhaustively evaluate records at scale.
Understanding the “deep phenotype” of a cancer patient requires a complete picture of both tumor and host.
Critical cancer phenotypic variables include morphology, tumor location, extent of invasion, predictive and
prognostic biomarkers, treatment exposure history, and response to treatment. Host phenotypic variables include
fitness (eg performance status and comorbidities), adverse effects of treatment, and non-medical determinants
of health (eg global distress, financial toxicity, and behavioral habits). Phenotypic profiles are typically
constructed from multiple data sources and temporality is critically important. As many phenotypic variables are
available only in EMR free text created over time, the cancer research community needs new, openly-available
natural language processing (NLP) methods and systems to transform phenotypic detail from EMRs to data for
advancing translational research. We have been developing DeepPhe, a platform for turning this rich data into
computable longitudinal summaries of cancer diagnostic, prognostic, and treatment information.
Since our last submission in 2019, there has been an unprecedented speed of developments within the Artificial
Intelligence field, mainly in its subfield of text processing as exemplified by the advent of large language models
(LLMs) and then very large language models. In this renewal, we will build on our and community’s methodology
advancements, including the use of LLMs for EMR processing, to deliver a state-of-the-art, comprehensive,
modern open-source tool for extracting deep phenotype information and provide novel visual analytics
approaches. Our case studies will demonstrate the utility of our tools and drive the development of a vibrant
community of cancer researchers using DeepPhe. Our community development efforts are aligned with the
mission of the NCI Cancer Research Data Commons to advance methods of extracting and representing
precision medicine phenotypes.
Publications
Informatics at the Frontier of Cancer Research.
Authors: Noller K.
, Botsis T.
, Camara P.G.
, Ciotti L.
, Cooper L.A.D.
, Goecks J.
, Griffith M.
, Haas B.J.
, Ideker T.
, Karchin R.
, et al.
.
Source: Cancer Research, 2025-08-15 00:00:00.0; 85(16), p. 2967-2986.
PMID: 40600473
Related Citations
Extracting Knowledge from Scientific Texts on Patient-Derived Cancer Models Using Large Language Models: Algorithm Development and Validation.
Authors: Yao J.
, Perova Z.
, Mandloi T.
, Lewis E.
, Parkinson H.
, Savova G.
.
Source: Biorxiv : The Preprint Server For Biology, 2025-01-29 00:00:00.0; , .
EPub date: 2025-01-29 00:00:00.0.
PMID: 39975119
Related Citations
As bleak as it sounds? Analysing trends in oncology clinical trial initiation in the UK from 2010 to 2022.
Authors: VanHelene A.D.
, Hadfield M.J.
, Trapani D.
, Warner J.L.
, Lythgoe M.P.
.
Source: Bmj Oncology, 2024; 3(1), p. e000410.
EPub date: 2024-08-14 00:00:00.0.
PMID: 39886121
Related Citations
DeepPhe-CR: Natural Language Processing Software Services for Cancer Registrar Case Abstraction.
Authors: Hochheiser H.
, Finan S.
, Yuan Z.
, Durbin E.B.
, Jeong J.C.
, Hands I.
, Rust D.
, Kavuluru R.
, Wu X.C.
, Warner J.L.
, et al.
.
Source: Medrxiv : The Preprint Server For Health Sciences, 2023-10-26 00:00:00.0; , .
EPub date: 2023-10-26 00:00:00.0.
PMID: 37205575
Related Citations
An End-to-End Natural Language Processing System for Automatically Extracting Radiation Therapy Events From Clinical Texts.
Authors: Bitterman D.S.
, Goldner E.
, Finan S.
, Harris D.
, Durbin E.B.
, Hochheiser H.
, Warner J.L.
, Mak R.H.
, Miller T.
, Savova G.K.
.
Source: International Journal Of Radiation Oncology, Biology, Physics, 2023-09-01 00:00:00.0; 117(1), p. 262-273.
EPub date: 2023-03-27 00:00:00.0.
PMID: 36990288
Related Citations
DeepPhe-CR: Natural Language Processing Software Services for Cancer Registrar Case Abstraction.
Authors: Hochheiser H.
, Finan S.
, Yuan Z.
, Durbin E.B.
, Jeong J.C.
, Hands I.
, Rust D.
, Kavuluru R.
, Wu X.C.
, Warner J.L.
, et al.
.
Source: Jco Clinical Cancer Informatics, 2023 Sep; 7, p. e2300156.
PMID: 38113411
Related Citations
Open-source Software Sustainability Models: Initial White Paper From the Informatics Technology for Cancer Research Sustainability and Industry Partnership Working Group.
Authors: Ye Y.
, Barapatre S.
, Davis M.K.
, Elliston K.O.
, Davatzikos C.
, Fedorov A.
, Fillion-Robin J.C.
, Foster I.
, Gilbertson J.R.
, Lasso A.
, et al.
.
Source: Journal Of Medical Internet Research, 2021-12-02 00:00:00.0; 23(12), p. e20028.
EPub date: 2021-12-02 00:00:00.0.
PMID: 34860667
Related Citations
Characterizing the Anticancer Treatment Trajectory and Pattern in Patients Receiving Chemotherapy for Cancer Using Harmonized Observational Databases: Retrospective Study.
Authors: Jeon H.
, You S.C.
, Kang S.Y.
, Seo S.I.
, Warner J.L.
, Belenkaya R.
, Park R.W.
.
Source: Jmir Medical Informatics, 2021-04-06 00:00:00.0; 9(4), p. e25035.
EPub date: 2021-04-06 00:00:00.0.
PMID: 33720842
Related Citations
Use of Natural Language Processing to Extract Clinical Cancer Phenotypes from Electronic Medical Records.
Authors: Savova G.K.
, Danciu I.
, Alamudun F.
, Miller T.
, Lin C.
, Bitterman D.S.
, Tourassi G.
, Warner J.L.
.
Source: Cancer Research, 2019-11-01 00:00:00.0; 79(21), p. 5463-5470.
EPub date: 2019-08-08 00:00:00.0.
PMID: 31395609
Related Citations