Skip to main content
Grant Details

Grant Number: 5UH3CA243120-04 Interpret this number
Primary Investigator: Savova, Guergana
Organization: Boston Children'S Hospital
Project Title: Natural Language Processing Platform for Cancer Surveillance
Fiscal Year: 2022


PROJECT SUMMARY/ABSTRACT This UG3/UH3 proposal titled “Natural Language Processing Platform for Cancer Surveillance” is in response to Research Area 1 of PAR 16-349 ( specifically addressing the development of natural language processing (NLP) tools to facilitate automatic/unsupervised/minimally supervised extraction of specific discrete cancer-related data from various types of unstructured electronic medical records (EMRs) related to the activities of cancer registries. It is submitted through a multi-PI mechanism – Prof. Guergana Savova from Boston Children’s Hospital/Harvard Medical School, Dr. Jeremy Warner from Vanderbilt University Medical Center, Prof. Harry Hochheiser from the University of Pittsburgh, and Prof. Eric Durbin from the Kentucky Cancer Registry/University of Kentucky. The current proposal builds on prior work funded by the NCI Informatics Tools for Cancer Research (ITCR) program ( ). We envision building on our work to date to advance methods for information extraction of clinical phenotyping data needed to fuel a new cancer surveillance paradigm that would benefit hospital-based, state-based, and national cancer registries. In this new paradigm, surveillance programs would use the methods to enhance the speed, accuracy, and ease of cancer reporting. The proposed DeepPhe*CR platform could be deployed at local sites or centrally, and could eventually be integrated into existing or new visualization and abstraction tools as needed by the cancer surveillance community. Although there has been some previous work on automatic phenotype extraction from the various streams of data including the clinical narrative for specific types of cancer or individual variables for cancer surveillance, the proposed work will be a step towards a generalizable information extraction. This generalizability enables extensibility and scalability. Interoperability is reinforced through the modeling part of the proposed project which is grounded in most recent advances in biomedical ontologies, terminologies, community-adopted conventions and standards. Our planned partnership with three SEER cancer registries provides our decision-making processes with a solid foundation in large-scale cancer surveillance.


An End-to-End Natural Language Processing System for Automatically Extracting Radiation Therapy Events From Clinical Texts.
Authors: Bitterman D.S. , Goldner E. , Finan S. , Harris D. , Durbin E.B. , Hochheiser H. , Warner J.L. , Mak R.H. , Miller T. , Savova G.K. .
Source: International journal of radiation oncology, biology, physics, 2023-09-01; 117(1), p. 262-273.
EPub date: 2023-03-27.
PMID: 36990288
Related Citations

Use of Natural Language Processing to Extract Clinical Cancer Phenotypes from Electronic Medical Records.
Authors: Savova G.K. , Danciu I. , Alamudun F. , Miller T. , Lin C. , Bitterman D.S. , Tourassi G. , Warner J.L. .
Source: Cancer research, 2019-11-01; 79(21), p. 5463-5470.
EPub date: 2019-08-08.
PMID: 31395609
Related Citations

Back to Top