Grant Details
Grant Number: |
5UH3CA243120-05 Interpret this number |
Primary Investigator: |
Savova, Guergana |
Organization: |
Boston Children'S Hospital |
Project Title: |
Natural Language Processing Platform for Cancer Surveillance |
Fiscal Year: |
2023 |
Abstract
PROJECT SUMMARY/ABSTRACT
This UG3/UH3 proposal titled “Natural Language Processing Platform for Cancer Surveillance” is in response
to Research Area 1 of PAR 16-349 (https://grants.nih.gov/grants/guide/pa-files/par-16-349.html) specifically
addressing the development of natural language processing (NLP) tools to facilitate
automatic/unsupervised/minimally supervised extraction of specific discrete cancer-related data from various
types of unstructured electronic medical records (EMRs) related to the activities of cancer registries. It is
submitted through a multi-PI mechanism – Prof. Guergana Savova from Boston Children’s Hospital/Harvard
Medical School, Dr. Jeremy Warner from Vanderbilt University Medical Center, Prof. Harry Hochheiser from
the University of Pittsburgh, and Prof. Eric Durbin from the Kentucky Cancer Registry/University of Kentucky.
The current proposal builds on prior work funded by the NCI Informatics Tools for Cancer Research (ITCR)
program (https://itcr.cancer.gov/ ). We envision building on our work to date to advance methods for
information extraction of clinical phenotyping data needed to fuel a new cancer surveillance paradigm that
would benefit hospital-based, state-based, and national cancer registries. In this new paradigm, surveillance
programs would use the methods to enhance the speed, accuracy, and ease of cancer reporting. The
proposed DeepPhe*CR platform could be deployed at local sites or centrally, and could eventually be
integrated into existing or new visualization and abstraction tools as needed by the cancer surveillance
community. Although there has been some previous work on automatic phenotype extraction from the various
streams of data including the clinical narrative for specific types of cancer or individual variables for cancer
surveillance, the proposed work will be a step towards a generalizable information extraction. This
generalizability enables extensibility and scalability. Interoperability is reinforced through the modeling part of
the proposed project which is grounded in most recent advances in biomedical ontologies, terminologies,
community-adopted conventions and standards. Our planned partnership with three SEER cancer registries
provides our decision-making processes with a solid foundation in large-scale cancer surveillance.
Publications
None