Skip to main content
Grant Details

Grant Number: 5U24CA184407-06 Interpret this number
Primary Investigator: Savova, Guergana
Organization: Boston Children'S Hospital
Project Title: Cancer Deep Phenotype Extraction From Electronic Medical Records
Fiscal Year: 2018
Back to top


DESCRIPTION (provided by applicant): Precise phenotype information is needed to advance translational cancer research, particularly to unravel the effects of genetic, epigenetic, and other factors on tumor behavior and responsiveness. Examples of phenotypic variables in cancer include: tumor morphology (e.g. histopathologic diagnosis), co-morbid conditions (e.g. associated immune disease), laboratory findings (e.g. gene amplification status), specific tumor behaviors (e.g. metastasis) and response to treatment (e.g. effect of a chemotherapeutic agent on tumor). Current models for correlating EMR data with -omics data largely ignore the clinical text, which remains one of the most important sources of phenotype information for cancer patients. Unlocking the value of clinical text has the potential to enable new insights about cancer initiation, progression, metastasis, and response to treatment. We propose further collaboration of two mature informatics groups with long histories of developing open-source natural language processing (NLP) software (Apache cTAKES, caTIES and ODIE) to extend existing software with new methods for cancer deep phenotyping. Several aims propose investigation of biomedical information extraction where there has been little or no previous work (e.g. clinical genomic entities, and causal discourse). Visualization of extracted data, usability of the software, and dissemination are also emphasized. Three driving oncology projects led by accomplished translational investigators in Breast Cancer, Melanoma, and Ovarian Cancer will drive development of the software. These labs will contribute phenotype variables for extraction, test utility and usability of the software, and provide the setting for a extrinsic evaluation. The proposed research bridges novel methods to automate cancer deep phenotype extraction from clinical text with emerging standards in phenotype knowledge representation and NLP. This work is highly aligned with recent calls in the scientific literature to advance scalable and robust methods of extracting and representing phenotypes for precision medicine and translational research.

Back to top


Clinical Natural Language Processing in languages other than English: opportunities and challenges.
Authors: Névéol A. , Dalianis H. , Velupillai S. , Savova G. , Zweigenbaum P. .
Source: Journal Of Biomedical Semantics, 2018-03-30 00:00:00.0; 9(1), p. 12.
EPub date: 2018-03-30 00:00:00.0.
PMID: 29602312
Related Citations

Computerized Approach to Creating a Systematic Ontology of Hematology/Oncology Regimens.
Authors: Malty A.M. , Jain S.K. , Yang P.C. , Harvey K. , Warner J.L. .
Source: Jco Clinical Cancer Informatics, 2018; 2018, .
EPub date: 2018-05-11 00:00:00.0.
PMID: 30238070
Related Citations

DeepPhe: A Natural Language Processing System for Extracting Cancer Phenotypes from Clinical Records.
Authors: Savova G.K. , Tseytlin E. , Finan S. , Castine M. , Miller T. , Medvedeva O. , Harris D. , Hochheiser H. , Lin C. , Chavan G. , et al. .
Source: Cancer Research, 2017-11-01 00:00:00.0; 77(21), p. e115-e118.
PMID: 29092954
Related Citations

Capturing the Patient's Perspective: a Review of Advances in Natural Language Processing of Health-Related Text.
Authors: Gonzalez-Hernandez G. , Sarker A. , O'Connor K. , Savova G. .
Source: Yearbook Of Medical Informatics, 2017 Aug; 26(1), p. 214-227.
EPub date: 2017-09-11 00:00:00.0.
PMID: 29063568
Related Citations

Automated annotation and classification of BI-RADS assessment from radiology reports.
Authors: Castro S.M. , Tseytlin E. , Medvedeva O. , Mitchell K. , Visweswaran S. , Bekhuis T. , Jacobson R.S. .
Source: Journal Of Biomedical Informatics, 2017 May; 69, p. 177-187.
EPub date: 2017-04-18 00:00:00.0.
PMID: 28428140
Related Citations

Towards Generalizable Entity-Centric Clinical Coreference Resolution.
Authors: Miller T. , Dligach D. , Bethard S. , Lin C. , Savova G. .
Source: Journal Of Biomedical Informatics, 2017-04-21 00:00:00.0; , .
EPub date: 2017-04-21 00:00:00.0.
PMID: 28438706
Related Citations

An information model for computable cancer phenotypes.
Authors: Hochheiser H. , Castine M. , Harris D. , Savova G. , Jacobson R.S. .
Source: Bmc Medical Informatics And Decision Making, 2016-09-15 00:00:00.0; 16(1), p. 121.
EPub date: 2016-09-15 00:00:00.0.
PMID: 27629872
Related Citations

Multilayered temporal modeling for the clinical domain.
Authors: Lin C. , Dligach D. , Miller T.A. , Bethard S. , Savova G.K. .
Source: Journal Of The American Medical Informatics Association : Jamia, 2016 Mar; 23(2), p. 387-95.
PMID: 26521301
Related Citations

Semi-supervised Learning for Phenotyping Tasks.
Authors: Dligach D. , Miller T. , Savova G.K. .
Source: Amia ... Annual Symposium Proceedings. Amia Symposium, 2015; 2015, p. 502-11.
PMID: 26958183
Related Citations

Back to Top