Grant Details
Grant Number: |
5R21CA227606-02 Interpret this number |
Primary Investigator: |
Kawatkar, Aniket |
Organization: |
Kaiser Foundation Research Institute |
Project Title: |
Development of an Automated Method to Capture Bladder Cancer Recurrence and Progression for Epidemiologic Research |
Fiscal Year: |
2020 |
Abstract
Project Summary / Abstract: Bladder cancer is the sixth most common cancer in the United States, and
non-muscle invasive bladder cancer (NMIBC) accounts for 75-80% of all cases. Tumor recurrence and
progression are common among NMIBC patients: over 50% of patients have their tumors recur, most
within the first year, and up to 45% of high-risk tumors progress to muscle-invasive disease within 5 years.
Patients therefore undergo intensive clinical surveillance and treatment, contributing to bladder cancer
being the most expensive cancer to treat on a per patient basis. Large population-based studies have
been limited in their ability to study tumor recurrence and progression because these key outcomes are
not typically captured in cancer registry or other discretely coded data. To overcome this limitation and
facilitate future epidemiologic and outcomes studies on NMIBC, we propose to develop and validate
automated algorithms using natural language processing (NLP) to capture bladder cancer recurrence
(Aim 1) and progression (Aim 2) from free-text pathology, urology, and imaging notes. We will externally
validate the accuracy of the algorithms for extracting tumor characteristics using a national sample of 575
patients from the Veterans Affairs (VA) healthcare system (Aim 3). NLP is a powerful tool that works by
segmenting notes into units of related text (e.g., sentences) and applying computational methods to
determine meaning and extract data. We will use a novel, internally-developed NLP tool that integrates
the best components of several open source NLP packages to efficiently develop, refine, and validate the
proposed algorithms. Kaiser Permanente Southern California (KPSC) is an ideal study setting because of
its large, diverse population, advanced electronic health record, high-quality cancer registry, and complete
capture of care. The initial NLP algorithms will be created based on clinical input and chart reviews of a
sample of medical records. The algorithms first will be developed using diagnostic reports, leveraging
validated cancer registry data on 6,000 patients; the same clinical procedures are used for initial diagnosis
as for recurrence / progression. Then, algorithms will be applied to surveillance reports and iteratively
refined based on false positive and negative results vs. study chart reviews (n=100 for each iteration). The
final algorithms will be compared to an expert reference standard provided by 2 urologic oncologists and a
pathologist in a sample of 200 patients. Algorithm performance will be assessed by sensitivity, specificity,
positive predictive value, and negative predictive value. The final algorithms will be applied to 4,000
newly diagnosed NMIBC patients age >18 from 2008-2017 within KPSC. The frequency of recurrence
and progression will be described, and characteristics of patients with and without the outcomes will be
compared. Successful completion of study aims will produce novel, automated methods that will facilitate
large epidemiologic and outcomes studies, whose results may improve care for NMIBC patients.
Publications
None