Grant Details
Grant Number: |
3U01CA199277-08S2 Interpret this number |
Primary Investigator: |
Lacey, James |
Organization: |
Beckman Research Institute/City Of Hope |
Project Title: |
A More Perfect Union: Leveraging Clinically Deployed Models and Cancer Epidemiology Cohort Data to Improve Ai/Ml Readiness of NIH-Supported Population Sciences Resources |
Fiscal Year: |
2022 |
Abstract
PROJECT SUMMARY/ABSTRACT
NIH has invested hundreds of millions of dollars in large-scale prospective observational cohorts. These
studies' diverse and valuable data have been used to generate important discoveries about how lifestyle and
environment affect health and disease. These high-dimensional and multi-modal real-world data can enable
broad research, including new AI/ML applications. Unfortunately, the standard methods cohorts use to store,
manage, analyze, and share their data are not ideal for contemporary AI/ML use. This creates a “readiness
gap” that hinders new AI/ML research. This project proposes an innovative yet feasible approach to close that
gap by improving AI/ML readiness at multiple levels. Our multidisciplinary team includes AI/ML experts at City
of Hope (COH); experienced population scientists from the California Teachers Study (CTS) cohort team; and
cloud computing specialists from the San Diego Supercomputer Center's (SDSC) Sherlock Cloud. The CTS
includes 133,477 female participants who have been followed continuously since 1995. Through surveys and
linkages, the CTS has collected comprehensive exposure and lifestyle data and has identified over 28,000
cancers; over 34,000 deaths; and over 800,000 individual hospitalizations. Based on an AI/ML readiness
framework, we will update the CTS's data & computing architecture; reconfigure data exploration and
aggregation tools and documentation; and use CTS data to text, evaluate, and expand existing, clinically
deployed AI/ML models. First, we will expand the current private CTS data analytics cloud to include a new
scalable computing environment specifically for AI/ML. We will deploy Amazon Web Services (AWS) resources
for AI/ML within our secure CTS enclave and provision GPU-enabled instances running a full suite of scientific
computing and AI/ML packages in Python and Jupyter Notebooks. Second, we will generate embeddings in
the CTS data to reduce the data complexity that is a barrier to AI/ML applications. Embeddings are low-
dimensional latent representations that compress data from multiple modalities into vectors that represent a
compact embedding, or abstracted summary, of a participant's data. Use of unsupervised learning and an
autocoder deep neural network will cluster CTS data into phenotype-based subgroups that can be used for
essential AI/ML functions, such as cohort discovery, close-neighbor identification, and imputation. Third, we will
augment clinically deployed risk models at COH (e.g., for readmissions) with CTS data to directly evaluate the
potential for real-world cohort data to improve model performance and the portability of clinical models into
cohort populations. Each of these three initiatives will be documented in interactive tutorial notebooks that will
be FAIR for the research community. This project includes a balanced combination of people, process, and
technology: a new multidisciplinary team of experts from relevant fields; new general-purpose embedding
representations of observational cohort data; and a secure cloud-based infrastructure configured specifically
for new AI/ML projects. Successful completion of this work will close the AI/ML readiness gap for cohort data.
Publications
None. See parent grant details.