Skip to main content
Grant Details

Grant Number: 3U01CA199277-08S2 Interpret this number
Primary Investigator: Lacey, James
Organization: Beckman Research Institute/City Of Hope
Project Title: A More Perfect Union: Leveraging Clinically Deployed Models and Cancer Epidemiology Cohort Data to Improve Ai/Ml Readiness of NIH-Supported Population Sciences Resources
Fiscal Year: 2022


PROJECT SUMMARY/ABSTRACT NIH has invested hundreds of millions of dollars in large-scale prospective observational cohorts. These studies' diverse and valuable data have been used to generate important discoveries about how lifestyle and environment affect health and disease. These high-dimensional and multi-modal real-world data can enable broad research, including new AI/ML applications. Unfortunately, the standard methods cohorts use to store, manage, analyze, and share their data are not ideal for contemporary AI/ML use. This creates a “readiness gap” that hinders new AI/ML research. This project proposes an innovative yet feasible approach to close that gap by improving AI/ML readiness at multiple levels. Our multidisciplinary team includes AI/ML experts at City of Hope (COH); experienced population scientists from the California Teachers Study (CTS) cohort team; and cloud computing specialists from the San Diego Supercomputer Center's (SDSC) Sherlock Cloud. The CTS includes 133,477 female participants who have been followed continuously since 1995. Through surveys and linkages, the CTS has collected comprehensive exposure and lifestyle data and has identified over 28,000 cancers; over 34,000 deaths; and over 800,000 individual hospitalizations. Based on an AI/ML readiness framework, we will update the CTS's data & computing architecture; reconfigure data exploration and aggregation tools and documentation; and use CTS data to text, evaluate, and expand existing, clinically deployed AI/ML models. First, we will expand the current private CTS data analytics cloud to include a new scalable computing environment specifically for AI/ML. We will deploy Amazon Web Services (AWS) resources for AI/ML within our secure CTS enclave and provision GPU-enabled instances running a full suite of scientific computing and AI/ML packages in Python and Jupyter Notebooks. Second, we will generate embeddings in the CTS data to reduce the data complexity that is a barrier to AI/ML applications. Embeddings are low- dimensional latent representations that compress data from multiple modalities into vectors that represent a compact embedding, or abstracted summary, of a participant's data. Use of unsupervised learning and an autocoder deep neural network will cluster CTS data into phenotype-based subgroups that can be used for essential AI/ML functions, such as cohort discovery, close-neighbor identification, and imputation. Third, we will augment clinically deployed risk models at COH (e.g., for readmissions) with CTS data to directly evaluate the potential for real-world cohort data to improve model performance and the portability of clinical models into cohort populations. Each of these three initiatives will be documented in interactive tutorial notebooks that will be FAIR for the research community. This project includes a balanced combination of people, process, and technology: a new multidisciplinary team of experts from relevant fields; new general-purpose embedding representations of observational cohort data; and a secure cloud-based infrastructure configured specifically for new AI/ML projects. Successful completion of this work will close the AI/ML readiness gap for cohort data.


None. See parent grant details.

Back to Top