Skip to main content
An official website of the United States government
Grant Details

Grant Number: 5U01CA274576-02 Interpret this number
Primary Investigator: Long, Qi
Organization: University Of Pennsylvania
Project Title: Robust Privacy Preserving Distributed Analysis Platform for Cancer Research: Addressing Data Bias and Disparities
Fiscal Year: 2024


Abstract

Project Summary Privacy-preserving distributed analysis has gained increasing interests in the broad biomedical research community in recent years, as it can a) eliminate the need to create, maintain, and secure access to central data repositories, b) minimize the need to disclose protected health information outside the data-owning entity, and c) mitigate many security, proprietary, privacy and other concerns. As such, it offers great promises in lowering regulatory and other hurdles for collaboration across multiple institutions and enhancing the public trust in biomedical research. Equally important, analysis of health data from multiple institutions across the US would yield more robust and generalizable findings. This is particularly relevant in cancer disparities research as the sample size for minority groups can be very small from one institution. However, there remain significant methodological gaps in the current state-of-the-art for privacy-preserving distributed analysis. Most notably, missing data present significant challenges, as they are ubiquitous in biomedical data including, but not limited to, electronic health records (EHR). It is well known that missing data is a major source of bias in EHR. For example, patients from minority groups and those who have less access to private insurance tend to have more missing data in their EHR. Biased data as a result of missing data are known to yield unfair statistical and machine learning models, which in turn can perpetuate and exacerbate health inequities and disparities. There has been no work on principled approaches for properly handling missing data in distributed analysis beyond our recent works. In addition, it is well-known that distributed analysis is still at risk of revealing important individual-level information and lacks rigorous guarantee in the sense of differential privacy, the prevailing notion and metric for privacy protection. To address these significant limitations, we propose three specific aims. In Aim 1, we will refine and develop state-of-the-art imputation methods for handling missing data in distributed analysis and develop advanced functionalities for enhanced privacy protection through differential privacy control and homomorphic encryption. Building on the methods developed in Aim 1, we will develop an open-source and open-access distributed analysis platform that includes a robust system architecture and user-friendly GUI in Aim 2. We will assess and validate our distributed analysis platform using real-world use cases in cancer disparities research in Aim 3. With the enhanced privacy protection, our proposed distributed analysis platform will have the potential to further enhance public trust and lowerhurdles for collaboration across multiple institutions in cancer research. As such, our platform will enable researchers to use more information and less biased data in cancer research, enhance the validity, robustness and generalizability of research findings, and offer research substantial benefits in areas including, but not limited to, cancer disparities and informatics practice.



Publications