Grant Details
Grant Number: |
1U01CA274576-01A1 Interpret this number |
Primary Investigator: |
Long, Qi |
Organization: |
University Of Pennsylvania |
Project Title: |
Robust Privacy Preserving Distributed Analysis Platform for Cancer Research: Addressing Data Bias and Disparities |
Fiscal Year: |
2023 |
Abstract
Project Summary
Privacy-preserving distributed analysis has gained increasing interests in the broad biomedical research
community in recent years, as it can a) eliminate the need to create, maintain, and secure access to central
data repositories, b) minimize the need to disclose protected health information outside the data-owning entity,
and c) mitigate many security, proprietary, privacy and other concerns. As such, it offers great promises in
lowering regulatory and other hurdles for collaboration across multiple institutions and enhancing the public
trust in biomedical research. Equally important, analysis of health data from multiple institutions across the US
would yield more robust and generalizable findings. This is particularly relevant in cancer disparities research
as the sample size for minority groups can be very small from one institution. However, there remain significant
methodological gaps in the current state-of-the-art for privacy-preserving distributed analysis. Most notably,
missing data present significant challenges, as they are ubiquitous in biomedical data including, but not limited
to, electronic health records (EHR). It is well known that missing data is a major source of bias in EHR. For
example, patients from minority groups and those who have less access to private insurance tend to have
more missing data in their EHR. Biased data as a result of missing data are known to yield unfair statistical and
machine learning models, which in turn can perpetuate and exacerbate health inequities and disparities. There
has been no work on principled approaches for properly handling missing data in distributed analysis beyond
our recent works. In addition, it is well-known that distributed analysis is still at risk of revealing important
individual-level information and lacks rigorous guarantee in the sense of differential privacy, the prevailing
notion and metric for privacy protection. To address these significant limitations, we propose three specific
aims. In Aim 1, we will refine and develop state-of-the-art imputation methods for handling missing data in
distributed analysis and develop advanced functionalities for enhanced privacy protection through differential
privacy control and homomorphic encryption. Building on the methods developed in Aim 1, we will develop an
open-source and open-access distributed analysis platform that includes a robust system architecture and
user-friendly GUI in Aim 2. We will assess and validate our distributed analysis platform using real-world use
cases in cancer disparities research in Aim 3. With the enhanced privacy protection, our proposed distributed
analysis platform will have the potential to further enhance public trust and lowerhurdles for collaboration
across
multiple
institutions
in cancer research. As such, our platform will enable researchers to use more
information and less biased data in cancer research, enhance the validity, robustness and generalizability of
research findings, and offer
research
substantial benefits in areas including, but not limited to, cancer disparities
and informatics practice.
Publications
None