Skip to main content
Grant Details

Grant Number: 5R01CA262296-03 Interpret this number
Primary Investigator: Cui, Yan
Organization: University Of Tennessee Health Sci Ctr
Project Title: Algorithm-Based Prevention and Reduction of Cancer Health Disparity Arising From Data Inequality
Fiscal Year: 2023


Ethnic minority groups have a long-term cumulative data disadvantage in biomedical research and clinical studies. Statistics have shown that over 90% of the samples in cancer-related GWAS and clinical omics projects were collected from Individuals of European ancestry. This severe data disadvantage of the ethnic minority groups is set to produce new health disparities as data-driven, algorithm-based biomedical research and clinical decisions become increasingly common. The new cancer disparity arising from data inequality can potentially impact all ethnic minority groups in all types of cancers where data inequality exists. Thus, its negative impact is not limited to the cancer types or subtypes for which significant ethnic disparities have already been evident. The long-term goal of the proposed research is to prevent or reduce the heath disparities arising from the data disadvantage of ethnic minority groups. The overall objective of this work is to obtain key knowledge and create open resources to establish a new paradigm for machine learning with multiethnic clinical omics data. Our central hypothesis is that the knowledge learned from data of the majority population can be transferred to improve machine learning performance on the data-disadvantaged ethnic minority groups. Guided by strong preliminary data, we will pursuit two specific aims to 1) Discover from cancer clinical omics data and genotype-phenotype data: under what conditions and to what extent the transfer learning scheme improves machine learning model performance on data-disadvantaged ethnic minority groups; 2) Create an open resource system for unbiased multiethnic machine learning to prevent or reduce new health disparities arising from the data disadvantage of ethnic minorities. The approach is innovative because it represents a substantive departure from the status quo by shifting the paradigm of multiethnic machine learning from mixture learning and independent learning schemes to a transfer learning scheme. The proposed research is significant, because it is expected to establish a new paradigm for unbiased multiethnic machine learning and to provide an open resource system to facilitate the paradigm shift, and thus to prevent or reduce health disparities arising from the data disadvantage of ethnic minorities.


Addressing the Challenge of Biomedical Data Inequality: An Artificial Intelligence Perspective.
Authors: Gao Y. , Sharma T. , Cui Y. .
Source: Annual review of biomedical data science, 2023-08-10; 6, p. 153-171.
EPub date: 2023-04-27.
PMID: 37104653
Related Citations

Clinical time-to-event prediction enhanced by incorporating compatible related outcomes.
Authors: Gao Y. , Cui Y. .
Source: PLOS digital health, 2022; 1(5), .
EPub date: 2022-05-26.
PMID: 35757279
Related Citations

Malignant transformation in human colorectal mucosa as monitored by distribution of laminin, a basement membrane glycoprotein.
Authors: Kellokumpu I. , Ekblom P. , Scheinin T.M. , Andersson L.C. .
Source: Acta pathologica, microbiologica, et immunologica Scandinavica. Section A, Pathology, 1985 Sep; 93(5), p. 285-91.
PMID: 4050437
Related Citations

Back to Top