Skip to main content
An official website of the United States government
Grant Details

Grant Number: 1U01CA289357-01 Interpret this number
Primary Investigator: Krasnitz, Alexander
Organization: Cold Spring Harbor Laboratory
Project Title: Computational Tools for Accurate Inference of Genetic Ancestry From Cancer-Derived Molecular Data
Fiscal Year: 2024


Abstract

Project summary/abstract For multiple cancer types, epidemiological data exhibit strong correlations between, on the one hand, the incidence of the disease, its severity when diagnosed, and its clinical outcome, and, on the other hand, the ancestral background of the patient. This well-documented phenomenon strongly suggests a link between the biology and genetics of cancer in an individual and the individual's genetic ancestry. Indeed, recent research in cancer genomics, both pan-cancer and cancer type-specific, points to genetic and phenotypic differences between tumors occurring in patient populations with differing genetic ancestries, and to the need for more data collection to power further study in this area. It is the purpose of this proposal to facilitate such data analysis on a much greater scale, by enabling genetic ancestry inference directly from cancer-derived molecular data, without the need for the patient's cancer-free genotype or self-declared race or ethnicity. Successful completion of this project will unlock vast amounts of such data for ancestry- oriented studies of cancer from two major sources. One is the body of data stored by the Sequence Read Archive and similar massive digital repositories, on the order of 106 cancer- derived molecular profiles. The other is the body of archival tumor tissues across multiple medical centers, from millions of which molecular data may be generated. We will develop software tools for genetic ancestry inference from multiple types of cancer-derived data, namely, DNA sequence data from whole exomes, whole genomes at low coverage and targeted sequence panels; RNA sequence data; ATAC-seq and bisulfite-converted sequence data. The tools to be developed will deliver inference of global genetic ancestry at a sub-continental level of resolution, of ancestral admixtures and of local ancestry. These tools will be adaptive, endowed with the ability to optimize their performance for each input cancer-derived molecular profile. This adaptability will be achieved using simulated data, combining the input cancer-derived profile with ancestral backgrounds representing well-defined population groups. As a result, these inference methods will perform consistently, and with quantifiable accuracy, across a range of profiling depths and qualities, and mitigate cancer-related damage to the genome. An open-source, user-friendly and FAIR-compliant software implementation of these methods will be made available to the research community through a number of channels, including GitHub, Bioconductor and Galaxy. Training and community outreach for this software will be provided in collaboration with ITCR Training Network.



Publications


None


Back to Top