Grant Details
Grant Number: |
1U01CA289357-01 Interpret this number |
Primary Investigator: |
Krasnitz, Alexander |
Organization: |
Cold Spring Harbor Laboratory |
Project Title: |
Computational Tools for Accurate Inference of Genetic Ancestry From Cancer-Derived Molecular Data |
Fiscal Year: |
2024 |
Abstract
Project summary/abstract
For multiple cancer types, epidemiological data exhibit strong correlations between, on the one
hand, the incidence of the disease, its severity when diagnosed, and its clinical outcome, and, on
the other hand, the ancestral background of the patient. This well-documented phenomenon
strongly suggests a link between the biology and genetics of cancer in an individual and the
individual's genetic ancestry. Indeed, recent research in cancer genomics, both pan-cancer and
cancer type-specific, points to genetic and phenotypic differences between tumors occurring in
patient populations with differing genetic ancestries, and to the need for more data collection to
power further study in this area. It is the purpose of this proposal to facilitate such data analysis
on a much greater scale, by enabling genetic ancestry inference directly from cancer-derived
molecular data, without the need for the patient's cancer-free genotype or self-declared race or
ethnicity. Successful completion of this project will unlock vast amounts of such data for ancestry-
oriented studies of cancer from two major sources. One is the body of data stored by the
Sequence Read Archive and similar massive digital repositories, on the order of 106 cancer-
derived molecular profiles. The other is the body of archival tumor tissues across multiple medical
centers, from millions of which molecular data may be generated.
We will develop software tools for genetic ancestry inference from multiple types of
cancer-derived data, namely, DNA sequence data from whole exomes, whole genomes at low
coverage and targeted sequence panels; RNA sequence data; ATAC-seq and bisulfite-converted
sequence data. The tools to be developed will deliver inference of global genetic ancestry at a
sub-continental level of resolution, of ancestral admixtures and of local ancestry. These tools will
be adaptive, endowed with the ability to optimize their performance for each input cancer-derived
molecular profile. This adaptability will be achieved using simulated data, combining the input
cancer-derived profile with ancestral backgrounds representing well-defined population groups.
As a result, these inference methods will perform consistently, and with quantifiable accuracy,
across a range of profiling depths and qualities, and mitigate cancer-related damage to the
genome. An open-source, user-friendly and FAIR-compliant software implementation of these
methods will be made available to the research community through a number of channels,
including GitHub, Bioconductor and Galaxy. Training and community outreach for this software
will be provided in collaboration with ITCR Training Network.
Publications
None