Grant Details
Grant Number: |
1R21CA293419-01 Interpret this number |
Primary Investigator: |
Bhattacharya, Arjun |
Organization: |
University Of Tx Md Anderson Can Ctr |
Project Title: |
Alternative Splicing and Isoform Expression as Mediators for the Genetic Etiology of Breast Cancer |
Fiscal Year: |
2024 |
Abstract
ABSTRACT
A drawback of genome-wide association studies (GWAS) for breast cancer risk and related phenotypes is their
limited insights into genotype-to-phenotype mechanisms for identified genomic regions. Although integrating
GWAS with functional genomics datasets, like the Genotype-Tissue Expression Project (GTEx) and The Cancer
Genomic Atlas (TCGA), has yielded promising results in identifying candidate target genes for many traits1–3,
these approaches overlook the complexity of alternative splicing and isoform diversity within the transcriptome.
Indeed, recent studies of long read RNA-sequencing (RNA-seq) data across tissues reveal that as much as 40-
60% of the human transcriptome is unannotated6–8 due to overlooked isoforms. We propose to re-align existing
breast-specific short-read RNA-seq datasets using novel isoform annotations developed from long-read RNA-
seq data. We will then integrate these with existing breast cancer and mammographic density GWAS data to
identify isoform- and splice-site-specific mechanisms underlying genetic associations for breast cancer and
mammographic density phenotypes. We will build on our recent work where we developed and showcased the
promise of isoform-level transcriptome-wide association studies (isoTWAS), an innovative machine learning
framework that integrates genetics, all expressed isoforms of a gene, and phenotypic associations. Specifically,
we will first quantify isoform expression and alternative splicing events in GTEx and TCGA using novel transcript
assemblies from long-read RNA-seq datasets (Aim 1). We will benchmark multiple statistical approaches for
alignment of isoforms by conducting extensive evaluation studies. We will then leverage these newly aligned
isoforms and alternative splicing events in breast tissue to pinpoint isoforms and alternative splicing events likely
to mediate germline genetic associations with breast cancer risk and mammographic density phenotypes (Aim
2). This innovative proposal aligns with the NCI strategic objective of Understanding the Mechanisms of
Cancer and Detecting and Diagnosing Cancer and addresses a critical challenge in studying the genetic
etiology of breast cancer: prioritizing potential causal biological mechanisms for further follow-up.
Our proposal is unique in that it will re-quantify and integrate multi-tissue, multi-level transcriptomic reference
panels (both short- and long-read RNA-seq) with robust GWAS summary statistics using cutting-edge
computational tools for transcriptomics and a novel integrative framework. By combining publicly available multi-
level `omic datasets in a systemic genomic epidemiology framework, our work will provide both molecular data
resources and reproducible computational frameworks that can be easily expanded to other tissues and traits.
Specifically, we will develop open-source computational pipelines for developing tissue-specific, novel isoform
annotations for short-read RNA-seq alignment and expression quantification and create and maintain a publicly
available portal to host iso- and splice-QTL summary statistics and predictive models allowing for the broader
research community to explore similar investigations across traits and tissues.
Publications
None