Skip to main content
An official website of the United States government
Grant Details

Grant Number: 1R21CA293419-01 Interpret this number
Primary Investigator: Bhattacharya, Arjun
Organization: University Of Tx Md Anderson Can Ctr
Project Title: Alternative Splicing and Isoform Expression as Mediators for the Genetic Etiology of Breast Cancer
Fiscal Year: 2024


Abstract

ABSTRACT A drawback of genome-wide association studies (GWAS) for breast cancer risk and related phenotypes is their limited insights into genotype-to-phenotype mechanisms for identified genomic regions. Although integrating GWAS with functional genomics datasets, like the Genotype-Tissue Expression Project (GTEx) and The Cancer Genomic Atlas (TCGA), has yielded promising results in identifying candidate target genes for many traits1–3, these approaches overlook the complexity of alternative splicing and isoform diversity within the transcriptome. Indeed, recent studies of long read RNA-sequencing (RNA-seq) data across tissues reveal that as much as 40- 60% of the human transcriptome is unannotated6–8 due to overlooked isoforms. We propose to re-align existing breast-specific short-read RNA-seq datasets using novel isoform annotations developed from long-read RNA- seq data. We will then integrate these with existing breast cancer and mammographic density GWAS data to identify isoform- and splice-site-specific mechanisms underlying genetic associations for breast cancer and mammographic density phenotypes. We will build on our recent work where we developed and showcased the promise of isoform-level transcriptome-wide association studies (isoTWAS), an innovative machine learning framework that integrates genetics, all expressed isoforms of a gene, and phenotypic associations. Specifically, we will first quantify isoform expression and alternative splicing events in GTEx and TCGA using novel transcript assemblies from long-read RNA-seq datasets (Aim 1). We will benchmark multiple statistical approaches for alignment of isoforms by conducting extensive evaluation studies. We will then leverage these newly aligned isoforms and alternative splicing events in breast tissue to pinpoint isoforms and alternative splicing events likely to mediate germline genetic associations with breast cancer risk and mammographic density phenotypes (Aim 2). This innovative proposal aligns with the NCI strategic objective of Understanding the Mechanisms of Cancer and Detecting and Diagnosing Cancer and addresses a critical challenge in studying the genetic etiology of breast cancer: prioritizing potential causal biological mechanisms for further follow-up. Our proposal is unique in that it will re-quantify and integrate multi-tissue, multi-level transcriptomic reference panels (both short- and long-read RNA-seq) with robust GWAS summary statistics using cutting-edge computational tools for transcriptomics and a novel integrative framework. By combining publicly available multi- level `omic datasets in a systemic genomic epidemiology framework, our work will provide both molecular data resources and reproducible computational frameworks that can be easily expanded to other tissues and traits. Specifically, we will develop open-source computational pipelines for developing tissue-specific, novel isoform annotations for short-read RNA-seq alignment and expression quantification and create and maintain a publicly available portal to host iso- and splice-QTL summary statistics and predictive models allowing for the broader research community to explore similar investigations across traits and tissues.



Publications


None


Back to Top