||3R01CA226802-04S1 Interpret this number
||Cincinnati Childrens Hosp Med Ctr
||Leveraging the Cloud for Splicing Discovery
Alternative splicing is among the most important contributors of proteomic diversity in higher order eukaryotes.
When disrupted in cancer, mis-splicing results in unique mRNA isoforms not observed in healthy cells. Such
cancer-specific splice isoforms represent an untapped reservoir of potential neoantigens for targeted cancer
vaccines and immunotherapies. As a central component of our funded NCI R01 for “Unbiased identification of
spliceosome vulnerabilities across cancer”, we have been extending and leveraging a comprehensive splicing
analysis pipeline to define splicing vulnerabilities across human cancers and healthy tissues. The associated
bioinformatics tools that are built-upon through this effort are designed to identify both known and novel cancer
subtypes, nominate key regulatory splicing factors, infer functional splice-isoform impacts and discover cancer-
specific neoantigens that can be exploited by emerging immunotherapies or vaccines. The bioinformatics tools
to yield these discoveries largely consistent of distinct components of the large AltAnalyze open-source project,
begun in 2008. While AltAnalyze or its algorithms have been cited in over 400 published research studies, re-
applying this workflow at the scale of TCGA, GTEx and other large RNA-Seq compendiums have historically
required significant computational resources and time to download and re-process hundreds of terabytes of
data in a secure and compliant manner. New emerging cloud-based solutions have the potential to mitigate
these technical challenges through streamlined compute of thousands of pre-processed samples at a low cost.
To address these challenges, we propose to:
Aim 1: Streamline and optimize AltAnalyze for the cloud. In this aim, we will decouple and optimize the
primary splicing analysis components of AltAnalyze to enable streamlined supervised and unsupervised
analysis of cancer transcriptomes. AltAnalyze will be packaged as a CWL pipeline, containerized with Docker
and deposited in DockStore, enabling fast and comprehensive analyses of splicing in the cloud.
Aim 2: Integrate AltAnalyze.cloud in Terra.bio. To enable direct analyses of controlled-access sequence-
level files in TCGA, TARGET, GTEx and other major human RNA-sequencing datasets, we will 1) translate our
CWL workflows to the Workflow Description Language (WDL) and 2) establish a Terra workflow for integrated
splicing analysis using AltAnalyze.cloud. AltAnalyze.cloud will be able to be run through the Terra web
interface for analysis, progress tracking, provenance and sharing of results. These features will enable
streamlined re-use of user and controlled NIH deposited datasets in the cloud.
None. See parent grant details.