Skip to main content
An official website of the United States government
Grant Details

Grant Number: 7R21CA182821-03 Interpret this number
Primary Investigator: Lindstroem, Sara
Organization: University Of Washington
Project Title: Prioritizing Follow-Up of Gwas Loci Using Genetic and Functional Annotation Data
Fiscal Year: 2015


Abstract

DESCRIPTION (provided by applicant): Although genome-wide association studies (GWAS) have identified thousands of disease susceptibility loci, the underlying genetic structure in these regions is not fully studied and it is likely that the index GWAS signal originates from one or many yet unidentified causal variants. In order to localize potential causal variant(s) for further follow-up experiments, fine-mapping studies in large populations are underway. To date, fine-mapping studies have used standard approaches that fail to account for the full array of information currently available such as associations with gene expression (eQTLs) and genomic functional annotation. With the advent of large-scale initiatives such as The Encyclopedia of DNA Elements (ENCODE) and The Cancer Genome Atlas (TCGA), it may be possible to include an additional layer of functional information to fine-mapping studies, enhancing the ability to localize causal variants. We here propose to develop a statistical framework that will incorporate functional and genetic information. We will build variant-specific priors based on cell-specific functional annotation (e.g. transcript starting sites, protein coding), associations ith tissue-specific gene expression and correlated phenotypes (e.g. mammographic density). We will capitalize on the publically available ENCODE data to acquire functional annotation for each genetic variant. We will estimate posterior probabilities for each genetic variant based on their derived prior and the evidence for association with the outcome of interest. Such posterior probabilities can then be used to prioritize genetic variants for further follow-up in a laboratory setting. Our proposed method will be flexible in that it will jointly model internal (e.g. sequencig and gene expression data) and external (e.g. ENCODE) sources. It will also allow for multiple causal loci at each region and jointly assess all loci simultaneously, allowing the method to "borrow" information between the loci. To ensure generalizability, we will conduct extensive simulation studies taking numerous possible scenarios into account. We will apply our method on a multi-ethnic breast cancer targeted sequencing dataset of 2,288 breast cancer cases and 2,323 controls. For all women, we have GWAS and high-depth sequencing data for 12 GWAS-identified breast cancer regions, spanning a total of 5,500 kb. For a subset of these women, we also have mammographic density (n=1,000) and whole-genome expression data (n=250) in both normal and tumor tissue, allowing us to apply our method and jointly model empirical sequencing, gene expression and phenotype data. We have assembled a multi-disciplinary research team with a track record of producing high-profile publications in breast cancer epidemiology, population genetics, fine- mapping, statistical methods and publicly available software packages for the genetics community. Our work has the potential of bridging the gap between initial screening for regions in the genome that are associated with disease and prioritizing specific variants for further functional analysis. Such methods will have important implications for understanding the underlying biology of disease, a major challenge in the post-GWAS era.



Publications


None


Back to Top