||1R01CA179011-01 Interpret this number
||Bootstrap-Based Testing of Rare Sequence Variants Using Family Data
DESCRIPTION (provided by applicant): We now have a large arsenal of tests for association between disease and rare variants in genomic regions using the genotypes of unrelated individuals. However only the simplest of them have been extended to family data. Yet case-control tests using related cases are more powerful than tests based only on unrelated cases, particularly for rare variants. The power gain reflects enrichment of affected relatives for rare causal variants. Increased power is critical because most damaging variants occur at very low frequencies in human populations, and huge sample sizes and external biological information will be needed to detect associations with disease. Biologically-based contrasts between the multi-locus genotypes of cases and controls are likely to be complex, and simple, flexible methods are needed to infer their null distributions in the presence of correlation among subjects¿ genotypes. We propose a new way to extend all case-control association tests to all subjects, regardless of their genealogical relationship. The new method, which uses the bootstrap of Efron in a novel way, involves ¿de-correlating¿ subjects¿ correlated genotype data to allow bootstrap resampling, and then ¿re-correlating¿ the bootstrapped data to infer the null distribution of the test statistic. Aim 1 will use simulations to validate the new Quasi-bootstrp (QB) method for using family data to identify associations of disease with complex combinations of genotypes. This aim includes: i) assessing the type-1 error and power of QB tests for family data in comparison to: a) the same tests applied to unrelated subjects; and b) closed-form Gaussian-based tests for family data when available; ii) extending the QB method to data containing population structure and cryptic relatedness, for which the correlation matrix between pairs of subjects must be estimated; iii) dealing with missing genotype data. Aim 2 will apply the
QB method to cancer family data to evaluate its performance on functional genetic units containing known carcinogenic variants. This includes testing for BRCA1 and BRCA2 association with breast cancer in affected and unaffected subjects from families in the Breast Cancer Family Registry (BCFR) and testing for HOXB13 association with prostate cancer in the International Consortium on Prostate Cancer Genetics (ICPCG). Aim 3 will develop freely-available software to implement the QB method for existing multi-locus case-control association tests. This software will include methods for handling missing genotype data for some subjects at some markers. The software will allow users with data from related and unrelated subjects to evaluate associations with disease using any of the existing tests currently available only for unrelated subjects. If validated, the proposed QB method would provide a major addition to our tools for next-generation sequence data by analyzing those most likely to carry causal disease variants, while building on the known strengths of the bootstrap. These include ease of use, robustness, and versatility for a large variety of applications. With the computing resources now routinely available, the proposed method can be implemented quickly and easily. Narrative: Sequencing the genomes are many people is now cost-effective, and it may help us finds the groups of genes that cause chronic diseases such as cancer. However evidence now suggests that many very rare variants may act in concert to cause such disease, and unraveling the new clues will require evaluating the genomes of diseased individuals from families with multiple cases of the disease. We propose a simple way of applying any of the new tests to such families, which should increase their efficacy.