||5R01CA158113-08 Interpret this number
||Texas A&M University
||Consistent Variable Selection in P>>n Settings
DESCRIPTION (provided by applicant): Molecular signature-guided clinical therapies are critical to advancing the treatment of cancer, and there has been a recent explosion in the number of types of molecular data that can potentially be used to identify mutations, expression levels, and methylations (and combinations of these effects) that contribute to cancer gene functioning. Vast stores of such data are now publicly available in repositories, like The Cancer Genome Atlas Projects and the International Cancer Genome Consortium, where they await statistical analyses. Like finding a needle in a haystack, the central problem that arises in the analyses of these data is the problem of identifying important prognostic factors from huge numbers of non-prognostic factors. The investigators of this project have recently developed a new method that can accomplish this feat. Their approach has proven to correctly identify important factors that predict outcomes when there are many more factors that can be used for prediction than there are observations of an outcome, and recent theoretical developments and simulation studies have demonstrated that these results can be extended to situations in which there are many, many more possible gene expression values than there are tissue samples from cancer patients. The goal of this project is to extend these methods so that they can be applied to broader classes of patient outcome data, to make these methods more computationally efficient so that they can be applied routinely to massive genomic datasets, to apply these methods to existing cancer studies, and to incorporate these new methods into software tools that can be distributed to cancer researchers throughout the world so that they can more effectively identify genetic mutations that are either associated with cancer functioning or predictive of the success of new or existing cancer therapies.