“Models for interpreting heterogeneous genomic data: from gene expression deconvolution to GWAS variant interpretation”
In this seminar, I will discuss two examples of how probabilistic models are making an impact on the fields of personalized medicine and human genetics. With respect to personalized medicine, I will first outline how high throughput genomics technologies are being used to discover molecular biomarkers in tumors that are predictive of cancer patient survival. I will demonstrate how contamination of these tumor samples by non-tumor cell types has a huge impact on our ability to identify biomarkers, and show that a novel statistical model is able to both remove this contamination in silico and improve our ability to predict patient survival. Second, I will discuss one of the major challenges in modern human genetics, the identification of gene pathways that underlie complex traits and diseases. Human genetics studies have identified thousands of genetic regions whose variation is associated with disease risk, but these genetic regions can be hundreds of thousands of nucleotides long. Therefore, it has been highly challenging to experimentally locate the precise OEcausal nucleotides¹ within these regions that drive disease risk variation. I will illustrate how a novel statistical model is able to prioritize specific nucleotides for experimental testing, as well as discover the global molecular architecture underlying several complex traits and diseases.