Predicting the Deleteriousness of Genomic Variants – Big and Small

Published on ● Video Link: https://www.youtube.com/watch?v=kDOvaeqY4i0



Duration: 26:55
142 views
2


Martin Kircher (BIH @ Charité / University of Luebeck)
https://simons.berkeley.edu/talks/predicting-deleteriousness-genomic-variants-big-and-small
From Algorithms to Discovery in Genome-Scale Biology and Medicine

Approaches for the identification of disease causal mutations are widely applied in research and clinical settings, but interpretation and ranking of the resulting variants remains challenging. Combined Annotation Dependent Depletion (CADD, https://cadd-sv.bihealth.org/) integrates annotations by contrasting variants that survived purifying selection along the human lineage with simulated mutations to score short sequence variants (SNVs, InDels, multi-allelic substitutions). Since its publication (Kircher, Witten et al. Nat Genet. 2014), CADD was well adopted by the community and minor adjustments and fixes were released since, including the native support of both GRCh37 and GRCh38 assemblies (Rentzsch et al. NAR 2019). Recently, we assessed existing deep neural network (DNN) models for splice effects with the Multiplexed Functional Assay of Splicing using Sort-seq dataset (MFASS, Cheung et al. Mol Cell. 2019). We selected two DNN models based only on genomic sequence, MMSplice and SpliceAI, which showed the best performance for integration into CADD (Rentzsch et al. Genome Med. 2021). The DNN scores boosted CADD's predictions for splice effects and we noted that while the DNN scores have superior performance on splice variants, they fail to account for nonsense and missense effects of the same variants. This suggests that variant prioritization will improve with more domain-specific information and underlines the importance of identifying additional such features, e.g. for regulatory sequences. With rapid advances in the identification of structural variants (SVs), we decided to apply the general concept of CADD to score them (CADD-SV, https://cadd-sv.bihealth.org/). While methods utilizing individual mechanistic principles like the deletion of coding sequence or 3D architecture disruptions were available, a comprehensive tool that uses the broad spectrum of available SV annotations was missing. We show that CADD-SV scores are predictive of pathogenicity and population frequency and that CADD-SV's ability to prioritize pathogenic variants exceeds that of existing methods like SVScore and AnnotSV (Kleinert & Kircher, Genome Res. 2022). Our results highlight advantages of the CADD approach, like profiting from a large training data set covering diverse and rare feature annotations without major ascertainment effects from historic and on-going variant collections.




Other Videos By Simons Institute for the Theory of Computing


2022-07-14Integrated Information Theory (IIT) and Nuclear Command and Control: Whither Sovereignty?
2022-07-14Authorship, Technicity, and Contingency
2022-07-13AI & Humanity on the Ground: Embedding AI into Critical Clinical Decision Making
2022-07-13Hard Choices in Artificial Intelligence
2022-07-13Law's Consumers and Platform Users: How Competing Constructions of Humans Legitimize...
2022-07-12Outward-Facing Science
2022-07-11Exponentiating Single-Cell Sequencing
2022-07-11Distinct Gene Programs Underpinning ‘Disease Tolerance’ and ‘Resistance’ Against Infections
2022-07-11Determining the Molecular Intermediates Between Genotype and Phenotype
2022-07-11How Genome 3D Organization Regulates Alternative Splicing?
2022-07-11Predicting the Deleteriousness of Genomic Variants – Big and Small
2022-07-11Algorithms for Inferring Phenotypes from Ancient DNA
2022-07-11Mapping Biological Pathways Using Systematic Genetics and Cell Biology
2022-07-11Computational Approaches to Study Interactions Between Mutagenic Processes and Cellular Processes
2022-07-11A Tyrosine Kinase Protein Interaction Map Reveals Targetable EGFR Network Oncogenesis in Lung Cancer
2022-07-11A Binary Quantitative Interaction Mapping Approach: Elucidating Multiprotein Complexes in...
2022-07-11Long-Range Propagation of Genetic Effects in Molecular Networks
2022-07-11Using Large-Scale Clinico-Genomics Data for in silico Clinical Trials and Precision Oncology
2022-07-11A Statistical, Reference-Free Algorithm Subsumes Myriad Problems in Genome Science
2022-07-11Machine Learning for Single-Cell 3D Epigenomics
2022-07-11Understanding Molecular Complexity for Precision Medicine



Tags:
Simons Institute
theoretical computer science
UC Berkeley
Computer Science
Theory of Computation
Theory of Computing
From Algorithms to Discovery in Genome-Scale Biology and Medicine
Martin Kircher