Sequence Bioinformatics at Large Scale: Petabase-Scale Sequence Alignment Catalyses Viral Discovery
Rayan Chikhi (Institut Pasteur)
https://simons.berkeley.edu/talks/sequence-bioinformatics-large-scale-petabase-scale-sequence-alignment-catalyses-viral
Computational Challenges in Very Large-Scale 'Omics'
Petabytes of valuable sequencing data reside in public repositories, doubling in size every two years. They contain a wealth of genetic information about viruses that would help us monitor spillovers and anticipate future pandemics. We recently developed a bioinformatics cloud infrastructure, named Serratus, to perform petabase-scale sequence alignment. With it we analyzed all available RNA-seq samples (5.7 million samples, 10 petabytes) and discovered 10x more RNA viruses than previously known, including a new family of coronaviruses (Edgar et al, Nature, 2022). In this talk, I will present the computational infrastructure and some of the biological analyses.