Cover Your Bases: How To Minimize The Sequencing Coverage In Dna Storage Systems

Published on ● Video Link: https://www.youtube.com/watch?v=aFuiMZ19Oc4



Category:
Guide
Duration: 49:40
68 views
3


Eitan Yaakobi (Technion)
https://simons.berkeley.edu/talks/eitan-yaakobi-technion-2024-03-07
Application-Driven Coding Theory

Although the expenses associated with DNA sequencing have been rapidly decreasing, the current cost of sequencing information stands at roughly $120/GB, which is dramatically more expensive than reading from existing archival storage solutions today. In this work, we aim to reduce not only the cost but also the latency of DNA storage by initiating the study of the DNA coverage depth problem, which aims to reduce the required number of reads to retrieve information from the storage system. Under this framework, our main goal is to understand the effect of error-correcting codes and retrieval algorithms on the required sequencing coverage depth. We establish that the expected number of reads that are required for information retrieval is minimized when the channel follows a uniform distribution. We also derive upper and lower bounds on the probability distribution of this number of required reads and provide a comprehensive upper and lower bound on its expected value. We further prove that for a noiseless channel and uniform distribution, MDS codes are optimal in terms of minimizing the expected number of reads. Additionally, we study the DNA coverage depth problem under the random-access setup, in which the user aims to retrieve just a specific information unit from the entire DNA storage system. We prove that the expected retrieval time is at least k for [n,k] MDS codes as well as for other families of codes. Furthermore, we present explicit code constructions that achieve expected retrieval times below k and evaluate their performance through analytical methods and simulations. Lastly, we provide lower bounds on the maximum expected retrieval time. Our findings offer valuable insights for reducing the cost and latency of DNA storage.




Other Videos By Simons Institute for the Theory of Computing


2024-04-04Graph Neural Networks for Classical and Quantum Channel Coding
2024-04-04Sparsity and Privacy in Distributed Matrix Multiplication
2024-04-04Service Rates Of Mds Codes & Fractional Matchings In Quasi-Uniform Hypergraphs
2024-04-04Combinatorial Problems in Support Recovery for 1-bit Compressed Sensing
2024-04-04Approximate coding computing
2024-04-04Coded matrix computation: numerical stability, partial stragglers and sparse input matrices
2024-04-04Coding For Dna Storage With Nanopore Sequencing
2024-04-04Polar Codes For Ids Channels
2024-04-04Neural Distributed Source Coding
2024-04-04Quantized-Constraint Concatenation And The Covering Radius Of Constrained Systems
2024-04-04Cover Your Bases: How To Minimize The Sequencing Coverage In Dna Storage Systems
2024-04-04Maximally Recoverable Codes
2024-04-04Generalized Staircase Codes With Arbitrary Bit Degree
2024-04-04Low-Density Parity-Check Codes And Spatial Coupling For Quantitative Group Testing
2024-04-04Non-binary LDPC codes for Information Reconciliation in QKD Protocols
2024-04-04Dynamic Denoising For Amp Applied To Sparse Regression Inner Codes With Outer Codes
2024-04-04Interleaved Codes For Cryptography
2024-04-04Mackay-Neal Codes For High-Speed Wireless And Free-Space Optical Links
2024-04-04Algebraic Coding Problems From Quantum Fault-Tolerance
2024-04-04Binary CSS-T codes: characterizations and posets
2024-04-04Toward Efficient Genomic Compression Using Channel Codes For Joint Alignment And Reconstruction



Tags:
Simons Institute
theoretical computer science
UC Berkeley
Computer Science
Theory of Computation
Theory of Computing
Application-Driven Coding Theory
Eitan Yaakobi