EconPapers    
Economics at your fingertips  
 

Quantifying molecular bias in DNA data storage

Yuan-Jyue Chen (), Christopher N. Takahashi, Lee Organick, Callista Bee, Siena Dumas Ang, Patrick Weiss, Bill Peck, Georg Seelig, Luis Ceze () and Karin Strauss ()
Additional contact information
Yuan-Jyue Chen: Microsoft Research
Christopher N. Takahashi: University of Washington
Lee Organick: University of Washington
Callista Bee: University of Washington
Siena Dumas Ang: Microsoft Research
Patrick Weiss: Twist Bioscience
Bill Peck: Twist Bioscience
Georg Seelig: University of Washington
Luis Ceze: University of Washington
Karin Strauss: Microsoft Research

Nature Communications, 2020, vol. 11, issue 1, 1-9

Abstract: Abstract DNA has recently emerged as an attractive medium for archival data storage. Recent work has demonstrated proof-of-principle prototype systems; however, very uneven (biased) sequencing coverage has been reported, which indicates inefficiencies in the storage process. Deviations from the average coverage in the sequence copy distribution can either cause wasteful provisioning in sequencing or excessive number of missing sequences. Here, we use millions of unique sequences from a DNA-based digital data archival system to study the oligonucleotide copy unevenness problem and show that the two paramount sources of bias are the synthesis and amplification (PCR) processes. Based on these findings, we develop a statistical model for each molecular process as well as the overall process. We further use our model to explore the trade-offs between synthesis bias, storage physical density, logical redundancy, and sequencing redundancy, providing insights for engineering efficient, robust DNA data storage systems.

Date: 2020
References: Add references at CitEc
Citations: View citations in EconPapers (4)

Downloads: (external link)
https://www.nature.com/articles/s41467-020-16958-3 Abstract (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:nat:natcom:v:11:y:2020:i:1:d:10.1038_s41467-020-16958-3

Ordering information: This journal article can be ordered from
https://www.nature.com/ncomms/

DOI: 10.1038/s41467-020-16958-3

Access Statistics for this article

Nature Communications is currently edited by Nathalie Le Bot, Enda Bergin and Fiona Gillespie

More articles in Nature Communications from Nature
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-03-19
Handle: RePEc:nat:natcom:v:11:y:2020:i:1:d:10.1038_s41467-020-16958-3