Bio-Constrained Codes with Neural Network for Density-Based DNA Data Storage

Rasool, Abdur; Qu, Qiang; Wang, Yang; Jiang, Qingshan

Bio-Constrained Codes with Neural Network for Density-Based DNA Data Storage

Abdur Rasool, Qiang Qu, Yang Wang and Qingshan Jiang
Additional contact information
Abdur Rasool: Shenzhen Key Laboratory for High Performance Data Mining, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
Qiang Qu: Shenzhen Key Laboratory for High Performance Data Mining, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
Yang Wang: Shenzhen Key Laboratory for High Performance Data Mining, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
Qingshan Jiang: Shenzhen Key Laboratory for High Performance Data Mining, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China

Mathematics, 2022, vol. 10, issue 5, 1-21

Abstract: DNA has evolved as a cutting-edge medium for digital information storage due to its extremely high density and durable preservation to accommodate the data explosion. However, the strings of DNA are prone to errors during the hybridization process. In addition, DNA synthesis and sequences come with a cost that depends on the number of nucleotides present. An efficient model to store a large amount of data in a small number of nucleotides is essential, and it must control the hybridization errors among the base pairs. In this paper, a novel computational model is presented to design large DNA libraries of oligonucleotides. It is established by integrating a neural network (NN) with combinatorial biological constraints, including constant GC-content and satisfying Hamming distance and reverse-complement constraints. We develop a simple and efficient implementation of NNs to produce the optimal DNA codes, which opens the door to applying neural networks for DNA-based data storage. Further, the combinatorial bio-constraints are introduced to improve the lower bounds and to avoid the occurrence of errors in the DNA codes. Our goal is to compute large DNA codes in shorter sequences, which should avoid non-specific hybridization errors by satisfying the bio-constrained coding. The proposed model yields a significant improvement in the DNA library by explicitly constructing larger codes than the prior published codes.

Keywords: DNA data storage; bio-constrained codes; neural network; DNA computing (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2022
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/10/5/845/pdf (application/pdf)
https://www.mdpi.com/2227-7390/10/5/845/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:10:y:2022:i:5:p:845-:d:765759

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().