DNA-COMPACT: DNA COMpression Based on a Pattern-Aware Contextual Modeling Technique
Pinghao Li,
Shuang Wang,
Jihoon Kim,
Hongkai Xiong,
Lucila Ohno-Machado and
Xiaoqian Jiang
PLOS ONE, 2013, vol. 8, issue 11, 1-13
Abstract:
Genome data are becoming increasingly important for modern medicine. As the rate of increase in DNA sequencing outstrips the rate of increase in disk storage capacity, the storage and data transferring of large genome data are becoming important concerns for biomedical researchers. We propose a two-pass lossless genome compression algorithm, which highlights the synthesis of complementary contextual models, to improve the compression performance. The proposed framework could handle genome compression with and without reference sequences, and demonstrated performance advantages over best existing algorithms. The method for reference-free compression led to bit rates of 1.720 and 1.838 bits per base for bacteria and yeast, which were approximately 3.7% and 2.6% better than the state-of-the-art algorithms. Regarding performance with reference, we tested on the first Korean personal genome sequence data set, and our proposed method demonstrated a 189-fold compression rate, reducing the raw file size from 2986.8 MB to 15.8 MB at a comparable decompression cost with existing algorithms. DNAcompact is freely available at https://sourceforge.net/projects/dnacompact/for research purpose.
Date: 2013
References: View complete reference list from CitEc
Citations: View citations in EconPapers (1)
Downloads: (external link)
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0080377 (text/html)
https://journals.plos.org/plosone/article/file?id= ... 80377&type=printable (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:plo:pone00:0080377
DOI: 10.1371/journal.pone.0080377
Access Statistics for this article
More articles in PLOS ONE from Public Library of Science
Bibliographic data for series maintained by plosone ().