Heuristics for the run-length encoded Burrows–Wheeler transform alphabet ordering problem
Lily Major (),
Amanda Clare (),
Jacqueline W. Daykin (),
Benjamin Mora () and
Christine Zarges ()
Additional contact information
Lily Major: Aberystwyth University
Amanda Clare: Aberystwyth University
Jacqueline W. Daykin: Aberystwyth University
Benjamin Mora: Swansea University, Bay Campus
Christine Zarges: Aberystwyth University
Journal of Heuristics, 2025, vol. 31, issue 1, No 11, 29 pages
Abstract:
Abstract The Burrows–Wheeler Transform (BWT) is a string transformation technique widely used in areas such as bioinformatics and file compression. Many applications combine a run-length encoding (RLE) with the BWT in a way which preserves the ability to query the compressed data efficiently. However, these methods may not take full advantage of the compressibility of the BWT as they do not modify the alphabet ordering for the sorting step embedded in computing the BWT. Indeed, any such alteration of the alphabet ordering can have a considerable impact on the output of the BWT, in particular on the number of runs. For an alphabet $$\Sigma $$ Σ containing $$\sigma $$ σ characters, the space of all alphabet orderings is of size $$\sigma !$$ σ ! . While for small alphabets an exhaustive investigation is possible, finding the optimal ordering for larger alphabets is not feasible. Therefore, there is a need for a more informed search strategy than brute-force sampling the entire space, which motivates a new heuristic approach. In this paper, we explore the non-trivial cases for the problem of minimizing the size of a run-length encoded BWT (RLBWT) via selecting a new ordering for the alphabet. We show that random sampling of the space of alphabet orderings usually gives sub-optimal orderings for compression and that a local search strategy can provide a large improvement in relatively few steps. We also inspect a selection of initial alphabet orderings, including ASCII, letter appearance, and letter frequency. While this alphabet ordering problem is computationally hard we demonstrate gain in compressibility.
Keywords: Alphabet ordering; Burrows–Wheeler transform; Compression; Local search; Random sampling; Run-length encoding (search for similar items in EconPapers)
Date: 2025
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
http://link.springer.com/10.1007/s10732-025-09548-3 Abstract (text/html)
Access to the full text of the articles in this series is restricted.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:joheur:v:31:y:2025:i:1:d:10.1007_s10732-025-09548-3
Ordering information: This journal article can be ordered from
http://www.springer.com/journal/10732
DOI: 10.1007/s10732-025-09548-3
Access Statistics for this article
Journal of Heuristics is currently edited by Manuel Laguna
More articles in Journal of Heuristics from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().