Minimum entropy approach to word segmentation problems
Bin Wang
Physica A: Statistical Mechanics and its Applications, 2001, vol. 293, issue 3, 583-591
Abstract:
Given a sequence composed of a limited number of characters, we try to “read” it as a “text”. This involves segmenting the sequence into “words”. The difficulty is to distinguish good segmentation from enormous numbers of random ones. Aiming at revealing the nonrandomness of the sequence as strongly as possible, by applying maximum likelihood method, we find a quantity called segmentation entropy that can be used to fulfill the aim. Contrary to commonplace where maximum entropy principle was applied to obtain good solution, we chose to minimize the segmentation entropy to obtain good segmentation. The concept developed in this letter can be used to study the noncoding DNA sequences, e.g., for regulatory elements prediction, in eukaryote genomes.
Keywords: ■; ■; ■ (search for similar items in EconPapers)
Date: 2001
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
http://www.sciencedirect.com/science/article/pii/S0378437100005458
Full text for ScienceDirect subscribers only. Journal offers the option of making the article available online on Science direct for a fee of $3,000
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:eee:phsmap:v:293:y:2001:i:3:p:583-591
DOI: 10.1016/S0378-4371(00)00545-8
Access Statistics for this article
Physica A: Statistical Mechanics and its Applications is currently edited by K. A. Dawson, J. O. Indekeu, H.E. Stanley and C. Tsallis
More articles in Physica A: Statistical Mechanics and its Applications from Elsevier
Bibliographic data for series maintained by Catherine Liu ().