Minimum entropy approach to word segmentation problems

Wang, Bin

Minimum entropy approach to word segmentation problems

Bin Wang

Physica A: Statistical Mechanics and its Applications, 2001, vol. 293, issue 3, 583-591

Abstract: Given a sequence composed of a limited number of characters, we try to “read” it as a “text”. This involves segmenting the sequence into “words”. The difficulty is to distinguish good segmentation from enormous numbers of random ones. Aiming at revealing the nonrandomness of the sequence as strongly as possible, by applying maximum likelihood method, we find a quantity called segmentation entropy that can be used to fulfill the aim. Contrary to commonplace where maximum entropy principle was applied to obtain good solution, we chose to minimize the segmentation entropy to obtain good segmentation. The concept developed in this letter can be used to study the noncoding DNA sequences, e.g., for regulatory elements prediction, in eukaryote genomes.

Keywords: ■; ■; ■ (search for similar items in EconPapers)
Date: 2001
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
http://www.sciencedirect.com/science/article/pii/S0378437100005458
Full text for ScienceDirect subscribers only. Journal offers the option of making the article available online on Science direct for a fee of $3,000

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:eee:phsmap:v:293:y:2001:i:3:p:583-591

DOI: 10.1016/S0378-4371(00)00545-8

Access Statistics for this article

Physica A: Statistical Mechanics and its Applications is currently edited by K. A. Dawson, J. O. Indekeu, H.E. Stanley and C. Tsallis

More articles in Physica A: Statistical Mechanics and its Applications from Elsevier
Bibliographic data for series maintained by Catherine Liu ().