Dictionary-based methods for information extraction
A. Baronchelli,
E. Caglioti,
V. Loreto and
E. Pizzi
Physica A: Statistical Mechanics and its Applications, 2004, vol. 342, issue 1, 294-300
Abstract:
In this paper, we present a general method for information extraction that exploits the features of data compression techniques. We first define and focus our attention on the so-called dictionary of a sequence. Dictionaries are intrinsically interesting and a study of their features can be of great usefulness to investigate the properties of the sequences they have been extracted from e.g. DNA strings. We then describe a procedure of string comparison between dictionary-created sequences (or artificial texts) that gives very good results in several contexts. We finally present some results on self-consistent classification problems.
Keywords: Information extraction; Data compression; Sequence analysis (search for similar items in EconPapers)
Date: 2004
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
http://www.sciencedirect.com/science/article/pii/S0378437104004868
Full text for ScienceDirect subscribers only. Journal offers the option of making the article available online on Science direct for a fee of $3,000
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:eee:phsmap:v:342:y:2004:i:1:p:294-300
DOI: 10.1016/j.physa.2004.01.072
Access Statistics for this article
Physica A: Statistical Mechanics and its Applications is currently edited by K. A. Dawson, J. O. Indekeu, H.E. Stanley and C. Tsallis
More articles in Physica A: Statistical Mechanics and its Applications from Elsevier
Bibliographic data for series maintained by Catherine Liu ().