Dictionary-based methods for information extraction

Baronchelli, A.; Caglioti, E.; Loreto, V.; Pizzi, E.

Dictionary-based methods for information extraction

A. Baronchelli, E. Caglioti, V. Loreto and E. Pizzi

Physica A: Statistical Mechanics and its Applications, 2004, vol. 342, issue 1, 294-300

Abstract: In this paper, we present a general method for information extraction that exploits the features of data compression techniques. We first define and focus our attention on the so-called dictionary of a sequence. Dictionaries are intrinsically interesting and a study of their features can be of great usefulness to investigate the properties of the sequences they have been extracted from e.g. DNA strings. We then describe a procedure of string comparison between dictionary-created sequences (or artificial texts) that gives very good results in several contexts. We finally present some results on self-consistent classification problems.

Keywords: Information extraction; Data compression; Sequence analysis (search for similar items in EconPapers)
Date: 2004
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
http://www.sciencedirect.com/science/article/pii/S0378437104004868
Full text for ScienceDirect subscribers only. Journal offers the option of making the article available online on Science direct for a fee of $3,000

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:eee:phsmap:v:342:y:2004:i:1:p:294-300

DOI: 10.1016/j.physa.2004.01.072

Access Statistics for this article

Physica A: Statistical Mechanics and its Applications is currently edited by K. A. Dawson, J. O. Indekeu, H.E. Stanley and C. Tsallis

More articles in Physica A: Statistical Mechanics and its Applications from Elsevier
Bibliographic data for series maintained by Catherine Liu ().