Using Weights with a Text Proximity Matrix
Angel R. Martinez,
Edward J. Wegman and
Wendy L. Martinez ()
Additional contact information
Angel R. Martinez: NAVSEA
Edward J. Wegman: George Mason University, School of Information Technology and Engineering
Wendy L. Martinez: NAVSEA
A chapter in COMPSTAT 2004 — Proceedings in Computational Statistics, 2004, pp 327-337 from Springer
Abstract:
Abstract In previous work, we introduced a way of encoding free-form documents called the bigram proximity matrix (BPM). When this encoding was used on a corpus of documents, where each document is tagged with a topic label, results showed that the documents could be classified based on their tagged meaning. In this paper, we investigate methods of weighting the elements of the BPM, analogous to the weighting schemes found in natural language processing. These include logarithmic weights, augmented normalized frequency, inverse document frequency and pointwise mutual information. Results presented in this paper show that some of the weights increased the proportion of correctly classified documents.
Keywords: Bigram proximity matrix; k nearest neighbors classifier; natural language processing (search for similar items in EconPapers)
Date: 2004
References: Add references at CitEc
Citations:
There are no downloads for this item, see the EconPapers FAQ for hints about obtaining it.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:sprchp:978-3-7908-2656-2_26
Ordering information: This item can be ordered from
http://www.springer.com/9783790826562
DOI: 10.1007/978-3-7908-2656-2_26
Access Statistics for this chapter
More chapters in Springer Books from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().