Text Categorisation Through Dimensionality Reduction Using Wavelet Transform
Jorge Chamorro-Padial () and
Rosa Rodríguez-Sánchez ()
Additional contact information
Jorge Chamorro-Padial: Departamento de Ciencias de la Computación e Inteligencia Artificial, CITIC-UGR Universidad de Granada, 18071 Granada, Spain
Rosa Rodríguez-Sánchez: Departamento de Ciencias de la Computación e Inteligencia Artificial, CITIC-UGR Universidad de Granada, 18071 Granada, Spain
Journal of Information & Knowledge Management (JIKM), 2020, vol. 19, issue 04, 1-21
Abstract:
This paper proposes a new method of dimensionality reduction when performing Text Classification, by applying the discrete wavelet transform to the document-term frequencies matrix. We analyse the features provided by the wavelet coefficients from the different orientations: (1) The high energy coefficients in the horizontal orientation correspond to relevant terms in a single document. (2) The high energy coefficients in the vertical orientation correspond to relevant terms for a single document, but not for the others. (3) The high energy coefficients in the diagonal orientation correspond to relevant terms in a document in comparison to other terms. If we filter using the wavelet coefficients and fulfil these three conditions simultaneously, we can obtain a reduced vocabulary of the corpus, with less dimensions than in the original one. To test the success of the reduced vocabulary, we recoded the corpus with the new reduced vocabulary and we obtained a statistically relevant level of accuracy for document classification.
Keywords: Text classification; dimensional reduction; wavelet; transform; coefficient wavelet properties; term-document (search for similar items in EconPapers)
Date: 2020
References: Add references at CitEc
Citations:
Downloads: (external link)
http://www.worldscientific.com/doi/abs/10.1142/S0219649220500392
Access to full text is restricted to subscribers
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:wsi:jikmxx:v:19:y:2020:i:04:n:s0219649220500392
Ordering information: This journal article can be ordered from
DOI: 10.1142/S0219649220500392
Access Statistics for this article
Journal of Information & Knowledge Management (JIKM) is currently edited by Professor Suliman Hawamdeh
More articles in Journal of Information & Knowledge Management (JIKM) from World Scientific Publishing Co. Pte. Ltd.
Bibliographic data for series maintained by Tai Tone Lim ().