Improving Large-Scale k -Nearest Neighbor Text Categorization with Label Autoencoders
Francisco J. Ribadas-Pena (),
Shuyuan Cao and
Víctor M. Darriba Bilbao
Additional contact information
Francisco J. Ribadas-Pena: Department of Computer Science, University of Vigo, Edificio Politécnico, Campus As Lagoas s/n, 32004 Ourense, Spain
Shuyuan Cao: Department of Computer Science, University of Vigo, Edificio Politécnico, Campus As Lagoas s/n, 32004 Ourense, Spain
Víctor M. Darriba Bilbao: Department of Computer Science, University of Vigo, Edificio Politécnico, Campus As Lagoas s/n, 32004 Ourense, Spain
Mathematics, 2022, vol. 10, issue 16, 1-22
Abstract:
In this paper, we introduce a multi-label lazy learning approach to deal with automatic semantic indexing in large document collections in the presence of complex and structured label vocabularies with high inter-label correlation. The proposed method is an evolution of the traditional k -Nearest Neighbors algorithm which uses a large autoencoder trained to map the large label space to a reduced size latent space and to regenerate the predicted labels from this latent space. We have evaluated our proposal in a large portion of the MEDLINE biomedical document collection which uses the Medical Subject Headings (MeSH) thesaurus as a controlled vocabulary. In our experiments we propose and evaluate several document representation approaches and different label autoencoder configurations.
Keywords: autoencoders; multi-label categorization; semantic indexing; nearest neighbors; text categorization; MeSH indexing (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2022
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/2227-7390/10/16/2867/pdf (application/pdf)
https://www.mdpi.com/2227-7390/10/16/2867/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:10:y:2022:i:16:p:2867-:d:885735
Access Statistics for this article
Mathematics is currently edited by Ms. Emma He
More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().