Effect of Dimension Size and Window Size on Word Embedding in Classification Tasks
Dávid Držík and
Jozef Kapusta
Acta Informatica Pragensia, vol. preprint
Abstract:
Background: Static word embedding models such as Word2Vec and GloVe remain widely used in natural language processing, yet key hyperparameters are often selected heuristically rather than through systematic validation.Objective: This study provides an extrinsic evaluation of context window size and embedding dimensionality for Word2Vec (CBOW and Skip-gram) and GloVe embeddings in a downstream spam classification task.Methods: Embeddings were trained on a large external corpus and evaluated using a neural network and several classical machine learning classifiers.Results: The results show that context window size has a moderate influence on performance, whereas embedding dimensionality has a clearer effect: values below approximately 50 degrade performance, while increases beyond moderate ranges (approximately 100-150) yield diminishing returns. Across all experiments, Word2Vec achieves higher stability and performance than GloVe.Conclusion: Overall, the findings suggest that robust classification performance can be achieved with moderate embedding dimensionalities and smaller context windows, providing practical guidance for efficient embedding configuration.
Keywords: Word embeddings; Word2Vec; GloVe; Vector dimension; Context window size (search for similar items in EconPapers)
References: Add references at CitEc
Citations:
Downloads: (external link)
http://aip.vse.cz/doi/10.18267/j.aip.309.html (text/html)
free of charge
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:prg:jnlaip:v:preprint:id:309
Ordering information: This journal article can be ordered from
Redakce Acta Informatica Pragensia, Katedra systémové analýzy, Vysoká škola ekonomická v Praze, nám. W. Churchilla 4, 130 67 Praha 3
http://aip.vse.cz
DOI: 10.18267/j.aip.309
Access Statistics for this article
Acta Informatica Pragensia is currently edited by Editorial Office
More articles in Acta Informatica Pragensia from Prague University of Economics and Business Contact information at EDIRC.
Bibliographic data for series maintained by Stanislav Vojir ().