EconPapers    
Economics at your fingertips  
 

Effect of Dimension Size and Window Size on Word Embedding in Classification Tasks

Dávid Držík and Jozef Kapusta

Acta Informatica Pragensia, vol. preprint

Abstract: Background: Static word embedding models such as Word2Vec and GloVe remain widely used in natural language processing, yet key hyperparameters are often selected heuristically rather than through systematic validation.Objective: This study provides an extrinsic evaluation of context window size and embedding dimensionality for Word2Vec (CBOW and Skip-gram) and GloVe embeddings in a downstream spam classification task.Methods: Embeddings were trained on a large external corpus and evaluated using a neural network and several classical machine learning classifiers.Results: The results show that context window size has a moderate influence on performance, whereas embedding dimensionality has a clearer effect: values below approximately 50 degrade performance, while increases beyond moderate ranges (approximately 100-150) yield diminishing returns. Across all experiments, Word2Vec achieves higher stability and performance than GloVe.Conclusion: Overall, the findings suggest that robust classification performance can be achieved with moderate embedding dimensionalities and smaller context windows, providing practical guidance for efficient embedding configuration.

Keywords: Word embeddings; Word2Vec; GloVe; Vector dimension; Context window size (search for similar items in EconPapers)
References: Add references at CitEc
Citations:

Downloads: (external link)
http://aip.vse.cz/doi/10.18267/j.aip.309.html (text/html)
free of charge

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:prg:jnlaip:v:preprint:id:309

Ordering information: This journal article can be ordered from
Redakce Acta Informatica Pragensia, Katedra systémové analýzy, Vysoká škola ekonomická v Praze, nám. W. Churchilla 4, 130 67 Praha 3
http://aip.vse.cz

DOI: 10.18267/j.aip.309

Access Statistics for this article

Acta Informatica Pragensia is currently edited by Editorial Office

More articles in Acta Informatica Pragensia from Prague University of Economics and Business Contact information at EDIRC.
Bibliographic data for series maintained by Stanislav Vojir ().

 
Page updated 2026-03-15
Handle: RePEc:prg:jnlaip:v:preprint:id:309