Global Dense Vector Representations for Words or Items Using Shared Parameter Alternating Tweedie Model
Taejoon Kim and
Haiyan Wang ()
Additional contact information
Taejoon Kim: Department of Statistics and Biostatistics, California State University East Bay, Hayward, CA 94542, USA
Haiyan Wang: Department of Statistics, Kansas State University, Manhattan, KS 66506, USA
Mathematics, 2025, vol. 13, issue 4, 1-40
Abstract:
In this article, we present a model for analyzing the co-occurrence count data derived from practical fields such as user–item or item–item data from online shopping platforms and co-occurring word–word pairs in sequences of texts. Such data contain important information for developing recommender systems or studying the relevance of items or words from non-numerical sources. Different from traditional regression models, there are no observations for covariates. Additionally, the co-occurrence matrix is typically of such high dimension that it does not fit into a computer’s memory for modeling. We extract numerical data by defining windows of co-occurrence using weighted counts on the continuous scale. Positive probability mass is allowed for zero observations. We present the Shared Parameter Alternating Tweedie (SA-Tweedie) model and an algorithm to estimate the parameters. We introduce a learning rate adjustment used along with the Fisher scoring method in the inner loop to help the algorithm stay on track with optimizing direction. Gradient descent with the Adam update was also considered as an alternative method for the estimation. Simulation studies showed that our algorithm with Fisher scoring and learning rate adjustment outperforms the other two methods. We applied SA-Tweedie to English-language Wikipedia dump data to obtain dense vector representations for WordPiece tokens. The vector representation embeddings were then used in an application of the Named Entity Recognition (NER) task. The SA-Tweedie embeddings significantly outperform GloVe, random, and BERT embeddings in the NER task. A notable strength of the SA-Tweedie embedding is that the number of parameters and training cost for SA-Tweedie are only a tiny fraction of those for BERT.
Keywords: NLP; word embedding; tweedie distribution; high-dimensional co-occurrence matrix; matrix factorization; adam; recommender systems (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2025
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/2227-7390/13/4/612/pdf (application/pdf)
https://www.mdpi.com/2227-7390/13/4/612/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:13:y:2025:i:4:p:612-:d:1590544
Access Statistics for this article
Mathematics is currently edited by Ms. Emma He
More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().