EconPapers    
Economics at your fingertips  
 

Basic Co-Occurrence Latent Semantic Vector Space Model

Feng gao Niu ()
Additional contact information
Feng gao Niu: Shanxi University

Journal of Classification, 2019, vol. 36, issue 2, No 6, 277-294

Abstract: Abstract The vector representation is one of the important parts in document clustering or classification, which can quantify the text. In this paper, a novel Cooccurrence Latent Semantic Vector Space Model (CLSVSM) is presented and the co-occurrence distribution is further studied. This model is developed based on the Vector Space Model (VSM), embedding the co-occurrence latent semantic of the documents’ keywords to represent their vectors. First, experiments were conducted to test the model performance, using documents from Chinese National Knowledge Infrastructure (CNKI). The results showed the Entropy (E), Purity (P) and F1 value of CLMSVM is 20% better than in VSM in the documents clustering testing, which reveals that CLSVSM can improve the accuracy of clustering of documents, meanwhile reducing sparse degree of vectors. Second, it is the best to estimate the latent semantic: maximum (MAX), minimum (MIN), average (AVE), and median (MED)? More experiments are performed to compare the four estimators. The results indicate that Max and AVE are preferred method, while MIN method is the worst, which coincided with the discussion. Some essential questions were discussed at the end. These questions related to the trends of co-occurrence frequency, the function of co-occurrence intensity and its distribution, which reinforced the model.

Keywords: CLSVSM; VSM; Clustering; High-dimensional vector; Co-occurrence; Co-word (search for similar items in EconPapers)
Date: 2019
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
http://link.springer.com/10.1007/s00357-018-9283-9 Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:jclass:v:36:y:2019:i:2:d:10.1007_s00357-018-9283-9

Ordering information: This journal article can be ordered from
http://www.springer. ... hods/journal/357/PS2

DOI: 10.1007/s00357-018-9283-9

Access Statistics for this article

Journal of Classification is currently edited by Douglas Steinley

More articles in Journal of Classification from Springer, The Classification Society
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-03-20
Handle: RePEc:spr:jclass:v:36:y:2019:i:2:d:10.1007_s00357-018-9283-9