EconPapers    
Economics at your fingertips  
 

Document clustering using the LSI subspace signature model

W.Z. Zhu and R.B. Allen

Journal of the American Society for Information Science and Technology, 2013, vol. 64, issue 4, 844-860

Abstract: We describe the latent semantic indexing subspace signature model (LSISSM) for semantic content representation of unstructured text. Grounded on singular value decomposition, the model represents terms and documents by the distribution signatures of their statistical contribution across the top‐ranking latent concept dimensions. LSISSM matches term signatures with document signatures according to their mapping coherence between latent semantic indexing (LSI) term subspace and LSI document subspace. LSISSM does feature reduction and finds a low‐rank approximation of scalable and sparse term‐document matrices. Experiments demonstrate that this approach significantly improves the performance of major clustering algorithms such as standard K‐means and self‐organizing maps compared with the vector space model and the traditional LSI model. The unique contribution ranking mechanism in LSISSM also improves the initialization of standard K‐means compared with random seeding procedure, which sometimes causes low efficiency and effectiveness of clustering. A two‐stage initialization strategy based on LSISSM significantly reduces the running time of standard K‐means procedures.

Date: 2013
References: Add references at CitEc
Citations:

Downloads: (external link)
https://doi.org/10.1002/asi.22623

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:bla:jamist:v:64:y:2013:i:4:p:844-860

Ordering information: This journal article can be ordered from
https://doi.org/10.1002/(ISSN)1532-2890

Access Statistics for this article

More articles in Journal of the American Society for Information Science and Technology from Association for Information Science & Technology
Bibliographic data for series maintained by Wiley Content Delivery ().

 
Page updated 2025-03-19
Handle: RePEc:bla:jamist:v:64:y:2013:i:4:p:844-860