EconPapers    
Economics at your fingertips  
 

Eigenvalue‐based model selection during latent semantic indexing

Miles Efron

Journal of the American Society for Information Science and Technology, 2005, vol. 56, issue 9, 969-988

Abstract: In this study amended parallel analysis (APA), a novel method for model selection in unsupervised learning problems such as information retrieval (IR), is described. At issue is the selection of k, the number of dimensions retained under latent semantic indexing (LSI). Amended parallel analysis is an elaboration of Horn's parallel analysis, which advocates retaining eigenvalues larger than those that we would expect under term independence. Amended parallel analysis operates by deriving confidence intervals on these “null” eigenvalues. The technique amounts to a series of nonparametric hypothesis tests on the correlation matrix eigenvalues. In the study, APA is tested along with four established dimensionality estimators on six standard IR test collections. These estimates are evaluated with regard to two IR performance metrics. Additionally, results from simulated data are reported. In both rounds of experimentation APA performs well, predicting the best values of k on 3 of 12 observations, with good predictions on several others, and never offering the worst estimate of optimal dimensionality.

Date: 2005
References: Add references at CitEc
Citations:

Downloads: (external link)
https://doi.org/10.1002/asi.20188

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:bla:jamist:v:56:y:2005:i:9:p:969-988

Ordering information: This journal article can be ordered from
https://doi.org/10.1002/(ISSN)1532-2890

Access Statistics for this article

More articles in Journal of the American Society for Information Science and Technology from Association for Information Science & Technology
Bibliographic data for series maintained by Wiley Content Delivery ().

 
Page updated 2025-03-19
Handle: RePEc:bla:jamist:v:56:y:2005:i:9:p:969-988