Evaluation of clustering algorithms for word sense disambiguation
Bartosz Broda and
Wojciech Mazur
International Journal of Data Analysis Techniques and Strategies, 2012, vol. 4, issue 3, 219-236
Abstract:
Word sense disambiguation in text is still a difficult problem as the best supervised methods require laborious and costly preparation of training data. This work focuses on evaluation of a few selected clustering algorithms in the task of word sense disambiguation. We used five datasets for two languages (English and Polish). Five clustering algorithms (k-means, k-medoids, hierarchical agglomerative clustering, hierarchical divisive clustering, graph-partitioning-based clustering) and two weighting schemes were tested. The best parameters of the algorithms were chosen using 5 × 2 cross validation. BCubed measure was employed for evaluation of clustering. We conclude that with these settings agglomerative hierarchical clustering achieves best results for all the datasets.
Keywords: clustering algorithms; word sense disambiguation; WSD; BCubed; senseval; bag of words; English; Polish. (search for similar items in EconPapers)
Date: 2012
References: Add references at CitEc
Citations: View citations in EconPapers (1)
Downloads: (external link)
http://www.inderscience.com/link.php?id=47817 (text/html)
Access to full text is restricted to subscribers.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:ids:injdan:v:4:y:2012:i:3:p:219-236
Access Statistics for this article
More articles in International Journal of Data Analysis Techniques and Strategies from Inderscience Enterprises Ltd
Bibliographic data for series maintained by Sarah Parker ().