A semantic similarity approach to predicting Library of Congress subject headings for social tags
Kwan Yi
Journal of the American Society for Information Science and Technology, 2010, vol. 61, issue 8, 1658-1672
Abstract:
Social tagging or collaborative tagging has become a new trend in the organization, management, and discovery of digital information. The rapid growth of shared information mostly controlled by social tags poses a new challenge for social tag‐based information organization and retrieval. A plausible approach for this challenge is linking social tags to a controlled vocabulary. As an introductory step for this approach, this study investigates ways of predicting relevant subject headings for resources from social tags assigned to the resources. The prediction of subject headings was measured by five different similarity measures: tf–idf, cosine‐based similarity (CoS), Jaccard similarity (or Jaccard coefficient; JS), Mutual information (MI), and information radius (IRad). Their results were compared to those by professionals. The results show that a CoS measure based on top five social tags was most effective. Inclusions of more social tags only aggravate the performance. The performance of JS is comparable to the performance of CoS while tf–idf is comparable with up to 70% less than the best performance. MI and IRad have inferior performance compared to the other methods. This study demonstrates the application of the similarity measuring techniques to the prediction of correct Library of Congress subject headings.
Date: 2010
References: Add references at CitEc
Citations:
Downloads: (external link)
https://doi.org/10.1002/asi.21351
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:bla:jamist:v:61:y:2010:i:8:p:1658-1672
Ordering information: This journal article can be ordered from
https://doi.org/10.1002/(ISSN)1532-2890
Access Statistics for this article
More articles in Journal of the American Society for Information Science and Technology from Association for Information Science & Technology
Bibliographic data for series maintained by Wiley Content Delivery ().