Toward effective automated weighted subject indexing: A comparison of different approaches in different environments
Kun Lu,
Jin Mao and
Gang Li
Journal of the Association for Information Science & Technology, 2018, vol. 69, issue 1, 121-133
Abstract:
Subject indexing plays an important role in supporting subject access to information resources. Current subject indexing systems do not make adequate distinctions on the importance of assigned subject descriptors. Assigning numeric weights to subject descriptors to distinguish their importance to the documents can strengthen the role of subject metadata. Automated methods are more cost†effective. This study compares different automated weighting methods in different environments. Two evaluation methods were used to assess the performance. Experiments on three datasets in the biomedical domain suggest the performance of different weighting methods depends on whether it is an abstract or full text environment. Mutual information with bag†of†words representation shows the best average performance in the full text environment, while cosine with bag†of†words representation is the best in an abstract environment. The cosine measure has relatively consistent and robust performance. A direct weighting method, IDF (Inverse Document Frequency), can produce quick and reasonable estimates of the weights. Bag†of†words representation generally outperforms the concept†based representation. Further improvement in performance can be obtained by using the learning†to†rank method to integrate different weighting methods. This study follows up Lu and Mao (Journal of the Association for Information Science and Technology, 66, 1776–1784, 2015), in which an automated weighted subject indexing method was proposed and validated. The findings from this study contribute to more effective weighted subject indexing.
Date: 2018
References: Add references at CitEc
Citations:
Downloads: (external link)
https://doi.org/10.1002/asi.23912
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:bla:jinfst:v:69:y:2018:i:1:p:121-133
Ordering information: This journal article can be ordered from
http://www.blackwell ... bs.asp?ref=2330-1635
Access Statistics for this article
More articles in Journal of the Association for Information Science & Technology from Association for Information Science & Technology
Bibliographic data for series maintained by Wiley Content Delivery ().