EconPapers    
Economics at your fingertips  
 

Automatic extraction of document keyphrases for use in digital libraries: Evaluation and applications

Steve Jones and Gordon W. Paynter

Journal of the American Society for Information Science and Technology, 2002, vol. 53, issue 8, 653-677

Abstract: This article describes an evaluation of the Kea automatic keyphrase extraction algorithm. Document keyphrases are conventionally used as concise descriptors of document content, and are increasingly used in novel ways, including document clustering, searching and browsing interfaces, and retrieval engines. However, it is costly and time consuming to manually assign keyphrases to documents, motivating the development of tools that automatically perform this function. Previous studies have evaluated Kea's performance by measuring its ability to identify author keywords and keyphrases, but this methodology has a number of well‐known limitations. The results presented in this article are based on evaluations by human assessors of the quality and appropriateness of Kea keyphrases. The results indicate that, in general, Kea produces keyphrases that are rated positively by human assessors. However, typical Kea settings can degrade performance, particularly those relating to keyphrase length and domain specificity. We found that for some settings, Kea's performance is better than that of similar systems, and that Kea's ranking of extracted keyphrases is effective. We also determined that author‐specified keyphrases appear to exhibit an inherent ranking, and that they are rated highly and therefore suitable for use in training and evaluation of automatic keyphrasing systems.

Date: 2002
References: Add references at CitEc
Citations: View citations in EconPapers (3)

Downloads: (external link)
https://doi.org/10.1002/asi.10068

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:bla:jamist:v:53:y:2002:i:8:p:653-677

Ordering information: This journal article can be ordered from
https://doi.org/10.1002/(ISSN)1532-2890

Access Statistics for this article

More articles in Journal of the American Society for Information Science and Technology from Association for Information Science & Technology
Bibliographic data for series maintained by Wiley Content Delivery ().

 
Page updated 2025-03-19
Handle: RePEc:bla:jamist:v:53:y:2002:i:8:p:653-677