Automatic tagging with existing and novel tags
Junhui Wang,
Xiaotong Shen,
Yiwen Sun and
Annie Qu
Biometrika, 2017, vol. 104, issue 2, 273-290
Abstract:
SummaryAutomatic tagging by key words and phrases is important in multi-label classification of a document. In this paper, we first introduce a tagging loss to measure the discrepancy between predicted and actual tag sets, which is expressed in terms of a sum of weighted pairwise margins between two tags by their degree of similarity. We then construct a regularized empirical loss to incorporate linguistic knowledge, and identify a tagger maximizing the separations between the pairwise margins. One salient feature of the proposed method is its capability to identify novel tags absent from a training sample by using their similarity to existing tags. Computationally, the proposed method is implemented by an alternating direction method of multipliers, integrated with a difference convex algorithm. This permits scalable computation. We show that the method achieves accurate tagging, and that it compares favourably with existing methods. Finally, we apply the proposed method to tagging a Reuters news dataset.
Keywords: Alternating direction method of multipliers; Large margin; Multi-label classification; Scalability; Social bookmarking system; Text mining (search for similar items in EconPapers)
Date: 2017
References: Add references at CitEc
Citations:
Downloads: (external link)
http://hdl.handle.net/10.1093/biomet/asx016 (application/pdf)
Access to full text is restricted to subscribers.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:oup:biomet:v:104:y:2017:i:2:p:273-290.
Ordering information: This journal article can be ordered from
https://academic.oup.com/journals
Access Statistics for this article
Biometrika is currently edited by Paul Fearnhead
More articles in Biometrika from Biometrika Trust Oxford University Press, Great Clarendon Street, Oxford OX2 6DP, UK.
Bibliographic data for series maintained by Oxford University Press ().