Empirical study of constructing a knowledge organization system of patent documents using topic modeling
Zhengyin Hu (),
Shu Fang () and
Tian Liang ()
Additional contact information
Zhengyin Hu: Chengdu Document and Information Center, Chinese Academy of Sciences
Shu Fang: Chengdu Document and Information Center, Chinese Academy of Sciences
Tian Liang: Chengdu Document and Information Center, Chinese Academy of Sciences
Scientometrics, 2014, vol. 100, issue 3, No 12, 787-799
Abstract:
Abstract A knowledge organization system (KOS) can help easily indicate the deep knowledge structure of a patent document set. Compared to classification code systems, a personalized KOS made up of topics can represent the technology information in a more agile, detailed manner. This paper presents an approach to automatically construct a KOS of patent documents based on term clumping, Latent Dirichlet Allocation (LDA) model, K-Means clustering and Principal Components Analysis (PCA). Term clumping is adopted to generate a better bag-of-words for topic modeling and LDA model is applied to generate raw topics. Then by iteratively using K-Means clustering and PCA on the document set and topics matrix, we generated new upper topics and computed the relationships between topics to construct a KOS. Finally, documents are mapped to the KOS. The nodes of the KOS are topics which are represented by terms and their weights and the leaves are patent documents. We evaluated the approach with a set of Large Aperture Optical Elements (LAOE) patent documents as an empirical study and constructed the LAOE KOS. The method used discovered the deep semantic relationships between the topics and helped better describe the technology themes of LAOE. Based on the KOS, two types of applications were implemented: the automatic classification of patents documents and the categorical refinements above search results.
Keywords: Topic model; Term clumping; Knowledge organization system; Text clustering; Principal Component Analysis (search for similar items in EconPapers)
Date: 2014
References: View complete reference list from CitEc
Citations: View citations in EconPapers (8)
Downloads: (external link)
http://link.springer.com/10.1007/s11192-014-1328-1 Abstract (text/html)
Access to the full text of the articles in this series is restricted.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:scient:v:100:y:2014:i:3:d:10.1007_s11192-014-1328-1
Ordering information: This journal article can be ordered from
http://www.springer.com/economics/journal/11192
DOI: 10.1007/s11192-014-1328-1
Access Statistics for this article
Scientometrics is currently edited by Wolfgang Glänzel
More articles in Scientometrics from Springer, Akadémiai Kiadó
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().