Analysis of the effect of data properties in automated patent classification
Juan Carlos Gomez ()
Additional contact information
Juan Carlos Gomez: Universidad de Guanajuato
Scientometrics, 2019, vol. 121, issue 3, No 2, 1239-1268
Abstract:
Abstract Patent classification is a task performed in patent offices around the world by experts, where they assign category codes to a patent application based on its technical content. Nowadays, the number of applications is constantly growing and there is an economical interest on developing accurate and fast models to automate the classification task. In this paper, we present a methodology to systematically analyze the effect of three patent data properties and two classification details on the patent classification task: patent section to use for training/testing, document representation, patent codes to use for training, use of the hierarchy of categories, and the base classifier. For the analysis we create a diversity of models by combining different options for the properties. We evaluate the models in detail using standard patent datasets in two languages, English and German, considering three performance metrics, using statistical tests to validate the results and comparing them with other models in the literature. Our research findings indicate that it is important to follow a methodology to properly choose the options for the data properties to build a model according to our goal, considering classification accuracy and computational efficiency. Some combinations of options build models with good results but with high computational cost, whilst other build model that produce slightly worst results but at a fraction of the training time.
Keywords: Patent classification; Hierarchical classification; Multilabel classification; Document representation; Supervised learning; IPC (search for similar items in EconPapers)
Date: 2019
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (3)
Downloads: (external link)
http://link.springer.com/10.1007/s11192-019-03246-1 Abstract (text/html)
Access to the full text of the articles in this series is restricted.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:scient:v:121:y:2019:i:3:d:10.1007_s11192-019-03246-1
Ordering information: This journal article can be ordered from
http://www.springer.com/economics/journal/11192
DOI: 10.1007/s11192-019-03246-1
Access Statistics for this article
Scientometrics is currently edited by Wolfgang Glänzel
More articles in Scientometrics from Springer, Akadémiai Kiadó
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().