A comparative study of abstractive and extractive summarization techniques to label subgroups on patent dataset

Souza, Cinthia M.; Meireles, Magali R. G.; Almeida, Paulo E. M.

A comparative study of abstractive and extractive summarization techniques to label subgroups on patent dataset

Cinthia M. Souza, Magali R. G. Meireles () and Paulo E. M. Almeida
Additional contact information
Cinthia M. Souza: Pontifical Catholic University of Minas Gerais
Magali R. G. Meireles: Pontifical Catholic University of Minas Gerais
Paulo E. M. Almeida: Federal Center for Technological Education of Minas Gerais

Scientometrics, 2021, vol. 126, issue 1, No 6, 135-156

Abstract: Abstract Patents are an important source of information for measuring the technological advancement of a specific knowledge domain. To facilitate the search for information in patent datasets, classification systems separate documents into groups according to the area of knowledge, and designate names to define their content. The increase in the number of patented inventions leads to the need to subdivide these groups. Since these groups belong to a restricted knowledge domain, naming the generated subcategories can be extremely laborious. This work aims to compare the performance of abstractive and extractive summarization techniques in the task of generating sentences directly associated with the content of patents. The abstractive summarization model was composed by a Seq2Seq architecture and a LSTM network. The training was conducted with a dataset of patent titles and abstracts. The validation process was performed using the ROUGE set of metrics. The results obtained by the generated model were compared with the sentence resulting from an extractive summarization algorithm applied to the task of naming patent groups. The main idea was to help the specialist to name new patent groups created by the clustering systems. The naming experiments were performed on the dataset of abstracts of patent documents. Comparative experiments were conducted using four subgroups of the United States Patent and Trademark Office, which uses the Cooperative Patent Classification system.

Keywords: Computational intelligence; Knowledge representation; Information systems; Automatic text summarization; Patent datasets (search for similar items in EconPapers)
Date: 2021
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
http://link.springer.com/10.1007/s11192-020-03732-x Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:scient:v:126:y:2021:i:1:d:10.1007_s11192-020-03732-x

Ordering information: This journal article can be ordered from
http://www.springer.com/economics/journal/11192

DOI: 10.1007/s11192-020-03732-x

Access Statistics for this article

Scientometrics is currently edited by Wolfgang Glänzel

More articles in Scientometrics from Springer, Akadémiai Kiadó
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().