A Study of Training Approaches of a Hybrid Summarisation Model Applied to Patent Dataset

Souza, Cinthia M.; Bastos, Daniel S.; Filho, Leonardo A. Souza; Meireles, Magali R. G.

A Study of Training Approaches of a Hybrid Summarisation Model Applied to Patent Dataset

Cinthia M. Souza, Daniel S. Bastos (), Leonardo A. Souza Filho () and Magali R. G. Meireles ()
Additional contact information
Cinthia M. Souza: Institute of Mathematical Sciences and Informatics - PUC Minas, R. Walter Ianni, 255, 31.980-110, Belo Horizonte, MG, Brazil
Daniel S. Bastos: Institute of Mathematical Sciences and Informatics - PUC Minas, R. Walter Ianni, 255, 31.980-110, Belo Horizonte, MG, Brazil
Leonardo A. Souza Filho: Institute of Mathematical Sciences and Informatics - PUC Minas, R. Walter Ianni, 255, 31.980-110, Belo Horizonte, MG, Brazil
Magali R. G. Meireles: Institute of Mathematical Sciences and Informatics - PUC Minas, R. Walter Ianni, 255, 31.980-110, Belo Horizonte, MG, Brazil

Journal of Information & Knowledge Management (JIKM), 2023, vol. 22, issue 05, 1-28

Abstract: Patents are recognised as an important source of scientific knowledge. The automatic summarisation process of patents can assist in the organisation, and, consequently, the access to the contents of patent databases. The main contribution of this work is to carry out a study of training approaches of a hybrid summarisation model to create concise, single sentence summaries for patent documents. The experiments were executed using a dataset containing more than 80,000 patents, made available by the United States Patent and Trademark Office. Comparative experiments between the selected model and seven state-of-the-art models in extractive, abstractive and hybrid text summarisation (HTS) were performed. The results obtained showed that the selected approach produces better results than extractive and HTS models, and yields good prospects in extremely concise summaries. It is concluded that the study of different training approaches, coupled with the analysis of the attention words weights in the final results, is an important step in this process, impacting directly the choice of the final summarisation model. Besides this, the results of the experiments suggest that the removal of stop words from the input text did not generate better results, although the attention words extracted with the model without stop words were, in general, better.

Keywords: Computational intelligence; knowledge representation; information systems; automatic text summarisation; patent datasets (search for similar items in EconPapers)
Date: 2023
References: Add references at CitEc
Citations:

Downloads: (external link)
http://www.worldscientific.com/doi/abs/10.1142/S0219649223500302
Access to full text is restricted to subscribers

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:wsi:jikmxx:v:22:y:2023:i:05:n:s0219649223500302

Ordering information: This journal article can be ordered from

DOI: 10.1142/S0219649223500302

Access Statistics for this article

Journal of Information & Knowledge Management (JIKM) is currently edited by Professor Suliman Hawamdeh

More articles in Journal of Information & Knowledge Management (JIKM) from World Scientific Publishing Co. Pte. Ltd.
Bibliographic data for series maintained by Tai Tone Lim ().