EconPapers    
Economics at your fingertips  
 

PatentSBERTa: A deep NLP based hybrid model for patent distance and classification using augmented SBERT

Hamid Bekamiri, Daniel S. Hain and Roman Jurowetzki

Technological Forecasting and Social Change, 2024, vol. 206, issue C

Abstract: This study presents an efficient approach for utilizing text data to calculate patent-to-patent (p2p) technological similarity and proposes a hybrid framework for leveraging the resulting p2p similarity in applications such as semantic search and automated patent classification. To achieve this, we create embeddings using Sentence-BERT (SBERT) on patent claims. For domain adaptation of the general SBERT model, we implement an augmented approach to fine-tune SBERT using in-domain supervised patent claims data. The study utilizes SBERT's efficiency in creating embedding distance measures to map p2p similarity in large sets of patent data. We demonstrate applications of the framework for the use case of automated patent classification with a simple K Nearest Neighbors (KNN) model that predicts assigned Cooperative Patent Classification (CPC) based on the class assignment of the K patents with the highest p2p similarity. The results show that p2p similarity captures technological features in terms of CPC overlap, and the approach is useful for automatic patent classification based on text data. Moreover, the presented classification framework is simple, and the results are easy to interpret and evaluate by end-users via instance-based explanations. The study performs an out-of-sample model validation, predicting all assigned CPC classes on the subclass (663) level with an F1 score of 66 %, outperforming the current state-of-the-art in text-based multi-label patent classification. The study also discusses the applicability of the presented framework for semantic intellectual property (IP) search, patent landscaping, and technology mapping. Finally, the study outlines a future research agenda to leverage multi-source patent embeddings, evaluate their appropriateness across applications, and improve and validate patent embeddings by creating domain-expert curated Semantic Textual Similarity (STS) benchmark datasets.

Keywords: Technological distance; Patent classification; Deep NLP; Augmented SBERT; Hybrid model; Model explainability (search for similar items in EconPapers)
Date: 2024
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
http://www.sciencedirect.com/science/article/pii/S0040162524003329
Full text for ScienceDirect subscribers only

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:eee:tefoso:v:206:y:2024:i:c:s0040162524003329

DOI: 10.1016/j.techfore.2024.123536

Access Statistics for this article

Technological Forecasting and Social Change is currently edited by Fred Phillips

More articles in Technological Forecasting and Social Change from Elsevier
Bibliographic data for series maintained by Catherine Liu ().

 
Page updated 2025-03-19
Handle: RePEc:eee:tefoso:v:206:y:2024:i:c:s0040162524003329