Research on Patent Information Extraction Based on Deep Learning

Cui, Xiaolei; Qian, Lingfei

Research on Patent Information Extraction Based on Deep Learning

Xiaolei Cui () and Lingfei Qian
Additional contact information
Xiaolei Cui: College of Economics and Management, Nanjing University of Aeronautics and Astronautics
Lingfei Qian: College of Economics and Management, Nanjing University of Aeronautics and Astronautics

A chapter in AI and Analytics for Public Health, 2022, pp 291-302 from Springer

Abstract: Abstract In the context of the era of big data, enterprises are paying more and more attention to the information management of internal big data. Patent is one of the important technical documents within the enterprise. Transforming it into a structured form for storage can improve the accuracy and convenience of patent information retrieval. However, most companies do not establish their own domain knowledge base which leads them to face huge and messy data. To solve this problem, we propose a patent information extraction method which is based on sequence tagging and semantic matching for extracting entity relation triples and patent features. It can provide a basis for the construction of knowledge models. Firstly, we apply python to preprocess the patent text of the field of battery technology for new energy vehicles, including data cleaning, word segmentation and so on. Then this study introduces a character-based pre-trained model and incorporates it with a bi-directional long short-term memory (BiLSTM) and a conditional random field (CRF) to extract entity words, relation words, and feature words from 6829 annotated datasets. Since the triple formed by random combination of entity words and relation words contains noise data, we consider the triple as a short text for semantic matching with the patent text. In this process, we also use pre-trained model combine BiLSTM to extract semantic information and remove noise data. In addition, we have improved the performance of the model by changing the way of data tagging. The results show that adding a pre-trained model before the traditional model can capture more semantic information and significantly improve the model performance. It also proves that the method we proposed is effective and can realize the automatic extraction of patent information in the field of new energy vehicle battery technology.

Keywords: Pre-trained model; Patent; Information extraction; Semantic matching; Deep learning (search for similar items in EconPapers)
Date: 2022
References: Add references at CitEc
Citations:

There are no downloads for this item, see the EconPapers FAQ for hints about obtaining it.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:prbchp:978-3-030-75166-1_21

Ordering information: This item can be ordered from
http://www.springer.com/9783030751661

DOI: 10.1007/978-3-030-75166-1_21

Access Statistics for this chapter

More chapters in Springer Proceedings in Business and Economics from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().