A novel text representation which enables image classifiers to also simultaneously classify text, applied to name disambiguation
Stephen M. Petrie () and
T’Mir D. Julius
Additional contact information
Stephen M. Petrie: Swinburne University of Technology
T’Mir D. Julius: Swinburne University of Technology
Scientometrics, 2024, vol. 129, issue 2, No 1, 719-743
Abstract:
Abstract We introduce a novel method for converting text data into abstract image representations, which allows image-based processing techniques (e.g. image classification networks) to be applied to text-based comparison problems. We apply the technique to entity disambiguation of inventor names in US patents, obtaining a list of IDs which identify individual inventors with high accuracy. The method involves converting text from each pairwise comparison between two inventor name records into a 2D RGB (stacked) image representation. We then train an image classification neural network to discriminate between such pairwise comparison images. The trained neural network then labels each pair of records as either matched (same inventor) or non-matched (different inventors), producing highly accurate results. Our new text-to-image representation method could also be used more broadly for other text comparison problems, such as entity disambiguation of academic publications, or for problems that require simultaneous classification of both text and image datasets.
Keywords: Entity disambiguation; Text classification; Convolutional neural networks; Simultaneous text and image processing (search for similar items in EconPapers)
Date: 2024
References: Add references at CitEc
Citations:
Downloads: (external link)
http://link.springer.com/10.1007/s11192-023-04712-7 Abstract (text/html)
Access to the full text of the articles in this series is restricted.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:scient:v:129:y:2024:i:2:d:10.1007_s11192-023-04712-7
Ordering information: This journal article can be ordered from
http://www.springer.com/economics/journal/11192
DOI: 10.1007/s11192-023-04712-7
Access Statistics for this article
Scientometrics is currently edited by Wolfgang Glänzel
More articles in Scientometrics from Springer, Akadémiai Kiadó
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().