“FabNER”: information extraction from manufacturing process science domain literature using named entity recognition
Aman Kumar and
Binil Starly ()
Additional contact information
Aman Kumar: North Carolina State University
Binil Starly: North Carolina State University
Journal of Intelligent Manufacturing, 2022, vol. 33, issue 8, No 13, 2393-2407
Abstract:
Abstract The number of published manufacturing science digital articles available from scientific journals and the broader web have exponentially increased every year since the 1990s. To assimilate all of this knowledge by a novice engineer or an experienced researcher, requires significant synthesis of the existing knowledge space contained within published material, to find answers to basic and complex queries. Algorithmic approaches through machine learning and specifically Natural Language Processing (NLP) on a domain specific area such as manufacturing, is lacking. One of the significant challenges to analyzing manufacturing vocabulary is the lack of a named entity recognition model that enables algorithms to classify the manufacturing corpus of words under various manufacturing semantic categories. This work presents a supervised machine learning approach to categorize unstructured text from 500K+ manufacturing science related scientific abstracts and labelling them under various manufacturing topic categories. A neural network model using a bidirectional long-short term memory, plus a conditional random field (BiLSTM + CRF) is trained to extract information from manufacturing science abstracts. Our classifier achieves an overall accuracy (f1-score) of 88%, which is quite near to the state-of-the-art performance. Two use case examples are presented that demonstrate the value of the developed NER model as a Technical Language Processing (TLP) workflow on manufacturing science documents. The long term goal is to extract valuable knowledge regarding the connections and relationships between key manufacturing concepts/entities available within millions of manufacturing documents into a structured labeled-property graph data structure that allow for programmatic query and retrieval.
Keywords: NER; Technical language processing; TLP; Word2Vec; Topic modeling (search for similar items in EconPapers)
Date: 2022
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
http://link.springer.com/10.1007/s10845-021-01807-x Abstract (text/html)
Access to the full text of the articles in this series is restricted.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:joinma:v:33:y:2022:i:8:d:10.1007_s10845-021-01807-x
Ordering information: This journal article can be ordered from
http://www.springer.com/journal/10845
DOI: 10.1007/s10845-021-01807-x
Access Statistics for this article
Journal of Intelligent Manufacturing is currently edited by Andrew Kusiak
More articles in Journal of Intelligent Manufacturing from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().