Automatic ontology generation from patents using a pre-built library, WordNet and a class-based n-gram model
Zhen Li and
Derrick Tate
International Journal of Product Development, 2015, vol. 20, issue 2, 142-172
Abstract:
An ontology is defined as a structured, hierarchical way for describing domain knowledge. Research work regarding ontological engineering has yielded fruitful results, but these methods share a common drawback: they require significant manual work to generate an ontology, which limits the usefulness of these approaches in practice. In this paper, we propose a computational model that combines data mining, Natural Language Processing (NLP), WordNet and a novel class-based n-gram model for automatic ontology discovery and recognition from existing patent documents. A pre-built ontology library was constructed by gathering knowledge from engineering textbooks and dictionaries. Then a data set of engineering patent claims was split into training (80%) and validation (20%) subsets. The pre-built library and WordNet were used to generate class labels for constructing class-based n-gram models in a training process. The holdout validation showed that the average accuracy was 87.26% for all validation samples.
Keywords: ontological engineering; n-gram language models; natural language processing; NLP; ontology generation; patents; computational models; data mining; automatic ontology discovery; ontology recognition; ontology library. (search for similar items in EconPapers)
Date: 2015
References: Add references at CitEc
Citations:
Downloads: (external link)
http://www.inderscience.com/link.php?id=68965 (text/html)
Access to full text is restricted to subscribers.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:ids:ijpdev:v:20:y:2015:i:2:p:142-172
Access Statistics for this article
More articles in International Journal of Product Development from Inderscience Enterprises Ltd
Bibliographic data for series maintained by Sarah Parker ().