EconPapers    
Economics at your fingertips  
 

Automatic semantic knowledge extraction from electronic forms

Haolin Wu, Tim French, Wei Liu and Melinda Hodkiewicz

Journal of Risk and Reliability, 2024, vol. 238, issue 5, 933-944

Abstract: Electronic tabular forms are an intuitive way for organisations to collect, present and store structured information for human readers. Forms use features such as fonts, colours and cell positioning to help readers navigate and find information. Millions of forms, typically in Portable Document Format (PDF), are generated by businesses as part of routine operations. Unlike human readers, machines are not able to directly ‘understand’ the implicit cues contained in the fonts, colours and use of boxes without explicit processing. In this paper, a supervised computer vision model is proposed to decompose the PDF form document into nested microtables. The cells within these microtables are then processed using a customisable rule bank for meaningful table content and semantic relationship extraction. The process is demonstrated on an industry dataset of 37 maintenance procedure documents containing 373 pages and 1016 unique microtables. A web application EMU (Extracting Machine Understandable Semantics from Forms) demonstrates how data captured in tables with different dimensions in procedural forms can be automatically extracted and stored in JavaScript Object Notation (JSON). Identifying and extracting nested tables is a critical fundamental step for future applications to support machine-automated search and extraction of data at scale for both maintenance and other procedural documentation.

Keywords: Knowledge graph; procedures; maintenance; semantic relationship (search for similar items in EconPapers)
Date: 2024
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://journals.sagepub.com/doi/10.1177/1748006X221098272 (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:sae:risrel:v:238:y:2024:i:5:p:933-944

DOI: 10.1177/1748006X221098272

Access Statistics for this article

More articles in Journal of Risk and Reliability
Bibliographic data for series maintained by SAGE Publications ().

 
Page updated 2025-03-19
Handle: RePEc:sae:risrel:v:238:y:2024:i:5:p:933-944