Integrating an incident dataset with a question and answering language model to assist hazard identification: Comparison of an extractive and generative model

Ricketts, Jon; Guo, Weisi; Pelham, Jonathan; Barry, David

Integrating an incident dataset with a question and answering language model to assist hazard identification: Comparison of an extractive and generative model

Jon Ricketts, Weisi Guo, Jonathan Pelham and David Barry

Journal of Risk and Reliability, 2025, vol. 239, issue 4, 736-753

Abstract: Robust hazard identification (HAZID) relies upon extensive knowledge of the system being analysed, the technical aspects, and how it will be used operationally. Typically, this knowledge is held by human participants who can draw out answers in natural language to hazard related questions based upon their own experience. However, several threats exist to this, such as high staff turnover, a poor learning from incidents capability or even insufficient Information Technology resources. Alternatively, incident databases hold vast amounts of hazard information that can be transformed into a source of knowledge. As mitigation to the aforementioned issues, this paper presents a Question and Answering (Q&A) Bidirectional Encoder Representations from Transformers (BERT) language model trained upon aviation incidents and a unique Q&A dataset. The model can extract answers to typical HAZID questions, based upon factual incident reports. Alongside this extractive approach, the paper also explores the use of a generative Large Language Model combined with an incident dataset. Both models proved a useful addition to HAZID activities based upon the Structured What If Technique (SWIFT), answering safety-themed questions based upon a retrieved context of incident reports that semantically matched the query. For the purposes of HAZID, it was suggested that the generative option is preferable based upon its ease of implementation, lower resource requirements and quality of responses. Additionally, it is shown that it is possible for organisations to train and create their own custom models for HAZID purposes. Future work may wish to consider the application of models that can hypothesize scenarios based upon incident reports, building further understanding to the relationships between causes, hazards and consequences.

Keywords: Natural language processing; hazard analysis; information retrieval; incident reporting; safety analysis (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:

Downloads: (external link)
https://journals.sagepub.com/doi/10.1177/1748006X241272831 (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:sae:risrel:v:239:y:2025:i:4:p:736-753

DOI: 10.1177/1748006X241272831

Access Statistics for this article

More articles in Journal of Risk and Reliability
Bibliographic data for series maintained by SAGE Publications ().