EconPapers    
Economics at your fingertips  
 

A Multi-View Framework to Detect Redundant Activity Labels for More Representative Event Logs in Process Mining

Qifan Chen, Yang Lu, Charmaine S. Tam and Simon K. Poon
Additional contact information
Qifan Chen: School of Computer Science, The University of Sydney, Sydney, NSW 2006, Australia
Yang Lu: School of Computer Science, The University of Sydney, Sydney, NSW 2006, Australia
Charmaine S. Tam: Centre for Translational Data Science and Northern Clinical School, The University of Sydney, Sydney, NSW 2006, Australia
Simon K. Poon: School of Computer Science, The University of Sydney, Sydney, NSW 2006, Australia

Future Internet, 2022, vol. 14, issue 6, 1-23

Abstract: Process mining aims to gain knowledge of business processes via the discovery of process models from event logs generated by information systems. The insights revealed from process mining heavily rely on the quality of the event logs. Activities extracted from different data sources or the free-text nature within the same system may lead to inconsistent labels. Such inconsistency would then lead to redundancy in activity labels, which refer to labels that have different syntax but share the same behaviours. Redundant activity labels can introduce unnecessary complexities to the event logs. The identification of these labels from data-driven process discovery are difficult and rely heavily on human intervention. Neither existing process discovery algorithms nor event data preprocessing techniques can solve such redundancy efficiently. In this paper, we propose a multi-view approach to automatically detect redundant activity labels by using not only context-aware features such as control–flow relations and attribute values but also semantic features from the event logs. Our evaluation of several publicly available datasets and a real-life case study demonstrate that our approach can efficiently detect redundant activity labels even with low-occurrence frequencies. The proposed approach can add value to the preprocessing step to generate more representative event logs.

Keywords: process mining; activity label; process event log; data quality (search for similar items in EconPapers)
JEL-codes: O3 (search for similar items in EconPapers)
Date: 2022
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
https://www.mdpi.com/1999-5903/14/6/181/pdf (application/pdf)
https://www.mdpi.com/1999-5903/14/6/181/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jftint:v:14:y:2022:i:6:p:181-:d:835459

Access Statistics for this article

Future Internet is currently edited by Ms. Grace You

More articles in Future Internet from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().

 
Page updated 2025-03-19
Handle: RePEc:gam:jftint:v:14:y:2022:i:6:p:181-:d:835459