Economics at your fingertips  

Clustering and automatic labelling within time series of categorical observations—with an application to marine log messages

Emanuele Gramuglia, Geir Storvik and Morten Stakkeland

Journal of the Royal Statistical Society Series C, 2021, vol. 70, issue 3, 714-732

Abstract: System logs or log files containing textual messages with associated time stamps are generated by many technologies and systems. The clustering technique proposed in this paper provides a tool to discover and identify patterns or macrolevel events in this data. The motivating application is logs generated by frequency converters in the propulsion system on a ship, while the general setting is fault identification and classification in complex industrial systems. The paper introduces an offline approach for dividing a time series of log messages into a series of discrete segments of random lengths. These segments are clustered into a limited set of states. A state is assumed to correspond to a specific operation or condition of the system, and can be a fault mode or a normal operation. Each of the states can be associated with a specific, limited set of messages, where messages appear in a random or semi‐structured order within the segments. These structures are in general not defined a priori. We propose a Bayesian hierarchical model where the states are characterised both by the temporal frequency and the type of messages within each segment. An algorithm for inference based on reversible jump MCMC is proposed. The performance of the method is assessed by both simulations and operational data.

Date: 2021
References: Add references at CitEc
Citations: Track citations by RSS feed

Downloads: (external link)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link:

Ordering information: This journal article can be ordered from
http://ordering.onli ... 1111/(ISSN)1467-9876

Access Statistics for this article

Journal of the Royal Statistical Society Series C is currently edited by R. Chandler and P. W. F. Smith

More articles in Journal of the Royal Statistical Society Series C from Royal Statistical Society Contact information at EDIRC.
Bibliographic data for series maintained by Wiley Content Delivery ().

Page updated 2021-06-05
Handle: RePEc:bla:jorssc:v:70:y:2021:i:3:p:714-732