Railway Fault Text Clustering Method Using an Improved Dirichlet Multinomial Mixture Model
Ni Yang,
Youpeng Zhang and
Naeem Jan
Mathematical Problems in Engineering, 2022, vol. 2022, 1-12
Abstract:
Railway signal equipment fault data (RSEFD) are one of the issues with in-depth traffic big data analysis throughout the life cycle of intelligent transportation. In the course of daily operation and maintenance, the railway electrical maintenance department records equipment malfunction information in a natural language. The data have the characteristics of strong professionalism, short text, unbalanced category, and low efficiency of manual analysis and processing. How to effectively mine the information contained in these fault texts to provide help for on-site operation and maintenance plays an important role. Therefore, we propose a railway fault text clustering method using an improved Dirichlet multinomial mixture model called ICH-GSDMM. In this method, first, the railway signal terminology thesaurus is established to overcome the inaccurate problem of RSEFD segmentation. Second, the traditional Chi square statistics is improved to overcome the learning difficulties caused by the imbalance of RSEFD. Finally, the Gibbs sampling algorithm for Dirichlet multinomial mixture model (GSDMM) is modified using an improved chi-square statistical method (ICH) to overcome the symmetry problem of the word Dirichlet prior parameters in the traditional GSDMM. Compared to the traditional GSDMM model and the GSDMM model based on chi-square statistics (CH-GSDMM), the quantitative experimental results show that the GSDMM model based on improved chi-square statistics (ICH-GSDMM internal)’s evaluation index of clustering performance has greatly improved, and its external evaluation indices are also the best, with the exception of external index NMI of data set DS2. Simultaneously, the diagnostic accuracy of a select few categories in RSEFD has considerably improved, demonstrating its efficacy.
Date: 2022
References: Add references at CitEc
Citations:
Downloads: (external link)
http://downloads.hindawi.com/journals/mpe/2022/7882396.pdf (application/pdf)
http://downloads.hindawi.com/journals/mpe/2022/7882396.xml (application/xml)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:hin:jnlmpe:7882396
DOI: 10.1155/2022/7882396
Access Statistics for this article
More articles in Mathematical Problems in Engineering from Hindawi
Bibliographic data for series maintained by Mohamed Abdelhakeem ().