ICRSSD: Identification and Classification for Railway Structured Sensitive Data
Yage Jin,
Hongming Chen,
Rui Ma (),
Yanhua Wu and
Qingxin Li
Additional contact information
Yage Jin: School of Computer Science & Technology, Beijing Institute of Technology, Beijing 100081, China
Hongming Chen: School of Computer Science & Technology, Beijing Institute of Technology, Beijing 100081, China
Rui Ma: School of Computer Science & Technology, Beijing Institute of Technology, Beijing 100081, China
Yanhua Wu: The Center of National Railway Intelligent Transportation System Engineering and Technology, Beijing 100081, China
Qingxin Li: The Center of National Railway Intelligent Transportation System Engineering and Technology, Beijing 100081, China
Future Internet, 2025, vol. 17, issue 7, 1-24
Abstract:
The rapid growth of the railway industry has resulted in the accumulation of large structured data that makes data security a critical component of reliable railway system operations. However, existing methods for identifying and classifying often suffer from limitations such as overly coarse identification granularity and insufficient flexibility in classification. To address these issues, we propose ICRSSD, a two-stage method for identification and classification in terms of the railway domain. The identification stage focuses on obtaining the sensitivity of all attributes. We first divide structured data into canonical data and semi-canonical data at a finer granularity to improve the identification accuracy. For canonical data, we use information entropy to calculate the initial sensitivity. Subsequently, we update the attribute sensitivities through cluster analysis and association rule mining. For semi-canonical data, we calculate attribute sensitivity by using a combination of regular expressions and keyword lists. In the classification stage, to further enhance accuracy, we adopt a dynamic and multi-granularity classified strategy. It considers the relative sensitivity of attributes across different scenarios and classifies them into three levels based on the sensitivity values obtained during the identification stage. Additionally, we design a rule base specifically for the identification and classification of sensitive data in the railway domain. This rule base enables effective data identification and classification, while also supporting the expiry management of sensitive attribute labels. To improve the efficiency of regular expression generation, we developed an auxiliary tool with the help of large language models and a well-designed prompt framework. We conducted experiments on a real-world dataset from the railway domain. The results demonstrate that ICRSSD significantly improves the accuracy and adaptability of sensitive data identification and classification in the railway domain.
Keywords: structured data; clustering analysis; association rule mining; rule base (search for similar items in EconPapers)
JEL-codes: O3 (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/1999-5903/17/7/294/pdf (application/pdf)
https://www.mdpi.com/1999-5903/17/7/294/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jftint:v:17:y:2025:i:7:p:294-:d:1691596
Access Statistics for this article
Future Internet is currently edited by Ms. Grace You
More articles in Future Internet from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().