Stochastic Neighbourhood Components Analysis
Graham Laidler (),
Lucy E. Morgan (),
Nicos G. Pavlidis () and
Barry L. Nelson ()
Additional contact information
Graham Laidler: STOR-i Centre for Doctoral Training, Lancaster University, Lancaster LA1 4YW, United Kingdom
Lucy E. Morgan: Department of Management Science, Lancaster University, Lancaster LA1 4YW, United Kingdom
Nicos G. Pavlidis: Department of Management Science, Lancaster University, Lancaster LA1 4YW, United Kingdom
Barry L. Nelson: Department of Industrial Engineering and Management Sciences, Northwestern University, Evanston, Illinois 60208
INFORMS Joural on Data Science, 2025, vol. 4, issue 3, 248-264
Abstract:
Distance metric learning is a fundamental task in data mining and is known to enhance the performance of various distance-based algorithms. In this paper, we consider stochastic training data in which repeated feature vectors can belong to different classes, a scenario in which existing methods of metric learning are known to struggle. This type of data is common in stochastic simulations, where multidimensional, recurrent system states are subject to inherent randomness. Classification models on such high-resolution simulation-generated data play a critical role in real-time decision making across diverse applications. This paper presents and implements a stochastic version of the popular neighbourhood components analysis. We demonstrate its behaviour on stochastic data using simulation models and reveal its advantages when used for nearest neighbour classification. Meanwhile, the assumptions of stochastic labelling and repeated feature vectors extend to data from various domains, suggesting that the method can attain broad impact. For example, beyond its applications to system control and decision making with digital twin simulation, it may enhance the analysis of data from sensor networks, recommender systems, and crowdsourced platforms, where stochasticity and recurring feature patterns are typical.
Keywords: distance metric learning; stochastic data; discrete-event simulation; simulation analytics; nearest neighbours (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:
Downloads: (external link)
http://dx.doi.org/10.1287/ijds.2023.0018 (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:inm:orijds:v:4:y:2025:i:3:p:248-264
Access Statistics for this article
More articles in INFORMS Joural on Data Science from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().