Variational inference for semiparametric Bayesian novelty detection in large datasets
Luca Benedetti,
Eric Boniardi,
Leonardo Chiani,
Jacopo Ghirri,
Marta Mastropietro,
Andrea Cappozzo and
Francesco Denti ()
Additional contact information
Luca Benedetti: Politecnico di Milano
Eric Boniardi: Politecnico di Milano
Leonardo Chiani: Politecnico di Milano
Jacopo Ghirri: Politecnico di Milano
Marta Mastropietro: Politecnico di Milano
Andrea Cappozzo: Politecnico di Milano
Francesco Denti: Università Cattolica del Sacro Cuore
Advances in Data Analysis and Classification, 2024, vol. 18, issue 3, No 7, 703 pages
Abstract:
Abstract After being trained on a fully-labeled training set, where the observations are grouped into a certain number of known classes, novelty detection methods aim to classify the instances of an unlabeled test set while allowing for the presence of previously unseen classes. These models are valuable in many areas, ranging from social network and food adulteration analyses to biology, where an evolving population may be present. In this paper, we focus on a two-stage Bayesian semiparametric novelty detector, also known as Brand, recently introduced in the literature. Leveraging on a model-based mixture representation, Brand allows clustering the test observations into known training terms or a single novelty term. Furthermore, the novelty term is modeled with a Dirichlet Process mixture model to flexibly capture any departure from the known patterns. Brand was originally estimated using MCMC schemes, which are prohibitively costly when applied to high-dimensional data. To scale up Brand applicability to large datasets, we propose to resort to a variational Bayes approach, providing an efficient algorithm for posterior approximation. We demonstrate a significant gain in efficiency and excellent classification performance with thorough simulation studies. Finally, to showcase its applicability, we perform a novelty detection analysis using the openly-available Statlog dataset, a large collection of satellite imaging spectra, to search for novel soil types.
Keywords: Novelty detection; Dirichlet process; Variational inference; Large datasets; Nested mixtures; Bayesian modeling; 62H30; 68T09 (search for similar items in EconPapers)
Date: 2024
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
http://link.springer.com/10.1007/s11634-023-00569-z Abstract (text/html)
Access to the full text of the articles in this series is restricted.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:advdac:v:18:y:2024:i:3:d:10.1007_s11634-023-00569-z
Ordering information: This journal article can be ordered from
http://www.springer. ... ds/journal/11634/PS2
DOI: 10.1007/s11634-023-00569-z
Access Statistics for this article
Advances in Data Analysis and Classification is currently edited by H.-H. Bock, W. Gaul, A. Okada, M. Vichi and C. Weihs
More articles in Advances in Data Analysis and Classification from Springer, German Classification Society - Gesellschaft für Klassifikation (GfKl), Japanese Classification Society (JCS), Classification and Data Analysis Group of the Italian Statistical Society (CLADAG), International Federation of Classification Societies (IFCS)
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().