Computing minimum-volume enclosing ellipsoids for large datasets
Samuel Rosa and
Radoslav Harman
Computational Statistics & Data Analysis, 2022, vol. 171, issue C
Abstract:
An algorithm is provided for calculating the minimum-volume enclosing ellipsoid (MVEE) for a large dataset stored in a separate database, for which the existing algorithms run out of memory or become prohibitively slow. The focus is on tall datasets, i.e., those consisting of huge numbers of data points of moderate dimensionality. The proposed Big Index Batching algorithm works in an optimization-deletion-adaptation cycle that consists of: using an existing algorithm by applying it on a smaller batch of data; pruning the vector of the indices of the data points by removing the points that are guaranteed to not lie on the boundary of the MVEE; and efficiently adapting the choice of the batch. The algorithm is provably convergent, and simple to describe and implement. The reading of tall data from the database is very time consuming, therefore the amount of reading during an MVEE computation should be as small as possible. It is shown on examples that Big Index Batching tends to find the MVEE after reading all data points just two or three times. As a consequence, the proposed algorithm usually converges to the MVEE reasonably fast. Its usefulness in robust statistics and anomaly detection is demonstrated by finding the potential outliers in a large dataset by using so-called ellipsoidal trimming.
Keywords: Minimum-volume enclosing ellipsoid; D-optimal design; Algorithm; Tall data (search for similar items in EconPapers)
Date: 2022
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
http://www.sciencedirect.com/science/article/pii/S0167947322000329
Full text for ScienceDirect subscribers only.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:eee:csdana:v:171:y:2022:i:c:s0167947322000329
DOI: 10.1016/j.csda.2022.107452
Access Statistics for this article
Computational Statistics & Data Analysis is currently edited by S.P. Azen
More articles in Computational Statistics & Data Analysis from Elsevier
Bibliographic data for series maintained by Catherine Liu ().