EconPapers    
Economics at your fingertips  
 

Fast and fully-automated histograms for large-scale data sets

Valentina Zelaya Mendizábal, Marc Boullé and Fabrice Rossi

Computational Statistics & Data Analysis, 2023, vol. 180, issue C

Abstract: G-Enum histograms are a new fast and fully automated method for irregular histogram construction. By framing histogram construction as a density estimation problem and its automation as a model selection task, these histograms leverage the Minimum Description Length principle (MDL) to derive two different model selection criteria. Several proven theoretical results about these criteria give insights about their asymptotic behaviour and are used to speed up their optimisation. These insights, combined to a greedy search heuristic, are used to construct histograms in linearithmic time rather than the polynomial time incurred by previous works. The capabilities of the proposed MDL density estimation method are illustrated with reference to other fully automated methods in the literature, both on synthetic and large real-world data sets.

Keywords: Density estimation; Histograms; Model selection; Minimum description length (search for similar items in EconPapers)
Date: 2023
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
http://www.sciencedirect.com/science/article/pii/S0167947322002481
Full text for ScienceDirect subscribers only.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:eee:csdana:v:180:y:2023:i:c:s0167947322002481

DOI: 10.1016/j.csda.2022.107668

Access Statistics for this article

Computational Statistics & Data Analysis is currently edited by S.P. Azen

More articles in Computational Statistics & Data Analysis from Elsevier
Bibliographic data for series maintained by Catherine Liu ().

 
Page updated 2025-03-19
Handle: RePEc:eee:csdana:v:180:y:2023:i:c:s0167947322002481