EconPapers    
Economics at your fingertips  
 

Using hierarchical information-theoretic criteria to optimize subsampling of extensive datasets

Belmiro P.M. Duarte, Anthony C. Atkinson and Nuno M.C. Oliveira

LSE Research Online Documents on Economics from London School of Economics and Political Science, LSE Library

Abstract: This paper addresses the challenge of subsampling large datasets, aiming to generate a smaller dataset that retains a significant portion of the original information. To achieve this objective, we present a subsampling algorithm that integrates hierarchical data partitioning with a specialized tool tailored to identify the most informative observations within a dataset for a specified underlying linear model, not necessarily first-order, relating responses and inputs. The hierarchical data partitioning procedure systematically and incrementally aggregates information from smaller-sized samples into new samples. Simultaneously, our selection tool employs Semidefinite Programming for numerical optimization to maximize the information content of the chosen observations. We validate the effectiveness of our algorithm through extensive testing, using both benchmark and real-world datasets. The real-world dataset is related to the physicochemical characterization of white variants of Portuguese Vinho Verde. Our results are highly promising, demonstrating the algorithm's capability to efficiently identify and select the most informative observations while keeping computational requirements at a manageable level.

Keywords: hierarchical data partitioning; information-theoretic criteria; large datasets; semidefinite programming; subsampling (search for similar items in EconPapers)
JEL-codes: C1 (search for similar items in EconPapers)
Date: 2024-02-15
References: View references in EconPapers View complete reference list from CitEc
Citations:

Published in Chemometrics and Intelligent Laboratory Systems, 15, February, 2024, 245. ISSN: 0169-7439

Downloads: (external link)
http://eprints.lse.ac.uk/121641/ Open access version. (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:ehl:lserod:121641

Access Statistics for this paper

More papers in LSE Research Online Documents on Economics from London School of Economics and Political Science, LSE Library LSE Library Portugal Street London, WC2A 2HD, U.K.. Contact information at EDIRC.
Bibliographic data for series maintained by LSERO Manager ().

 
Page updated 2025-03-31
Handle: RePEc:ehl:lserod:121641