EconPapers    
Economics at your fingertips  
 

Dissimilar Batch Decompositions of Random Datasets

Ghurumuruhan Ganesan ()
Additional contact information
Ghurumuruhan Ganesan: IISER Bhopal

Sankhya A: The Indian Journal of Statistics, 2025, vol. 87, issue 1, No 2, 64 pages

Abstract: Abstract For better learning, large datasets are often split into small batches and fed sequentially to the predictive model. In this paper, we study such batch decompositions from a probabilistic perspective. We assume that data points (possibly corrupted) are drawn independently from a given space and define a concept of similarity between two data points. We then consider decompositions that restrict the amount of similarity within each batch and obtain high probability bounds for the minimum size. We demonstrate an inherent tradeoff between relaxing the similarity constraint and the overall size and also use martingale methods to obtain bounds for the maximum size of data subsets with a given similarity.

Keywords: Random datasets; Corrupted entries; Dissimilar batch decompositions; Martingale method; Primary 60K35; 60J10 (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:

Downloads: (external link)
http://link.springer.com/10.1007/s13171-024-00366-6 Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:sankha:v:87:y:2025:i:1:d:10.1007_s13171-024-00366-6

Ordering information: This journal article can be ordered from
http://www.springer.com/statistics/journal/13171

DOI: 10.1007/s13171-024-00366-6

Access Statistics for this article

Sankhya A: The Indian Journal of Statistics is currently edited by Dipak Dey

More articles in Sankhya A: The Indian Journal of Statistics from Springer, Indian Statistical Institute
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-04-02
Handle: RePEc:spr:sankha:v:87:y:2025:i:1:d:10.1007_s13171-024-00366-6