EconPapers    
Economics at your fingertips  
 

Generalized Data Thinning Using Sufficient Statistics

Ameer Dharamshi, Anna Neufeld, Keshav Motwani, Lucy L. Gao, Daniela Witten and Jacob Bien

Journal of the American Statistical Association, 2025, vol. 120, issue 549, 511-523

Abstract: Our goal is to develop a general strategy to decompose a random variable X into multiple independent random variables, without sacrificing any information about unknown parameters. A recent paper showed that for some well-known natural exponential families, X can be thinned into independent random variables X(1),…,X(K) , such that X=∑k=1KX(k) . These independent random variables can then be used for various model validation and inference tasks, including in contexts where traditional sample splitting fails. In this article, we generalize their procedure by relaxing this summation requirement and simply asking that some known function of the independent random variables exactly reconstruct X. This generalization of the procedure serves two purposes. First, it greatly expands the families of distributions for which thinning can be performed. Second, it unifies sample splitting and data thinning, which on the surface seem to be very different, as applications of the same principle. This shared principle is sufficiency. We use this insight to perform generalized thinning operations for a diverse set of families. Supplementary materials for this article are available online, including a standardized description of the materials available for reproducing the work.

Date: 2025
References: Add references at CitEc
Citations:

Downloads: (external link)
http://hdl.handle.net/10.1080/01621459.2024.2353948 (text/html)
Access to full text is restricted to subscribers.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:taf:jnlasa:v:120:y:2025:i:549:p:511-523

Ordering information: This journal article can be ordered from
http://www.tandfonline.com/pricing/journal/UASA20

DOI: 10.1080/01621459.2024.2353948

Access Statistics for this article

Journal of the American Statistical Association is currently edited by Xuming He, Jun Liu, Joseph Ibrahim and Alyson Wilson

More articles in Journal of the American Statistical Association from Taylor & Francis Journals
Bibliographic data for series maintained by Chris Longhurst ().

 
Page updated 2025-05-02
Handle: RePEc:taf:jnlasa:v:120:y:2025:i:549:p:511-523