Generalized Data Thinning Using Sufficient Statistics
Ameer Dharamshi,
Anna Neufeld,
Keshav Motwani,
Lucy L. Gao,
Daniela Witten and
Jacob Bien
Journal of the American Statistical Association, 2025, vol. 120, issue 549, 511-523
Abstract:
Our goal is to develop a general strategy to decompose a random variable X into multiple independent random variables, without sacrificing any information about unknown parameters. A recent paper showed that for some well-known natural exponential families, X can be thinned into independent random variables X(1),…,X(K) , such that X=∑k=1KX(k) . These independent random variables can then be used for various model validation and inference tasks, including in contexts where traditional sample splitting fails. In this article, we generalize their procedure by relaxing this summation requirement and simply asking that some known function of the independent random variables exactly reconstruct X. This generalization of the procedure serves two purposes. First, it greatly expands the families of distributions for which thinning can be performed. Second, it unifies sample splitting and data thinning, which on the surface seem to be very different, as applications of the same principle. This shared principle is sufficiency. We use this insight to perform generalized thinning operations for a diverse set of families. Supplementary materials for this article are available online, including a standardized description of the materials available for reproducing the work.
Date: 2025
References: Add references at CitEc
Citations:
Downloads: (external link)
http://hdl.handle.net/10.1080/01621459.2024.2353948 (text/html)
Access to full text is restricted to subscribers.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:taf:jnlasa:v:120:y:2025:i:549:p:511-523
Ordering information: This journal article can be ordered from
http://www.tandfonline.com/pricing/journal/UASA20
DOI: 10.1080/01621459.2024.2353948
Access Statistics for this article
Journal of the American Statistical Association is currently edited by Xuming He, Jun Liu, Joseph Ibrahim and Alyson Wilson
More articles in Journal of the American Statistical Association from Taylor & Francis Journals
Bibliographic data for series maintained by Chris Longhurst ().