Leveraging LLMs to Improve Experimental Design: A Generative Stratification Approach

Gui, George; Kim, Seungwoo

Leveraging LLMs to Improve Experimental Design: A Generative Stratification Approach

George Gui and Seungwoo Kim

Abstract: Pre-experiment stratification, or blocking, is a well-established technique for designing more efficient experiments and increasing the precision of the experimental estimates. However, when researchers have access to many covariates at the experiment design stage, they often face challenges in effectively selecting or weighting covariates when creating their strata. This paper proposes a Generative Stratification procedure that leverages Large Language Models (LLMs) to synthesize high-dimensional covariate data to improve experimental design. We demonstrate the value of this approach by applying it to a set of experiments and find that our method would have reduced the variance of the treatment effect estimate by 10%-50% compared to simple randomization in our empirical applications. When combined with other standard stratification methods, it can be used to further improve the efficiency. Our results demonstrate that LLM-based simulation is a practical and easy-to-implement way to improve experimental design in covariate-rich settings.

Date: 2025-09
New Economics Papers: this item is included in nep-ain, nep-big, nep-cmp, nep-dcm and nep-exp
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
http://arxiv.org/pdf/2509.25709 Latest version (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:arx:papers:2509.25709

Access Statistics for this paper

More papers in Papers from arXiv.org
Bibliographic data for series maintained by arXiv administrators ().