A Monte Carlo simulation study of sample size requirements for the Graded Response Model
Tatsuya Ikeda
PLOS ONE, 2026, vol. 21, issue 4, 1-15
Abstract:
Background: The graded response model (GRM) is commonly used in psychometrics to analyze ordinal response data. Despite its growing application in scale development and validation, sample size recommendations—such as those provided by the COSMIN guidelines (e.g., n ≥ 1000)—are often based on expert consensus rather than empirical validation. Furthermore, the extent to which the number of items (J) and the number of response categories (K) contribute to parameter estimation accuracy remains insufficiently explored. Methods: We conducted a Monte Carlo simulation to examine how three design conditions—sample size (n = 500–1500), number of items (J = 5–50), and a number of response categories (K = 4–7)—influence the estimation accuracy of the latent trait parameter (θ) and the item discrimination parameter (a) under the GRM. For each condition, we generated a large population dataset based on predefined distributions for θ, a, and b, and then randomly drew samples (n) for estimation. The GRM was fitted using the EM algorithm. Estimation accuracy was evaluated using root mean squared error (RMSE), FPC-corrected RMSE, and Pearson’s correlation coefficient between true and estimated θ values. Results: The RMSE of the discrimination parameter a decreased with increasing sample size (n) and number of items (J), while the effect of K was negligible. In contrast, the RMSE of θ was primarily influenced by J, with only minor effects from n and K. Notably, the Pearson correlation between true and estimated θ values consistently exceeded r = .98 across all conditions, suggesting high ordinal fidelity even with small samples. Increasing J beyond approximately 30 yielded diminishing returns in RMSE reduction. Conclusions: Our findings suggest that sample size recommendations for GRM should be flexibly tailored to the measurement goal. For accurate estimation of θ, a sufficiently large number of items (e.g., J ≥ 30) can compensate for smaller sample sizes (n ≈ 500), whereas precise estimation of a requires larger samples (n ≥ 1000). The impact of increasing K was limited, indicating that additional response categories may not always enhance parameter recovery. These results provide empirically grounded guidance to support efficient and purpose-specific measurement designs in GRM applications.
Date: 2026
References: Add references at CitEc
Citations:
Downloads: (external link)
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0347684 (text/html)
https://journals.plos.org/plosone/article/file?id= ... 47684&type=printable (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:plo:pone00:0347684
DOI: 10.1371/journal.pone.0347684
Access Statistics for this article
More articles in PLOS ONE from Public Library of Science
Bibliographic data for series maintained by plosone ().