Comparing Weighted RMSD, Weighted MD, Infit, and Outfit Item Fit Statistics Under Uniform Differential Item Functioning

Robitzsch, Alexander

Comparing Weighted RMSD, Weighted MD, Infit, and Outfit Item Fit Statistics Under Uniform Differential Item Functioning

Alexander Robitzsch ()
Additional contact information
Alexander Robitzsch: IPN—Leibniz Institute for Science and Mathematics Education, Olshausenstraße 62, 24118 Kiel, Germany

Mathematics, 2025, vol. 13, issue 23, 1-27

Abstract: In educational large-scale assessment studies, uniform differential item functioning (DIF) across countries often challenges the application of a common item response model, such as the two-parameter logistic (2PL) model, to all participating countries. DIF occurs when certain items provide systematic advantages or disadvantages to specific groups, potentially biasing ability estimates and secondary analyses. Identifying misfitting items caused by DIF is therefore essential, and several item fit statistics have been proposed in the literature for this purpose. This article investigates the performance of four commonly used item fit statistics under uniform DIF: the weighted root mean square deviation (RMSD), the weighted mean deviation (MD), the infit, and the outfit statistics. Analytical approximations were derived to relate the uniform DIF effect size to these item fit statistics, and the theoretical findings were confirmed through a comprehensive simulation study. The results indicate that distribution-weighted RMSD and MD statistics are less sensitive to DIF in very easy or very difficult items, whereas difficulty-weighted RMSD and MD exhibit consistent detection performance across all item difficulty levels. However, the sampling variance of the difficulty-weighted statistics is notably higher for items with extreme difficulty. Infit and outfit statistics were largely ineffective in detecting DIF in items of moderate difficulty, with sensitivity limited to very easy or very difficult items. To illustrate the practical application of these statistics, they were computed for the PISA 2006 reading study, and the distribution of the statistics across participating countries was descriptively examined. The findings guide selecting appropriate item fit statistics in large-scale assessments and highlight the strengths and limitations of different approaches under uniform DIF conditions.

Keywords: item response model; 2PL model; differential item functioning; RMSD; MD; infit; outfit; item fit (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/13/23/3752/pdf (application/pdf)
https://www.mdpi.com/2227-7390/13/23/3752/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:13:y:2025:i:23:p:3752-:d:1800970

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().