Accounting for Statistical Artifacts in Item Bias Research

Shepard, Lorrie; Camilli, Gregory; Williams, David M.

Accounting for Statistical Artifacts in Item Bias Research

Lorrie Shepard, Gregory Camilli and David M. Williams

Journal of Educational and Behavioral Statistics, 1984, vol. 9, issue 2, 93-128

Abstract: Theoretically preferred IRT bias detection procedures were applied to both a mathematics achievement and vocabulary test. The data were from black and white seniors on the High School and Beyond data files. To account for statistical artifacts, each analysis was repeated on randomly equivalent samples of blacks and whites ( n â€™s = 1,500). Furthermore, to establish a baseline for judging bias indices that might be attributable only to sampling fluctuations, bias analyses were conducted comparing randomly selected groups of whites. To assess the effect of mean group differences on the appearance of bias, pseudo-ethnic groups were created, that is, samples of whites were selected to simulate the average black-white difference. The validity and sensitivity of the IRT bias indices was supported by several findings. A relatively large number of items (10 of 29) on the math test were found to be consistently biased; they were replicated in parallel analyses. The bias indices were substantially smaller in white-white analyses. Furthermore, the indices (with the possible exception of Ï‡ 2 ) did not find bias in the pseudo-ethnic comparison. The pattern of between-study correlations showed high consistency for parallel ethnic analyses where bias was plausibly present. Also, the indices met the discriminant validity testâ€”the correlations were low between conditions where bias should not be present. For the math test, where a substantial number of items appeared biased, the results were interpretable. Verbal math problems were systematically more difficult for blacks. Overall, the sums-of-squares statistics (weighted by the inverse of the variance errors) were judged to be the best indices for quantifying ICC differences between groups. Not only were these statistics the most consistent in detecting bias in the ethnic comparisons, but they also intercorrelated the least in situations of no bias.

Keywords: Item bias; test bias; IRT (latent trait) applications (search for similar items in EconPapers)
Date: 1984
References: Add references at CitEc
Citations:

Downloads: (external link)
https://journals.sagepub.com/doi/10.3102/10769986009002093 (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:sae:jedbes:v:9:y:1984:i:2:p:93-128

DOI: 10.3102/10769986009002093

Access Statistics for this article

More articles in Journal of Educational and Behavioral Statistics
Bibliographic data for series maintained by SAGE Publications ().