High-Stakes Testing Case Study: A Latent Variable Approach for Assessing Measurement and Prediction Invariance
Steven Andrew Culpepper (),
Herman Aguinis (),
Justin L. Kern () and
Roger Millsap ()
Additional contact information
Steven Andrew Culpepper: University of Illinois at Urbana–Champaign
Herman Aguinis: George Washington University
Justin L. Kern: University of Illinois at Urbana-Champaign
Roger Millsap: Arizona State University
Psychometrika, 2019, vol. 84, issue 1, No 14, 285-309
Abstract:
Abstract The existence of differences in prediction systems involving test scores across demographic groups continues to be a thorny and unresolved scientific, professional, and societal concern. Our case study uses a two-stage least squares (2SLS) estimator to jointly assess measurement invariance and prediction invariance in high-stakes testing. So, we examined differences across groups based on latent as opposed to observed scores with data for 176 colleges and universities from The College Board. Results showed that evidence regarding measurement invariance was rejected for the SAT mathematics (SAT-M) subtest at the 0.01 level for 74.5% and 29.9% of cohorts for Black versus White and Hispanic versus White comparisons, respectively. Also, on average, Black students with the same standing on a common factor had observed SAT-M scores that were nearly a third of a standard deviation lower than for comparable Whites. We also found evidence that group differences in SAT-M measurement intercepts may partly explain the well-known finding of observed differences in prediction intercepts. Additionally, results provided evidence that nearly a quarter of the statistically significant observed intercept differences were not statistically significant at the 0.05 level once predictor measurement error was accounted for using the 2SLS procedure. Our joint measurement and prediction invariance approach based on latent scores opens the door to a new high-stakes testing research agenda whose goal is to not simply assess whether observed group-based differences exist and the size and direction of such differences. Rather, the goal of this research agenda is to assess the causal chain starting with underlying theoretical mechanisms (e.g., contextual factors, differences in latent predictor scores) that affect the size and direction of any observed differences.
Keywords: measurement invariance; prediction invariance; instrumental variables; high-stakes testing (search for similar items in EconPapers)
Date: 2019
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)
Downloads: (external link)
http://link.springer.com/10.1007/s11336-018-9649-2 Abstract (text/html)
Access to the full text of the articles in this series is restricted.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:psycho:v:84:y:2019:i:1:d:10.1007_s11336-018-9649-2
Ordering information: This journal article can be ordered from
http://www.springer. ... gy/journal/11336/PS2
DOI: 10.1007/s11336-018-9649-2
Access Statistics for this article
Psychometrika is currently edited by Irini Moustaki
More articles in Psychometrika from Springer, The Psychometric Society
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().