Aguinis, H., Culpepper, S. A., & Pierce, C. A. (2010). Revival of test bias research in preemployment testing.
Journal of Applied Psychology,
95(4), 648–680.
https://doi.org/10.1037/a0018714
American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (2014). Standards for educational and psychological testing. American Educational Research Association.
Barrash, J., Stillman, A., Anderson, S. W., Uc, E. Y., Dawson, J. D., & Rizzo, M. (2010). Prediction of driving ability with neuropsychological tests: Demographic adjustments diminish accuracy.
Journal of the International Neuropsychological Society,
16(4), 679–686.
https://doi.org/10.1017/S1355617710000470
Bauer, D. J., Belzak, W. C. M., & Cole, V. T. (2020). Simplifying the assessment of measurement invariance over multiple background variables: Using regularized moderated nonlinear factor analysis to detect differential item functioning.
Structural Equation Modeling: A Multidisciplinary Journal,
27(1), 43–55.
https://doi.org/10.1080/10705511.2019.1642754
Belzak, W. C. M., & Bauer, D. J. (2020). Improving the assessment of measurement invariance: Using regularization to select anchor items and identify differential item functioning.
Psychological Methods,
25(6), 673–690.
https://doi.org/10.1037/met0000253
Brown, R. T., Reynolds, C. R., & Whitaker, J. S. (1999). Bias in mental testing since bias in mental testing.
School Psychology Quarterly,
14(3), 208–238.
https://doi.org/10.1037/h0089007
Burlew, A. K., Peteet, B. J., McCuistian, C., & Miller-Roenigk, B. D. (2019). Best practices for researching diverse groups.
American Journal of Orthopsychiatry,
89(3), 354–368.
https://doi.org/10.1037/ort0000350
Camilli, G. (2013). Ongoing issues in test fairness.
Educational Research and Evaluation,
19(2–3), 104–120.
https://doi.org/10.1080/13803611.2013.767602
Chalmers, P. (2020).
mirt: Multidimensional item response theory.
https://CRAN.R-project.org/package=mirt
Chen, F. F. (2007). Sensitivity of goodness of fit indexes to lack of measurement invariance.
Structural Equation Modeling: A Multidisciplinary Journal,
14(3), 464–504.
https://doi.org/10.1080/10705510701301834
Cheng, Y., Shao, C., & Lathrop, Q. N. (2016). The mediated MIMIC model for understanding the underlying mechanism of DIF.
Educational and Psychological Measurement,
76(1), 43–63.
https://doi.org/10.1177/0013164415576187
Cheung, G. W., & Rensvold, R. B. (2002). Evaluating goodness-of-fit indexes for testing measurement invariance.
Structural Equation Modeling: A Multidisciplinary Journal,
9(2), 233–255.
https://doi.org/10.1207/s15328007sem0902_5
Clark, M. J., & Grandy, J. (1984). Sex differences in the academic performance of Scholastic Aptitude Test takers: College board report no. 84-8. College Board Publications.
Cole, N. S. (1981). Bias in testing.
American Psychologist,
36(10), 1067–1077.
https://doi.org/10.1037/0003-066X.36.10.1067
Cole, V., Gottfredson, N., & Giordano, M. (2018).
aMNLFA: Automated fitting of moderated nonlinear factor analysis through the Mplus program.
https://CRAN.R-project.org/package=aMNLFA
Committee on the General Aptitude Test Battery, Commission on Behavioral and Social Sciences and Education, & National Research Council. (1989). Fairness in employment testing: Validity generalization, minority issues, and the general aptitude test battery. National Academies Press.
Counsell, A., Cribbie, R. A., & Flora, D. B. (2020). Evaluating equivalence testing methods for measurement invariance.
Multivariate Behavioral Research,
55(2), 312–328.
https://doi.org/10.1080/00273171.2019.1633617
Curran, P. J., Howard, A. L., Bainter, S. A., Lane, S. T., & McGinley, J. S. (2014). The separation of between-person and within-person components of individual change over time: A latent curve model with structured residuals.
Journal of Consulting and Clinical Psychology,
82, 8–94.
https://doi.org/10.1037/a0035297
Dorans, N. J. (2017). Contributions to the quantitative assessment of item, test, and score fairness. In R. E. Bennett & M. von Davier (Eds.), Advancing human assessment (pp. 201–230). Springer, Cham.
Dueber, D. (2019).
dmacs: Measurement nonequivalence effect size calculator.
https://github.com/ddueber/dmacs
Epskamp, S. (2022).
semPlot: Path diagrams and visual analysis of various SEM packages’ output.
https://github.com/SachaEpskamp/semPlot
Fernández, A. L., & Abe, J. (2018). Bias in cross-cultural neuropsychological testing: Problems and possible solutions.
Culture and Brain,
6(1), 1–35.
https://doi.org/10.1007/s40167-017-0050-2
Fletcher, R. R., Nakeshimana, A., & Olubeko, O. (2021). Addressing fairness, bias, and appropriate use of artificial intelligence and machine learning in global health.
Frontiers in Artificial Intelligence,
3(116).
https://doi.org/10.3389/frai.2020.561802
Gipps, C., & Stobart, G. (2009). Fairness in assessment. In C. Wyatt-Smith & J. J. Cumming (Eds.),
Educational assessment in the 21st century: Connecting theory and practice (pp. 105–118). Springer Netherlands.
https://doi.org/10.1007/978-1-4020-9964-9_6
Gonzalez, O., & Pelham, W. E. (2021). When does differential item functioning matter for screening? A method for empirical evaluation.
Assessment,
28(2), 446–456.
https://doi.org/10.1177/1073191120913618
Gottfredson, L. S. (1994). The science and politics of race-norming.
American Psychologist,
49(11), 955–963.
https://doi.org/10.1037/0003-066X.49.11.955
Gottfredson, N. C., Cole, V. T., Giordano, M. L., Bauer, D. J., Hussong, A. M., & Ennett, S. T. (2019). Simplifying the implementation of modern scale scoring methods with an automated
R package: Automated moderated nonlinear factor analysis (
aMNLFA).
Addictive Behaviors,
94, 65–73.
https://doi.org/10.1016/j.addbeh.2018.10.031
Gunn, H. J., Grimm, K. J., & Edwards, M. C. (2020). Evaluation of six effect size measures of measurement non-invariance for continuous outcomes.
Structural Equation Modeling: A Multidisciplinary Journal,
27(4), 503–514.
https://doi.org/10.1080/10705511.2019.1689507
Hagquist, C. (2019). Explaining differential item functioning focusing on the crucial role of external information – an example from the measurement of adolescent mental health.
BMC Medical Research Methodology,
19(1), 185.
https://doi.org/10.1186/s12874-019-0828-3
Hagquist, C., & Andrich, D. (2017). Recent advances in analysis of differential item functioning in health research using the
Rasch model.
Health and Quality of Life Outcomes,
15(1), 181.
https://doi.org/10.1186/s12955-017-0755-0
Hall, G. C. N., Bansal, A., & Lopez, I. R. (1999). Ethnicity and psychopathology: A meta-analytic review of 31 years of comparative MMPI/MMPI-2 research.
Psychological Assessment,
11(2), 186–197.
https://doi.org/10.1037/1040-3590.11.2.186
Han, K., Colarelli, S. M., & Weed, N. C. (2019). Methodological and statistical advances in the consideration of cultural diversity in assessment: A critical review of group classification and measurement invariance testing.
Psychological Assessment,
31(12), 1481–1496.
https://doi.org/10.1037/pas0000731
Helms, J. E. (2006). Fairness is not validity or cultural bias in racial-group assessment: A quantitative perspective.
American Psychologist,
61(8), 845–859.
https://doi.org/10.1037/0003-066X.61.8.845
Jensen, A. R. (1980). Précis of bias in mental testing.
Behavioral and Brain Sciences,
3(3), 325–333.
https://doi.org/10.1017/S0140525X00005161
Jonson, J. L., & Geisinger, K. F. (2022). Fairness in educational and psychological testing: Examining theoretical, research, practice, and policy implications of the 2014 standards. American Educational Research Association,.
Jorgensen, T. D., Kite, B. A., Chen, P.-Y., & Short, S. D. (2018). Permutation randomization methods for testing measurement equivalence and detecting differential item functioning in multiple-group confirmatory factor analysis.
Psychological Methods,
23(4), 708–728.
https://doi.org/10.1037/met0000152
Kuncel, N. R., & Hezlett, S. A. (2010). Fact and fiction in cognitive ability testing for admissions and hiring decisions.
Current Directions in Psychological Science,
19(6), 339–345.
https://doi.org/10.1177/0963721410389459
Lai, M. H. C. (2021). Adjusting for measurement noninvariance with alignment in growth modeling.
Multivariate Behavioral Research, 1–18.
https://doi.org/10.1080/00273171.2021.1941730
Lee, K., Bull, R., & Ho, R. M. H. (2013). Developmental changes in executive functioning.
Child Development,
84(6), 1933–1953.
https://doi.org/10.1111/cdev.12096
Little, T. D. (2013). Longitudinal structural equation modeling. The Guilford Press.
Little, T. D., Preacher, K. J., Selig, J. P., & Card, N. A. (2007). New developments in latent variable panel analyses of longitudinal data.
International Journal of Behavioral Development,
31(4), 357–365.
https://doi.org/10.1177/0165025407077757
Little, T. D., Slegers, D. W., & Card, N. A. (2006). A non-arbitrary method of identifying and scaling latent variables in SEM and MACS models.
Structural Equation Modeling,
13(1), 59–72.
https://doi.org/10.1207/s15328007sem1301_3
Liu, Y., Millsap, R. E., West, S. G., Tein, J.-Y., Tanaka, R., & Grimm, K. J. (2017). Testing measurement invariance in longitudinal data with ordered-categorical measures.
Psychological Methods,
22(3), 486–506.
https://doi.org/10.1037/met0000075
Manly, J. J. (2005). Advantages and disadvantages of separate norms for
African
Americans.
The Clinical Neuropsychologist,
19(2), 270–275.
https://doi.org/10.1080/13854040590945346
Manly, J. J., & Echemendia, R. J. (2007). Race-specific norms: Using the model of hypertension to understand issues of race, culture, and education in neuropsychology.
Archives of Clinical Neuropsychology,
22(3), 319–325.
https://doi.org/10.1016/j.acn.2007.01.006
McClelland, D. C. (1973). Testing for competence rather than for
“intelligence.” American Psychologist,
28, 1–14.
https://doi.org/10.1037/h0034092
Meade, A. W. (2010). A taxonomy of effect size measures for the differential functioning of items and scales.
Journal of Applied Psychology,
95(4), 728–743.
https://doi.org/10.1037/a0018966
Melikyan, Z. A., Agranovich, A. V., & Puente, A. E. (2019). Fairness in psychological testing. In G. Goldstein, D. N. Allen, & J. DeLuca (Eds.),
Handbook of psychological assessment (fourth edition) (pp. 551–572). Academic Press.
https://doi.org/10.1016/B978-0-12-802203-0.00018-3
Millsap, R. E. (2011). Statistical approaches to measurement invariance. Taylor & Francis.
Muthén, L. K., & Muthén, B. O. (2019). Mplus version 8.4. Muthén & Muthén.
Newsom, J. T. (2015). Longitudinal structural equation modeling: A comprehensive introduction. Routledge.
Nye, C. D., Bradburn, J., Olenick, J., Bialko, C., & Drasgow, F. (2019). How big are my effects? Examining the magnitude of effect sizes in studies of measurement equivalence.
Organizational Research Methods,
22(3), 678–709.
https://doi.org/10.1177/1094428118761122
Oberski, D. L. (2014). Evaluating sensitivity of parameters of interest to measurement invariance in latent variable models.
Political Analysis,
22(1), 45–60.
https://doi.org/10.1093/pan/mpt014
Oberski, D. L., Vermunt, J. K., & Moors, G. B. D. (2015). Evaluating measurement invariance in categorical data latent variable models with the EPC-interest.
Political Analysis,
23(4), 550–563.
https://doi.org/10.1093/pan/mpv020
Paulus, J. K., & Kent, D. M. (2020). Predictably unequal: Understanding and addressing concerns that algorithmic clinical prediction may increase health disparities.
Npj Digital Medicine,
3(1), 99.
https://doi.org/10.1038/s41746-020-0304-9
Petersen, I. T. (2025).
petersenlab: A collection of R functions by the Petersen Lab.
https://doi.org/10.32614/CRAN.package.petersenlab
Petersen, I. T., Choe, D. E., & LeBeau, B. (2020). Studying a moving target in development: The challenge and opportunity of heterotypic continuity.
Developmental Review,
58, 100935.
https://doi.org/10.1016/j.dr.2020.100935
Putnick, D. L., & Bornstein, M. H. (2016). Measurement invariance conventions and reporting: The state of the art and future directions for psychological research.
Developmental Review,
41, 71–90.
https://doi.org/10.1016/j.dr.2016.06.004
Raykov, T., Marcoulides, G. A., Harrison, M., & Zhang, M. (2020). On the dependability of a popular procedure for studying measurement invariance: A cause for concern?
Structural Equation Modeling: A Multidisciplinary Journal,
27(4), 649–656.
https://doi.org/10.1080/10705511.2019.1610409
Reynolds, C. R., Altmann, R. A., & Allen, D. N. (2021). The problem of bias in psychological assessment. In C. R. Reynolds, R. A. Altmann, & D. N. Allen (Eds.),
Mastering modern psychological testing: Theory and methods (pp. 573–613). Springer International Publishing.
https://doi.org/10.1007/978-3-030-59455-8_15
Reynolds, C. R., & Suzuki, L. A. (2012). Bias in psychological assessment: An empirical review and recommendations. In I. B. Weiner, J. R. Graham, & J. A. Naglieri (Eds.), Handbook of psychology, Vol. 10: Assessment psychology, Part 1: Assessment issues (2nd ed., pp. 82–113).
Robitzsch, A. (2019).
mnlfa: Moderated nonlinear factor analysis.
https://CRAN.R-project.org/package=mnlfa
Rosseel, Y., Jorgensen, T. D., & Rockwood, N. (2022).
lavaan: Latent variable analysis.
https://lavaan.ugent.be
Sackett, P. R., Borneman, M. J., & Connelly, B. S. (2008). High stakes testing in higher education and employment: Appraising the evidence for validity and fairness.
American Psychologist,
63, 215–227.
https://doi.org/10.1037/0003-066X.63.4.215
Sackett, P. R., Schmitt, N., Ellingson, J. E., & Kabin, M. B. (2001). High-stakes testing in employment, credentialing, and higher education.
American Psychologist,
56, 301–318.
https://doi.org/10.1037/0003-066X.56.4.302
Sackett, P. R., & Wilk, S. L. (1994). Within-group norming and other forms of score adjustment in preemployment testing.
American Psychologist,
49(11), 929–954.
https://doi.org/10.1037/0003-066X.49.11.929
Schmidt, F. L., & Hunter, J. E. (1981). Employment testing: Old theories and new research findings.
American Psychologist,
36(10), 1128–1137.
https://doi.org/10.1037/0003-066X.36.10.1128
Silverberg, N. D., & Millis, S. R. (2009). Impairment versus deficiency in neuropsychological assessment: Implications for ecological validity.
Journal of the International Neuropsychological Society,
15(1), 94–102.
https://doi.org/10.1017/S1355617708090139
Thorndike, R. L. (1971). Concepts of culture-fairness.
Journal of Educational Measurement,
8(2), 63–70.
https://doi.org/10.1111/j.1745-3984.1971.tb00907.x
Van De Schoot, R., Kluytmans, A., Tummers, L., Lugtig, P., Hox, J., & Muthen, B. (2013). Facing off with scylla and charybdis: A comparison of scalar, partial, and the novel possibility of approximate measurement invariance.
Frontiers in Psychology,
4(770).
https://doi.org/10.3389/fpsyg.2013.00770
Van De Schoot, R., Schmidt, P., De Beuckelaer, A., Lek, K., & Zondervan-Zwijnenburg, M. (2015). Editorial: Measurement invariance.
Frontiers in Psychology,
6(1064).
https://doi.org/10.3389/fpsyg.2015.01064
Wang, T., Merkle, E. C., & Zeileis, A. (2014). Score-based tests of measurement invariance: Use in practice.
Frontiers in Psychology,
5.
https://doi.org/10.3389/fpsyg.2014.00438
Wang, W.-C., Shih, C.-L., & Yang, C.-C. (2009). The MIMIC method with scale purification for detecting differential item functioning.
Educational and Psychological Measurement,
69(5), 713–731.
https://doi.org/10.1177/0013164409332228
Youngstrom, E. A., & Van Meter, A. (2016). Empirically supported assessment of children and adolescents.
Clinical Psychology: Science and Practice,
23(4), 327–347.
https://doi.org/10.1111/cpsp.12172
Zieky, M. J. (2006). Fairness review in assessment. In S. M. Downing & T. M. Haladyna (Eds.),
Handbook of test development (pp. 359–376). Routledge.
https://doi.org/10.4324/9780203874776.ch16
Zieky, M. J. (2013). Fairness review in assessment. In K. F. Geisinger, B. A. Bracken, J. F. Carlson, J.-I. C. Hansen, N. R. Kuncel, S. P. Reise, & M. C. Rodriguez (Eds.),
APA handbook of testing and assessment in psychology, Vol. 1: Test theory and testing and assessment in industrial and organizational psychology (pp. 293–302). American Psychological Association.
https://doi.org/10.1037/14047-017