I need your help!

I want your feedback to make the book better for you and other readers. If you find typos, errors, or places where the text may be improved, please let me know. The best ways to provide feedback are by GitHub or hypothes.is annotations.

Opening an issue or submitting a pull request on GitHub: https://github.com/isaactpetersen/Principles-Psychological-Assessment

Adding an annotation using hypothes.is. To add an annotation, select some text and then click the symbol on the pop-up menu. To see the annotations of others, click the symbol in the upper right-hand corner of the page.

References

Achenbach, T. M. (2001). What are norms and why do we need valid ones? Clinical Psychology: Science and Practice, 8(4), 446–450. https://doi.org/10.1093/clipsy.8.4.446

Ackerman, P. L. (2013). Assessment of intellectual functioning in adults. In K. F. Geisinger, J. F. Carlson, J.-I. C. Hansen, N. R. Kuncel, S. P. Reise, & M. C. Rodriguez (Eds.), APA handbook of testing and assessment in psychology, Vol 2: Testing and assessment in clinical and counseling psychology (pp. 119–132). American Psychological Association.

Ægisdóttir, S., White, M. J., Spengler, P. M., Maugherman, A. S., Anderson, L. A., Cook, R. S., Nichols, C. N., Lampropoulos, G. K., Walker, B. S., Cohen, G., & Rush, J. D. (2006). The meta-analysis of clinical judgment project: Fifty-six years of accumulated research on clinical versus statistical prediction. The Counseling Psychologist, 34(3), 341–382. https://doi.org/10.1177/0011000005285875

Aguinis, H., Culpepper, S. A., & Pierce, C. A. (2010). Revival of test bias research in preemployment testing. Journal of Applied Psychology, 95(4), 648–680. https://doi.org/10.1037/a0018714

Aguinis, H., Edwards, J. R., & Bradley, K. J. (2017). Improving our understanding of moderation and mediation in strategic management research. Organizational Research Methods, 20(4), 665–685. https://doi.org/10.1177/1094428115627498

Ahuvia, I. L., Schleider, J. L., Kneeland, E. T., Moser, J. S., & Schroder, H. S. (2024). Depression self-labeling in U.S. College students: Associations with perceived control and coping strategies. Journal of Affective Disorders, 351, 202–210. https://doi.org/10.1016/j.jad.2024.01.229

Aitken, M., Plamondon, A., Krzeczkowski, J., Kil, H., & Andrade, B. F. (2024). Systematic integration of multi-informant externalizing ratings in clinical settings. Research on Child and Adolescent Psychopathology, 52, 635–644. https://doi.org/10.1007/s10802-023-01119-z

Allaire, J. J., Teague, C., Scheidegger, C., Xie, Y., Dervieux, C., & Woodhull, G. (2025). Quarto (Version 1.8) [Computer software]. https://doi.org/10.5281/zenodo.5960048

American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (2014). Standards for educational and psychological testing. American Educational Research Association.

American Psychological Association. (2017). Ethical principles of psychologists and code of conduct.

American Psychological Association. (2020). Publication manual of the American Psychological Association (7th ed.).

American Psychological Association Office of Ethnic Minority Affairs. (1993). Guidelines for providers of psychological services to ethnic, linguistic, and culturally diverse populations. American Psychologist, 48(1), 45–48. https://doi.org/10.1037/0003-066X.48.1.45

Antony, M. M., & Rowa, K. (2005). Evidence-based assessment of anxiety disorders in adults. Psychological Assessment, 17(3), 256–266. https://doi.org/10.1037/1040-3590.17.3.256

Arnett, A., Pennington, B., Willcutt, E., Dmitrieva, J., Byrne, B., Samuelsson, S., & Olson, R. (2012). A cross-lagged model of the development of ADHD inattention symptoms and rapid naming speed. Journal of Abnormal Child Psychology, 40(8), 1313–1326. https://doi.org/10.1007/s10802-012-9644-5

Arvey, R. D., Bouchard, T. J., Carroll, J. B., Cattell, R. B., Cohen, D. B., Dawis, R. V., Detterman, D. K., Dunnette, M., Eysenck, H., Feldman, J. M., Fleishman, E. A., Gilmore, G. C., Gordon, R. A., Gottfredson, L. S., Greene, R. L., Haier, R. J., Hardin, G., Hogan, R., Horn, J. M., … Willerman, L. (1994). Mainstream science on intelligence. Wall Street Journal, 13(1), 18–25.

Atanasov, P., Witkowski, J., Ungar, L., Mellers, B., & Tetlock, P. (2020). Small steps to accuracy: Incremental belief updaters are better forecasters. Organizational Behavior and Human Decision Processes, 160, 19–35. https://doi.org/10.1016/j.obhdp.2020.02.001

Austin, P. C., & Steyerberg, E. W. (2014). Graphical assessment of internal and external calibration of logistic regression models by using loess smoothers. Statistics in Medicine, 33(3), 517–535. https://doi.org/10.1002/sim.5941

Avugos, S., Köppen, J., Czienskowski, U., Raab, M., & Bar-Eli, M. (2013). The “hot hand” reconsidered: A meta-analytic approach. Psychology of Sport and Exercise, 14(1), 21–27. https://doi.org/10.1016/j.psychsport.2012.07.005

Baird, C., & Wagner, D. (2000). The relative validity of actuarial- and consensus-based risk assessment systems. Children and Youth Services Review, 22(11), 839–871. https://doi.org/10.1016/S0190-7409(00)00122-5

Bakeman, R., & Goodman, S. H. (2020). Interobserver reliability in clinical research: Current issues and discussion of how to establish best practices. Journal of Abnormal Psychology, 129(1), 5–13. https://doi.org/10.1037/abn0000487

Ballesteros-Pérez, P., González-Cruz, M. C., & Mora-Melià, D. (2018). Explaining the Bayes’ theorem graphically. Proceedings of the International Technology, Education and Development Conference.

Baltes, P. B. (1968). Longitudinal and cross-sectional sequences in the study of age and generation effects. Human Development, 11(3), 145–171. http://www.jstor.org/stable/26761719

Bandalos, D. L. (2018). Measurement theory and applications for the social sciences. Guilford Publications.

Bar-Eli, M., Avugos, S., & Raab, M. (2006). Twenty years of “hot hand” research: Review and critique. Psychology of Sport and Exercise, 7(6), 525–553. https://doi.org/10.1016/j.psychsport.2006.03.001

Baron-Cohen, S. (2002). The extreme male brain theory of autism. Trends in Cognitive Sciences, 6(6), 248–254. https://doi.org/10.1016/S1364-6613(02)01904-6

Baron-Cohen, S. (2010). Empathizing, systemizing, and the extreme male brain theory of autism. In I. Savic (Ed.), Progress in brain research (Vol. 186, pp. 167–175). Elsevier.

Barrash, J., Stillman, A., Anderson, S. W., Uc, E. Y., Dawson, J. D., & Rizzo, M. (2010). Prediction of driving ability with neuropsychological tests: Demographic adjustments diminish accuracy. Journal of the International Neuropsychological Society, 16(4), 679–686. https://doi.org/10.1017/S1355617710000470

Bates, D., Maechler, M., Bolker, B., & Walker, S. (2022). lme4: Linear mixed-effects models using Eigen and S4. https://github.com/lme4/lme4/

Bauer, D. J., Belzak, W. C. M., & Cole, V. T. (2020). Simplifying the assessment of measurement invariance over multiple background variables: Using regularized moderated nonlinear factor analysis to detect differential item functioning. Structural Equation Modeling: A Multidisciplinary Journal, 27(1), 43–55. https://doi.org/10.1080/10705511.2019.1642754

Bauer, D. J., Howard, A. L., Baldasaro, R. E., Curran, P. J., Hussong, A. M., Chassin, L., & Zucker, R. A. (2013). A trifactor model for integrating ratings across multiple informants. Psychological Methods, 18(4), 475–493. https://doi.org/10.1037/a0032475

BBC. (1973). Monty python’s flying circus: S3E38 - a book at bedtime. https://osf.io/gc79d

Beaujean, A. A. (2014). Latent variable modeling using R: A step-by-step guide. Routledge.

Beltz, A. M., Wright, A. G. C., Sprague, B. N., & Molenaar, P. C. M. (2016). Bridging the nomothetic and idiographic approaches to the analysis of clinical data. Assessment, 23(4), 447–458. https://doi.org/10.1177/1073191116648209

Belzak, W. C. M., & Bauer, D. J. (2020). Improving the assessment of measurement invariance: Using regularization to select anchor items and identify differential item functioning. Psychological Methods, 25(6), 673–690. https://doi.org/10.1037/met0000253

Benjamin, L. T. (2005). A history of clinical psychology as a profession in America (and a glimpse of its future). Annual Review of Clinical Psychology, 1, 1–30. https://doi.org/10.1146/annurev.clinpsy.1.102803.143758

Bennett, C. M., Miller, M. B., & Wolford, G. L. (2009). Neural correlates of interspecies perspective taking in the post-mortem Atlantic Salmon: An argument for multiple comparisons correction. NeuroImage, 47, S125. https://doi.org/10.1016/S1053-8119(09)71202-9

Bennett, C. M., Miller, M. B., & Wolford, G. L. (2010). Neural correlates of interspecies perspective taking in the post-mortem Atlantic Salmon: An argument for multiple comparisons correction. Journal of Serendipitous and Unexpected Results, 1, 1–5. https://teenspecies.github.io/pdfs/NeuralCorrelates.pdf

Benning, S. D., Bachrach, R. L., Smith, E. A., Freeman, A. J., & Wright, A. G. C. (2019). The registration continuum in clinical science: A guide toward transparent practices. Journal of Abnormal Psychology, 128(6), 528–540. https://doi.org/10.1037/abn0000451

Bensch, D., Maaß, U., Greiff, S., Horstmann, K. T., & Ziegler, M. (2019). The nature of faking: A homogeneous and predictable construct? Psychological Assessment, 31(4), 532–544. https://doi.org/10.1037/pas0000619

Berry, D., & Willoughby, M. T. (2017). On the practical interpretability of cross-lagged panel models: Rethinking a developmental workhorse. Child Development, 88(4), 1186–1206. https://doi.org/10.1111/cdev.12660

Bersoff, D. N., DeMatteo, D., & Foster, E. E. (2012). Assessment and testing. In S. J. Knapp (Ed.), APA handbook of ethics in psychology, Vol 2: Practice, teaching, and research (pp. 45–74). American Psychological Association.

Bickel, J. E., & Kim, S. D. (2008). Verification of The Weather Channel probability of precipitation forecasts. Monthly Weather Review, 136(12), 4867–4881. https://doi.org/10.1175/2008MWR2547.1

Bland, J. M., & Altman, D. G. (1986). Statistical methods for assessing agreement between two methods of clinical measurement. The Lancet, 327(8476), 307–310. https://doi.org/10.1016/S0140-6736(86)90837-8

Bland, J. M., & Altman, D. G. (1999). Measuring agreement in method comparison studies. Statistical Methods in Medical Research, 8(2), 135–160. https://doi.org/10.1177/096228029900800204

Blashfield, R. K., Keeley, J. W., Flanagan, E. H., & Miles, S. R. (2014). The cycle of classification: DSM-I through DSM-5. Annual Review of Clinical Psychology, 10(1), 25–51. https://doi.org/10.1146/annurev-clinpsy-032813-153639

Blumberg, M. S. (2013). Homology, correspondence, and continuity across development: The case of sleep. Developmental Psychobiology, 55(1), 92–100. https://doi.org/10.1002/dev.21024

Bocskocsky, A., Ezekowitz, J., & Stein, C. (2014). The hot hand: A new approach to an old “fallacy.” MIT Sloan Sports Analytics Conference.

Bolger, F., & Önkal-Atay, D. (2004). The effects of feedback on judgmental interval predictions. International Journal of Forecasting, 20(1), 29–39. https://doi.org/10.1016/S0169-2070(03)00009-8

Bollen, K. A. (1989). Structural equations with latent variables. John Wiley & Sons.

Bollen, K. A. (2002). Latent variables in psychology and the social sciences. Annual Review of Psychology, 53(1), 605–634. https://doi.org/10.1146/annurev.psych.53.100901.135239

Bollen, K. A., & Bauldry, S. (2011). Three Cs in measurement models: Causal indicators, composite indicators, and covariates. Psychological Methods, 16(3), 265–284. https://doi.org/10.1037/a0024448

Bollen, K. A., & Diamantopoulos, A. (2017). In defense of causal-formative indicators: A minority report. Psychological Methods, 22(3), 581–596. https://doi.org/10.1037/met0000056

Bollen, K. A., & Lennox, R. D. (1991). Conventional wisdom on measurement: A structural equation perspective. Psychological Bulletin, 110(2), 305–314. https://doi.org/10.1037/0033-2909.110.2.305

Boring, E. G. (1923). Intelligence as the tests test it. New Republic, 36, 35–37.

Bornstein, R. F. (2011). Toward a process-focused model of test score validity: Improving psychological assessment in science and practice. Psychological Assessment, 23(2), 532–544. https://doi.org/10.1037/a0022402

Borsboom, D. (2003). Conceptual issues in psychological measurement. Universiteit van Amsterdam.

Box, G. E. P. (1979). Robustness in the strategy of scientific model building. In R. L. Launer & G. N. Wilkinson (Eds.), Robustness in statistics. Academic Press.

Brennan, R. L. (1992). Generalizability theory. Educational Measurement: Issues and Practice, 11(4), 27–34. https://doi.org/10.1111/j.1745-3992.1992.tb00260.x

Brennan, R. L. (2001). Generalizability theory. Springer New York. https://books.google.com/books?id=nbHbBwAAQBAJ

Brickman, A. M., Cabo, R., & Manly, J. J. (2006). Ethical issues in cross-cultural neuropsychology. Applied Neuropsychology, 13(2), 91–100. https://doi.org/10.1207/s15324826an1302_4

Brown, R. T., Reynolds, C. R., & Whitaker, J. S. (1999). Bias in mental testing since bias in mental testing. School Psychology Quarterly, 14(3), 208–238. https://doi.org/10.1037/h0089007

Buchanan, T. (2002). Online assessment: Desirable or dangerous? Professional Psychology: Research and Practice, 33(2), 148–154. https://doi.org/10.1037/0735-7028.33.2.148

Burchett, D., & Ben-Porath, Y. S. (2019). Methodological considerations for developing and evaluating response bias indicators. Psychological Assessment, 31(12), 1497–1511. https://doi.org/10.1037/pas0000680

Burisch, M. (1984). Approaches to personality inventory construction: A comparison of merits. American Psychologist, 39, 214–227. https://doi.org/10.1037/0003-066X.39.3.214

Bürkner, P.-C. (2021). Bayesian item response modeling in R with brms and Stan. Journal of Statistical Software, 100(5), 1–54. https://doi.org/10.18637/jss.v100.i05

Burlew, A. K., Peteet, B. J., McCuistian, C., & Miller-Roenigk, B. D. (2019). Best practices for researching diverse groups. American Journal of Orthopsychiatry, 89(3), 354–368. https://doi.org/10.1037/ort0000350

Buros Center for Testing. (2021). The twenty-first mental measurements yearbook. Buros Center for Testing.

Busemeyer, J. R., & Jones, L. E. (1983). Analysis of multiplicative combination rules when the causal variables are measured with error. Psychological Bulletin, 93(3), 549–562. https://doi.org/10.1037/0033-2909.93.3.549

Busemeyer, J. R., & Stout, J. C. (2002). A contribution of cognitive decision models to clinical assessment: Decomposing performance on the Bechara gambling task. Psychological Assessment, 14(3), 253–262. https://doi.org/10.1037/1040-3590.14.3.253

Button, K. S., Ioannidis, J. P. A., Mokrysz, C., Nosek, B. A., Flint, J., Robinson, E. S. J., & Munafo, M. R. (2013a). Confidence and precision increase with high statistical power. Nature Reviews Neuroscience, 14(8), 585–585. https://doi.org/10.1038/nrn3475-c4

Button, K. S., Ioannidis, J. P. A., Mokrysz, C., Nosek, B. A., Flint, J., Robinson, E. S. J., & Munafo, M. R. (2013b). Power failure: Why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience, 14(5), 365–376. https://doi.org/10.1038/nrn3475

Byrd, D. A., Rivera Mindt, M. M., Clark, U. S., Clarke, Y., Thames, A. D., Gammada, E. Z., & Manly, J. J. (2021). Creating an antiracist psychology by addressing professional complicity in psychological assessment. Psychological Assessment, 33(3), 279–285. https://doi.org/10.1037/pas0000993

Calamia, M. (2019). Practical considerations for evaluating reliability in ambulatory assessment studies. Psychological Assessment, 31(3), 285–291. https://doi.org/10.1037/pas0000599

Camilli, G. (2013). Ongoing issues in test fairness. Educational Research and Evaluation, 19(2–3), 104–120. https://doi.org/10.1080/13803611.2013.767602

Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant validation by the multitrait-multimethod matrix. Psychological Bulletin, 56(2), 81–105. https://doi.org/10.1037/h0046016

Campbell, L., Vasquez, M., Behnke, S., & Kinscherff, R. (2010). APA ethics code commentary and case illustrations (pp. v, 392–v, 392). American Psychological Association.

Carlson, S. M., & Zelazo, P. D. (2014). Minnesota executive function scale. Test manual. Reflection Sciences, LLC.

Carpenter, R. W., Wycoff, A. M., & Trull, T. J. (2016). Ambulatory assessment: New adventures in characterizing dynamic processes. Assessment, 23(4), 414–424. https://doi.org/10.1177/1073191116632341

Cashel, M. L. (2002). Child and adolescent psychological assessment: Current clinical practices and the impact of managed care. Professional Psychology: Research and Practice, 33(5), 446–453. https://doi.org/10.1037/0735-7028.33.5.446

Caspi, A., Houts, R. M., Ambler, A., Danese, A., Elliott, M. L., Hariri, A., Harrington, H., Hogan, S., Poulton, R., Ramrakha, S., Rasmussen, L. J. H., Reuben, A., Richmond-Rakerd, L., Sugden, K., Wertz, J., Williams, B. S., & Moffitt, T. E. (2020). Longitudinal assessment of mental health disorders and comorbidities across 4 decades among participants in the Dunedin Birth Cohort Study. JAMA Network Open, 3(4), e203221–e203221. https://doi.org/10.1001/jamanetworkopen.2020.3221

Caspi, A., Houts, R. M., Belsky, D. W., Goldman-Mellor, S. J., Harrington, H., Israel, S., Meier, M. H., Ramrakha, S., Shalev, I., Poulton, R., & Moffitt, T. E. (2014). The p factor: One general psychopathology factor in the structure of psychiatric disorders? Clinical Psychological Science, 2(2), 119–137. https://doi.org/10.1177/2167702613497473

Caspi, A., & Shiner, R. L. (2006). Personality development. In N. Eisenberg, W. Damon, & R. M. Lerner (Eds.), Handbook of child psychology (6th ed., Vol. 3, pp. 300–365). John Wiley & Sons, Inc.

Cattell, R. B. (1966). The scree test for the number of factors. Multivariate Behavioral Research, 1(2), 245–276. https://doi.org/10.1207/s15327906mbr0102_10

Chalmers, P. (2020). mirt: Multidimensional item response theory. https://CRAN.R-project.org/package=mirt

Chalmers, P. (2021). mirtCAT: Computerized adaptive testing with multidimensional item response theory. https://CRAN.R-project.org/package=mirtCAT

Chandler, J., Sisso, I., & Shapiro, D. (2020). Participant carelessness and fraud: Consequences for clinical research and potential solutions. Journal of Abnormal Psychology, 129(1), 49–55. https://doi.org/10.1037/abn0000479

Charba, J. P., & Klein, W. H. (1980). Skill in precipitation forecasting in the National Weather Service. Bulletin of the American Meteorological Society, 61(12), 1546–1555. https://doi.org/10.1175/1520-0477(1980)061<1546:SIPFIT>2.0.CO;2

Chen, F. F. (2007). Sensitivity of goodness of fit indexes to lack of measurement invariance. Structural Equation Modeling: A Multidisciplinary Journal, 14(3), 464–504. https://doi.org/10.1080/10705510701301834

Chen, F. F. (2008). What happens if we compare chopsticks with forks? The impact of making inappropriate comparisons in cross-cultural research. Journal of Personality and Social Psychology, 95(5), 1005–1018. https://doi.org/10.1037/a0013193

Chen, F. R., & Jaffee, S. R. (2015). The heterogeneity in the development of homotypic and heterotypic antisocial behavior. Journal of Developmental and Life-Course Criminology, 1(3), 269–288. https://doi.org/10.1007/s40865-015-0012-3

Chen, Y., Prudêncio, R. B. C., Diethe, T., & Flach, P. (2019). β3-IRT: A new item response model and its applications. arXiv:1903.04016. https://arxiv.org/abs/1903.04016

Cheng, Y., Shao, C., & Lathrop, Q. N. (2016). The mediated MIMIC model for understanding the underlying mechanism of DIF. Educational and Psychological Measurement, 76(1), 43–63. https://doi.org/10.1177/0013164415576187

Cheung, G. W., Cooper-Thomas, H. D., Lau, R. S., & Wang, L. C. (2024). Reporting reliability, convergent and discriminant validity with structural equation modeling: A review and best-practice recommendations. Asia Pacific Journal of Management, 41(2), 745–783. https://doi.org/10.1007/s10490-023-09871-y

Cheung, G. W., & Rensvold, R. B. (2002). Evaluating goodness-of-fit indexes for testing measurement invariance. Structural Equation Modeling: A Multidisciplinary Journal, 9(2), 233–255. https://doi.org/10.1207/s15328007sem0902_5

Childs, D. Z., Hindle, B. J., & Warren, P. H. (2021). APS 240: Data analysis and statistics with R. https://dzchilds.github.io/stats-for-bio/

Choca, J. P., & Rossini, E. D. (2018). Assessment using the Rorschach inkblot test. American Psychological Association.

Cicchetti, D., & Rogosch, F. A. (2002). A developmental psychopathology perspective on adolescence. Journal of Consulting and Clinical Psychology, 70(1), 6–20. https://doi.org/10.1037/0022-006X.70.1.6

Civelek, M. E. (2018). Essentials of structural equation modeling. Zea E-Books.

Clark, L. A., & Watson, D. (1995). Constructing validity: Basic issues in objective scale development. Psychological Assessment, 7, 309–319. https://doi.org/10.1037/1040-3590.7.3.309

Clark, L. A., & Watson, D. (2019). Constructing validity: New developments in creating objective measuring instruments. Psychological Assessment, 31(12), 1412–1427. https://doi.org/10.1037/pas0000626

Clark, M. J., & Grandy, J. (1984). Sex differences in the academic performance of Scholastic Aptitude Test takers: College board report no. 84-8. College Board Publications.

Clark, S. J., & Desharnais, R. A. (1998). Honest answers to embarrassing questions: Detecting cheating in the randomized response model. Psychological Methods, 3(2), 160–168. https://doi.org/10.1037/1082-989X.3.2.160

Cohen, J. (1968). Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. Psychological Bulletin, 70(4), 213–220. https://doi.org/10.1037/h0026256

Cohen, Z. D., & DeRubeis, R. J. (2018). Treatment selection in depression. Annual Review of Clinical Psychology, 14(1), 209–236. https://doi.org/10.1146/annurev-clinpsy-050817-084746

Cole, N. S. (1981). Bias in testing. American Psychologist, 36(10), 1067–1077. https://doi.org/10.1037/0003-066X.36.10.1067

Cole, V., Gottfredson, N., & Giordano, M. (2018). aMNLFA: Automated fitting of moderated nonlinear factor analysis through the Mplus program. https://CRAN.R-project.org/package=aMNLFA

Committee on the General Aptitude Test Battery, Commission on Behavioral and Social Sciences and Education, & National Research Council. (1989). Fairness in employment testing: Validity generalization, minority issues, and the general aptitude test battery. National Academies Press.

Conradt, E., Crowell, S. E., & Cicchetti, D. (2021). Using development and psychopathology principles to inform the research domain criteria (RDoC) framework. Development and Psychopathology, 33(5), 1521–1525. https://doi.org/10.1017/S0954579421000985

Cooper, L. D., & Balsis, S. (2009). When less is more: How fewer diagnostic criteria can indicate greater severity. Psychological Assessment, 21(3), 285–293. https://doi.org/10.1037/a0016698

Cortina, J. M. (1993). What is coefficient alpha? An examination of theory and applications. Journal of Applied Psychology, 78, 98–104. https://doi.org/10.1037/0021-9010.78.1.98

Costa Jr., P. T., McCrae, R. R., & Löckenhoff, C. E. (2019). Personality across the life span. Annual Review of Psychology, 70(1), 423–448. https://doi.org/10.1146/annurev-psych-010418-103244

Counsell, A., Cribbie, R. A., & Flora, D. B. (2020). Evaluating equivalence testing methods for measurement invariance. Multivariate Behavioral Research, 55(2), 312–328. https://doi.org/10.1080/00273171.2019.1633617

Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52(4), 281–302. https://doi.org/10.1037/h0040957

Curran, P. J., & Hancock, G. R. (2020). Quantitude: "S2E15: Ethics in quantitative research". https://quantitudepod.org/s2e15-ethics-in-quantitative-research/

Curran, P. J., Howard, A. L., Bainter, S. A., Lane, S. T., & McGinley, J. S. (2014). The separation of between-person and within-person components of individual change over time: A latent curve model with structured residuals. Journal of Consulting and Clinical Psychology, 82, 8–94. https://doi.org/10.1037/a0035297

Dana, J., & Thomas, R. (2006). In defense of clinical judgment … and mechanical prediction. Journal of Behavioral Decision Making, 19(5), 413–428. https://doi.org/10.1002/bdm.537

Dana, R. H. (1998). Multicultural assessment of personality and psychopathology in the United States: Still art, not yet science, and controversial. European Journal of Psychological Assessment, 14(1), 62–70. https://doi.org/10.1027/1015-5759.14.1.62

Datta, D. (2018). blandr: Bland-Altman method comparison. https://github.com/deepankardatta/blandr/

Daugherty, J. C., Puente, A. E., Fasfous, A. F., Hidalgo-Ruzzante, N., & Pérez-Garcia, M. (2017). Diagnostic mistakes of culturally diverse individuals when using North American neuropsychological tests. Applied Neuropsychology: Adult, 24(1), 16–22. https://doi.org/10.1080/23279095.2015.1036992

Davison, G. C., Vogel, R. S., & Coffman, S. G. (1997). Think-aloud approaches to cognitive assessment and the articulated thoughts in simulated situations paradigm. Journal of Consulting and Clinical Psychology, 65(6), 950–958. https://doi.org/10.1037/0022-006X.65.6.950

Dawes, R. M. (1986). Representative thinking in clinical judgment. Clinical Psychology Review, 6, 425–441. https://doi.org/10.1016/0272-7358(86)90030-9

Dawes, R. M., Faust, D., & Meehl, P. E. (1989). Clinical versus actuarial judgment. Science, 243(4899), 1668–1674. https://doi.org/10.1126/science.2648573

Dell’Armo, K., & Tassé, M. J. (2025). How intellectual disability may bias psychologists’ clinical impressions: An examination of diagnostic overshadowing. Psychological Assessment, 37(4), 161–171. https://doi.org/10.1037/pas0001367

DeRubeis, R. J., Cohen, Z. D., Forand, N. R., Fournier, J. C., Gelfand, L. A., & Lorenzo-Luaces, L. (2014). The personalized advantage index: Translating research on prediction into individualized treatment recommendations. A demonstration. PLoS ONE, 9(1), e83875. https://doi.org/10.1371/journal.pone.0083875

Diamantopoulos, A., Riefler, P., & Roth, K. P. (2008). Advancing formative measurement models. Journal of Business Research, 61(12), 1203–1218. https://doi.org/10.1016/j.jbusres.2008.01.009

Dien, J. (2012). Applying principal components analysis to event-related potentials: A tutorial. Developmental Neuropsychology, 37(6), 497–517. https://doi.org/10.1080/87565641.2012.697503

Digitale, J. C., Martin, J. N., & Glymour, M. M. (2022). Tutorial on directed acyclic graphs. Journal of Clinical Epidemiology, 142, 264–267. https://doi.org/10.1016/j.jclinepi.2021.08.001

Dinno, A. (2014). Gently clarifying the application of Horn’s parallel analysis to principal component analysis versus factor analysis. http://archives.pdx.edu/ds/psu/10527

Dombrowski, S. C., McGill, R. J., & Morgan, G. B. (2021). Monte Carlo modeling of contemporary intelligence test (IQ) factor structure: Implications for IQ assessment, interpretation, and theory. Assessment, 28(3), 977–993. https://doi.org/10.1177/1073191119869828

Dorans, N. J. (2017). Contributions to the quantitative assessment of item, test, and score fairness. In R. E. Bennett & M. von Davier (Eds.), Advancing human assessment (pp. 201–230). Springer, Cham.

Draheim, C., Mashburn, C. A., Martin, J. D., & Engle, R. W. (2019). Reaction time in differential and developmental research: A review and commentary on the problems and alternatives. Psychological Bulletin, 145(5), 508–535. https://doi.org/10.1037/bul0000192

Dubois, J., & Adolphs, R. (2016). Building a science of individual differences from fMRI. Trends in Cognitive Sciences, 20(6), 425–443. https://doi.org/10.1016/j.tics.2016.03.014

Dueber, D. (2019). dmacs: Measurement nonequivalence effect size calculator. https://github.com/ddueber/dmacs

Dumenci, L. (2024). Principles of psychological assessment, with applied examples in R. By Isaac T. Petersen, Chapman and Hall/CRC, 2024, ISBN: 9781032413068 https://www.routledge.com/Principles-of-psychological-assessment-with-applied-examples-in-R/Petersen/p/book/9781032413068. Biometrics, 80(4). https://doi.org/10.1093/biomtc/ujae133

Duncan, G. J., Engel, M., Claessens, A., & Dowsett, C. J. (2014). Replication and robustness in developmental research. Developmental Psychology, 50(11), 2417–2425. https://doi.org/10.1037/a0037996

Dunkley, D. M., Segal, Z. V., & Blankstein, K. R. (2019). Cognitive assessment: Issues and methods. In K. S. Dobson & D. J. A. Dozois (Eds.), Handbook of cognitive-behavioral therapies (4th ed., pp. 85–119). Guilford Press.

Dunn, T. J., Baguley, T., & Brunsden, V. (2014). From alpha to omega: A practical solution to the pervasive problem of internal consistency estimation. British Journal of Psychology, 105(3), 399–412. https://doi.org/10.1111/bjop.12046

Dunning, D., Heath, C., & Suls, J. M. (2004). Flawed self-assessment: Implications for health, education, and the workplace. Psychological Science in the Public Interest, 5, 69–106. https://doi.org/10.1111/j.1529-1006.2004.00018.x

Durbin, C. E., Wilson, S., & MacDonald, I., Angus W. (2022). Integrating development into the research domain criteria (RDoC) framework: Introduction to the special section. Journal of Psychopathology and Clinical Science, 131(6), 535–541. https://doi.org/10.1037/abn0000767

Dwyer, D. B., Falkai, P., & Koutsouleris, N. (2018). Machine learning approaches for clinical psychology and psychiatry. Annual Review of Clinical Psychology, 14(1), 91–118. https://doi.org/10.1146/annurev-clinpsy-032816-045037

Eaton, W. W. (1980). The sociology of mental disorders. Praeger.

Eddy, D. M. (1982). Probabilistic reasoning in clinical medicine: Problems and opportunities. In D. Kahneman, P. Slovic, & A. Tversky (Eds.), Judgment under uncertainty: Heuristics and biases (pp. 249–267). Cambridge University Press.

Edwards, J. R. (2011). The fallacy of formative measurement. Organizational Research Methods, 14(2), 370–388. https://doi.org/10.1177/1094428110378369

Edwards, J. R., & Bagozzi, R. P. (2000). On the nature and direction of relationships between constructs and measures. Psychological Methods, 5(2), 155–174. https://doi.org/10.1037/1082-989X.5.2.155

Edwards, L. M., Burkard, A. W., Adams, H. A., & Newcomb, S. A. (2017). A mixed-method study of psychologists’ use of multicultural assessment. Professional Psychology: Research and Practice, 48(2), 131–138. https://doi.org/10.1037/pro0000095

Einstein, A. (1934). On the method of theoretical physics. Philosophy of Science, 1(2), 163–169. https://doi.org/10.1086/286316

Ellard, K. K., Fairholme, C. P., Boisseau, C. L., Farchione, T. J., & Barlow, D. H. (2010). Unified protocol for the transdiagnostic treatment of emotional disorders: Protocol development and initial outcome data. Cognitive and Behavioral Practice, 17(1), 88–101. https://doi.org/10.1016/j.cbpra.2009.06.002

Embretson, S. E. (1996). The new rules of measurement. Psychological Assessment, 8, 341–349. https://doi.org/10.1037/1040-3590.8.4.341

Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists (Vol. 4). Lawrence Erlbaum Associates.

Epskamp, S. (2022). semPlot: Path diagrams and visual analysis of various SEM packages’ output. https://github.com/SachaEpskamp/semPlot

Evans, S. C., & Shaughnessy, S. (2024). Emotion regulation as central to psychopathology across childhood and adolescence: A commentary on Nobakht et al. (2023). Journal of Child Psychology and Psychiatry, 65(3), 354–357. https://doi.org/10.1111/jcpp.13910

Executive Board of the American Anthropological Association. (1998). AAA statement on race. American Anthropologist, 100(3), 712–713. https://doi.org/10.1525/aa.1998.100.3.712

Exner, J. E. (1974). The Rorschach: A comprehensive system. John Wiley & Sons.

Exner, J. E., & Erdberg, S. P. (2005). The Rorschach, a comprehensive system: Advanced interpretation (3rd ed., Vol. 2). John Wiley & Sons, Inc.

Fadus, M. C., Ginsburg, K. R., Sobowale, K., Halliday-Boykins, C. A., Bryant, B. E., Gray, K. M., & Squeglia, L. M. (2020). Unconscious bias and the diagnosis of disruptive behavior disorders and ADHD in african american and hispanic youth. Academic Psychiatry, 44(1), 95–102. https://doi.org/10.1007/s40596-019-01127-6

Falotico, R., & Quatto, P. (2010). On avoiding paradoxes in assessing inter-rater agreement. Italian Journal of Applied Statistics, 22, 151–160.

Faraone, S. V., & Tsuang, M. T. (1994). Measuring diagnostic accuracy in the absence of a “gold standard.” American Journal of Psychiatry, 151, 650–657. https://doi.org/10.1176/ajp.151.5.650

Farrington, D. P., & Loeber, R. (1989). Relative improvement over chance (RIOC) and phi as measures of predictive efficiency and strength of association in 2×2 tables. Journal of Quantitative Criminology, 5(3), 201–213. https://doi.org/10.1007/BF01062737

Farris, C., Treat, T. A., Viken, R. J., & McFall, R. M. (2008). Perceptual mechanisms that characterize gender differences in decoding women’s sexual intent. Psychological Science, 19(4), 348–354. https://doi.org/10.1111/j.1467-9280.2008.02092.x

Farris, C., Viken, R. J., Treat, T. A., & McFall, R. M. (2006). Heterosocial perceptual organization: Application of the choice model to sexual coercion. Psychological Science (0956-7976), 17(10), 869–875. https://doi.org/10.1111/j.1467-9280.2006.01796.x

Faul, F., Erdfelder, E., Buchner, A., & Lang, A.-G. (2009). Statistical power analyses using g*power 3.1: Tests for correlation and regression analyses. Behavior Research Methods, 41(4), 1149–1160. https://doi.org/10.3758/brm.41.4.1149

Fernández, A. L., & Abe, J. (2018). Bias in cross-cultural neuropsychological testing: Problems and possible solutions. Culture and Brain, 6(1), 1–35. https://doi.org/10.1007/s40167-017-0050-2

Field, A., Miles, J., & Field, Z. (2012). Discovering statistics using r. SAGE Publications.

Fiske, D. W., & Campbell, D. T. (1992). Citations do not solve problems. Psychological Bulletin, 112(3), 393–395. https://doi.org/10.1037/0033-2909.112.3.393

Fleck, M. S., Samei, E., & Mitroff, S. R. (2010). Generalized “satisfaction of search”: Adverse influences on dual-target search accuracy. Journal of Experimental Psychology: Applied, 16(1), 60–71. https://doi.org/10.1037/a0018629

Fletcher, R. R., Nakeshimana, A., & Olubeko, O. (2021). Addressing fairness, bias, and appropriate use of artificial intelligence and machine learning in global health. Frontiers in Artificial Intelligence, 3(116). https://doi.org/10.3389/frai.2020.561802

Flora, D. B. (2020). Your coefficient alpha is probably wrong, but which coefficient omega is right? A tutorial on using R to obtain better reliability estimates. Advances in Methods and Practices in Psychological Science, 3(4), 484–501. https://doi.org/10.1177/2515245920951747

Floyd, F. J., & Widaman, K. F. (1995). Factor analysis in the development and refinement of clinical assessment instruments. Psychological Assessment, 7, 286–299. https://doi.org/10.1037/1040-3590.7.3.286

Fok, C. C. T., & Henry, D. (2015). Increasing the sensitivity of measures to change. Prevention Science, 16(7), 978–986. https://doi.org/10.1007/s11121-015-0545-z

Fontaine, N. M. G., & Petersen, I. T. (2017). Developmental trajectories of psychopathology: An overview of approaches and applications. In L. Centifanti & D. Williams (Eds.), The wiley handbook of developmental psychopathology (pp. 5–28). Wiley-Blackwell.

Forbey, J. D., & Ben-Porath, Y. S. (2007). Computerized adaptive personality testing: A review and illustration with the MMPI-2 computerized adaptive version. Psychological Assessment, 19(1), 14–24. https://doi.org/10.1037/1040-3590.19.1.14

Fornell, C., & Larcker, D. F. (1981). Evaluating structural equation models with unobservable variables and measurement error. Journal of Marketing Research, 18(1), 39–50. https://doi.org/10.2307/3151312

Fox, J., Weisberg, S., & Price, B. (2022). Car: Companion to applied regression. https://CRAN.R-project.org/package=car

Frank, L. K. (1939). Projective methods for the study of personality. Journal of Psychology, 8, 389–413. https://doi.org/10.1080/00223980.1939.9917671

Frazier, T. W., Georgiades, S., Bishop, S. L., & Hardan, A. Y. (2014). Behavioral and cognitive characteristics of females and males with autism in the simons simplex collection. Journal of the American Academy of Child & Adolescent Psychiatry, 53(3), 329–340.e3. https://doi.org/10.1016/j.jaac.2013.12.004

Freese, J., & Peterson, D. (2017). Replication in social science. Annual Review of Sociology.

Freud, S. (1911). Psycho-analytic notes on an autobiographical account of a case of paranoia (dementia paranoides). In J. Strachey (Ed.), The standard edition of the complete psychological works of Sigmund Freud: The case of Schreber, papers on technique and other works, Vol. 12 (1911–1913) (pp. 1–82).

Fried, E. I. (2022). Studying mental health problems as systems, not syndromes. Current Directions in Psychological Science, 31(6), 500–508. https://doi.org/10.1177/09637214221114089

Furr, R. M. (2017). Psychometrics: An introduction. SAGE publications.

Furr, R. M., & Heuckeroth, S. (2019). The “quantifying construct validity” procedure: Its role, value, interpretations, and computation. Assessment, 26(4), 555–566. https://doi.org/10.1177/1073191118820638

Galatzer-Levy, I. R., & Bryant, R. A. (2013). 636,120 ways to have posttraumatic stress disorder. Perspectives on Psychological Science, 8(6), 651–662. https://doi.org/10.1177/1745691613504115

Galatzer-Levy, I. R., & Onnela, J.-P. (2023). Machine learning and the digital measurement of psychological health. Annual Review of Clinical Psychology, 19, 133–154. https://doi.org/10.1146/annurev-clinpsy-080921-073212

Gambrill, E. (2014). The diagnostic and statistical manual of mental disorders as a major form of dehumanization in the modern world. Research on Social Work Practice, 24(1), 13–36. https://doi.org/10.1177/1049731513499411

Gandrud, C. (2020). Reproducible research with R and R studio (3rd ed.). CRC Press. https://www.routledge.com/Reproducible-Research-with-R-and-RStudio/Gandrud/p/book/9780367143985

Garb, H. N. (1997). Race bias, social class bias, and gender bias in clinical judgment. Clinical Psychology: Science and Practice, 4(2), 99–120. https://doi.org/10.1111/j.1468-2850.1997.tb00104.x

Garb, H. N. (2005). Clinical judgment and decision making. Annual Review of Clinical Psychology, 1, 67–89. https://doi.org/10.1146/annurev.clinpsy.1.102803.143810

Garb, H. N. (2007). Computer-administered interviews and rating scales. Psychological Assessment, 19(1), 4–13. https://doi.org/10.1037/1040-3590.19.1.4

Garb, H. N., & Wood, J. M. (2019). Methodological advances in statistical prediction. Psychological Assessment, 31(12), 1456–1466. https://doi.org/10.1037/pas0000673

Garb, H. N., Wood, J. M., Lilienfeld, S. O., & Nezworski, M. T. (2005). Roots of the Rorschach controversy. Clinical Psychology Review, 25(1), 97–118. https://doi.org/10.1016/j.cpr.2004.09.002

Garber, J., & Weersing, V. R. (2010). Comorbidity of anxiety and depression in youth: Implications for treatment and prevention. Clinical Psychology: Science and Practice, 17(4), 293–306. https://doi.org/10.1111/j.1468-2850.2010.01221.x

Geldhof, G. J., Preacher, K. J., & Zyphur, M. J. (2014). Reliability estimation in a multilevel confirmatory factor analysis framework. Psychological Methods, 19(1), 72–91. https://doi.org/10.1037/a0032138

Gelman, A., & Loken, E. (2013). The garden of forking paths: Why multiple comparisons can be a problem, even when there is no “fishing expedition” or “p-hacking” and the research hypothesis was posited ahead of time. Department of Statistics, Columbia University.

Gibbons, R. D., Weiss, D. J., Frank, E., & Kupfer, D. (2016). Computerized adaptive diagnosis and testing of mental health disorders. Annual Review of Clinical Psychology, 12(1), 83–104. https://doi.org/10.1146/annurev-clinpsy-021815-093634

Gilovich, T., Vallone, R., & Tversky, A. (1985). The hot hand in basketball: On the misperception of random sequences. Cognitive Psychology, 17(3), 295–314. https://doi.org/10.1016/0010-0285(85)90010-6

Gipps, C., & Stobart, G. (2009). Fairness in assessment. In C. Wyatt-Smith & J. J. Cumming (Eds.), Educational assessment in the 21st century: Connecting theory and practice (pp. 105–118). Springer Netherlands. https://doi.org/10.1007/978-1-4020-9964-9_6

Girard, J. M., & Cohn, J. F. (2016). A primer on observational measurement. Assessment, 23(4), 404–413. https://doi.org/10.1177/1073191116635807

Gneiting, T., & Walz, E.-M. (2021). Receiver operating characteristic (ROC) movies, universal ROC (UROC) curves, and coefficient of predictive ability (CPA). Machine Learning. https://doi.org/10.1007/s10994-021-06114-3

Gonzalez, O., & Pelham, W. E. (2021). When does differential item functioning matter for screening? A method for empirical evaluation. Assessment, 28(2), 446–456. https://doi.org/10.1177/1073191120913618

Goodwin, L. D., & Leech, N. L. (2006). Understanding correlation: Factors that affect the size of r. The Journal of Experimental Education, 74(3), 249–266. https://doi.org/10.3200/JEXE.74.3.249-266

Gottfredson, L. S. (1994). The science and politics of race-norming. American Psychologist, 49(11), 955–963. https://doi.org/10.1037/0003-066X.49.11.955

Gottfredson, L. S. (1997). Mainstream science on intelligence: An editorial with 52 signatories, history, and bibliography. Intelligence, 24(1), 13–23.

Gottfredson, N. C., Cole, V. T., Giordano, M. L., Bauer, D. J., Hussong, A. M., & Ennett, S. T. (2019). Simplifying the implementation of modern scale scoring methods with an automated R package: Automated moderated nonlinear factor analysis (aMNLFA). Addictive Behaviors, 94, 65–73. https://doi.org/10.1016/j.addbeh.2018.10.031

Graham, J. M. (2006). Congeneric and (essentially) tau-equivalent estimates of score reliability: What they are and how to use them. Educational and Psychological Measurement, 66(6), 930–944. https://doi.org/10.1177/0013164406288165

Graham, J. R., Veltri, C. O. C., & Lee, T. T. C. (2022). MMPI instruments: Assessing personality and psychopathology (6th ed.). Oxford University Press.

Graham, J., Olchowski, A., & Gilreath, T. (2007). How many imputations are really needed? Some practical clarifications of multiple imputation theory. Prevention Science, 8(3), 206–213. https://doi.org/10.1007/s11121-007-0070-9

Granziol, U., Brancaccio, A., Pizziconi, G., Spangaro, M., Gentili, F., Bosia, M., Gregori, E., Luperini, C., Pavan, C., Santarelli, V., Cavallaro, R., Cremonese, C., Favaro, A., Rossi, A., Vidotto, G., & Spoto, A. (2022). On the implementation of computerized adaptive observations for psychological assessment. Assessment, 29(2), 225–241. https://doi.org/10.1177/1073191120960215

Green, S. B., & Yang, Y. (2015). Evaluation of dimensionality in the assessment of internal consistency reliability: Coefficient alpha and omega coefficients. Educational Measurement: Issues and Practice, 34(4), 14–20. https://doi.org/10.1111/emip.12100

Greenberg, D. M., Warrier, V., Allison, C., & Baron-Cohen, S. (2018). Testing the empathizing–systemizing theory of sex differences and the extreme male brain theory of autism in half a million people. Proceedings of the National Academy of Sciences, 115(48), 12152–12157. https://doi.org/10.1073/pnas.1811032115

Grove, W. M., & Meehl, P. E. (1996). Comparative efficiency of informal (subjective, impressionistic) and formal (mechanical, algorithmic) prediction procedures: The clinical–statistical controversy. Psychology, Public Policy, and Law, 2(2), 293–323. https://doi.org/10.1037/1076-8971.2.2.293

Grove, W. M., Zald, D. H., Lebow, B. S., Snitz, B. E., & Nelson, C. (2000). Clinical versus mechanical prediction: A meta-analysis. Psychological Assessment, 12(1), 19–30. https://doi.org/10.1037/1040-3590.12.1.19

Gunn, H. J., Grimm, K. J., & Edwards, M. C. (2020). Evaluation of six effect size measures of measurement non-invariance for continuous outcomes. Structural Equation Modeling: A Multidisciplinary Journal, 27(4), 503–514. https://doi.org/10.1080/10705511.2019.1689507

Gwet, K. L. (2008). Computing inter-rater reliability and its variance in the presence of high agreement. British Journal of Mathematical and Statistical Psychology, 61(1), 29–48. https://doi.org/10.1348/000711006X126600

Gwet, K. L. (2021a). Handbook of inter-rater reliability: The definitive guide to measuring the extent of agreement among raters, Vol. 1: Analysis of categorical ratings (5th ed.). AgreeStat Analytics.

Gwet, K. L. (2021b). Handbook of inter-rater reliability: The definitive guide to measuring the extent of agreement among raters, Vol. 2: Analysis of quantitative ratings (5th ed.). AgreeStat Analytics.

Hagquist, C. (2019). Explaining differential item functioning focusing on the crucial role of external information – an example from the measurement of adolescent mental health. BMC Medical Research Methodology, 19(1), 185. https://doi.org/10.1186/s12874-019-0828-3

Hagquist, C., & Andrich, D. (2017). Recent advances in analysis of differential item functioning in health research using the Rasch model. Health and Quality of Life Outcomes, 15(1), 181. https://doi.org/10.1186/s12955-017-0755-0

Hall, G. C. N., Bansal, A., & Lopez, I. R. (1999). Ethnicity and psychopathology: A meta-analytic review of 31 years of comparative MMPI/MMPI-2 research. Psychological Assessment, 11(2), 186–197. https://doi.org/10.1037/1040-3590.11.2.186

Hamaker, E. L., Kuiper, R. M., & Grasman, R. P. P. P. (2015). A critique of the cross-lagged panel model. Psychological Methods, 20(1), 102–116. https://doi.org/10.1037/a0038889

Han, K., Colarelli, S. M., & Weed, N. C. (2019). Methodological and statistical advances in the consideration of cultural diversity in assessment: A critical review of group classification and measurement invariance testing. Psychological Assessment, 31(12), 1481–1496. https://doi.org/10.1037/pas0000731

Hancock, G. R., & French, B. F. (2013). Power analysis in structural equation modeling. In Structural equation modeling: A second course, 2nd ed. (pp. 117–159). IAP Information Age Publishing.

Hardin, A. M., Chang, J. C.-J., Fuller, M. A., & Torkzadeh, G. (2011). Formative measurement and academic research: In search of measurement theory. Educational and Psychological Measurement, 71(2), 281–305. https://doi.org/10.1177/0013164410370208

Harrell, F. (2015). Regression modeling strategies: With applications to linear models, logistic and ordinal regression, and survival analysis. Springer.

Harrell, Jr., F. E. (2021). rms: Regression modeling strategies. https://CRAN.R-project.org/package=rms

Hayes, A. F., & Coutts, J. J. (2020). Use omega rather than Cronbach’s alpha for estimating reliability. but…. Communication Methods and Measures, 14(1), 1–24. https://doi.org/10.1080/19312458.2020.1718629

Hayes, S. C., Nelson, R. O., & Jarrett, R. B. (1987). The treatment utility of assessment: A functional approach to evaluating assessment quality. American Psychologist, 42, 963–974. https://doi.org/10.1037/0003-066X.42.11.963

Haynes, S. N. (2001). Clinical applications of analogue behavioral observation: Dimensions of psychometric evaluation. Psychological Assessment, 13(1), 73–85. https://doi.org/10.1037/1040-3590.13.1.73

Haynes, S. N., & Yoshioka, D. T. (2007). Clinical assessment applications of ambulatory biosensors. Psychological Assessment, 19(1), 44–57. https://doi.org/10.1037/1040-3590.19.1.44

Hays, P. A. (2016). Addressing cultural complexities in practice: Assessment, diagnosis, and therapy. American Psychological Association.

Hedge, C., Powell, G., & Sumner, P. (2018). The reliability paradox: Why robust cognitive tasks do not produce reliable individual differences. Behavior Research Methods, 50(3), 1166–1186. https://doi.org/10.3758/s13428-017-0935-1

Helms, J. E. (2006). Fairness is not validity or cultural bias in racial-group assessment: A quantitative perspective. American Psychologist, 61(8), 845–859. https://doi.org/10.1037/0003-066X.61.8.845

Helms, J. E., Jernigan, M., & Mascher, J. (2005). The meaning of race in psychology and how to change it: A methodological perspective. American Psychologist, 60(1), 27–36. https://doi.org/10.1037/0003-066X.60.1.27

Henrich, J., Heine, S. J., & Norenzayan, A. (2010). Most people are not WEIRD. Nature, 466(7302), 29–29. https://doi.org/10.1038/466029a

Henseler, J. (2021). Composite-based structural equation modeling: Analyzing latent and emergent variables. Guilford Publications.

Henseler, J., Ringle, C. M., & Sarstedt, M. (2015). A new criterion for assessing discriminant validity in variance-based structural equation modeling. Journal of the Academy of Marketing Science, 43(1), 115–135. https://doi.org/10.1007/s11747-014-0403-8

Hertzog, C., & Nesselroade, J. R. (2003). Assessing psychological change in adulthood: An overview of methodological issues. Psychology and Aging, 18(4), 639–657. https://doi.org/10.1037/0882-7974.18.4.639

Himmelstein, P. H., Woods, W. C., & Wright, A. G. C. (2019). A comparison of signal- and event-contingent ambulatory assessment of interpersonal behavior and affect in social situations. Psychological Assessment, 31(7), 952–960. https://doi.org/10.1037/pas0000718

Hinshaw, S. P., & Nigg, J. T. (1999). Behavior rating scales in the assessment of disruptive behavior problems in childhood. In D. Shaffer, C. P. Lucas, & J. E. Richters (Eds.), Diagnostic assessment in child and adolescent psychopathology. (pp. 91–126). The Guilford Press.

Hoch, S. J. (1985). Counterfactual reasoning and accuracy in predicting personal events. Journal of Experimental Psychology: Learning, Memory, and Cognition, 11(4), 719–731. https://doi.org/10.1037/0278-7393.11.1-4.719

Holmlund, T. B., Foltz, P. W., Cohen, A. S., Johansen, H. D., Sigurdsen, R., Fugelli, P., Bergsager, D., Cheng, J., Bernstein, J., Rosenfeld, E., & Elvevåg, B. (2019). Moving psychological assessment out of the controlled laboratory setting: Practical challenges. Psychological Assessment, 31(3), 292–303. https://doi.org/10.1037/pas0000647

Hough, S. E. (2016). Predicting the unpredictable: The tumultuous science of earthquake prediction. Princeton University Press.

Hove, D. ten, Jorgensen, T. D., & Ark, L. A. van der. (2022). Interrater reliability for multilevel data: A generalizability theory approach. Psychological Methods, 27(4), 650–666. https://doi.org/10.1037/met0000391

Howell, R. D., Breivik, E., & Wilcox, J. B. (2007). Reconsidering formative measurement. Psychological Methods, 12(2), 205–218. https://doi.org/10.1037/1082-989X.12.2.205

Hsiao, Y.-Y., Kwok, O.-M., & Lai, M. H. C. (2018). Evaluation of two methods for modeling measurement errors when testing interaction effects with observed composite scores. Educational and Psychological Measurement, 78(2), 181–202. https://doi.org/10.1177/0013164416679877

Huebner, A., & Lucht, M. (2019). Generalizability theory in R. Practical Assessment, Research & Evaluation, 24(5), 2. https://doi.org/10.7275/5065-gc10

Hunsley, J., Lee, C. M., Wood, J. M., & Taylor, W. (2015). Controversial and questionable assessment techniques. In S. O. Lilienfeld, S. J. Lynn, & J. M. Lohr (Eds.), Science and pseudoscience in clinical psychology (2nd ed., pp. 42–82). The Guilford Press.

Hunsley, J., & Mash, E. J. (2007). Evidence-based assessment. Annual Review of Clinical Psychology, 3, 29–51. https://doi.org/10.1146/annurev.clinpsy.3.022806.091419

Hurlburt, R. T. (1997). Randomly sampling thinking in the natural environment. Journal of Consulting and Clinical Psychology, 65(6), 941–949. https://doi.org/10.1037/0022-006X.65.6.941

Hussong, A. M., Bauer, D. J., Giordano, M. L., & Curran, P. J. (2020). Harmonizing altered measures in integrative data analysis: A methods analogue study. Behavior Research Methods. https://doi.org/10.3758/s13428-020-01472-7

Hussong, A. M., Curran, P. J., & Bauer, D. J. (2013). Integrative data analysis in clinical psychology research. Annual Review of Clinical Psychology, 9(1), 61–89. https://doi.org/10.1146/annurev-clinpsy-050212-185522

Hyndman, R. J., & Athanasopoulos, G. (2018). Forecasting: Principles and practice (2nd ed.). OTexts.

Jaccard, J., & Wan, C. K. (1995). Measurement error in the analysis of interaction effects between continuous predictors using multiple regression: Multiple indicator and structural equation approaches. Psychological Bulletin, 117(2), 348–357. https://doi.org/10.1037/0033-2909.117.2.348

Jacinto, S. B., Lewis, C. C., Braga, J. N., & Scott, K. (2018). A conceptual model for generating and validating in-session clinical judgments. Psychotherapy Research, 28(1), 91–105. https://doi.org/10.1080/10503307.2016.1169329

Jensen, A. R. (1980). Précis of bias in mental testing. Behavioral and Brain Sciences, 3(3), 325–333. https://doi.org/10.1017/S0140525X00005161

Jiang, Z. (2018). Using the linear mixed-effect model framework to estimate generalizability variance components in R. Methodology, 14(3), 133–142. https://doi.org/10.1027/1614-2241/a000149

John, L. K., Loewenstein, G., & Prelec, D. (2012). Measuring the prevalence of questionable research practices with incentives for truth telling. Psychological Science, 23(5), 524–532. https://doi.org/10.1177/0956797611430953

Johnson, J. E. V., & Bruce, A. C. (2001). Calibration of subjective probability judgments in a naturalistic setting. Organizational Behavior and Human Decision Processes, 85(2), 265–290. https://doi.org/10.1006/obhd.2000.2949

Johnson, P. E. (2022). rockchalk: Regression estimation and presentation. https://CRAN.R-project.org/package=rockchalk

Jonson, J. L., & Geisinger, K. F. (2022). Fairness in educational and psychological testing: Examining theoretical, research, practice, and policy implications of the 2014 standards. American Educational Research Association,.

Jorgensen, T. D., Kite, B. A., Chen, P.-Y., & Short, S. D. (2018). Permutation randomization methods for testing measurement equivalence and detecting differential item functioning in multiple-group confirmatory factor analysis. Psychological Methods, 23(4), 708–728. https://doi.org/10.1037/met0000152

Jorgensen, T. D., Pornprasertmanit, S., Schoemann, A. M., & Rosseel, Y. (2021). semTools: Useful tools for structural equation modeling. https://github.com/simsem/semTools/wiki

Kagan, J. (1969). The three faces of continuity in human development. In D. A. Goslin (Ed.), Handbook of socialization theory and research (pp. 983–1002). Rand McNally.

Kahneman, D. (2011). Thinking, fast and slow. Farrar, Straus, and Giroux.

Kaiser, H. F. (1960). The application of electronic computers to factor analysis. Educational and Psychological Measurement, 20(1), 141–151. https://doi.org/10.1177/001316446002000116

Kaiser, H. F. (1970). A second generation little jiffy. Psychometrika, 35(4), 401–415. https://doi.org/10.1007/BF02291817

Kaiser, H. F. (1974). An index of factorial simplicity. Psychometrika, 39(1), 31–36. https://doi.org/10.1007/BF02291575

Kaiser, H. F., & Rice, J. (1974). Little jiffy, mark IV. Educational and Psychological Measurement, 34(1), 111–117. https://doi.org/10.1177/001316447403400115

Karch, J. D. (2025). lavaangui: A web-based graphical interface for specifying lavaan models by drawing path diagrams. Structural Equation Modeling: A Multidisciplinary Journal, 32(6), 1077–1088. https://doi.org/10.1080/10705511.2024.2420678

Kazdin, A. E. (1995). Preparing and evaluating research reports. Psychological Assessment, 7(3), 228–237. https://doi.org/10.1037/1040-3590.7.3.228

Kelley, K. (2020). MBESS: The MBESS R package. http://nd.edu/~kkelley/site/MBESS.html

Kelley, K., & Pornprasertmanit, S. (2016). Confidence intervals for population reliability coefficients: Evaluation of methods, recommendations, and software for composite measures. Psychological Methods, 21(1), 69–92. https://doi.org/10.1037/a0040086

Kenny, D. A. (1979). Correlation and causality. New York: Wiley.

Keren, G. (1987). Facing uncertainty in the game of bridge: A calibration study. Organizational Behavior and Human Decision Processes, 39(1), 98–114. https://doi.org/10.1016/0749-5978(87)90047-1

Kessler, R. C., Bossarte, R. M., Luedtke, A., Zaslavsky, A. M., & Zubizarreta, J. R. (2020). Suicide prediction models: A critical review of recent research with recommendations for the way forward. Molecular Psychiatry, 25(1), 168–179. https://doi.org/10.1038/s41380-019-0531-0

Kievit, R. A., Brandmaier, A. M., Ziegler, G., Harmelen, A.-L. van, Mooij, S. M. M. de, Moutoussis, M., Goodyer, I., Bullmore, E., Jones, P. B., Fonagy, P., Lindenberger, U., & Dolan, R. J. (2018). Developmental cognitive neuroscience using latent change score models: A tutorial and applications. Developmental Cognitive Neuroscience, 33, 99–117. https://doi.org/10.1016/j.dcn.2017.11.007

Kievit, R., Frankenhuis, W., Waldorp, L., & Borsboom, D. (2013). Simpson’s paradox in psychological science: A practical guide. Frontiers in Psychology, 4(513). https://doi.org/10.3389/fpsyg.2013.00513

Klein, D. F., & Cleary, T. A. (1969). Platonic true scores: Further comment. Psychological Bulletin, 71(4), 278–280. https://doi.org/10.1037/h0026852

Kline, R. B. (2023). Principles and practice of structural equation modeling (5th ed.). Guilford Publications.

Kline, R. B. (2024). How to evaluate local fit (residuals) in large structural equation models. International Journal of Psychology, 59(6), 1293–1306. https://doi.org/10.1002/ijop.13252

Koehler, D. J., Brenner, L., & Griffin, D. (2002). The calibration of expert judgment: Heuristics and biases beyond the laboratory. In T. Gilovich, D. Griffin, & D. Kahneman (Eds.), Heuristics and biases: The psychology of intuitive judgment. Cambridge University Press.

Koriat, A., Lichtenstein, S., & Fischhoff, B. (1980). Reasons for confidence. Journal of Experimental Psychology: Human Learning and Memory, 6(2), 107–118. https://doi.org/10.1037/0278-7393.6.2.107

Korotitsch, W. J., & Nelson-Gray, R. O. (1999). An overview of self-monitoring research in assessment and treatment. Psychological Assessment, 11(4), 415–425. https://doi.org/10.1037/1040-3590.11.4.415

Kotov, R., Krueger, R. F., Watson, D., Achenbach, T. M., Althoff, R. R., Bagby, R. M., Brown, T. A., Carpenter, W. T., Caspi, A., Clark, L. A., Eaton, N. R., Forbes, M. K., Forbush, K. T., Goldberg, D., Hasin, D., Hyman, S. E., Ivanova, M. Y., Lynam, D. R., Markon, K., … Zimmerman, M. (2017). The hierarchical taxonomy of psychopathology (HiTOP): A dimensional alternative to traditional nosologies. Journal of Abnormal Psychology, 126(4), 454–477. https://doi.org/10.1037/abn0000258

Kotov, R., Krueger, R. F., Watson, D., Cicero, D. C., Conway, C. C., DeYoung, C. G., Eaton, N. R., Forbes, M. K., Hallquist, M. N., Latzman, R. D., Mullins-Sweatt, S. N., Ruggero, C. J., Simms, L. J., Waldman, I. D., Waszczuk, M. A., & Wright, A. G. C. (2021). The hierarchical taxonomy of psychopathology (HiTOP): A quantitative nosology based on consensus of evidence. Annual Review of Clinical Psychology, 17(1), 83–108. https://doi.org/10.1146/annurev-clinpsy-081219-093304

Kozak, M. J., & Cuthbert, B. N. (2016). The NIMH research domain criteria initiative: Background, issues, and pragmatics. Psychophysiology, 53(3), 286–297. https://doi.org/10.1111/psyp.12518

Kriegman, L. S., & Kriegman, G. (1965). The PaTE report: A new psychodynamic and therapeutic evaluative procedure. The Psychiatric Quarterly, 39(1), 646–674. https://doi.org/10.1007/BF01569493

Krosnick, J. A. (1999). Survey research. Annual Review of Psychology, 50, 537–567. https://doi.org/10.1146/annurev.psych.50.1.537

Krueger, R. F., Nichol, P. E., Hicks, B. M., Markon, K. E., Patrick, C. J., lacono, W. G., & McGue, M. (2004). Using latent trait modeling to conceptualize an alcohol problems continuum. Psychological Assessment, 16(2), 107–119. https://doi.org/10.1037/1040-3590.16.2.107

Kuhn, M. (2022). caret: Classification and regression training. https://github.com/topepo/caret/

Kuncel, N. R., & Hezlett, S. A. (2010). Fact and fiction in cognitive ability testing for admissions and hiring decisions. Current Directions in Psychological Science, 19(6), 339–345. https://doi.org/10.1177/0963721410389459

Kundu, S., Aulchenko, Y. S., & Janssens, A. C. J. W. (2020). PredictABEL: Assessment of risk prediction models. https://CRAN.R-project.org/package=PredictABEL

Lai, M. H. C. (2021). Adjusting for measurement noninvariance with alignment in growth modeling. Multivariate Behavioral Research, 1–18. https://doi.org/10.1080/00273171.2021.1941730

Lakens, D. (2024). When and how to deviate from a preregistration. Collabra: Psychology, 10(1). https://doi.org/10.1525/collabra.117094

Larson, M. J., & Carbine, K. A. (2017). Sample size calculations in human electrophysiology (EEG and ERP) studies: A systematic review and recommendations for increased rigor. International Journal of Psychophysiology, 111, 33–41. https://doi.org/10.1016/j.ijpsycho.2016.06.015

Lee, K., Bull, R., & Ho, R. M. H. (2013). Developmental changes in executive functioning. Child Development, 84(6), 1933–1953. https://doi.org/10.1111/cdev.12096

Lee Meeuw Kjoe, P. R., Agelink van Rentergem, J. A., Vermeulen, I. E., & Schagen, S. B. (2021). How to correct for computer experience in online cognitive testing? Assessment, 28(5), 1247–1255. https://doi.org/10.1177/1073191120911098

Lee, S., & Hershberger, S. (1990). A simple rule for generating equivalent models in covariance structure modeling. Multivariate Behavioral Research, 25(3), 313–334. https://doi.org/10.1207/s15327906mbr2503_4

Lek, K. M., & Van De Schoot, R. (2018). A comparison of the single, conditional and person-specific standard error of measurement: What do they measure and when to use them? Frontiers in Applied Mathematics and Statistics, 4(40). https://doi.org/10.3389/fams.2018.00040

Lele, S. R., Keim, J. L., & Solymos, P. (2019). ResourceSelection: Resource selection (probability) functions for use-availability data. https://github.com/psolymos/ResourceSelection

Leong, F. T. L., & Kalibatseva, Z. (2016). Threats to cultural validity in clinical diagnosis and assessment: Illustrated with the case of Asian Americans. In N. Zane, G. Bernal, & F. T. L. Leong (Eds.), Evidence-based psychological practice with ethnic minorities: Culturally informed research and clinical strategies (pp. 57–74). American Psychological Association.

Lewis-Fernández, R., Aggarwal, N. K., Bäärnhielm, S., Rohlof, H., Kirmayer, L. J., Weiss, M. G., Jadhav, S., Hinton, L., Alarcón, R. D., Bhugra, D., Groen, S., Dijk, R. van, Qureshi, A., Collazos, F., Rousseau, C., Caballero, L., Ramos, M., & Lu, F. (2014). Culture and psychiatric evaluation: Operationalizing cultural formulation for DSM-5. Psychiatry: Interpersonal and Biological Processes, 77(2), 130–154. https://doi.org/10.1521/psyc.2014.77.2.130

Lilienfeld, S. O. (2007). Psychological treatments that cause harm. Perspectives on Psychological Science, 2(1), 53–70. https://doi.org/10.1111/j.1745-6916.2007.00029.x

Lilienfeld, S. O. (2017). Psychology’s replication crisis and the grant culture: Righting the ship. Perspectives on Psychological Science, 12(4), 660–664. https://doi.org/10.1177/1745691616687745

Lilienfeld, S. O., Sauvigne, K., Lynn, S. J., Latzman, R. D., Cautin, R., & Waldman, I. D. (2015). Fifty psychological and psychiatric terms to avoid: A list of inaccurate, misleading, misused, ambiguous, and logically confused words and phrases. Frontiers in Psychology, 6. https://doi.org/10.3389/fpsyg.2015.01100

Lilienfeld, S. O., Wood, J. M., & Garb, H. N. (2000). The scientific status of projective techniques. Psychological Science in the Public Interest, 1, 27–66. https://doi.org/10.1111/1529-1006.002

Lindhiem, O., Petersen, I. T., Mentch, L. K., & Youngstrom, E. A. (2020). The importance of calibration in clinical psychology. Assessment, 27(4), 840–854. https://doi.org/10.1177/1073191117752055

Lindhiem, O., Yu, L., Grasso, D. J., Kolko, D. J., & Youngstrom, E. A. (2015). Adapting the posterior probability of diagnosis index to enhance evidence-based screening: An application to ADHD in primary care. Assessment, 22(2), 198–207. https://doi.org/10.1177/1073191114540748

Lindzey, G. (1952). Thematic apperception test: Interpretive assumptions and related empirical evidence. Psychological Bulletin, 49, 1–25. https://doi.org/10.1037/h0062363

Little, T. D. (2013). Longitudinal structural equation modeling. The Guilford Press.

Little, T. D., Cunningham, W. A., Shahar, G., & Widaman, K. F. (2002). To parcel or not to parcel: Exploring the question, weighing the merits. Structural Equation Modeling, 9(2), 151–173. https://doi.org/10.1207/S15328007SEM0902_1

Little, T. D., Preacher, K. J., Selig, J. P., & Card, N. A. (2007). New developments in latent variable panel analyses of longitudinal data. International Journal of Behavioral Development, 31(4), 357–365. https://doi.org/10.1177/0165025407077757

Little, T. D., Rhemtulla, M., Gibson, K., & Schoemann, A. M. (2013). Why the items versus parcels controversy needn’t be one. Psychological Methods, 18(3), 285–300. https://doi.org/10.1037/a0033266

Little, T. D., Slegers, D. W., & Card, N. A. (2006). A non-arbitrary method of identifying and scaling latent variables in SEM and MACS models. Structural Equation Modeling, 13(1), 59–72. https://doi.org/10.1207/s15328007sem1301_3

Liu, Y., Millsap, R. E., West, S. G., Tein, J.-Y., Tanaka, R., & Grimm, K. J. (2017). Testing measurement invariance in longitudinal data with ordered-categorical measures. Psychological Methods, 22(3), 486–506. https://doi.org/10.1037/met0000075

Lobbestael, J., Leurgans, M., & Arntz, A. (2011). Inter-rater reliability of the Structured Clinical Interview for DSM-IV Axis I Disorders (SCID I) and Axis II Disorders (SCID II). Clinical Psychology & Psychotherapy, 18(1), 75–79. https://doi.org/10.1002/cpp.693

Loevinger, J. (1957). Objective tests as instruments of psychological theory. Psychological Reports, 3(3), 635–694. https://doi.org/10.2466/pr0.1957.3.3.635

Loken, E., & Gelman, A. (2017). Measurement error and the replication crisis. Science, 355(6325), 584–585. https://doi.org/10.1126/science.aal3618

Lubke, G. H., McArtor, D. B., Boomsma, D. I., & Bartels, M. (2018). Genetic and environmental contributions to the development of childhood aggression. Developmental Psychology, 54(1), 39–50. https://doi.org/10.1037/dev0000403

Lüdecke, D., Ben-Shachar, M. S., Patil, I., Waggoner, P., & Makowski, D. (2021). performance: An R package for assessment, comparison and testing of statistical models. Journal of Open Source Software, 6(60), 3139. https://doi.org/10.21105/joss.03139

Lupien, S. J., Sasseville, M., François, N., Giguère, C. E., Boissonneault, J., Plusquellec, P., Godbout, R., Xiong, L., Potvin, S., Kouassi, E., & Lesage, A. (2017). The DSM5/RDoC debate on the future of mental health research: Implication for studies on human stress and presentation of the signature bank. Stress, 20(1), 2–18. https://doi.org/10.1080/10253890.2017.1286324

Lutz, W., Schwartz, B., & Delgadillo, J. (2022). Measurement-based and data-informed psychological therapy. Annual Review of Clinical Psychology, 18(1), 71–98. https://doi.org/10.1146/annurev-clinpsy-071720-014821

Lysell, H., Dahlin, M., Viktorin, A., Ljungberg, E., D’Onofrio, B. M., Dickman, P., & Runeson, B. (2018). Maternal suicide – register based study of all suicides occurring after delivery in sweden 1974–2009. PLOS ONE, 13(1), e0190133. https://doi.org/10.1371/journal.pone.0190133

MacCallum, R. C., & Austin, J. T. (2000). Applications of structural equation modeling in psychological research. Annual Review of Psychology, 51(1), 201–226. https://doi.org/10.1146/annurev.psych.51.1.201

Makridakis, S., Hogarth, R. M., & Gaba, A. (2009). Forecasting and uncertainty in the economic and business world. International Journal of Forecasting, 25(4), 794–812. https://doi.org/10.1016/j.ijforecast.2009.05.012

Manly, J. J. (2005). Advantages and disadvantages of separate norms for African Americans. The Clinical Neuropsychologist, 19(2), 270–275. https://doi.org/10.1080/13854040590945346

Manly, J. J., & Echemendia, R. J. (2007). Race-specific norms: Using the model of hypertension to understand issues of race, culture, and education in neuropsychology. Archives of Clinical Neuropsychology, 22(3), 319–325. https://doi.org/10.1016/j.acn.2007.01.006

Markon, K. E. (2019). Bifactor and hierarchical models: Specification, inference, and interpretation. Annual Review of Clinical Psychology, 15(1), 51–69. https://doi.org/10.1146/annurev-clinpsy-050718-095522

Markon, K. E., Chmielewski, M., & Miller, C. J. (2011). The reliability and validity of discrete and continuous measures of psychopathology: A quantitative review. Psychological Bulletin, 137(5), 856–879. https://doi.org/10.1037/a0023678

Markus, K. A. (2018). Three conceptual impediments to developing scale theory for formative scales. Methodology, 14(4), 156–164. https://doi.org/10.1027/1614-2241/a000154

Marsh, H. W., Morin, A. J. S., Parker, P. D., & Kaur, G. (2014). Exploratory structural equation modeling: An integration of the best features of exploratory and confirmatory factor analysis. Annual Review of Clinical Psychology, 10(1), 85–110. https://doi.org/10.1146/annurev-clinpsy-032813-153700

Masche, J. G., & Dulmen, M. H. M. van. (2004). Advances in disentangling age, cohort, and time effects: No quadrature of the circle, but a help. Developmental Review, 24(3), 322–342. https://doi.org/10.1016/j.dr.2004.04.002

Massey, C., & Thaler, R. H. (2013). The loser’s curse: Decision making and market efficiency in the National Football League draft. Management Science, 59(7), 1479–1495. https://doi.org/10.1287/mnsc.1120.1657

Matthews, M., Abdullah, S., Murnane, E., Voida, S., Choudhury, T., Gay, G., & Frank, E. (2016). Development and evaluation of a smartphone-based measure of social rhythms for bipolar disorder. Assessment, 23(4), 472–483. https://doi.org/10.1177/1073191116656794

McArdle, J. J., & Grimm, K. J. (2011). An empirical example of change analysis by linking longitudinal item response data from multiple tests. In A. A. von Davier (Ed.), Statistical models for test equating, scaling, and linking (pp. 71–88). Springer Science & Business Media.

McArdle, J. J., Grimm, K. J., Hamagami, F., Bowles, R. P., & Meredith, W. (2009). Modeling life-span growth curves of cognition using longitudinal data with multiple samples and changing scales of measurement. Psychological Methods, 14(2), 126–149. https://doi.org/10.1037/a0015857

McClelland, D. C. (1973). Testing for competence rather than for “intelligence.” American Psychologist, 28, 1–14. https://doi.org/10.1037/h0034092

McClelland, D. C. (1994). The knowledge-testing-educational complex strikes back. American Psychologist, 49(1), 66–69. https://doi.org/10.1037/0003-066X.49.1.66

McClelland, G. H., & Judd, C. M. (1993). Statistical difficulties of detecting interactions and moderator effects. Psychological Bulletin, 114(2), 376–390. https://doi.org/10.1037/0033-2909.114.2.376

McFall, R. M. (1991). Manifesto for a science of clinical psychology. The Clinical Psychologist, 44(6), 75–91.

McFall, R. M. (2000). Elaborate reflections on a simple manifesto. Applied & Preventive Psychology, 9(1), 5–21. https://doi.org/10.1016/s0962-1849(05)80035-6

McGraw, K. O., & Wong, S. P. (1996). Forming inferences about some intraclass correlation coefficients. Psychological Methods, 1(1), 30–46. https://doi.org/10.1037/1082-989X.1.1.30

McNally, R. J. (2021). Network analysis of psychopathology: Controversies and challenges. Annual Review of Clinical Psychology, 17(1), 31–53. https://doi.org/10.1146/annurev-clinpsy-081219-092850

McNeish, D. (2018). Thanks coefficient alpha, we’ll take it from here. Psychological Methods, 23(3), 412–433. https://doi.org/10.1037/met0000144

McNeish, D. (2026). How do psychologists determine whether a measurement scale is good? A quarter-century of scale validation with Hu & Bentler (1999). Annual Review of Psychology, 77, 8.1–8.25. https://doi.org/10.1146/annurev-psych-121924-104021

McNeish, D., & Wolf, M. G. (2023). Dynamic fit index cutoffs for confirmatory factor analysis models. Psychological Methods, 28(1), 61–88. https://doi.org/10.1037/met0000425

McNiel, D. E., & Binder, R. L. (1995). Correlates of accuracy in the assessment of psychiatric inpatients’ risk of violence. American Journal of Psychiatry, 152(6), 901–906. https://doi.org/10.1176/ajp.152.6.901

Meade, A. W. (2010). A taxonomy of effect size measures for the differential functioning of items and scales. Journal of Applied Psychology, 95(4), 728–743. https://doi.org/10.1037/a0018966

Meehl, P. E. (1957). When shall we use our heads instead of the formula? Journal of Counseling Psychology, 4(4), 268–273. https://doi.org/10.1037/h0047554

Meehl, P. E. (1978). Theoretical risks and tabular asterisks: Sir Karl, Sir Ronald, and the slow progress of soft psychology. Journal of Consulting and Clinical Psychology, 46(4), 806–834. https://doi.org/10.1037/0022-006x.46.4.806

Meehl, P. E. (1986). Causes and effects of my disturbing little book. Journal of Personality Assessment, 50(3), 370–375. https://doi.org/10.1207/s15327752jpa5003_6

Meehl, P. E., & Rosen, A. (1955). Antecedent probability and the efficiency of psychometric signs, patterns, or cutting scores. Psychological Bulletin, 52(3), 194–216. https://doi.org/10.1037/h0048070

Melikyan, Z. A., Agranovich, A. V., & Puente, A. E. (2019). Fairness in psychological testing. In G. Goldstein, D. N. Allen, & J. DeLuca (Eds.), Handbook of psychological assessment (fourth edition) (pp. 551–572). Academic Press. https://doi.org/10.1016/B978-0-12-802203-0.00018-3

Metz, C. E., Goodenough, D. J., & Rossmann, K. (1973). Evaluation of receiver operating characteristic curve data in terms of information theory, with applications in radiography. Radiology, 109(2), 297–303. https://doi.org/10.1148/109.2.297

Meyer, G. J., Erard, R. E., Erdberg, P., Mihura, J. L., & Viglione, D. J. (2011). Rorschach Performance Assessment System: Administration, coding, interpretation, and technical manual. Rorschach Performance Asessement Systems LLC.

Miller, G. A., Elbert, T., Sutton, B. P., & Heller, W. (2007). Innovative clinical assessment technologies: Challenges and opportunities in neuroimaging. Psychological Assessment, 19(1), 58–73. https://doi.org/10.1037/1040-3590.19.1.58

Miller, G. A., Rockstroh, B. S., Hamilton, H. K., & Yee, C. M. (2016). Psychophysiology as a core strategy in RDoC. Psychophysiology, 53(3), 410–414. https://doi.org/10.1111/psyp.12581

Miller, J. B., & Sanjurjo, A. (2014). A cold shower for the hot hand fallacy. Innocenzo Gasparini Institute for Economic Research. https://repec.unibocconi.it/igier/igi/wp/2014/518.pdf

Miller, J. L., Vaillancourt, T., & Boyle, M. H. (2009). Examining the heterotypic continuity of aggression using teacher reports: Results from a national Canadian study. Social Development, 18(1), 164–180. https://doi.org/10.1111/j.1467-9507.2008.00480.x

Millsap, R. E. (2011). Statistical approaches to measurement invariance. Taylor & Francis.

Moeller, J. (2015). A word on standardization in longitudinal studies: don’t. Frontiers in Psychology, 6(1389), 1–4. https://doi.org/10.3389/fpsyg.2015.01389

Moffitt, T. E. (1993). Adolescence-limited and life-course-persistent antisocial behavior: A developmental taxonomy. Psychological Review, 100(4), 674–701. https://doi.org/10.1037/0033-295X.100.4.674

Moffitt, T. E. (2006a). A review of research on the taxonomy of life-course persistent versus adolescence-limited antisocial behavior. Taking Stock: The Status of Criminological Theory, 15, 277–312.

Moffitt, T. E. (2006b). Life-course-persistent versus adolescence-limited antisocial behavior. In D. C. D. J. Cohen (Ed.), Developmental psychopathology, vol 3: Risk, disorder, and adaptation (2nd ed.) (pp. 570–598). John Wiley & Sons Inc.

Moore, C. T. (2016). gtheory: Apply generalizability theory with R. http://EvaluationDashboard.com

Morgan, C. D., & Murray, H. A. (1935). A method for investigating fantasies: The thematic apperception test. Archives of Neurology & Psychiatry, 34(2), 289–306. https://doi.org/10.1001/archneurpsyc.1935.02250200049005

Morley, S. K., Brito, T. V., & Welling, D. T. (2018). Measures of model performance based on the log accuracy ratio. Space Weather, 16(1), 69–88. https://doi.org/10.1002/2017SW001669

Mullins-Sweatt, S. N., & Widiger, T. A. (2009). Clinical utility and DSM-V. Psychological Assessment, 21(3), 302–312. https://doi.org/10.1037/a0016607

Murphy, A. H., & Winkler, R. L. (1984). Probability forecasting in meterology. Journal of the American Statistical Association, 79(387), 489–500. https://doi.org/10.2307/2288395

Muthén, L. K., & Muthén, B. O. (2002). How to use a Monte Carlo study to decide on sample size and determine power. Structural Equation Modeling: A Multidisciplinary Journal, 9(4), 599–620. https://doi.org/10.1207/s15328007sem0904_8

Muthén, L. K., & Muthén, B. O. (2019). Mplus version 8.4. Muthén & Muthén.

Myers, K., & Winters, N. C. (2002). Ten-year review of rating scales. I: Overview of scale functioning, psychometric properties, and selection. Journal of the American Academy of Child & Adolescent Psychiatry, 41(2), 114–122. https://doi.org/10.1097/00004583-200202000-00004

Nagy, T. F. (2011). Essential ethics for psychologists: A primer for understanding and mastering core issues (pp. x, 252–x, 252). American Psychological Association.

Nelson-Gray, R. O. (2003). Treatment utility of psychological assessment. Psychological Assessment, 15(4), 521–531. https://doi.org/10.1037/1040-3590.15.4.521

Newsom, J. T. (2015). Longitudinal structural equation modeling: A comprehensive introduction. Routledge.

Ng, J. C. K., & Chan, W. (2020). Latent moderation analysis: A factor score approach. Structural Equation Modeling: A Multidisciplinary Journal, 27(4), 629–648. https://doi.org/10.1080/10705511.2019.1664304

Nisbett, R. E., & Wilson, T. D. (1977). Telling more than we can know: Verbal reports on mental processes. Psychological Review, 84(3), 231–259. https://doi.org/10.1037/0033-295x.84.3.231

Nunnally, J. C., & Bernstein, I. H. (1994). Psychometric theory (3rd ed.). McGraw-Hill.

Nye, C. D., Bradburn, J., Olenick, J., Bialko, C., & Drasgow, F. (2019). How big are my effects? Examining the magnitude of effect sizes in studies of measurement equivalence. Organizational Research Methods, 22(3), 678–709. https://doi.org/10.1177/1094428118761122

Oberski, D. L. (2014). Evaluating sensitivity of parameters of interest to measurement invariance in latent variable models. Political Analysis, 22(1), 45–60. https://doi.org/10.1093/pan/mpt014

Oberski, D. L., Vermunt, J. K., & Moors, G. B. D. (2015). Evaluating measurement invariance in categorical data latent variable models with the EPC-interest. Political Analysis, 23(4), 550–563. https://doi.org/10.1093/pan/mpv020

Okazaki, S., & Sue, S. (1995). Methodological issues in assessment research with ethnic minorities. Psychological Assessment, 7(3), 367–375. https://doi.org/10.1037/1040-3590.7.3.367

Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251). https://doi.org/10.1126/science.aac4716

Orth, U., Clark, D. A., Donnellan, M. B., & Robins, R. W. (2021). Testing prospective effects in longitudinal research: Comparing seven competing cross-lagged models. Journal of Personality and Social Psychology, 120(4), 1013–1034. https://doi.org/10.1037/pspp0000358

Oskamp, S. (1965). Overconfidence in case-study judgments. Journal of Consulting Psychology, 29(3), 261–265. https://doi.org/10.1037/h0022125

Park, D. C., & Bischof, G. N. (2013). The aging mind: Neuroplasticity in response to cognitive training. Dialogues in Clinical Neuroscience, 15(1), 109–119. https://doi.org/10.31887/DCNS.2013.15.1/dpark

Partnow, S. (2021). The midrange theory: Basketball’s evolution in the age of analytics. Triumph Books.

Patrick, C. J., Iacono, W. G., & Venables, N. C. (2019). Incorporating neurophysiological measures into clinical assessments: Fundamental challenges and a strategy for addressing them. Psychological Assessment, 31(7), 952–960. https://doi.org/10.1037/pas0000713

Patterson, G. R. (1993). Orderly change in a stable world: The antisocial trait as a chimera. Journal of Consulting and Clinical Psychology, 61(6), 911–919. https://doi.org/10.1037/0022-006X.61.6.911

Paulus, J. K., & Kent, D. M. (2020). Predictably unequal: Understanding and addressing concerns that algorithmic clinical prediction may increase health disparities. Npj Digital Medicine, 3(1), 99. https://doi.org/10.1038/s41746-020-0304-9

Pearl, J. (2013). Linear models: A useful “microscope" for causal analysis. Journal of Causal Inference, 1(1), 155–170. https://doi.org/10.1515/jci-2013-0003

Peters, G.-J. (2014). The alpha and the omega of scale reliability and validity: Why and how to abandon Cronbach’s alpha and the route towards more comprehensive assessment of scale quality. European Health Psychologist, 16(2), 56–69.

Petersen, I. T. (2024a). Assessing externalizing behaviors in school-aged children: Implications for school and community providers. https://doi.org/10.17077/rep.006639

Petersen, I. T. (2024b). Reexamining developmental continuity and discontinuity in the 21st century: Better aligning behaviors, functions, and mechanisms. Developmental Psychology, 60(11), 1992–2007. https://doi.org/10.1037/dev0001657

Petersen, I. T. (2025). petersenlab: A collection of R functions by the Petersen Lab. https://doi.org/10.32614/CRAN.package.petersenlab

Petersen, I. T. (2026). Fantasy football analytics: Statistics, prediction, and empiricism using R. University of Iowa Libraries. https://isaactpetersen.github.io/Fantasy-Football-Analytics-Textbook

Petersen, I. T., Apfelbaum, K. S., & McMurray, B. (2024). Adapting open science and pre-registration to longitudinal research. Infant and Child Development, 33(1), e2315. https://doi.org/10.1002/icd.2315

Petersen, I. T., Bates, J. E., D’Onofrio, B. M., Coyne, C. A., Lansford, J. E., Dodge, K. A., Pettit, G. S., & Van Hulle, C. A. (2013). Language ability predicts the development of behavior problems in children. Journal of Abnormal Psychology, 122(2), 542–557. https://doi.org/10.1037/a0031963

Petersen, I. T., Bates, J. E., Dodge, K. A., Lansford, J. E., & Pettit, G. S. (2015). Describing and predicting developmental profiles of externalizing problems from childhood to adulthood. Development and Psychopathology, 27(3), 791–818. https://doi.org/10.1017/S0954579414000789

Petersen, I. T., Bates, J. E., McQuillan, M. E., Hoyniak, C. P., Staples, A. D., Rudasill, K. M., Molfese, D. L., & Molfese, V. J. (2021). Heterotypic continuity of inhibitory control in early childhood: Evidence from four widely used measures. Developmental Psychology, 57(11), 1755–1771. https://doi.org/10.1037/dev0001025

Petersen, I. T., Choe, D. E., & LeBeau, B. (2020). Studying a moving target in development: The challenge and opportunity of heterotypic continuity. Developmental Review, 58, 100935. https://doi.org/10.1016/j.dr.2020.100935

Petersen, I. T., Demko, Z., Doebler, P., Sabel, L., Oleson, J. J., & Krueger, R. F. (in press). How often is “often”? Improving assessment of the externalizing spectrum using absolute frequency. Psychological Assessment. https://doi.org/10.1037/pas0001441

Petersen, I. T., Demko, Z., Lee, W.-C., & Oleson, J. J. (in press). Studying development of psychopathology using changing measures to account for heterotypic continuity. JAACAP Open. https://doi.org/10.1016/j.jaacop.2025.10.008

Petersen, I. T., Hoyniak, C. P., McQuillan, M. E., Bates, J. E., & Staples, A. D. (2016). Measuring the development of inhibitory control: The challenge of heterotypic continuity. Developmental Review, 40, 25–71. https://doi.org/10.1016/j.dr.2016.02.001

Petersen, I. T., & LeBeau, B. (2021). Language ability in the development of externalizing behavior problems in childhood. Journal of Educational Psychology, 113(1), 68–85. https://doi.org/10.1037/edu0000461

Petersen, I. T., & LeBeau, B. (2022). Creating a developmental scale to chart the development of psychopathology with different informants and measures across time. Journal of Psychopathology and Clinical Science, 131(6), 611–625. https://doi.org/10.1037/abn0000649

Petersen, I. T., LeBeau, B., & Choe, D. E. (2021). Creating a developmental scale to account for heterotypic continuity in development: A simulation study. Child Development, 92(1), e1–e19. https://doi.org/10.1111/cdev.13433

Petersen, I. T., Lindhiem, O., LeBeau, B., Bates, J. E., Pettit, G. S., Lansford, J. E., & Dodge, K. A. (2018). Development of internalizing problems from adolescence to emerging adulthood: Accounting for heterotypic continuity with vertical scaling. Developmental Psychology, 54(3), 586–599. https://doi.org/10.1037/dev0000449

Petscher, Y., Justice, L. M., & Hogan, T. (2018). Modeling the early language trajectory of language development when the measures change and its relation to poor reading comprehension. Child Development, 89(6), 2136–2156. https://doi.org/10.1111/cdev.12880

Piasecki, T. M., Hufford, M. R., Solhan, M., & Trull, T. J. (2007). Assessing clients in their natural environments with electronic diaries: Rationale, benefits, limitations, and barriers. Psychological Assessment, 19(1), 25–43. https://doi.org/10.1037/1040-3590.19.1.25

Podsakoff, P. M., MacKenzie, S. B., & Podsakoff, N. P. (2012). Sources of method bias in social science research and recommendations on how to control it. Annual Review of Psychology, 63(1), 539–569. https://doi.org/10.1146/annurev-psych-120710-100452

Pornprasertmanit, S., Miller, P., Schoemann, A., & Jorgensen, T. D. (2021). simsem: SIMulated structural equation modeling. http://www.simsem.org

Posner, K., Brown, G. K., Stanley, B., Brent, D. A., Yershova, K. V., Oquendo, M. A., Currier, G. W., Melvin, G. A., Greenhill, L., Shen, S., & Mann, J. J. (2011). The Columbia–Suicide Severity Rating Scale: Initial validity and internal consistency findings from three multisite studies with adolescents and adults. American Journal of Psychiatry, 168(12), 1266–1277. https://doi.org/10.1176/appi.ajp.2011.10111704

Putnam, S. P., Rothbart, M. K., & Gartstein, M. A. (2008). Homotypic and heterotypic continuity of fine-grained temperament during infancy, toddlerhood, and early childhood. Infant & Child Development, 17(4), 387–405. https://doi.org/10.1002/ICD.582

Putnick, D. L., & Bornstein, M. H. (2016). Measurement invariance conventions and reporting: The state of the art and future directions for psychological research. Developmental Review, 41, 71–90. https://doi.org/10.1016/j.dr.2016.06.004

R Core Team. (2022). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.org/

Raiche, G., & Magis, D. (2020). nFactors: Parallel analysis and other non graphical solutions to the Cattell scree test. https://CRAN.R-project.org/package=nFactors

Raugh, I. M., Chapman, H. C., Bartolomeo, L. A., Gonzalez, C., & Strauss, G. P. (2019). A comprehensive review of psychophysiological applications for ecological momentary assessment in psychiatric populations. Psychological Assessment, 31(3), 304–317. https://doi.org/10.1037/pas0000651

Raykov, T. (2001). Bias of coefficient α for fixed congeneric measures with correlated errors. 25(1), 69–76. https://doi.org/10.1177/01466216010251005

Raykov, T., & Marcoulides, G. A. (2001). Can there be infinitely many models equivalent to a given covariance structure model? Structural Equation Modeling: A Multidisciplinary Journal, 8(1), 142–149. https://doi.org/10.1207/S15328007SEM0801_8

Raykov, T., & Marcoulides, G. A. (2019). Thanks coefficient alpha, we still need you! Educational and Psychological Measurement, 79(1), 200–210. https://doi.org/10.1177/0013164417725127

Raykov, T., Marcoulides, G. A., Harrison, M., & Zhang, M. (2020). On the dependability of a popular procedure for studying measurement invariance: A cause for concern? Structural Equation Modeling: A Multidisciplinary Journal, 27(4), 649–656. https://doi.org/10.1080/10705511.2019.1610409

Reise, S. P., & Waller, N. G. (2009). Item response theory and clinical measurement. Annual Review of Clinical Psychology, 5(1), 27–48. https://doi.org/10.1146/annurev.clinpsy.032408.153553

Revelle, W. (2022). psych: Procedures for psychological, psychometric, and personality research. https://personality-project.org/r/psych/

Revelle, W., & Condon, D. M. (2019). Reliability from α to ω: A tutorial. Psychological Assessment, 31(12), 1395–1411. https://doi.org/10.1037/pas0000754

Revelle, W., & Rocklin, T. (1979). Very simple structure: An alternative procedure for estimating the optimal number of interpretable factors. Multivariate Behavioral Research, 14(4), 403–414. https://doi.org/10.1207/s15327906mbr1404_2

Reynolds, C. R., Altmann, R. A., & Allen, D. N. (2021). The problem of bias in psychological assessment. In C. R. Reynolds, R. A. Altmann, & D. N. Allen (Eds.), Mastering modern psychological testing: Theory and methods (pp. 573–613). Springer International Publishing. https://doi.org/10.1007/978-3-030-59455-8_15

Reynolds, C. R., & Suzuki, L. A. (2012). Bias in psychological assessment: An empirical review and recommendations. In I. B. Weiner, J. R. Graham, & J. A. Naglieri (Eds.), Handbook of psychology, Vol. 10: Assessment psychology, Part 1: Assessment issues (2nd ed., pp. 82–113).

Rhemtulla, M., & Savalei, V. (2025). Estimated factor scores are not true factor scores. Multivariate Behavioral Research, 60(3), 598–619. https://doi.org/10.1080/00273171.2024.2444943

Rice, M. E., Harris, G. T., & Lang, C. (2013). Validation of and revision to the VRAG and SORAG: The Violence Risk Appraisal Guide—Revised (VRAG-R). Psychological Assessment, 25(3), 951–965. https://doi.org/10.1037/a0032878

Ridley, C. R., Hill, C. L., & Wiese, D. L. (2001). Ethics in multicultural assessment a model of reasoned application. In D. L. Wiese (Ed.), Handbook of multicultural assessment: Clinical, psychological, and educational applications (p. 29).

Ridley, C. R., Li, L. C., & Hill, C. L. (1998). Multicultural assessment: Reexamination, reconceptualization, and practical application. The Counseling Psychologist, 26(6), 827–910. https://doi.org/10.1177/0011000098266001

Rigdon, E. E. (2010). Polychoric correlation coefficient. In N. J. Salkind (Ed.), Encyclopedia of research design. SAGE Publications. https://doi.org/10.4135/9781412961288

Rigdon, E. E., Becker, J.-M., & Sarstedt, M. (2019a). Factor indeterminacy as metrological uncertainty: Implications for advancing psychological measurement. Multivariate Behavioral Research, 54(3), 429–443. https://doi.org/10.1080/00273171.2018.1535420

Rigdon, E. E., Becker, J.-M., & Sarstedt, M. (2019b). Parceling cannot reduce factor indeterminacy in factor analysis: A research note. Psychometrika, 84(3), 772–780. https://doi.org/10.1007/s11336-019-09677-2

Rivera Mindt, M., Byrd, D., Saez, P., & Manly, J. (2010). Increasing culturally competent neuropsychological services for ethnic minority populations: A call to action. The Clinical Neuropsychologist, 24(3), 429–453. https://doi.org/10.1080/13854040903058960

Roberts, A. C., Yeap, Y. W., Seah, H. S., Chan, E., Soh, C.-K., & Christopoulos, G. I. (2019). Assessing the suitability of virtual reality for psychological testing. Psychological Assessment, 31(3), 318–328. https://doi.org/10.1037/pas0000663

Robin, X., Turck, N., Hainard, A., Tiberti, N., Lisacek, F., Sanchez, J.-C., & Müller, M. (2021). pROC: Display and analyze ROC curves. http://expasy.org/tools/pROC/

Robitzsch, A. (2019). mnlfa: Moderated nonlinear factor analysis. https://CRAN.R-project.org/package=mnlfa

Rodebaugh, T. L., Scullin, R. B., Langer, J. K., Dixon, D. J., Huppert, J. D., Bernstein, A., Zvielli, A., & Lenze, E. J. (2016). Unreliability as a threat to understanding psychopathology: The cautionary tale of attentional bias. Journal of Abnormal Psychology, 125(6), 840–851. https://doi.org/10.1037/abn0000184

Roemer, E., Schuberth, F., & Henseler, J. (2021). HTMT2–an improved criterion for assessing discriminant validity in structural equation modeling. Industrial Management & Data Systems, 121(12), 2637–2650. https://doi.org/10.1108/IMDS-02-2021-0082

Rogosa, D. R., & Willett, J. B. (1983). Demonstrating the reliability of the difference score in the measurement of change. Journal of Educational Measurement, 20(4), 335–343. https://doi.org/10.1111/j.1745-3984.1983.tb00211.x

Rönkkö, M., & Cho, E. (2020). An updated guideline for assessing discriminant validity. Organizational Research Methods, 1094428120968614. https://doi.org/10.1177/1094428120968614

Rosseel, Y., Jorgensen, T. D., & Rockwood, N. (2022). lavaan: Latent variable analysis. https://lavaan.ugent.be

Royal, K. (2016). “Face validity” is not a legitimate type of validity evidence! The American Journal of Surgery, 212(5), 1026–1027. https://doi.org/10.1016/j.amjsurg.2016.02.018

Ruiz, M. A., Drake, E. B., Glass, A., Marcotte, D., & Gorp, W. G. van. (2002). Trying to beat the system: Misuse of the internet to assist in avoiding the detection of psychological symptom dissimulation. Professional Psychology: Research and Practice, 33(3), 294–299. https://doi.org/10.1037/0735-7028.33.3.294

Ruscio, J., & Roche, B. (2012). Determining the number of factors to retain in an exploratory factor analysis using comparison data of known factorial structure. Psychological Assessment, 24(2), 282–292. https://doi.org/10.1037/a0025697

Rush, A. J., First, M. B., & Blacker, D. (2009). Handbook of psychiatric measures. American Psychiatric Publishing.

Rushton, J. P., Brainerd, C. J., & Pressley, M. (1983). Behavioral development and construct validity: The principle of aggregation. Psychological Bulletin, 94(1), 18–38. https://doi.org/10.1037/0033-2909.94.1.18

Russo, J. E., & Schoemaker, P. J. (1992). Managing overconfidence. Sloan Management Review, 33(2), 7.

Sackett, P. R., Borneman, M. J., & Connelly, B. S. (2008). High stakes testing in higher education and employment: Appraising the evidence for validity and fairness. American Psychologist, 63, 215–227. https://doi.org/10.1037/0003-066X.63.4.215

Sackett, P. R., Schmitt, N., Ellingson, J. E., & Kabin, M. B. (2001). High-stakes testing in employment, credentialing, and higher education. American Psychologist, 56, 301–318. https://doi.org/10.1037/0003-066X.56.4.302

Sackett, P. R., & Wilk, S. L. (1994). Within-group norming and other forms of score adjustment in preemployment testing. American Psychologist, 49(11), 929–954. https://doi.org/10.1037/0003-066X.49.11.929

Sarstedt, M., Adler, S. J., Ringle, C. M., Cho, G., Diamantopoulos, A., Hwang, H., & Liengaard, B. D. (2024). Same model, same data, but different outcomes: Evaluating the impact of method choices in structural equation modeling. Journal of Product Innovation Management, 41(6), 1100–1117. https://doi.org/10.1111/jpim.12738

Sattler, J. M., & Hoge, R. D. (2006). Assessment of children: Behavioral, social, and clinical foundations (5th ed.). Jerome M. Sattler, Publisher, Inc.

Sayal, K., Wyatt, L., Partlett, C., Ewart, C., Bhardwaj, A., Dubicka, B., Marshall, T., Gledhill, J., Lang, A., Sprange, K., Thomson, L., Moody, S., Holt, G., Bould, H., Upton, C., Keane, M., Cox, E., James, M., & Montgomery, A. (2025). The clinical and cost effectiveness of a STAndardised DIagnostic Assessment for children and adolescents with emotional difficulties: The STADIA multi-centre randomised controlled trial. Journal of Child Psychology and Psychiatry, 66(6), 805–820. https://doi.org/10.1111/jcpp.14090

Schaefer, J. D., Caspi, A., Belsky, D. W., Harrington, H., Houts, R., Horwood, L. J., Hussong, A., Ramrakha, S., Poulton, R., & Moffitt, T. E. (2017). Enduring mental health: Prevalence and prediction. Journal of Abnormal Psychology, 126(2), 212–224. https://doi.org/10.1037/abn0000232

Schaie, K. W. (1965). A general model for the study of developmental problems. Psychological Bulletin, 64(2), 92–107. https://doi.org/10.1037/h0022371

Schaie, K. W. (2005). Developmental influences on adult intelligence: The Seattle longitudinal study. Oxford University Press.

Schaie, K. W., & Baltes, P. B. (1975). On sequential strategies in developmental research. Human Development, 18(5), 384–390. https://doi.org/10.1159/000271498

Schamberger, T., Schuberth, F., & Henseler, J. (2023). Confirmatory composite analysis in human development research. International Journal of Behavioral Development, 47(1), 89–100. https://doi.org/10.1177/01650254221117506

Schmidt, F. L., & Hunter, J. E. (1981). Employment testing: Old theories and new research findings. American Psychologist, 36(10), 1128–1137. https://doi.org/10.1037/0003-066X.36.10.1128

Schmidt, F. L., & Hunter, J. E. (1996). Measurement error in psychological research: Lessons from 26 research scenarios. Psychological Methods, 1(2), 199–223. https://doi.org/10.1037/1082-989X.1.2.199

Schneider, W. J. (2021). simstandard: Generate standardized data. https://github.com/wjschne/simstandard

Schreiber, J. B., Nora, A., Stage, F. K., Barlow, E. A., & King, J. (2006). Reporting structural equation modeling and confirmatory factor analysis results: A review. Journal of Educational Research, 99(6), 323–337. https://doi.org/10.3200/JOER.99.6.323-338

Schuberth, F. (2023). The Henseler-Ogasawara specification of composites in structural equation modeling: A tutorial. Psychological Methods, 28(4), 843–859. https://doi.org/10.1037/met0000432

Schulenberg, J. E., & Maslowsky, J. (2009). Taking substance use and development seriously: Developmentally distal and proximal influences on adolescence drug use. Monographs of the Society for Research in Child Development, 74(3), 121–130. https://doi.org/10.1111/j.1540-5834.2009.00544.x

Schulenberg, J. E., Patrick, M. E., Maslowsky, J., & Maggs, J. L. (2014). The epidemiology and etiology of adolescent substance use in developmental perspective. In M. Lewis & K. D. Rudolph (Eds.), Handbook of developmental psychopathology (pp. 601–620). Springer US.

Schulenberg, J. E., & Zarrett, N. R. (2006). Mental health during emerging adulthood: Continuity and discontinuity in courses, causes, and functions. In Emerging adults in america: Coming of age in the 21st century. (pp. 135–172). American Psychological Association.

Sechrest, L. (1963). Incremental validity: A recommendation. Educational and Psychological Measurement, 23, 153–158. https://doi.org/10.1177/001316446302300113

Sechrest, L., Stickle, T. R., & Stewart, M. (1998). The role of assessment in clinical psychology. In A. Bellack, M. Hersen, & C. R. Reynolds (Eds.), Comprehensive clinical psychology, Vol. 4: Assessment. Pergamon.

Sellbom, M. (2019). The MMPI-2-restructured form (MMPI-2-RF): Assessment of personality and psychopathology in the twenty-first century. Annual Review of Clinical Psychology, 15(1), 149–177. https://doi.org/10.1146/annurev-clinpsy-050718-095701

Sellbom, M., & Tellegen, A. (2019). Factor analysis in psychological assessment research: Common pitfalls and recommendations. Psychological Assessment, 31(12), 1428–1441. https://doi.org/10.1037/pas0000623

Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-experimental designs for generalized causal inference. Houghton Mifflin.

Sharp, K. L., Williams, A. J., Rhyner, K. T., & Ilardi, S. S. (2013). The clinical interview. In K. F. Geisinger, J. F. Carlson, J.-I. C. Hansen, N. R. Kuncel, S. P. Reise, & M. C. Rodriguez (Eds.), APA handbook of testing and assessment in psychology, Vol. 2: Testing and assessment in clinical and counseling psychology (pp. 103–117). American Psychological Association.

Shavelson, R. J., Webb, N. M., & Rawley, R. L. (1989). Generalizability theory. American Psychologist, 44, 922–932. https://doi.org/10.1037/0003-066X.44.6.922

Shiffman, S., Stone, A. A., & Hufford, M. R. (2008). Ecological momentary assessment. Annual Review of Clinical Psychology, 4, 1–32. https://doi.org/10.1146/annurev.clinpsy.3.022806.091415

Shin, H. J., Rabe-Hesketh, S., & Wilson, M. (2019). Trifactor models for multiple-ratings data. Multivariate Behavioral Research, 54(3), 360–381. https://doi.org/10.1080/00273171.2018.1530091

Shrout, P. E., & Fleiss, J. L. (1979). Intraclass correlations: Uses in assessing rater reliability. Psychological Bulletin, 86(2), 420–428. https://doi.org/10.1037/0033-2909.86.2.420

Shrout, P. E., & Rodgers, J. L. (2018). Psychology, science, and knowledge construction: Broadening perspectives from the replication crisis. Annual Review of Psychology, 69(1), 487–510. https://doi.org/10.1146/annurev-psych-122216-011845

Sijtsma, K. (2008). On the use, the misuse, and the very limited usefulness of cronbach’s alpha. Psychometrika, 74(1), 107. https://doi.org/10.1007/s11336-008-9101-0

Silver, N. (2012). The signal and the noise: Why so many predictions fail–but some don’t. Penguin.

Silverberg, N. D., & Millis, S. R. (2009). Impairment versus deficiency in neuropsychological assessment: Implications for ecological validity. Journal of the International Neuropsychological Society, 15(1), 94–102. https://doi.org/10.1017/S1355617708090139

Simms, L. J., Zelazny, K., Williams, T. F., & Bernstein, L. (2019). Does the number of response options matter? Psychometric perspectives using personality questionnaire data. Psychological Assessment, 31(4), 557–566. https://doi.org/10.1037/pas0000648

Skala, D. (2008). Overconfidence in psychology and finance–an interdisciplinary literature review. Bank i Kredyt, 4, 33–50.

Slack, M. K., & Draugalis, J., Jolaine R. (2001). Establishing the internal and external validity of experimental studies. American Journal of Health-System Pharmacy, 58(22), 2173–2181. https://doi.org/10.1093/ajhp/58.22.2173

Smedley, A., & Smedley, B. D. (2005). Race as biology is fiction, racism as a social problem is real: Anthropological and historical perspectives on the social construction of race. American Psychologist, 60(1), 16–26. https://doi.org/10.1037/0003-066X.60.1.16

Smith, G. T., Atkinson, E. A., Davis, H. A., Riley, E. N., & Oltmanns, J. R. (2020). The general factor of psychopathology. Annual Review of Clinical Psychology, 16(1), 75–98. https://doi.org/10.1146/annurev-clinpsy-071119-115848

Smith, G. T., McCarthy, D. M., & Anderson, K. G. (2000). On the sins of short-form development. Psychological Assessment, 12(1), 102–111. https://doi.org/10.1037/1040-3590.12.1.102

Sobell, L. C., & Sobell, M. B. (2008). Timeline followback (TLFB). In A. J. Rush Jr., M. B. First, & D. Blacker (Eds.), Handbook of psychiatric measures (2nd ed., pp. 466–468). American Psychiatric Publishing.

Sommers-Flanagan, J., & Sommers-Flanagan, R. (2016). Clinical interviewing. Wiley.

Somoza, E., Soutullo-Esperon, L., & Mossman, D. (1989). Evaluation and optimization of diagnostic tests using receiver operating characteristic analysis and information theory. International Journal of Bio-Medical Computing, 24(3), 153–189. https://doi.org/10.1016/0020-7101(89)90029-9

Stanislaw, H., & Todorov, N. (1999). Calculation of signal detection theory measures. Behavior Research Methods, Instruments, & Computers, 31(1), 137–149. https://doi.org/10.3758/bf03207704

Stanton, K., McDonnell, C. G., Hayden, E. P., & Watson, D. (2020). Transdiagnostic approaches to psychopathology measurement: Recommendations for measure selection, data analysis, and participant recruitment. Journal of Abnormal Psychology, 129(1), 21–28. https://doi.org/10.1037/abn0000464

Staples, A. D., Bates, J. E., Petersen, I. T., McQuillan, M. E., & Hoyniak, C. (2019). Measuring sleep in young children and their mothers: Identifying actigraphic sleep composites. International Journal of Behavioral Development, 43(3), 278–285. https://doi.org/10.1177/0165025419830236

Sternberg, R. J., Grigorenko, E. L., & Kidd, K. K. (2005). Intelligence, race, and genetics. American Psychologist, 60(1), 46–59. https://doi.org/10.1037/0003-066x.60.1.46

Stevens, R. J., & Poppe, K. K. (2020). Validation of clinical prediction models: What does the “calibration slope” really measure? Journal of Clinical Epidemiology, 118, 93–99. https://doi.org/10.1016/j.jclinepi.2019.09.016

Stevens, S. S. (1946). On the theory of scales of measurement. Science, 103(2684), 677–680. https://doi.org/10.1126/science.103.2684.677

Steyerberg, E. W., & Vergouwe, Y. (2014). Towards better clinical prediction models: Seven steps for development and an ABCD for validation. European Heart Journal, 35(29), 1925–1931. https://doi.org/10.1093/eurheartj/ehu207

Steyerberg, E. W., Vickers, A. J., Cook, N. R., Gerds, T., Gonen, M., Obuchowski, N., Pencina, M. J., & Kattan, M. W. (2010). Assessing the performance of prediction models: A framework for traditional and novel measures. Epidemiology, 21(1), 128–138. https://doi.org/10.1097/EDE.0b013e3181c30fb2

Stone, A. A., Schneider, S., & Smyth, J. M. (2023). Evaluation of pressing issues in ecological momentary assessment. Annual Review of Clinical Psychology, 19(1), 107–131. https://doi.org/10.1146/annurev-clinpsy-080921-083128

Strauss, M. E., & Smith, G. T. (2009). Construct validity: Advances in theory and methodology. Annual Review of Clinical Psychology, 5(1), 1–25. https://doi.org/10.1146/annurev.clinpsy.032408.153639

Sullivan, H. S. (1970). The psychiatric interview. Norton.

Summerfeldt, L. J., Kloosterman, P. H., & Antony, M. M. (2010). Structured and semistructured diagnostic interviews. In M. M. Antony & D. H. Barlow (Eds.), Handbook of assessment and treatment planning for psychological disorders (2nd ed., pp. 95–137). Guilford Press.

Suzuki, L. A., Onoue, M. A., & Hill, J. S. (2013). Clinical assessment: A multicultural perspective. In K. F. Geisinger, J. F. Carlson, J.-I. C. Hansen, N. R. Kuncel, S. P. Reise, & M. C. Rodriguez (Eds.), APA handbook of testing and assessment in psychology, Vol. 2: Testing and assessment in clinical and counseling psychology (pp. 193–212). American Psychological Association.

Swets, J. A., Dawes, R. M., & Monahan, J. (2000). Psychological science can improve diagnostic decisions. Psychological Science in the Public Interest, 1, 1–26. https://doi.org/10.1111/1529-1006.001

Tackett, J. L., Brandes, C. M., King, K. M., & Markon, K. E. (2019). Psychology’s replication crisis and clinical psychological science. Annual Review of Clinical Psychology, 15(1), 579–604. https://doi.org/10.1146/annurev-clinpsy-050718-095710

Tackett, J. L., Brandes, C. M., & Reardon, K. W. (2019). Leveraging the open science framework in clinical psychological assessment research. Psychological Assessment, 31(12), 1386–1394. https://doi.org/10.1037/pas0000583

Tackett, J. L., Lang, J. W. B., Markon, K. E., & Herzhoff, K. (2019). A correlated traits, correlated methods model for thin-slice child personality assessment. Psychological Assessment, 31(4), 545–556. https://doi.org/10.1037/pas0000635

Tervalon, M., & Murray-Garcia, J. (1998). Cultural humility versus cultural competence: A critical distinction in defining physician training outcomes in multicultural education. Journal of Health Care for the Poor and Underserved, 9(2), 117–125.

Tetlock, P. E. (2017). Expert political judgment: How good is it? How can we know? - New edition. Princeton University Press.

Textor, J., van der Zander, B., & Ankan, A. (2021). dagitty: Graphical analysis of structural causal models. https://CRAN.R-project.org/package=dagitty

Textor, J., Zander, B. van der, Gilthorpe, M. S., Liśkiewicz, M., & Ellison, G. T. (2017). Robust causal inference using directed acyclic graphs: The R package “dagitty”. International Journal of Epidemiology, 45(6), 1887–1894. https://doi.org/10.1093/ije/dyw341

Thomas, M. L. (2019). Advances in applications of item response theory to clinical assessment. Psychological Assessment, 31(12), 1442–1455. https://doi.org/10.1037/pas0000597

Thorndike, R. L. (1971). Concepts of culture-fairness. Journal of Educational Measurement, 8(2), 63–70. https://doi.org/10.1111/j.1745-3984.1971.tb00907.x

Tiego, J., Martin, E. A., DeYoung, C. G., Hagan, K., Cooper, S. E., Pasion, R., Satchell, L., Shackman, A. J., Bellgrove, M. A., Fornito, A., Abend, R., Goulter, N., Eaton, N. R., Kaczkurkin, A. N., & and, R. N. (2023). Precision behavioral phenotyping as a strategy for uncovering the biological correlates of psychopathology. Nature Mental Health, 1, 304–315. https://doi.org/10.1038/s44220-023-00057-5

Timmerman, M. E., De Bildt, A., & Urban, J. (in press). The GRoNC: Guidelines for reporting on norm-referenced and criterion-referenced scores. Assessment. https://doi.org/10.1177/10731911251371395

Tofallis, C. (2015). A better measure of relative prediction accuracy for model selection and model estimation. Journal of the Operational Research Society, 66(8), 1352–1362. https://doi.org/10.1057/jors.2014.103

Tong, Y., & Kolen, M. J. (2007). Comparisons of methodologies and results in vertical scaling for educational achievement tests. Applied Measurement in Education, 20(2), 227–253. https://doi.org/10.1080/08957340701301207

Toomey, R. B., Syvertsen, A. K., & Shramko, M. (2018). Transgender adolescent suicide behavior. Pediatrics, 142(4). https://doi.org/10.1542/peds.2017-4218

Trafimow, D. (2015). A defense against the alleged unreliability of difference scores. Cogent Mathematics, 2(1), 1064626. https://doi.org/10.1080/23311835.2015.1064626

Trafimow, D., Hyman, M. R., & Kostyk, A. (2025). Enhancing predictive power by unamalgamating multi-item scales. Psychological Methods, 30(5), 1043–1055. https://doi.org/10.1037/met0000599

Treat, T. A., McFall, R. M., Viken, R. J., Kruschke, J. K., Nosofsky, R. M., & Wang, S. S. (2007). Clinical cognitive science: Applying quantitative models of cognitive processing to examine cognitive aspects of psychopathology. In R. W. J. Neufeld (Ed.), Advances in clinical cognitive science: Formal modeling of processes and symptoms (pp. 179–205). American Psychological Association.

Treat, T. A., & Viken, R. J. (2023). Measuring test performance with signal detection theory techniques. In H. Cooper, M. N. Coutanche, L. M. McMullen, A. T. Panter, D. Rindskopf, & K. J. Sher (Eds.), APA handbook of research methods in psychology: Foundations, planning, measures, and psychometrics (2nd ed., Vol. 1, pp. 837–858). American Psychological Association.

Treiblmaier, H., Bentler, P. M., & Mair, P. (2011). Formative constructs implemented via common factors. Structural Equation Modeling: A Multidisciplinary Journal, 18(1), 1–17. https://doi.org/10.1080/10705511.2011.532693

Trull, T. J., & Ebner-Priemer, U. (2013). Ambulatory assessment. Annual Review of Clinical Psychology, 9, 151–176. https://doi.org/10.1146/annurev-clinpsy-050212-185510

Trull, T. J., & Ebner-Priemer, U. W. (2020). Ambulatory assessment in psychopathology research: A review of recommended reporting guidelines and current practices. Journal of Abnormal Psychology, 129(1), 56–63. https://doi.org/10.1037/abn0000473

Tversky, A., & Kahneman, D. (1974). Judgment under uncertainty: Heuristics and biases. Science, 185(4157), 1124–1131. https://doi.org/10.1126/science.185.4157.1124

Ursenbach, J., O’Connell, M. E., Neiser, J., Tierney, M. C., Morgan, D., Kosteniuk, J., & Spiteri, R. J. (2019). Scoring algorithms for a computer-based cognitive screening tool: An illustrative example of overfitting machine learning approaches and the impact on estimates of classification accuracy. Psychological Assessment, 31(11), 1377–1382. https://doi.org/10.1037/pas0000764

Van De Schoot, R., Kluytmans, A., Tummers, L., Lugtig, P., Hox, J., & Muthen, B. (2013). Facing off with scylla and charybdis: A comparison of scalar, partial, and the novel possibility of approximate measurement invariance. Frontiers in Psychology, 4(770). https://doi.org/10.3389/fpsyg.2013.00770

Van De Schoot, R., Schmidt, P., De Beuckelaer, A., Lek, K., & Zondervan-Zwijnenburg, M. (2015). Editorial: Measurement invariance. Frontiers in Psychology, 6(1064). https://doi.org/10.3389/fpsyg.2015.01064

van der Nest, G., Lima Passos, V., Candel, M. J. J. M., & van Breukelen, G. J. P. (2020). An overview of mixture modelling for latent evolutions in longitudinal data: Modelling approaches, fit statistics and software. Advances in Life Course Research, 43, 100323. https://doi.org/10.1016/j.alcr.2019.100323

Vaz, S., Falkmer, T., Passmore, A. E., Parsons, R., & Andreou, P. (2013). The case for using the repeatability coefficient when calculating test–retest reliability. PLOS ONE, 8(9), e73990. https://doi.org/10.1371/journal.pone.0073990

Vispoel, W. P., Hong, H., & Lee, H. (2023). Benefits of doing generalizability theory analyses within structural equation modeling frameworks: Illustrations using the Rosenberg self-esteem scale. Structural Equation Modeling: A Multidisciplinary Journal, 1–17. https://doi.org/10.1080/10705511.2023.2187734

Vispoel, W. P., Lee, H., & Hong, H. (2024). Analyzing multivariate generalizability theory designs within structural equation modeling frameworks. Structural Equation Modeling: A Multidisciplinary Journal, 31(3), 552–570. https://doi.org/10.1080/10705511.2023.2222913

Vispoel, W. P., Lee, H., Xu, G., & Hong, H. (2022). Integrating bifactor models into a generalizability theory based structural equation modeling framework. The Journal of Experimental Education, 1–21. https://doi.org/10.1080/00220973.2022.2092833

Vispoel, W. P., Morris, C. A., & Kilinc, M. (2018). Applications of generalizability theory and their relations to classical test theory and structural equation modeling. Psychological Methods, 23(1), 1–26. https://doi.org/10.1037/met0000107

Vispoel, W. P., Morris, C. A., & Kilinc, M. (2019). Using generalizability theory with continuous latent response variables. Psychological Methods, 24(2), 153–178. https://doi.org/10.1037/met0000177

Voorhees, C. M., Brady, M. K., Calantone, R., & Ramirez, E. (2016). Discriminant validity testing in marketing: An analysis, causes for concern, and proposed remedies. Journal of the Academy of Marketing Science, 44(1), 119–134. https://doi.org/10.1007/s11747-015-0455-4

Wainer, H. (1976). Estimating coefficients in linear models: It don’t make no nevermind. Psychological Bulletin, 83(2), 213–217. https://doi.org/10.1037/0033-2909.83.2.213

Wakschlag, L. S., Tolan, P. H., & Leventhal, B. L. (2010). Research review: “Ain’t misbehavin”: Towards a developmentally-specified nosology for preschool disruptive behavior. Journal of Child Psychology and Psychiatry, 51(1), 3–22. https://doi.org/10.1111/j.1469-7610.2009.02184.x

Wang, S., Jiao, H., & Zhang, L. (2013). Validation of longitudinal achievement constructs of vertically scaled computerised adaptive tests: A multiple-indicator, latent-growth modelling approach. International Journal of Quantitative Research in Education, 1(4), 383–407. https://doi.org/10.1504/IJQRE.2013.058307

Wang, T., Merkle, E. C., & Zeileis, A. (2014). Score-based tests of measurement invariance: Use in practice. Frontiers in Psychology, 5. https://doi.org/10.3389/fpsyg.2014.00438

Wang, W.-C., Shih, C.-L., & Yang, C.-C. (2009). The MIMIC method with scale purification for detecting differential item functioning. Educational and Psychological Measurement, 69(5), 713–731. https://doi.org/10.1177/0013164409332228

Wang, Y. A., & Rhemtulla, M. (2021). Power analysis for parameter estimation in structural equation modeling: A discussion and tutorial. Advances in Methods and Practices in Psychological Science, 4(1), 1–17.

Watkins, C. E., Campbell, V. L., Nieberding, R., & Hallmark, R. (1995). Contemporary practice of psychological assessment by clinical psychologists. Professional Psychology: Research and Practice, 26(1), 54–60. https://doi.org/10.1037/0735-7028.26.1.54

Webb, N. M., & Shavelson, R. J. (2005). Generalizability theory: overview. In B. S. Everitt & D. C. Howell (Eds.), Encyclopedia of statistics in behavioral science (Vol. 2, pp. 717–719). John Wiley & Sons, Ltd.

Weems, C. F. (2008). Developmental trajectories of childhood anxiety: Identifying continuity and change in anxious emotion. Developmental Review, 28(4), 488–502. https://doi.org/10.1016/j.dr.2008.01.001

Wei, T., & Simko, V. (2021). R package “corrplot": Visualization of a correlation matrix. https://github.com/taiyun/corrplot

Weintraub, S., Bauer, P. J., Zelazo, P. D., Wallner-Allen, K., Dikmen, S. S., Heaton, R. K., Tulsky, D. S., Slotkin, J., Blitz, D. L., Carlozzi, N. E., Havlik, R. J., Beaumont, J. L., Mungas, D., Manly, J. J., Borosh, B. G., Nowinski, C. J., & Gershon, R. C. (2013). I. NIH toolbox cognition battery (CB): Introduction and pediatric data. Monographs of the Society for Research in Child Development, 78(4), 1–15. https://doi.org/10.1111/mono.12031

Weiss, B., & Garber, J. (2003). Developmental differences in the phenomenology of depression. Development and Psychopathology, 15(2), 403–430. https://doi.org/10.1017/S0954579403000221

Whitbourne, S. K. (2019). Longitudinal, cross-sectional, and sequential designs in lifespan developmental psychology. Oxford University Press.

Wicherts, J. M., & Dolan, C. V. (2010). Measurement invariance in confirmatory factor analysis: An illustration using IQ test performance of minorities. Educational Measurement: Issues and Practice, 29(3), 39–47. https://doi.org/10.1111/j.1745-3992.2010.00182.x

Wicherts, J. M., Veldkamp, C. L. S., Augusteijn, H. E. M., Bakker, M., Aert, R. C. M. van, & Assen, M. A. L. M. van. (2016). Degrees of freedom in planning, running, analyzing, and reporting psychological studies: A checklist to avoid p-hacking. Frontiers in Psychology, 7, 1832. https://doi.org/10.3389/fpsyg.2016.01832

Wickham, H. (2021). tidyverse: Easily install and load the tidyverse. https://CRAN.R-project.org/package=tidyverse

Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L. D., François, R., Grolemund, G., Hayes, A., Henry, L., Hester, J., Kuhn, M., Pedersen, T. L., Miller, E., Bache, S. M., Müller, K., Ooms, J., Robinson, D., Seidel, D. P., Spinu, V., … Yutani, H. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686. https://doi.org/10.21105/joss.01686

Widaman, K. F. (2018). On common factor and principal component representations of data: Implications for theory and for confirmatory replications. Structural Equation Modeling: A Multidisciplinary Journal, 25(6), 829–847. https://doi.org/10.1080/10705511.2018.1478730

Widiger, T. A. (2002). Personality disorders. In M. M. Antony & D. H. Barlow (Eds.), Handbook of assessment and treatment planning for psychological disorders (pp. 453–480). Guilford Publications.

Wiggins, J. S. (1973). Personality and prediction: Principles of personality assessment. Addison-Wesley.

Willett, W. (2012). Correction for the effects of measurement error. In W. Willett (Ed.), Nutritional epidemiology (3rd ed., pp. 287–304). Oxford University Press.

Williams, A. J., Botanov, Y., Kilshaw, R. E., Wong, R. E., & Sakaluk, J. K. (2021). Potentially harmful therapies: A meta-scientific review of evidential value. Clinical Psychology: Science and Practice, 28(1), 5–18. https://doi.org/10.1111/cpsp.12331

Wood, J. M., Garb, H. N., Lilienfeld, S. O., & Nezworski, M. T. (2002). Clinical assessment. Annual Review of Psychology, 53(1), 519. https://doi.org/10.1146/annurev.psych.53.100901.135136

Wood, J. M., Nezworski, M. T., Garb, H. N., & Lilienfeld, S. O. (2001). Problems with the norms of the Comprehensive System for the Rorschach: Methodological and conceptual considerations. Clinical Psychology: Science and Practice, 8(3), 397–402. https://doi.org/10.1093/clipsy.8.3.397

Wood, J. M., Nezworski, M. T., & Stejskal, W. J. (1996a). The Comprehensive System for the Rorschach: A critical examination. Psychological Science, 7(1), 3–10. https://doi.org/10.1111/j.1467-9280.1996.tb00658.x

Wood, J. M., Nezworski, M. T., & Stejskal, W. J. (1996b). Thinking critically about the Comprehensive System for the Rorschach: A reply to exner. Psychological Science, 7(1), 14–17. https://doi.org/10.1111/j.1467-9280.1996.tb00660.x

Wood, J. M., Teresa, P. M., Garb, H. N., & Lilienfeld, S. O. (2001). The misperception of psychopathology: Problems with the norms of the Comprehensive System for the Rorschach. Clinical Psychology: Science and Practice, 8(3), 350–373. https://doi.org/10.1093/clipsy.8.3.350

Woody, M. L., & Gibb, B. E. (2015). Integrating NIMH Research Domain Criteria (RDoC) into depression research. Current Opinion in Psychology, 4, 6–12. https://doi.org/10.1016/j.copsyc.2015.01.004

Wright, A. G. C., Gates, K. M., Arizmendi, C., Lane, S. T., Woods, W. C., & Edershile, E. A. (2019). Focusing personality assessment on the person: Modeling general, shared, and person specific processes in personality and psychopathology. Psychological Assessment, 31(4), 502–515. https://doi.org/10.1037/pas0000617

Wright, A. G. C., & Woods, W. C. (2020). Personalized models of psychopathology. Annual Review of Clinical Psychology, 16(1), 49–74. https://doi.org/10.1146/annurev-clinpsy-102419-125032

Wright, A. G. C., & Zimmermann, J. (2019). Applied ambulatory assessment: Integrating idiographic and nomothetic principles of measurement. Psychological Assessment, 31(12), 1467–1480. https://doi.org/10.1037/pas0000685

Yang, Y., & Land, K. C. (2013). Age-period-cohort analysis: New models, methods, and empirical applications. Taylor & Francis.

Youngstrom, E. A., Halverson, T. F., Youngstrom, J. K., Lindhiem, O., & Findling, R. L. (2018). Evidence-based assessment from simple clinical judgments to statistical learning: Evaluating a range of options using pediatric bipolar disorder as a diagnostic challenge. Clinical Psychological Science, 6(2), 243–265. https://doi.org/10.1177/2167702617741845

Youngstrom, E. A., & Van Meter, A. (2016). Empirically supported assessment of children and adolescents. Clinical Psychology: Science and Practice, 23(4), 327–347. https://doi.org/10.1111/cpsp.12172

Youngstrom, E. A., Van Meter, A., Frazier, T. W., Hunsley, J., Prinstein, M. J., Ong, M.-L., & Youngstrom, J. K. (2017). Evidence-based assessment as an integrative model for applying psychological science to guide the voyage of treatment. Clinical Psychology: Science and Practice, 24(4), 331–363. https://doi.org/10.1111/cpsp.12207

Yu, X., Schuberth, F., & Henseler, J. (2023). Specifying composites in structural equation modeling: A refinement of the Henseler-Ogasawara specification. Statistical Analysis and Data Mining: The ASA Data Science Journal, 16(4), 348–357. https://doi.org/10.1002/sam.11608

Yudell, M., Roberts, D., DeSalle, R., & Tishkoff, S. (2016). Taking race out of human genetics. Science, 351(6273), 564–565. https://doi.org/10.1126/science.aac4951

Zhang, J., & Mueller, S. T. (2005). A note on ROC analysis and non-parametric estimate of sensitivity. Psychometrika, 70(1), 203–212. https://doi.org/10.1007/s11336-003-1119-8

Zhang, X., & Savalei, V. (2024). An overview of alternative formats to the Likert format: A comment on Wilson et al. (2022). Psychological Methods, 29(3), 606–612. https://doi.org/10.1037/met0000631

Zieky, M. J. (2006). Fairness review in assessment. In S. M. Downing & T. M. Haladyna (Eds.), Handbook of test development (pp. 359–376). Routledge. https://doi.org/10.4324/9780203874776.ch16

Zieky, M. J. (2013). Fairness review in assessment. In K. F. Geisinger, B. A. Bracken, J. F. Carlson, J.-I. C. Hansen, N. R. Kuncel, S. P. Reise, & M. C. Rodriguez (Eds.), APA handbook of testing and assessment in psychology, Vol. 1: Test theory and testing and assessment in industrial and organizational psychology (pp. 293–302). American Psychological Association. https://doi.org/10.1037/14047-017

Zuckerman, M. (1990). Some dubious premises in research and theory on racial differences: Scientific, social, and ethical issues. American Psychologist, 45(12), 1297–1303. https://doi.org/10.1037/0003-066X.45.12.1297

Feedback