I need your help!

I want your feedback to make the book better for you and other readers. If you find typos, errors, or places where the text may be improved, please let me know. The best ways to provide feedback are by GitHub or hypothes.is annotations.

Opening an issue or submitting a pull request on GitHub: https://github.com/isaactpetersen/Principles-Psychological-Assessment

Adding an annotation using hypothes.is. To add an annotation, select some text and then click the symbol on the pop-up menu. To see the annotations of others, click the symbol in the upper right-hand corner of the page.

Chapter 1 Introduction

1.1 About This Book

First, let us discuss what this book is not. This book is not a guide on how to assess each psychological construct or disorder. This book is also not a comparative summary of the psychometrics of different measures. There already exist many resources that summarize and compare the reliability and validity of measures in psychology (Buros Center for Testing, 2021). Instead, this book is about the principles of psychological assessment.

This book was originally written for a graduate-level course on psychological assessment. The chapters provide an overview of topics—each of which could have its own class and textbook, such as structural equation modeling, item response theory, generalizability theory, factor analysis, prediction, cognitive assessment, psychophysiological assessment, etc. The book gives readers an overview of the breadth of the field of assessment and various assessment approaches. As a consequence, the book does not cover any one assessment device or method in great depth.

The goal of this book is to help researchers and clinicians learn to think critically about assessments so they can better develop, evaluate, administer, score, integrate, and interpret psychological assessments. Learning important principles of assessment will put you in a better position to learn any assessment device and to develop better ones. This book applies a scientific perspective to the principles of psychological assessment. The assessments used in a given situation—whether in research or practice—should be supported by the strongest available science, or they should be used cautiously while undergoing development and study. In addition to discussing principles, however, analysis scripts in the software R (R Core Team, 2022) are provided, so that you are able to apply the principles discussed in this book. Some of the chapters include analysis exercises (at the end of the chapter) for you to test your comprehension.

1.2 Why R?

R is free, open source, open platform, and widely used. Unlike proprietary software used for data analysis, R is not a black box. You can examine the code for any function or computation you perform. You can even modify and improve these functions by changing the code, and you can create your own functions. R also has advanced capabilities for data wrangling and has many packages available for advanced statistical analysis and graphing. In addition, there are strong resources available for creating your analyses in R so that they are reproducible by others (Gandrud, 2020).

1.3 About the Author

Regarding my background, I am a licensed clinical psychologist. My research examines how children develop behavior problems. I am also a trained clinician, and I supervise training clinicians in assessment and therapy, particularly assessment and treatment of children’s disruptive behavior. Given my expertise, many of the examples in the book deal with topics in clinical psychology, but many of the assessment principles discussed are relevant to all areas of psychology—and science more broadly—and are often overlooked in research and practice. As a card-carrying clinical scientist, my perspective is that the scientific epistemology is the strongest approach to knowledge and that assessment should be guided first and foremost by the epistemology of science, regardless of whether one is doing research or practice.

1.4 What Is Assessment?

Assessment is the gathering of information about a person, group, setting, or context. In psychological assessment, we are interested in gathering information about people’s psychological functioning, including their thoughts, emotions, and behaviors. Psychological assessment can also consider biological and physiological processes that are linked to people’s thoughts, emotions, and behaviors. Many assessment approaches can be used to assess people’s thoughts, emotions, and behaviors, including self-report questionnaires, questionnaires reported by others (e.g., spouse, parent, teacher, or friend), interviews, observations, biopsychological assessments (e.g., cortisol, heart rate, brain imaging), performance-based assessments, archival approaches (e.g., chart review), and combinations of these.

1.5 Why Should We Care About Assessment (and Science)?

In research, assessments are conducted to advance knowledge, such as improved prediction or understanding. For example, in my research, I use assessments to understand what processes influence children’s development of disruptive behavior. In society, assessments are conducted to improve decision-making. For instance, assessments are conducted to determine whether to hire a job candidate or promote an employee. In a clinical context, assessments are conducted to improve treatment and the client’s outcomes. As an example, assessments are conducted to determine which treatment would be most effective for a person suffering from depression. Assessments can be valuable for understanding current functioning as well as making predictions. To best answer these questions and address these goals, we need to have confidence that our devices yield accurate answers for these purposes for the assessed individuals. Science is crucial for knowing how much (or how little) confidence we have in a given assessment for a given purpose and population. Effective treatment often depends on accurate assessment. Thus, knowing how to conduct and critically evaluate science will make you more effective at selecting, administering, and interpreting assessments.

Decisions resulting from assessments can have important life-altering consequences. High-stakes decisions based on assessments include decisions about whether a person is hospitalized, whether a child is removed from their abusive home, whether a person is deemed competent to stand trial, whether a prisoner is released on parole, and whether an applicant is admitted to graduate school. These important assessment-related decisions should be made using the best available science.

The problem is that there has been a proliferation of pseudoscience in assessment and treatment. There are widely used psychological assessments and treatments that we know are inaccurate, do not work, or, in some cases, that we know to be harmful. Lists of harmful psychological treatments (e.g., Lilienfeld, 2007) and inaccurate assessments (e.g., Hunsley et al., 2015) have been published, but these treatments and assessments are still used by professional providers to this day. Practice using such techniques violates the aphorism, “First, do no harm.” This would be inconceivable for other applied sciences, such as chemistry, engineering, and medicine. For instance, the prescription of a particular medication for a particular purpose requires approval by the U.S. Food and Drug Administration (FDA). Psychological assessments and treatments do not have the same level of oversight.

The gap between what we know based on science and what is implemented in practice (the science–practice gap) motivated McFall’s (1991) “Manifesto for a Science of Clinical Psychology,” which he later expanded (McFall, 2000). The Manifesto has one cardinal principle and four corollaries:

Cardinal Principle: Scientific clinical psychology is the only legitimate and acceptable form of clinical psychology.

First Corollary: Psychological services should not be administered to the public (except under strict experimental control) until they have satisfied these four minimal criteria:

The exact nature of the service must be described clearly.

The claimed benefits of the service must be stated explicitly.

These claimed benefits must be validated scientifically.

Possible negative side effects that might outweigh any benefits must be ruled out empirically.

Second Corollary: The primary and overriding objective of doctoral training programs in clinical psychology must be to produce the most competent clinical scientists possible.

Third Corollary: A scientific epistemology differentiates science from pseudoscience.

Fourth Corollary: The most caring and humane psychological services are those that have been shown empirically to be the most effective, efficient, and safe.

The Manifesto orients you to the scientific perspective from which we will be examining psychological assessment techniques in this book.

1.5.1 Assessment and the Replication Crisis in Science

Assessment is also crucial to advancing knowledge in research, as summarized in the maxim, “What we know depends on how we know it.” Findings from studies boil down to the methods that were used to obtain them—thus, everything we know comes down to methods.

Many domains of science, particularly social science, have struggled with a replication crisis, such that a large proportion of findings fail to replicate when independent investigators attempt to replicate the original findings (Duncan et al., 2014; Freese & Peterson, 2017; Larson & Carbine, 2017; Lilienfeld, 2017; Open Science Collaboration, 2015; Shrout & Rodgers, 2018; Tackett, Brandes, King, et al., 2019). There is considerable speculation on what factors account for the replication crisis. For instance, one possible factor is the researcher degrees of freedom, which are unacknowledged choices in how researchers prepare, analyze, and report their data that can lead to detecting significance in the absence of real effects (Loken & Gelman, 2017). This is similar to Gelman & Loken (2013)’s description of research as the garden of forking paths, where different decisions along the way can lead to different outcomes (see Figure 1.1). A second possibility for the replication crisis is that some replication studies have had limited statistical power (e.g., insufficiently large sample sizes). A third possibility may be that there is publication bias such that researchers tend to publish only significant findings, which is known as the file-drawer effect. A fourth possibility is that researchers may engage in ethically questionable research practices, such as multiple testing and selective reporting.

Garden of Forking Paths. (Adapted from https://www.si.umich.edu/about-umsi/news/ditch-stale-pdf-making-research-papers-interactive-and-more-transparent [archived at https://perma.cc/R2V9-CP3F].)

Figure 1.1: Garden of Forking Paths. (Adapted from https://www.si.umich.edu/about-umsi/news/ditch-stale-pdf-making-research-papers-interactive-and-more-transparent [archived at https://perma.cc/R2V9-CP3F].)

However, difficulties with replication could exist even if researchers have the best of intentions, engage in ethical research practices, and are transparent about all of the methods they used and decisions they made. The replication crisis could owe, in part, to noisy (imprecise and inaccurate) measures. The field has paid insufficient attention to measurement unreliability as a key culprit in the replication crisis. As Loken & Gelman (2017) demonstrated, when measures are less noisy, measurement error weakens the association between the measures. But when using noisy measures and selecting what to publish based on statistical significance, measurement error can make the association appear stronger than it is. This is what Loken & Gelman (2017) describe as the statistical significance filter: In a study with noisy measures and a small or moderate sample size, statistically significant estimates are likely to have a stronger effect size than the actual effect size—the “true” underlying effects could be small or nonexistent. The statistical significance filter exists because, with a small sample size, the effect size will need to be larger in order to detect it as statistically significant due to larger standard errors. That is, when researchers publish a statistically significant effect with a small or moderate sample size and noisy measures, the effect size will necessarily be large enough to detect it (and likely larger than the true effect). However, the effect of noise (measurement error) diminishes as the sample size increases. So, the goal should be to use less noisy measures with larger sample sizes. And, as discussed in Chapter 13 on ethical considerations in psychological assessment, the use of pre-registration could be useful to control researcher degrees of freedom.

The lack of replicability of findings has the potential to negatively impact the people we study through misinformed assessment, treatment, and policy decisions. Therefore, it is crucial to use assessments with strong psychometric properties and/or to develop better assessments. Psychometrics refer to the reliability and validity of measures. These concepts are described in greater detail in Chapters 4 and 5, but for now, think about reliability as consistency of measurement and validity as accuracy of measurement.

1.5.2 Science Versus Pseudoscience in Assessment

Science is the best system of epistemology we have to pursue truth. Science is a process, not a set of facts. It helps us overcome blind spots. The system is revisionary and self-correcting. Science is the epistemology that is the least susceptible to error due to authority, belief, intuition, bias, preference, etc. Clients are in a vulnerable position and deserve to receive services consistent with the strongest available evidence. By providing a client with a service, you are implicitly making a claim and prediction. As a psychologist, you are claiming to have expert knowledge and competence. You are making a prediction that the client will improve because of your services. Ethically, you should be making these predictions based on science and a risk-benefit analysis. It is also important to make sure the client knows when services are unproven so they can provide fully informed consent. Otherwise, because of your position as a psychologist, they may believe that you are using an evidence-based approach when you are not.

We will be examining psychological assessment from a scientific perspective. Here are characteristics of science that distinguish it from pseudoscience:

Risky hypotheses are posed that are falsifiable. The hypotheses can be shown to be wrong.
Findings can be replicated independently by different research groups and different methods. Evidence converges across studies and methods.
Potential alternative explanations for findings are specified and examined empirically (with data).
Steps are taken to guard against the undue influence of personal beliefs and biases.
The strength of claims reflects the strength of evidence. Findings and the ability to make judgments or predictions are not overstated. For instance, it is important to present the degree of uncertainty from assessments with error bars or confidence intervals.
Scientifically supported measurement strategies are used based on their psychometrics, including reliability and validity.

Science does not progress without advances in measurement, including

more efficient measurement (see Chapters 8 and 21)
more precise measurement (i.e., reliability; see Chapter 4)
more accurate measurement (i.e., validity; see Chapter 5)
more sophisticated modeling (see Chapter 24)
more sophisticated biopsychological (e.g., cognitive neuroscience) techniques, as opposed to self-report and neuropsychological techniques (see Chapter 20)
considerations of cultural and individual diversity (see Chapter 25)
ethical considerations (see Chapter 13)

These considerations serve as the focus of this book.

1.6 Prerequisites

This book was written in Markdown using the bookdown package (Xie, 2022a) in R (R Core Team, 2022). The bookdown package was built on top of the rmarkdown (Allaire et al., 2022) and knitr (Xie, 2022b) packages (Xie, 2015).

Applied examples in R are provided throughout the book. Each chapter that has R examples has a section on “Getting Started,” which provides the code to load relevant libraries, load data files, simulate data, add missing data (for realism), perform calculations, and more. The data files used for the examples are available on the Open Science Framework (OSF): https://osf.io/3pwza.

Most of the R packages used in this book can be installed from the Comprehensive R Archive Network (CRAN) using the following command:

Code

install.packages("INSERT_PACKAGE_NAME_HERE")

Several of the packages are hosted on GitHub repositories, including uroc (Gneiting & Walz, 2021) and dmacs (Dueber, 2019).

You can install the uroc and dmacs packages using the following code:

Code

install.packages("remotes")
remotes::install_github("evwalz/uroc")
remotes::install_github("ddueber/dmacs")

Many of the R functions used in this book are available from the petersenlab package (Petersen, 2025): https://github.com/DevPsyLab/petersenlab. You can install the latest stable release of the petersenlab package (Petersen, 2025) from CRAN using the following code:

Code

install.packages("petersenlab")

You can install the latest development version of the petersenlab package (Petersen, 2025) from GitHub using the following code:

Code

install.packages("remotes")
remotes::install_github("DevPsyLab/petersenlab")

The code that generates this book is located on GitHub: https://github.com/isaactpetersen/Principles-Psychological-Assessment.

References

Allaire, J., Xie, Y., McPherson, J., Luraschi, J., Ushey, K., Atkins, A., Wickham, H., Cheng, J., Chang, W., & Iannone, R. (2022). rmarkdown: Dynamic documents for R. https://CRAN.R-project.org/package=rmarkdown

Buros Center for Testing. (2021). The twenty-first mental measurements yearbook. Buros Center for Testing.

Dueber, D. (2019). dmacs: Measurement nonequivalence effect size calculator. https://github.com/ddueber/dmacs

Duncan, G. J., Engel, M., Claessens, A., & Dowsett, C. J. (2014). Replication and robustness in developmental research. Developmental Psychology, 50(11), 2417–2425. https://doi.org/10.1037/a0037996

Freese, J., & Peterson, D. (2017). Replication in social science. Annual Review of Sociology.

Gandrud, C. (2020). Reproducible research with R and R studio (3rd ed.). CRC Press. https://www.routledge.com/Reproducible-Research-with-R-and-RStudio/Gandrud/p/book/9780367143985

Gelman, A., & Loken, E. (2013). The garden of forking paths: Why multiple comparisons can be a problem, even when there is no “fishing expedition” or “p-hacking” and the research hypothesis was posited ahead of time. Department of Statistics, Columbia University.

Gneiting, T., & Walz, E.-M. (2021). Receiver operating characteristic (ROC) movies, universal ROC (UROC) curves, and coefficient of predictive ability (CPA). Machine Learning. https://doi.org/10.1007/s10994-021-06114-3

Hunsley, J., Lee, C. M., Wood, J. M., & Taylor, W. (2015). Controversial and questionable assessment techniques. In S. O. Lilienfeld, S. J. Lynn, & J. M. Lohr (Eds.), Science and pseudoscience in clinical psychology (2nd ed., pp. 42–82). The Guilford Press.

Larson, M. J., & Carbine, K. A. (2017). Sample size calculations in human electrophysiology (EEG and ERP) studies: A systematic review and recommendations for increased rigor. International Journal of Psychophysiology, 111, 33–41. https://doi.org/10.1016/j.ijpsycho.2016.06.015

Lilienfeld, S. O. (2007). Psychological treatments that cause harm. Perspectives on Psychological Science, 2(1), 53–70. https://doi.org/10.1111/j.1745-6916.2007.00029.x

Lilienfeld, S. O. (2017). Psychology’s replication crisis and the grant culture: Righting the ship. Perspectives on Psychological Science, 12(4), 660–664. https://doi.org/10.1177/1745691616687745

Loken, E., & Gelman, A. (2017). Measurement error and the replication crisis. Science, 355(6325), 584–585. https://doi.org/10.1126/science.aal3618

McFall, R. M. (1991). Manifesto for a science of clinical psychology. The Clinical Psychologist, 44(6), 75–91.

McFall, R. M. (2000). Elaborate reflections on a simple manifesto. Applied & Preventive Psychology, 9(1), 5–21. https://doi.org/10.1016/s0962-1849(05)80035-6

Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251). https://doi.org/10.1126/science.aac4716

Petersen, I. T. (2025). petersenlab: A collection of R functions by the Petersen Lab. https://doi.org/10.32614/CRAN.package.petersenlab

R Core Team. (2022). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.org/

Shrout, P. E., & Rodgers, J. L. (2018). Psychology, science, and knowledge construction: Broadening perspectives from the replication crisis. Annual Review of Psychology, 69(1), 487–510. https://doi.org/10.1146/annurev-psych-122216-011845

Tackett, J. L., Brandes, C. M., King, K. M., & Markon, K. E. (2019). Psychology’s replication crisis and clinical psychological science. Annual Review of Clinical Psychology, 15(1), 579–604. https://doi.org/10.1146/annurev-clinpsy-050718-095710

Xie, Y. (2015). Dynamic documents with R and knitr (2nd ed.). Chapman; Hall/CRC.

Xie, Y. (2022a). bookdown: Authoring books and technical documents with R Markdown. https://CRAN.R-project.org/package=bookdown

Xie, Y. (2022b). knitr: A general-purpose package for dynamic report generation in R. https://yihui.org/knitr/

Feedback

Please consider providing feedback about this textbook, so that I can make it as helpful as possible. You can provide feedback at the following link: https://forms.gle/95iW4p47cuaphTek6