I need your help!

I want your feedback to make the book better for you and other readers. If you find typos, errors, or places where the text may be improved, please let me know. The best ways to provide feedback are by GitHub or hypothes.is annotations.

Opening an issue or submitting a pull request on GitHub: https://github.com/isaactpetersen/Principles-Psychological-Assessment

Adding an annotation using hypothes.is. To add an annotation, select some text and then click the symbol on the pop-up menu. To see the annotations of others, click the symbol in the upper right-hand corner of the page.

Chapter 22 Behavioral Assessment

22.1 Overview of Behavioral Assessment

In other chapters, we discuss using psychophysiological assessment, cognitive assessment, and other assessment techniques. Psychology is often concerned about understanding behavior. To assess behavior, one approach would be to observe people. Observing people allows seeing the dynamic of the behavior unfold in relation to other processes. Observations might also allow getting around examinees’ response biases such as social desirability.

Behavioral observation provides a temporal stream of behavior to see which behaviors precede other behaviors. An overview of observational assessment is provided by Girard & Cohn (2016). When conducting observations, it is important to consider the context and the environment.

22.1.1 Approaches

Examples of behavioral assessment include naturalistic behavioral observation, analogue (structured) observation, self-monitoring/diary, behavior ratings (e.g., informant report), and self-report. Self-report is essentially retrospective self-monitoring. I published a practice brief for the assessment of externalizing behavior in school-aged children that integrates these different methods of behavioral assessment (Petersen, 2024a).

Naturalistic behavioral observation involves observing a person in their natural context or habitat. For instance, it might involve a person in their home, at school, at work, or their everyday interpersonal interactions. The observer is like a “fly on the wall” in that they want to exert as little effect as possible on the person’s behavior. The observation occurs within normal situational contexts and under naturally occurring reinforcements and consequences. Naturalistic observations were the origins of behavioral assessment.

Analogue observation, also called structured observation, is the observation of a person in a contrived environment (Haynes, 2001). The goal of analogue observation is for the contrived situation to be analogous to the situations that the client is likely to encounter in their natural environment. For instance, the examiner might set up similar conditions in a lab or therapy room. Analogue observations can differ in the degree to which the observation is structured, but they are typically more structured than naturalistic observation. Analogue observations might involve role-plays, family or marital interaction tasks, and compliance (do/don’t) tasks. As an example, the observer might observe role-played scenarios in which the person engages in one or more simulated social interactions.

22.2 Contexts for Observing

There are multiple contexts in which we can observe people. Here are just a few examples:

Classrooms
Homes
In the community; for example, you could take a client out into the world to see how they interact with others
Group homes
Psychiatric hospitals

Token economies for people with schizophrenia were based on behavioral assessment in psychiatric hospitals. A token economy is a contingency management system based on operant conditioning. Operant conditioning occurs when the frequency of behavior is influenced by the outcomes of the behavior. In a token economy, a secondary reinforcer, such as a token or symbol, is given to the client for matching the target behavior, which can be exchanged for back-up reinforcers. For instance, a token may be given to the client for making their bed. The tokens could then be turned in for back-up reinforcers. Back-up reinforcers include material reinforcers such as food, money, and cigarettes; services, such as having your room cleaned; and privileges such as getting to make a phone call. Token economies have been shown to work and to influence people’s behavior, especially the negative symptoms of schizophrenia. However, token economies are not widely used because of political issues.

22.3 Costs of Behavioral Observation

Behavioral observations are costly due to people and time. The cost limits the amount of observation time that is possible, which raises questions about how typical and representative the participant’s behavior was during the observation period relative to their everyday life. The limited amount of observation time raises questions about reliability and external validity, that is the generalizability of the person’s behavior in the situation to their typical functioning.

A gold standard for naturalistic observations would be to observe someone in their natural context across more than one day, time, situation, and context with more than one rater to get a more robust sampling of the person’s behavior. But it is costly to do this, and is therefore uncommon in practice.

22.4 Dependent Variable

After (or during) the observation, we need to turn the observation into information that can be useful. To do that, we create a dependent variable using behavioral coding. Behavioral coding involves observing behavior and quantifying it, that is turning behavior into numbers that can be analyzed to make inferences. During observation, you can measure the frequency of the behavior (how many times a behavior occurs), which is a useful measure of individual differences. But you may want to know the stream of behavior, so you know when the behavior occurred. So, you might make a coding sample at regular intervals (e.g., every 6 seconds). Coding systems can differ in their granularity with which they treat time. In event-based coding, the coder codes whenever a discrete behavior occurs—i.e., the behavior has discrete start and stop points. For example, an event-based coding system may determine when a person is talking. Event-based coding allows assessing the number, duration, and sequence of behaviors with high temporal precision.

In interval-based coding, the coder codes during specified spans of time that are called intervals. Intervals can be of any length. Shorter intervals provide more temporal precision but are more burdensome to code. Longer intervals provide less temporal precision but are less burdensome to code. An example of interval-based coding would be coding the extent of positive affect that a person showed during each 10-second interval.

Coding can focus on the small units of behavior or what the behavior means. Sign-based coding involves focusing on small units of behavior, such as movements. Message-based coding involves focusing on the meaning of the behavior using theory. For example, message-based coding might attempt to code examinees’ emotional states. Sign-based coding requires a lesser degree of inference, has less subjectivity, and has higher inter-rater reliability than message-based coding. On the other hand, message-based coding is potentially more theoretically interesting than sign-based coding.

22.5 Functional Behavioral Assessment/Analysis

One goal of behavioral assessment is to determine the function of the problem behavior. Every behavior has a function, reason, or purpose (or multiple ones). If you can determine the function(s) of the behavior, you may be able to help the client learn more adaptive ways of accomplishing their goals. To determine the function(s) of behavior, functional behavioral assessments (FBAs) are commonly conducted, especially in schools. FBA involves identifying the patterns of sequences that involve the problem behavior, noting the following:

The time
The context
What led up to the problem behavior (antecedents)
What problem behavior occurred
What happened after the problem behavior (consequences)
How the situation resolved

An FBA is based on the antecedent-behavior-consequence (A-B-C) model. The idea of the A-B-C model is that every behavior is a function of its antecedents and consequences. The goal of the A-B-C model is to identify common patterns of antecedents and consequences. Examples of antecedents include insufficient sleep, a change in routine, and an ineffective command. Examples of consequences include giving attention, and getting to play without having to pick up the toys. By identifying the patterns of antecedents and consequences, we can generate hypotheses for the function of the behavior. For example, a possible hypothesis for a function of given problem behavior might be to get attention, to get out of doing what they do not want to do (escape or avoidance), to get access to a preferable activity or object, or to engage in sensory stimulation (e.g., rocking).

You can design treatment to change the antecedents and consequences to show them more adaptive ways of behaving. One approach to case conceptualization or case formulation is the “6 Ps”: presenting problems, protective factors, predisposing factors, precipitating factors, perpetuating factors, and prognosis. Treatment would address the presenting problems by addressing the perpetuating (maintaining) factors, and helping the client identify the triggering (precipitating) factors and how to handle those, while building on their strengths (protective factors) and considering their vulnerabilities (predisposing factors), with the intensity of treatment matched to the prognosis.

22.6 Mental Status Exam

A mental status exam is a structured assessment of a client’s cognition, affect, etc. A mental status exam is often used in psychiatric hospitals to quickly “size up” a patient. Mental status exams are also useful in outpatient settings. For training clinicians, it can be useful to get in the habit of performing mental status exams.

22.7 Reliability

It is important to consider the reliability of behavioral observations. From a generalizability perspective, we aim to generalize behavioral observations across, raters, contexts, and time points. Thus, inter-rater reliability and test–retest reliability are especially relevant. Averaging ratings across multiple raters can yield scores that are more reliable than the scores from the individual raters, assuming both raters have inter-rater reliability.

22.8 Validity

Validity is also an important consideration of behavioral observations. In general, little attention has been paid to validity of observational measures, particularly those used in clinical practice. This is a content validity problem. When coding, the examiner makes choices about what to code; they are not “writing down the real thing”.

22.8.1 Reactivity

Another concern of observational measures is response biases, such as faking good, faking bad, and impression control. For instance, people may act differently because they are being observed, which is a form of reactivity. Reactivity is the change in a behavior for the mere reason that the behavior is being observed or measured. People could behave better or worse than they typically do—observation tends to push people to the extremes.

22.8.2 Coding

Regarding coders and observers, it is ideal during training to provide them immediate corrective feedback. It is ideal for them to be blind to hypotheses, so they are not biased in favor of a particular outcome. It can be helpful to periodically compare coders’ ratings to criterion ratings that are accepted as the gold-standard ratings to prevent coder drift. Coder drift occurs when there are changes in coders’ ratings over time. You can evaluate intra-observer reliability by having them rate the same (video-recorded) behavior at two different time points. You can have two or more raters code at least a subset of the participants, to establish inter-rater reliability.

22.9 Forms of Measurement

There are several forms of measurement that behavioral assessments can take. One form of measurement from observation is notes, which involves qualitative data. For rating scales, the form of measurement is often Likert scales. Likert scales assess the observer’s impression of the examineee’s behavior, which is a simpler way of getting the same information as many approaches to video coding. Rating scales can be completed in the context of, for example, a home observation, parent report, teacher report, or school observation.

It can be helpful to examine correlations between these different methods and contexts. Having multiple forms of measurement is great for research but is difficult to do in clinical practice because it comes at a tremendous expense. There is also difficulty in observing what you want to observe. For example, aggression is a relatively low base rate phenomenon, so the client may not engage in aggressive behavior during the observation window. There are also questions of the incremental validity of naturalistic observational assessments because simpler measures (including analogue assessments) could get at the same thing.

22.10 Analogue (Structured) Observational Assessments

Analogue or structured assessments use tasks that evoke individual differences in the behavior of interest. For instance, in our lab we have forbidden toys on a shelf, and we see how the children and parents handle that situation when they are in the room together, then when the parent is occupied on the phone, and then when the parent leaves the room and the child is alone in the room. In the clinic, we have a parent tell their child to clean up the toys so we can observe how the parent gives the command and how the child responds. To study marital conflict, researchers may have the couple provide a list of topics they disagree about, and have them have a discussion about one of those topics to evoke an argument in the lab. To study stress and anxiety, or to perform exposure in the clinic, the clinician or researcher may have the client/participant give a public speech in a simulated session.

The goal of analogue assessments is that the tasks should have content validity. That is, the tasks should be relevant to the target construct, and they should be representative of the tasks the person is likely to face in their natural environment by sampling all facets that affect the behavior problem in the natural environment.

Analogue assessment can lead to greater frequency of target behavior through the use of contrived situations or stimuli, compared to naturalistic observation. For instance, noncompliance might occur more often when doing a compliance task in the lab than when observing a child in their natural environment. Therefore, the frequency or rate of behavior may not generalize to the natural context. But the functional relations of the behavior in relation to other behaviors, such as antecedents and consequences, may generalize. Thus, they can help identify potential intervention targets.

Analogue assessment is most useful for assessing behaviors that are difficult or rare to observe in the natural environment but that are frequent in the analogue environment. For example, analogue assessment is particularly useful for assessing socially sensitive behavior problems such as verbal aggression, panic, and oppositional behavior.

Analogue assessment can be sensitive to change, and therefore can be a clinically useful measure of treatment gains or failure. A client who cannot demonstrate a skill in the clinic may not be able to demonstrate a skill in the real world. But just because they show a skill in a session does not necessarily mean they will be able to use it in the real world.

In terms of reliability and validity, analogue/structured assessments tend to show greater reliability than naturalistic observations. Analogue assessments tend to show greater internal validity than naturalistic observations. That is, analogue assessments are more likely to reflect causal factors because they have more control of when and how the stimuli are presented. However, analogue assessments show weaker ecological validity than naturalistic observations and therefore tend to show weaker external validity. That is, analogue assessments yield lower generalizability than naturalistic observations because analogue assessments may not be as reflective of their functioning in the real world. Nevertheless, analogue assessments tend to show greater clinical utility, in terms of incremental validity and cost-effectiveness than naturalistic observations, when they are done well.

22.11 Self-Monitoring

Self-monitoring involves observing and recording one’s own behaviors, thoughts, emotions, bodily sensations, events, etc. Given the expense of conducting behavioral observations, you can train people to be their own observers, which has better portability. Phone apps can now be used to track many different processes, such as mood, sleep, and nutrition. These are examples of electronic diaries. An overview of research on self-monitoring and electronic diaries is provided by Korotitsch & Nelson-Gray (1999) and Piasecki et al. (2007).

Self-monitoring provides a great deal of information at a low cost. There are several benefits of self-monitoring. Self-monitoring can help characterize the problem and the presumed causal or maintaining factors. Self-monitoring can also contribute to greater objective self-awareness. In addition, self-monitoring can be used to track treatment progress. Furthermore, self-monitoring can be helpful for assessing covert behaviors that are hidden to outside observers, such as obsessions, paranoid thoughts, and sexual problems.

Self-monitoring is a component of most empirically supported treatments. In addition, the extent to which clients complete self-monitoring is predictive of treatment effectiveness. As an example of self-monitoring in therapy, in parent management training, we have parents track (tally) the frequency of their child’s compliance, noncompliance, and aggression each day during the week between sessions as homework. Self-monitoring provides a more objective way to track treatment progress than retrospective reports or satisfaction ratings because clients track the specific behavior the moment it occurs. We modify treatment in response to the information gained from tracking. We also become more aware of the situations and contexts in which the behavior occurs, so it is predictable and the client can better anticipate, prepare for, and control the problem.

Based on principles of measurement-based care, assessment does not just occur at the outset of treatment; it should continue throughout treatment. And self-monitoring helps parents become more aware of the actual frequency of the problem behavior. For instance, they might have over- (or under-)estimated how often a behavior occurs. Thus, self-monitoring can lead to a change in people’s perceptions of the behavior. Also, some children will misbehave less at home or school due to reactivity because they know they are being observed, so this can be a way of kick-starting some behavioral improvement.

There are often changes in a behavior in response to self-monitoring—typically, the client experiences behavioral improvement. Self-monitoring might increase awareness of how often a problem behavior is happening and their perceived predictability and control of the behavior. Self-monitoring is now considered an empirically supported treatment for a number of conditions, including academic/conduct problems, addictive behaviors, anxiety and depression, disordered eating, physical health problems, psychosis, and insomnia. That is, assessment is a form of intervention!

However, there are challenges to self-monitoring. One challenge is how to evaluate the reliability of self-monitoring. Another challenge is how to handle if the client fills out the whole tracking sheet retrospectively, all at once, right before session—this is called “backfilling”. In addition, there is a possibility of inaccuracy and distortion with self-monitoring due to the respondent not having been trained on how to observe the phenomenon of interest. To counter this, in parent management training, we explicitly define what counts as an instance of compliance, noncompliance, and aggression. Another challenge with self-monitoring is that it involves respondent burden because it requires compliance by the client. To handle noncompliance with self-monitoring, it can be helpful to “get procedural” and to discuss with the client when they will track, where they will put the tracking sheet, etc. It can also be helpful to perform advance troubleshooting: help the client anticipate obstacles and how they will address them. For instance, if the client feels that they might forget to do the tracking, have them set a timer. Self-monitoring has lower reliability and validity than behavioral observation, but it is still useful and widely used because it is cheap and easy to obtain lots of information that can lead to behavioral improvement.

22.12 Behavior Rating Scales

Behavior rating scales include self- or informant ratings on a checklist of behaviors assessing particular construct(s). An example assessment system that uses behavior rating scales is the Achenbach System of Empirically Based Assessment (ASEBA) that includes the Child Behavior Checklist (CBCL), Teacher’s Report Form (TRF), Adult Self-Report, etc. The ASEBA is an empirically based assessment of psychopathology, rather than a DSM-focused assessment.

Behavior rating scales are widely used in research and clinical settings. They are relatively cheap and easy to administer and score. Item response theory techniques reveal issues with different levels of Likert scale anchors of rating scales. For example, people assign different meanings to the levels “never”, “sometimes”, and “a lot”. It is often better to have the rater describe or provide the numeric amount. For example: “How many times during the last week did your child physically hit someone else?”

Behavior rating scales are typically retrospective where the rater has to reflect back and estimate the frequency of behaviors and people’s responses to the environment. Retrospective ratings can have challenges with validity due to inaccurate memory recall.

22.13 Assessment of Therapeutic Process

Other behavioral assessments include assessments of therapeutic process. Example assessments of therapeutic process include having a video recording of a session and then coding it in a naturalistic setting. For a therapist, the naturalistic setting is the therapeutic context.

An example of assessment of the therapeutic process could include assessing the clinician’s adherence and competence. Assessing adherence would involve coding whether the therapist is following the treatment protocol and manual, that is, the degree to which the clinician shows treatment fidelity. Assessing the clinician’s competence would involve coding how skilled the therapist is in providing the treatment. Coding clinician competence in most treatments, including cognitive behavioral therapy, would not be a moment-to-moment coding; instead, the coder would examine the sequence of interactions between client and therapist.

22.14 Conclusion

Examples of behavioral assessment include naturalistic behavioral observation, analogue (structured) observation, self-monitoring/diary, behavior ratings (e.g., informant report), and self-report. A gold standard for naturalistic observations is to observe people in their natural context across more than one day, time, situation, and context with more than one rater to get a more robust sampling of the person’s behavior. Compared to naturalistic observation, analogue assessments tend to show greater reliability and internal validity but lower external validity. A useful approach to identify the function of a given behavior is to identify patterns of sequences according to the antecedent-behavior-consequence (A-B-C) model as part of a functional behavior assessment. Another useful approach is self-monitoring, which provides a lot of information at low cost and is therefore a component of most empirically supported treatments.

22.15 Suggested Readings

Girard & Cohn (2016)

References

Girard, J. M., & Cohn, J. F. (2016). A primer on observational measurement. Assessment, 23(4), 404–413. https://doi.org/10.1177/1073191116635807

Haynes, S. N. (2001). Clinical applications of analogue behavioral observation: Dimensions of psychometric evaluation. Psychological Assessment, 13(1), 73–85. https://doi.org/10.1037/1040-3590.13.1.73

Korotitsch, W. J., & Nelson-Gray, R. O. (1999). An overview of self-monitoring research in assessment and treatment. Psychological Assessment, 11(4), 415–425. https://doi.org/10.1037/1040-3590.11.4.415

Petersen, I. T. (2024a). Assessing externalizing behaviors in school-aged children: Implications for school and community providers. https://doi.org/10.17077/rep.006639

Piasecki, T. M., Hufford, M. R., Solhan, M., & Trull, T. J. (2007). Assessing clients in their natural environments with electronic diaries: Rationale, benefits, limitations, and barriers. Psychological Assessment, 19(1), 25–43. https://doi.org/10.1037/1040-3590.19.1.25

Feedback

Please consider providing feedback about this textbook, so that I can make it as helpful as possible. You can provide feedback at the following link: https://forms.gle/95iW4p47cuaphTek6