I need your help!

I want your feedback to make the book better for you and other readers. If you find typos, errors, or places where the text may be improved, please let me know. The best ways to provide feedback are by GitHub or hypothes.is annotations.

Opening an issue or submitting a pull request on GitHub: https://github.com/isaactpetersen/Principles-Psychological-Assessment

Hypothesis Adding an annotation using hypothes.is. To add an annotation, select some text and then click the symbol on the pop-up menu. To see the annotations of others, click the symbol in the upper right-hand corner of the page.

Chapter 16 Test Bias

16.1 Overview of Bias

There are multiple definitions of the term “bias” depending on the context. In general, bias is a systematic error (Reynolds & Suzuki, 2012). Mean error is an example of systematic error, and is sometimes called bias. Cognitive biases are systematic errors in thinking, including confirmation bias and hindsight bias. Method biases are a form of systematic error that involve the influence of measurement on a person’s score that is not due to the person’s level on the construct. Method biases include response biases or response styles, including acquiescence and social desirability bias. Attentional bias refers to the tendency to process some types of stimuli more than others.

Sometimes bias is used to refer in particular to systematic error (in measurement, prediction, etc.) as a function of group membership, where test bias refers to the same score having different meaning for different groups. Under this meaning, a test is unbiased if a given test score has the same meaning regardless of group membership. For example, a test is biased if there is differential validity of test scores for groups (e.g., age, education, culture, race, sex). Test bias would exist, for instance, if a test is a less valid predictor for racial minorities or linguistic minorities. Test bias would also exist if scores on the Scholastic Aptitude Test (SAT) under-estimate women’s grades in college, for instance.

There are some known instances of test bias, as described in Section 16.3. Research has not produced much empirical evidence of test bias (Brown et al., 1999; Hall et al., 1999; Jensen, 1980; Kuncel & Hezlett, 2010; Reynolds et al., 2021; Reynolds & Suzuki, 2012; Sackett et al., 2008; Sackett & Wilk, 1994), though some item-level bias is not uncommon. Moreover, where test bias has been observed, it is often small, unclear, and does not always generalize (N. S. Cole, 1981). However, just because there is not much empirical evidence of test bias does not mean that test bias does not exist. Moreover, just because a test does not show bias does not mean that it should be used. Furthermore, just because a test does not show bias does not mean that there are not race-, social class-, and gender-related biases in clinical judgment during the assessment process.

It is also worth pointing out that group differences in scores do not necessarily indicate bias. Group differences in scores could reflect true group differences in the construct. For instance, women have better verbal abilities, on average, compared to men. So, if women’s scores on a verbal ability test are higher on average than men’s scores, this would not be sufficient evidence for bias.

There are two broad categories of test bias:

  1. predictive bias
  2. test structure bias

Predictive bias refers to differences between groups in the relation between the test and criterion. As with all criterion-related validity tests, the findings depend on the strength and quality of the criterion. Test structure bias refers to differences in the internal test characteristics across groups.

16.2 Ways to Investigate/Detect Test Bias

16.2.1 Predictive Bias

Predictive bias exists when differences emerge between groups in terms of predictive validity to a criterion. It is assessed by using a regression line looking at the association between test score and job performance. For instance, consider a 2x2 confusion matrix used for the standard prediction problem. A confusion matrix for whom to select for a job is depicted in Figure 16.1.

2x2 Confusion Matrix for Job Selection. TP = true positive; TN = true negative; FP = false positive; FN = false negative.

Figure 16.1: 2x2 Confusion Matrix for Job Selection. TP = true positive; TN = true negative; FP = false positive; FN = false negative.

We can also visualize the confusion matrix in terms of a scatterplot of the test scores (i.e., predicted job performance) and the “truth” scores (i.e., actual job performance), as depicted in Figure 16.2. The predictor (test score) is on the x-axis. The criterion (job performance) is on the y-axis. The quadrants reflect the cutoffs (i.e., thresholds) imposed from the 2x2 confusion matrix. The vertical line reflects the cutoff for selecting someone for a job. The horizontal line reflects the cutoff for good job performance (i.e., people who should have been selected for the job).

2x2 Confusion Matrix for Job Selection in the Form of a Graph With Predicted Performance on the x-Axis and Actual Job Performance on the y-Axis. TP = true positive; TN = true negative; FP = false positive; FN = false negative.

Figure 16.2: 2x2 Confusion Matrix for Job Selection in the Form of a Graph With Predicted Performance on the x-Axis and Actual Job Performance on the y-Axis. TP = true positive; TN = true negative; FP = false positive; FN = false negative.

The data points in the top right quadrant are true positives: people who the test predicted would do a good job and who did a good job. The data points in the bottom left quadrant are true negatives: people who the test predicted would do a poor job and who would have done a poor job. The data points in the bottom right quadrant are false positives: people who the test predicted would do a good job and who did a poor job. The data points in the top left quadrant are false negatives: people who the test predicted would do a poor job and who would have done a good job.

Figure 16.3 depicts a strong predictor. The best-fit regression line has a steep slope where there are lots of data points that are true positives and true negatives, with relatively few false positives and false negatives.

Example of a Strong Predictor. TP = true positive; TN = true negative; FP = false positive; FN = false negative.

Figure 16.3: Example of a Strong Predictor. TP = true positive; TN = true negative; FP = false positive; FN = false negative.

Figure 16.4 depicts a poor predictor. The best-fit regression line has a shallow slope where there are just as many data points that are in the false cells (false positives and false negatives) as there are in the true cells (true positives and true negatives). In general, the steeper the slope, the better the predictor.

Example of a Poor Predictor. TP = true positive; TN = true negative; FP = false positive; FN = false negative.

Figure 16.4: Example of a Poor Predictor. TP = true positive; TN = true negative; FP = false positive; FN = false negative.

We can evaluate predictive bias using a best-fit regression line between the predictor and criterion for each group.

16.2.1.1 Types of Predictive Bias

There are three types of predictive bias:

  1. Different slopes
  2. Different intercepts
  3. Different intercepts and slopes

The slope of the regression line is the steepness of the line. The intercept of the regression line is the y-value of the point where the line crosses the y-axis (i.e., when \(x = 0\)). If a measure shows predictive test bias, when looking at the regression line for each group, the groups’ regression lines differ in either slopes and/or intercepts.

16.2.1.1.1 Different Slopes

Predictive bias in terms of different slopes exists when there are differences in the slope of the regression line between minority and majority groups. The slope describes the direction and steepness of the regression line. The slope of a regression line is the amount of change in \(y\) for every unit change in \(x\) (i.e., rise over run). Differing slopes indicate differential predictive validity, in which the test is a more effective predictor of performance in one group over the other. Different slopes predictive bias is depicted in Figure 16.5. In the figure, the predictor performs well in the majority group. However, the slope is close to zero in the minority group, indicating that there is no association between the predictor and the criterion for the minority group.

Test Bias: Different Slopes. TP = true positive; TN = true negative; FP = false positive; FN = false negative.

Figure 16.5: Test Bias: Different Slopes. TP = true positive; TN = true negative; FP = false positive; FN = false negative.

Different slopes can especially occur if we develop our measure and criterion based on the normative majority group. Not much evidence has found empirical evidence of different slopes across groups. However, samples often do not have the power to detect differing slopes (Aguinis et al., 2010). Theoretically, to fix biases related to different slopes, you should find another measure that is more predictive for the minority group. If the predictor is a strong predictor in both groups but shows slight differences in the slope, within-group norming could be used.

16.2.1.1.2 Different Intercepts

Predictive bias in terms of different intercepts exists when there are differences in the intercept of the regression line between minority and majority groups. The \(y\)-intercept describes the point on the \(y\)-axis that the line intersects with the \(y\)-axis (when \(x = 0\)). When the distributions have similar slopes, intercept differences suggest that the measure systematically under- or over-estimates group performance relative to the person’s ability. The same test score leads to systematically different predictions for the majority and minority groups. In other words, minority group members get different tests scores than majority group members with the same ability. Different intercepts predictive bias is depicted in Figure 16.6.

Test Bias: Different Intercepts. TP = true positive; TN = true negative; FP = false positive; FN = false negative.

Figure 16.6: Test Bias: Different Intercepts. TP = true positive; TN = true negative; FP = false positive; FN = false negative.

A higher intercept (relative to zero) indicates that the measure under-estimates a person’s ability (at that test score)—i.e., the person’s job performance is better than what the test score would suggest. A lower intercept (relative to zero) indicates that the measure over-estimates a person’s ability (at that test score)—i.e., the person’s job performance is worse than what the test score would suggest. Figure 16.6 indicates that the measure systematically under-estimates the job performance of the minority group.

Performance among members of a minority group could be under- or over-estimated. For example, historically, women’s grades in math and engineering classes tended to be under-estimated by the Scholastic Aptitude Test [SAT; M. J. Clark & Grandy (1984)]. However, where intercept differences have been observed, measures often show small over-estimation of school and job performance among minority groups (Reynolds & Suzuki, 2012). For example, women’s physical strength and endurance is over-estimated based on physical ability tests (Sackett & Wilk, 1994). In addition, over-estimation of African Americans’ and Hispanics’ school and job performance has been observed based on cognitive ability tests (N. S. Cole, 1981; Reynolds & Suzuki, 2012; Sackett et al., 2008; Sackett & Wilk, 1994). At the same time, the Black–White difference in job performance is less than the Black–White difference in test performance.

The over-prediction of lower-scoring groups is likely mostly an artifact of measurement error (L. S. Gottfredson, 1994). The over-estimation of African Americans’ and Hispanics’ school and job performance may be due to measurement error in the tests. Moreover, test scores explain only a portion of the variation in job performance. Black people are far less disadvantaged on the noncognitive determinants of job performance than on the cognitive ones. Nevertheless, the over-estimation that has been often observed is on average—the performance is not over-estimated for all individuals of the groups even if there is an average over-estimation effect. In addition, simulation findings indicate that lower intercepts (i.e., over-estimation) among minority groups compared to majority groups could be observed if there are different slopes but not different intercepts in the population, because different slopes are likely to go undetected due to low power (Aguinis et al., 2010). That is, if a test shows weaker validity for a minority group than the majority group, it could appear as different intercepts that favor the minority group when, in fact, it reflects shallower slopes of the minority group that go undetected.

Predictive biases in intercepts could especially occur if we develop tests that are based on the majority group, and the items assess constructs other than the construct of interest which are systematically biased in favor of the majority group or against the minority group. Arguments about reduced power to detect differences are less relevant for intercepts and means than for slopes.

To correct for a bias in intercepts, we could add bonus points to the scores for the minority group to correct for the amount of the systematic error, and to result in the same regression line. But if the minority group is over-predicted (as has often been the case where intercept differences have been observed), we would not want to use score adjustment to lower the minority group’s scores.

16.2.1.1.3 Different Intercepts and Slopes

Predictive bias in terms of different intercepts and slopes exists when there are differences in the intercept and slope of the regression line between minority and majority groups. In cases of different intercepts and slopes, there is both differential validity (because the regression lines have different slopes), as well as varying under- and over-estimation of groups’ performance at particular scores. Different intercepts and slopes predictive bias is depicted in Figure 16.7.

Test Bias: Different Intercepts and Slopes. TP = true positive; TN = true negative; FP = false positive; FN = false negative.

Figure 16.7: Test Bias: Different Intercepts and Slopes. TP = true positive; TN = true negative; FP = false positive; FN = false negative.

In instances of different intercepts and slopes predictive bias, a measure can simultaneously over-estimate and under-estimate a person’s ability at different test scores. For instance, a measure can under-estimate a person’s ability at higher test scores and can over-estimate a person’s ability at lower test scores.

Different intercepts and slopes across groups is possibly more realistic than just different intercepts or just different slopes. However, different intercepts and slopes predictive bias is more complicated to study, represent, and resolve. It is difficult to examine because of complexity, and it is not easy to fix. Currently, we have nothing to address different intercepts and slopes predictive bias. We would need to use a different measure or measures for each group.

16.2.2 Test Structure Bias

In addition to predictive bias, another type of test bias is test structure bias. Test structure bias involves differences in internal test characteristics across groups. Examining test structure bias is different from examining the total score, as is used when examining predictive bias. Test structure bias can be identified empirically or based on theory/judgment.

Empirically, test structure bias can be examined in multiple ways.

16.2.2.1 Empirical Approaches to Identification

16.2.2.1.1 Item \(\times\) Group tests (ANOVA)

Item \(\times\) Group tests in analysis of variance (ANOVA) examine whether the difference between groups on the overall score match comparisons among smaller items sets between groups. Item \(\times\) Group tests are used to rule out that items are operating in different ways in different groups. If the items operate in different ways in different groups, they do not have the same meaning across groups. For example, if we are going to use a measure for multiple groups, we would expect its items to operate similarly across groups. So, if women show higher scores on a depression measure compared to men, would also expect them to show similar elevations on each item (e.g., sleep loss).

16.2.2.1.2 Item Response Theory

Using item response theory, we can examine differential item functioning (DIF). Evidence of DIF, indicates that there are differences between group in terms of discrimination and/or difficulty/severity of items. Differences between groups in terms of the item characteristic curve (which combines the item’s discrimination and severity) would be evidence against construct validity invariance between the groups and would provide evidence of bias. DIF examines stretching and compression of different groups. As an example, consider the item “bites others” in relation to externalizing problems. The item would be expected to show a weaker discrimination and higher severity in adults compared to children. DIF is discussed in Section 16.9.

16.2.2.1.3 Confirmatory Factor Analysis

Confirmatory factor analysis allows tests of measurement invariance (also called factorial invariance). Measurement invariance examines whether the factor structure of the underlying latent variables in the test is consistent across groups. It also examines whether the manifestation of the construct differs between groups. Measurement invariance is discussed in Section 16.10.

Even if you find the same slope and intercepts across groups in a prediction model, the measure would still be assessing different constructs across groups if the measure has a different factor structure between the groups. A different factor structure across groups is depicted in Figure 16.8.

Different Factor Structure Across Groups.

Figure 16.8: Different Factor Structure Across Groups.

An example of a different factor structure across groups is the differentiation of executive functions from two factors to three factors (inhibition, working memory, cognitive flexibility) across childhood (Lee et al., 2013).

There are different degrees of measurement invariance (for a review, see Putnick & Bornstein, 2016):

  • Configural invariance: same number of factors in each group, and which indicators load on which factors are the same in each group (i.e., the same pattern of significant loadings in each group).
  • Metric (“weak factorial”) invariance: items have the same factor loadings (discrimination) in each group.
  • Scalar (“strong factorial”) invariance: items have the same intercepts (difficulty/severity) in each group.
  • Residual (“strict factorial”) invariance: items have the same residual/unique variances in each group.
16.2.2.1.4 Structural Equation Modeling

Structural equation modeling is a confirmatory factor analysis (CFA) model that incorporates prediction. Structural equation modeling allows examining differences in the underlying structure with differences in prediction in the same model.

16.2.2.1.5 Signal Detection Theory

Signal detection theory is a dynamic measure of bias. It allows examining the overall bias in selection systems, including both accuracy and errors at various cutoffs (sensitivity, specificity, positive predictive value, and negative predictive value), as well as accuracy across all possible cutoffs (the area under the receiver operating characteristic curve). While there may be similar predictive validity between groups, the type of errors we are making across groups might differ. It is important to decide which types of error to emphasize depending on the fairness goals and examining sensitivity/specificity to adjust cutoffs.

16.2.2.1.6 Empirical Evidence of Test Structure Bias

It is not uncommon to find items that show differences across groups in severity (intercepts) and/or discrimination (factor loadings). However, cross-group differences in item functioning tend to be small and not consistent across studies, suggesting that some of the differences may reflect Type I errors that result from sampling error and multiple testing. That said, some instances of cross-group differences in item parameters could reflect test structure bias that is real and important to address.

16.2.2.2 Theoretical/Judgmental Approaches to Identification

16.2.2.2.1 Facial Validity Bias

Facial validity bias considers the extent to which an average person thinks that an item is biased—i.e., the item has differing validity between minority and majority groups. If so, the item should be reconsidered. Does an item disfavor certain groups? Is the language specific to a particular group? Is it offensive to some people? This type of judgment moves into the realm of whether or not an item should be used.

16.2.2.2.2 Content Validity Bias

Content validity bias is determined by judgments of construct experts who look for items that do not do an adequate job assessing the construct between groups. A construct may include some content facets in one group, but may include different content facets in another group, as depicted in Figure 16.9.

Different Content Facets in a Given Construct for Two Groups.

Figure 16.9: Different Content Facets in a Given Construct for Two Groups.

Examples include information questions and vocabulary questions on the Wechsler Adult Intelligence Scale. If an item is linguistically complicated, grammatically complex or convoluted, or a double negative, it may be less valid or predictive for rural populations and those with less education.

Also, stereotype threat may contribute to content validity bias. Stereotype threat occurs when people are or feel at risk of conforming themselves to stereotypes about their social group, thus leading them to show poorer performance in ways that are consistent with the stereotype. Stereotype threat may partially explain why some women may perform more poorly on some math items than some men.

Another example of content validity bias is when the same measure is used to assess a construct across ages even though the construct shows heterotypic continuity. Heterotypic continuity occurs when a construct changes in its behavioral manifestation with development (Petersen et al., 2020). That is, the same construct may look different at different points in development. An example of a construct that shows heterotypic continuity is externalizing problems. In early childhood, externalizing problems often manifest in overt forms, including physical aggression (e.g., biting) and temper tantrums. By contrast, in adolescence and adulthood, externalizing problems more often manifest in covert ways, including relational aggression and substance use. Content validity and facial validity bias judgments are often related, but not always.

16.3 Examples of Bias

As described in the overview in Section 16.1, there is not much empirical evidence of test bias (Brown et al., 1999; Hall et al., 1999; Jensen, 1980; Kuncel & Hezlett, 2010; Reynolds et al., 2021; Reynolds & Suzuki, 2012; Sackett et al., 2008; Sackett & Wilk, 1994). That said, some item-level bias is not uncommon. One instance of test bias is that, historically, women’s grades in math and engineering classes tended to be under-estimated by the Scholastic Aptitude Test [SAT; M. J. Clark & Grandy (1984)]. Fernández & Abe (2018) review the evidence on other instances of test and item bias. For instance, test bias can occur if a subgroup is less familiar with the language, the stimulus material, or the response procedures, or if they have different response styles. In addition to test bias, there are known patterns of bias in clinical judgment, as described in Section 25.3.11.

16.4 Test Fairness

There is interest in examining more than just the accuracy of measures. It is also important to examine the errors being made and differentiate the weight or value of different kinds of errors (and correct decisions). Consider an example of an unbiased test, as depicted in Figure 16.10, adapted from L. S. Gottfredson (1994). Although the example is of a White group and a Black group, we could substitute any two groups into the example (e.g., males versus females).

Potential Unfairness in Testing. The ovals represent the distributions of individuals’ performance both on a test and a job performance criterion. TP = true positive; TN = true negative; FP = false positive; FN = false negative. (Adapted from L. S. Gottfredson (1994), Figure 1, p. 958. Gottfredson, L. S. (1994). The science and politics of race-norming. American Psychologist, 49(11), 955–963. https://doi.org/10.1037/0003-066X.49.11.955)

Figure 16.10: Potential Unfairness in Testing. The ovals represent the distributions of individuals’ performance both on a test and a job performance criterion. TP = true positive; TN = true negative; FP = false positive; FN = false negative. (Adapted from L. S. Gottfredson (1994), Figure 1, p. 958. Gottfredson, L. S. (1994). The science and politics of race-norming. American Psychologist, 49(11), 955–963. https://doi.org/10.1037/0003-066X.49.11.955)

The example is of an unbiased test between White and Black job applicants. There are no differences between the two groups in terms of slope. If we drew a regression line, the line would go through the centroid of both ovals. Thus, the measure is equally predictive in both groups even though that the Black group failed the test at a higher rate than the White group. Moreover, there is no difference between the groups in terms of intercept. Thus, the performance of one group is not over-estimated relative to the performance of the other group. To demonstrate what a different intercept would look like, Group X shows a different intercept. In sum, there is no predictive validity bias between the two groups. But just because the test predicts just as well in both groups does not mean that the selection procedures are fair.

Although the test is unbiased, there are differences in the quality of prediction: there are more false negatives in the Black group compared to the White group. This gives the White group an advantage and the Black group additional disadvantages. If the measure showed the same quality of prediction, we would say the test is fair. The point of the example is that just because a test is unbiased does not mean that the test is fair.

There are two kinds of errors: false negatives and false positives. Each error type has very different implications. False negatives would be when the test predicts that an applicant would perform poorly and we do not give them the job even though they would have performed well. False negatives have a negative effect on the applicant. And, in this example, there are more false negatives in the Black group. By contrast, false positives would be when we predict that an applicant would do well, and we give them the job but they perform poorly. False positives are a benefit to the applicant but have a negative effect on the employer. In this example, there are more false positives in the White group, which is an undeserved benefit based on the selection ratio; therefore, the White group benefits.

In sum, equal accuracy of prediction (i.e., equal total number of errors) does not necessarily mean the test is fair; we must examine the types of errors. Merely ensuring accuracy does not ensure fairness!

16.4.1 Adverse Impact

Adverse impact is defined as rejecting members of one group at a higher rate than another group. Adverse impact is different from test validity. According to federal guidelines, adverse impact is present if the selection rate of one group is less than four-fifths (80%) the selection rate of the group with the highest selection rate.

There is much more evidence of adverse impact than test bias. Indeed, disparate impact of tests on personnel selection across groups is the norm rather than the exception, even when using valid tests that are unbiased, which in part reflect group-related differences in job-related skills (L. S. Gottfredson, 1994). Examples of adverse impact include:

  • physical ability tests, which produce substantial adverse impact against women (despite over-estimation of women’s performance),
  • cognitive ability tests, which produce substantial impact against some ethnic minority groups, especially Black and Hispanic people (despite over-estimation of Black and Hispanic people’s performance), even though cognitive ability tests tend to be among the strongest predictors of job performance (Sackett et al., 2008; Schmidt & Hunter, 1981), and
  • personality tests, which produce higher estimates of dominance among men than women; it is unclear whether this has predictive bias.

16.4.2 Bias Versus Fairness

Whether a measure is accurate or shows test bias is a scientific question. By contrast, whether a test is fair and thus should be used for a given purpose is not just a scientific question; it is also an ethical question. It involves the consideration of the potential consequences of testing in terms of social values and consequential validity.

16.4.3 Operationalizing Fairness

There are many perspectives to what should be considered when evaluating test fairness (American Educational Research Association et al., 2014; Camilli, 2013; Committee on the General Aptitude Test Battery et al., 1989; Dorans, 2017; Fletcher et al., 2021; Gipps & Stobart, 2009; Helms, 2006; Jonson & Geisinger, 2022; Melikyan et al., 2019; Sackett et al., 2008; Thorndike, 1971; Zieky, 2006, 2013). As described in Fletcher et al. (2021), there are three primary ways of operationalizing fairness:

  1. Equal outcomes: the selection rate is the same across groups.
  2. Equal opportunity: the sensitivity (true positive rate; 1 \(-\) false negative rate) is the same across groups.
  3. Equal odds: the sensitivity is the same across groups and the specificity (true negative rate; 1 \(-\) false positive rate) is the same across groups.

For example, the job selection procedure shows equal outcomes if the proportion of men selected is equal to the proportion of women selected. The job selection procedure shows equal opportunity if, among those who show strong job performance, the proportion of classification errors (false negatives) is the same for men and women. Receiver operating characteristic (ROC) curves are depicted for two groups in Figure 16.11. A cutoff that represents equal opportunity is depicted with a horizontal line (i.e., the same sensitivity) in Figure 16.11. The job selection procedure shows equal odds if (a), among those who show strong job performance, the proportion of classification errors (false negatives) is the same for men and women, and (b), among those who show poor job performance, the proportion of classification errors (false positives) is the same for men and women. A cutoff that represents equal odds is depicted where the ROC curve for Group A intersects with the ROC curve from Group B in Figure 16.11. The equal odds approach to fairness is consistent with a National Academy of Sciences committee on fairness (Committee on the General Aptitude Test Battery et al., 1989; L. S. Gottfredson, 1994). Approaches to operationalizing fairness in the context of prediction models are described by Paulus & Kent (2020).

Receiver Operating Characteristic (ROC) Curves for Two Groups. (Figure reprinted from Fletcher et al. (2021), Figure 2, p. 3. Fletcher, R. R., Nakeshimana, A., & Olubeko, O. (2021). Addressing fairness, bias, and appropriate use of artificial intelligence and machine learning in global health. Frontiers in Artificial Intelligence, 3(116). https://doi.org/10.3389/frai.2020.561802)

Figure 16.11: Receiver Operating Characteristic (ROC) Curves for Two Groups. (Figure reprinted from Fletcher et al. (2021), Figure 2, p. 3. Fletcher, R. R., Nakeshimana, A., & Olubeko, O. (2021). Addressing fairness, bias, and appropriate use of artificial intelligence and machine learning in global health. Frontiers in Artificial Intelligence, 3(116). https://doi.org/10.3389/frai.2020.561802)

It is not possible to meet all three types of fairness simultaneously (i.e., equal selection rates, sensitivity, and specificity across groups) unless the base rates are the same across groups or the selection is perfectly accurate (Fletcher et al., 2021). In the medical context, equal odds is the most common approach to fairness. However, using the cutoff associated with equal odds typically reduces overall classification accuracy. And, changing the cutoff for specific groups can lead to negative consequences. In the case that equal odds results in a classification accuracy that is too low, it may be worth considering using separate assessment procedures/tests for each group. In general, it is best to follow one of these approaches to fairness. It is difficult to get right, so try to minimize negative impact. Many fairness supporters argue for simpler rules. In the 1991 Civil Rights Act, score adjustments based on race, gender, and ethnicity (e.g., within-race norming or race-conscious score adjustments) were made illegal in personnel selection (L. S. Gottfredson, 1994).

Another perspective to fairness is that selection procedures should predict job performance and if they are correlated with any group membership (e.g., race, socioeconomic status, or gender), the test should not be used (Helms, 2006). That is, according to Helms, we should not use any test that assesses anything other than the construct of interest (job performance). Unfortunately, however, no measures like this exist. Every measure assesses multiple things, and factors such as poverty can have long-lasting impacts across many domains.

Another perspective to fairness is to make the selection procedures equal the number of successes within each group (Thorndike, 1971). According to this perspective, if you want to do selection, you should hire all people, then look at job performance. If among successful employees, 60% are White and 40% are Black, then set this selection rate for each group (i.e., hiring 80% White individuals and 20% Black individuals is not okay). According to this perspective, a selection system is only fair if the majority–minority differences on the selection device used are equal in magnitude to majority–minority differences in job performance. Selection criteria should be made based on prior distributions of success rates. However, you likely will not ever really know the true base rate in these situations. No one uses this approach because you would have a period where you have to accept everyone to find the percent that works. Also, this would only work in a narrow window of time because the selection pool changes over time.

There are lots of groups and subgroups. Ensuring fairness is very complex, and there is no way to accomplish the goal of being equally fair to all people. Therefore, do the best you can and try to minimize negative impact.

16.5 Correcting For Bias

16.5.1 What to Do When Detecting Bias

When examining item bias (using differential item functioning/DIF or measurement non-invariance) with many items (or measures) across many groups, there can be many tests, which will make it likely that DIF/non-invariance will be detected, especially with a large sample. Some detected DIF may be artificial or trivial, but other DIF may be real and important to address. It is important to consider how you will proceed when detecting DIF/non-invariance. Considerations of effect size and theory can be important for evaluating the DIF/non-invariance and whether it is negligible or important to address.

When detecting bias, there are several steps to take. First, consider what the bias indicates. Does the bias present adverse impact for a minority group? For what reasons might the bias exist? Second, examine the effect size of the bias. If the effects are small, if the bias does not present adverse impact for a minority group, and if there is no compelling theoretical reason for the bias, the bias might not be sufficient to scrap the instrument for the population. Some detected bias may be artificial, but other bias may be real. Gender and cultural differences have shown a number of statistically significant effects for a number of different assessment purposes, but many of the observed effects are quite small and likely trivial, and they do not present compelling reasons to change the assessment (Youngstrom & Van Meter, 2016).

However, if you find bias, correct for it! There are a number of score adjustment and non-score adjustment approaches to correct for bias, as described in Sections 16.5.2 and 16.5.3. If the bias occurs at the item level (e.g., test structure bias), it is generally recommended to remove or resolve items that show non-negligible bias. There are three primary options: (1) drop the item for both groups, (2) drop the item for one group but keep it for the other group, or (3) freely estimate the parameters for the item across groups. Addressing items that show larger bias can also reduce artificial bias in other items (Hagquist & Andrich, 2017). Thus, researchers are encouraged to handle item bias sequentially from high to low in magnitude. If the bias occurs at the test score level (e.g., predictive bias), score adjustments may be considered.

If you do not correct for bias, consider the impact of the test, procedure, and selection procedure when interpreting scores. Interpret scores with caution and provide necessary caveats in resulting papers or reports regarding the interpretations in question. In sum, it is important to examine the possibility of bias—it is important to consider how much “erroneous junk” you are introducing into your research.

16.5.2 Score Adjustment to Correct for Bias

Score adjustment involves adjusting scores for a particular group or groups.

16.5.2.1 Why Adjust Scores?

There may be several reasons to adjust scores for various groups in a given situation. First, there may be social goals to adjust scores. For example, we may want our selection device to yield personnel that better represent the nation or region, including diversity of genders, races, majors, social classes, etc. Score adjustments are typically discussed with respect to racial minority differences due to historical and systemic inequities. Our society aims to provide equal opportunity, including the opportunity to gain a fair share (i.e., proportional representation) of jobs. A diversity of perspectives in a job is a strength; a diversity of perspectives can lead to greater creativity and improved problem-solving. A second potential reason that we may want to apply score adjustment is to correct for bias. A third potential reason that we may want to apply score adjustment is to improve the fairness of a test.

16.5.2.2 Types of Score Adjustment

There are a number of potential techniques that have been used in attempts to correct for bias, i.e., to reduce negative impact of the test on an under-represented group. What is considered an under-represented group may depend on the context. For instance, men are under-represented compared to women as nurses, preschool teachers, and college students. However, men may not face the same systemic challenges compared to women, so even though men may show under-representation in some domains, it is arguable whether scores should be adjusted to increase their representation. Techniques for score adjustment include:

16.5.2.2.1 Bonus Points

Providing bonus points involves adding a constant number of points to the scores of all individuals who are members of a particular group with the goal of eliminating or reducing group differences. Bonus points is used to correct for predictive bias differences in intercepts between groups. An example of bonus points is military veterans in placement for civil service jobs—points are added to the initial score for all veterans (e.g., add 5 points to test scores of all veterans). An example of using bonus points as a score adjustment is depicted in Figure 16.12.

Using Bonus Points as a Scoring Adjustment.

Figure 16.12: Using Bonus Points as a Scoring Adjustment.

There are several pros of bonus points. If the distribution of each group is the same, this will effectively reduce group differences. Moreover, it is a simple way of impacting test selection and procedure without changing the test, which is therefore a great advantage. There are several cons of bonus points. If there are differences in group standard deviations, adding bonus points may not actually correct for bias. The use of bonus points also obscures what is actually being done to scores, so other methods like using separate cutoffs may be more explicit. In addition, the simplicity of bonus points is also a great disadvantage because it is easily understood and often not viewed as “fair” because some people are getting extra points that others do not.

16.5.2.2.2 Within-Group Norming

A norm is the standard of performance that a person’s performance can be compared to. Within-group norming treats the person’s group in the sample as the norm. Within-group norming converts an individual’s score to standardized scores (e.g., T scores) or percentiles within one’s own group. Then, the people are selected based on the highest standard scores across groups. Withing-group norming is used to correct for predictive bias differences in slopes between groups. An example of using within-group norming as a score adjustment is depicted in Figure 16.13.

Using Within-Group Norming as a Scoring Adjustment.

Figure 16.13: Using Within-Group Norming as a Scoring Adjustment.

There are several pros of within-group norming. First, it accounts for differences in group standard deviations and means, so it does not have the same problem as bonus points and is generally more effective at eliminating adverse impact compared to bonus points. Second, some general (non-group-specific) norms are clearly irrelevant for characterizing a person’s functioning. Group-specific norms aim to describe a person’s performance relative to people with a similar background, thus potentially reducing cultural bias. Third, group-specific norms may better reflect cultural, educational, socioeconomic, and other factors that may influence a person’s score (Burlew et al., 2019). Fourth, group-specific norms may increase specificity, and reduce over-pathologizing by preventing giving a diagnosis to people who might not show a condition (Manly & Echemendia, 2007).

There are several cons of within-group norming. First, group differences could be maintained if one decides to norm based on a reference sample or, when scores are skewed, a local sample, especially when using standardized scores. However, percentile scores will consistently eliminate adverse impact. Second, using group-specific norms may obscure background variables that explain underlying reasons for group-related differences in test performance (Manly, 2005; Manly & Echemendia, 2007). Third, group-specific norms do not address the problem if the measure shows test bias (Burlew et al., 2019). Fourth, group-specific norms may reduce sensitivity to detect conditions (Manly & Echemendia, 2007). For instance, they may prevent people from getting treatment who would benefit. It is worth noting that within-group norming on the basis of sex, gender, and ethnicity is illegal for the basis of personnel selection according to the 1991 Civil Rights Act.

As an example of within-group norming, the National Football League used to use race-norming for identification of concussions. The effect of race-norming, however, was that it lowered Black players’ concussion risk scores, which prevented many Black players from being identified as having sustained a concussion and from receiving needed treatment. Race-norming compared the Black football players cognitive test scores to group-specific norms: the cognitive test scores of Black people in the general population (not to common norms). Using Black-specific norms assumed that Black football players showed lower cognitive ability than other groups, so a low cognitive ability score for a Black player was less likely to be flagged as concerning. Thus, the race-specific norms led to lower identified rates of concussions among Black football players compared to White football players. Due to the adverse impact, Black players sued the National Football League, and the league stopped the controversial practice of race-norming for identification of concussion (https://www.washingtonpost.com/sports/2021/06/03/nfl-concussion-settlement-race-norming/; archived at https://perma.cc/KN3L-5Z7R).

A common question is whether to use group-specific norms or common norms. Group-specific norms are a controversial practice, and the answer depends. If you are interested in a person’s absolute functioning (e.g., for determining whether someone is concussed or whether they are suitable to drive), recommendations are to use common norms, not group-specific norms (Barrash et al., 2010; Silverberg & Millis, 2009). If, by contrast, you are interested in a person’s relative functioning compared to a specific group, within-group norming could make sense if there is an appropriate reference group. The question about which norms to use are complex, and psychologists should evaluate the cost and benefit of each norm, and use the norm with the greatest benefit and the least cost for the client (Manly & Echemendia, 2007).

16.5.2.2.3 Separate Cutoffs

Using separate cutoffs involves using a separate cutoff score per group and selecting the top number from each group. That is, using separate cutoffs involves using different criteria for each group. Using separate cutoffs functions the same as adding bonus points, but it has greater transparency—i.e., you are lowering the standard for one group compared to another group. An example of using separate cutoffs as a score adjustment is depicted in Figure 16.14.

Using Separate Cutoffs as a Scoring Adjustment. In this example, the cutoff for the majority group is 128; the cutoff for the minority group is 123.

Figure 16.14: Using Separate Cutoffs as a Scoring Adjustment. In this example, the cutoff for the majority group is 128; the cutoff for the minority group is 123.

16.5.2.2.4 Top-Down Selection from Different Lists

Top-down selection from different lists involves taking the best from two different lists according to a preset rule as to how many to select from each group. Top-down selection from different lists functions the same as within-group norming. An example of using top-down selection from different lists as a score adjustment is depicted in Figure 16.15.

Using Top-Down Selection From Different Lists as a Scoring Adjustment. In this example, the top three candidates are selected from each group.

Figure 16.15: Using Top-Down Selection From Different Lists as a Scoring Adjustment. In this example, the top three candidates are selected from each group.

16.5.2.2.5 Banding

Banding uses a tier system that is based on the assumption that individuals within a specific score range are regarded as having equivalent scores. So that we do not over-estimate small score differences, scores within the same band are seen as equivalent—and the order of selection within the band can be modified depending on selection goals. The standard error of measurement (SEM) is used to estimate the precision (reliability) of the test scores, and it is used as the width of the band.

Consider an example: if a person received a score with confidence interval of 18–22, then scores between 18 to 22 are not necessarily different due to random fluctuation (measurement error). Therefore, scores in that range are considered the same, and we take a band of scores. However, banding by itself may not result in increased selection of lower scoring groups. The band provides a subsample of applicants so that we can use other criteria (other than the test) to select a candidate. Giving “minority preference” involves selecting members of minority group in a given band before selecting members of the majority group. An example of using banding as a score adjustment is depicted in Figure 16.16.

Using Banding as a Scoring Adjustment.

Figure 16.16: Using Banding as a Scoring Adjustment.

The problem with banding is that bands are set by the standard error of measurement: you can select the first group from the first band, but then whom do you select after the first band? There is no rationale where to “stop” the band because there are indistinguishable scores on the edges of each band to the next band. That is, 17 is indistinguishable from 18 (in terms of its confidence interval), 16 is indistinguishable from 17, and so on. Therefore, banding works okay for the top scores, but if you are going to hire a lot of candidates, it is a problem. A solution to this problem with banding is to use a sliding band, as described later.

16.5.2.2.6 Banding with Bonus Points

Banding is often used with bonus points to reduce the negative impact for minority groups. An example of using banding with bonus points as a score adjustment is depicted in Figure 16.17.

Using Banding With Bonus Points as a Scoring Adjustment.

Figure 16.17: Using Banding With Bonus Points as a Scoring Adjustment.

16.5.2.2.7 Sliding Band

Using a sliding band is a solution to the problem of which bands to use when using banding. Using a sliding band can help increase the number of minorities selected. Using the top band, you select all members of a minority group in the top band, then select members of the majority group with the top score of the band, then slide the band down (based on SEM), and repeat. You work your way down with bands though groups that are indistinguishable based on SEM, until getting a cell needed to select a relevant candidate.

For instance, if the top score is 22 and the SEM is 4 points, the first band would be: [18, 22]. Here is how you would proceed:

  1. Select the minority group members who have a score between 18 to 22.
  2. Select the majority group members who have a score of 22.
  3. Slide the band down based on the SEM to the next highest score: [17, 21].
  4. Select the minority group members who have a score between 17 to 21.
  5. Select the majority group members who have a score of 21.
  6. Slide the band down based on the SEM to the next highest score: [16, 20].
  7. And so on

An example of using a sliding band as a score adjustment is depicted in Figure 16.18.

Using a Sliding Band as a Scoring Adjustment.

Figure 16.18: Using a Sliding Band as a Scoring Adjustment.

In sum, using a sliding band, scores that are not significantly lower than the highest remaining score should not be treated as different. Using a sliding band has the same effects on decisions as bonus points that are the width of the band. For example, if the SEM is 3, it has the same decisions as bonus points of 3; therefore, any scores within 3 of the highest score are now considered equal.

A sliding band is popular because of its scientific and statistical rationale. Also, it is more confusing and, therefore, preferred by some because it may be less likely to be sued. However, a sliding band may not always eliminate adverse impact. A sliding band has never been overturned in court (or at least, not yet).

16.5.2.2.8 Separate Tests

Using separate tests for each group is another option to reduce bias. For instance, you might use one test for the majority group and a different test for the minority group, making sure that each test is valid for the relevant group. Using separate tests is an extreme version of top-down selection and within-group norming. Using separate tests would be an option if a measure shows different slopes predictive bias.

One way of developing separate tests is to use empirical keying by group: different items for each group are selected based on each item’s association with the criterion in each group. Empirical keying is an example of dustbowl empiricism (i.e., relying on empiricism rather than theory). However, theory can also inform the item selection.

16.5.2.2.9 Item Elimination based on Group Differences

Items that show large group differences in scores can be eliminated from the test. If you remove enough items showing differences between groups, you can get similar scores between groups and can get equal group selection. A problem of item elimination based on group differences is that if you get rid of predictive items, then two goals, equal selection and predictive power, are not met. If you use this method, you often have to be willing for the measure to show decreases in predictive power.

16.5.2.3 Use of Score Adjustment

Score adjustment can be used in a number of different domains, including tests of aptitude and intelligence. Score adjustment also comes up in other areas. For example, the number of drinks it takes to be considered binge drinking differs between men (five) and women (four). Although the list of score adjustment options is long, they all really reduce to two ways:

  1. Bonus points
  2. Within-group norming

Bonus points and within-group norming are the techniques that are most often used in the real world. These techniques differ in their degree of obscurity—i.e., confusion that is caused not for scientific reasons, but for social, political, and dissemination and implementation reasons. Often procedures that are hard to understand are preferred because it is hard to argue against, critique, or game the system. Basically, you have two options for score adjustment. One option is to adjust scores by raising scores in one group or lowering the criterion in one group. The second primary option is to renorm or change the scores. In sum, you can change the scores, or you can change the decisions you make based on the scores.

16.5.3 Other Ways to Correct for Bias

Because score adjustment is controversial, it is also important to consider other potential ways to correct for bias that do not involve score adjustment. Strategies other than score adjustment to correct for bias are described by Sackett et al. (2001).

16.5.3.1 Use Multiple Predictors

In general, high-stakes decisions should not be made based on the results from one test. So, for instance, do not make hiring decisions based just on aptitude assessments. For example, college admissions decisions are not made just based on SAT scores, but also one’s grades, personal statement, extracurricular activities, letters of recommendation, etc. Using multiple predictors works best when the predictors are not correlated with the assessment that has adverse impact, which is difficult to achieve.

There are larger majority–minority subgroup differences in verbal and cognitive ability tests than in noncognitive skills (e.g., motivation, personality, and interpersonal skills). So, it is important to include assessment of relevant noncognitive skills. Include as many relevant aspects of the construct as possible for content validity. For a job, consider as many factors as possible that are relevant for success, e.g., cognitive and noncognitive abilities.

16.5.3.2 Change the Criterion

Another option is to change the criterion so that the predictive validity of tests is less skewed. It may be that the selection instrument is not biased but the way in which we are thinking about selection procedures is biased. For example, for judging the quality of universities, there are many different criteria we could use. It could be valuable to examine the various criteria, and you might find what is driving adverse effects.

16.5.3.3 Remove Biased Items

Using item response theory or confirmatory factor analysis, you can identify items that function differently across groups (i.e., differential item functioning/DIF or measurement non-invariance). For instance, you can identify items that show different discrimination/factor loadings or difficulty/intercepts by group. You do not just want to remove items that show mean-level differences in scores (or different rates of endorsement) for one group than another, because there may be true group differences in their level on particular items. If an item is clearly invalid in one group but valid in another group, another option is to keep the item in one group, and to remove it in another group.

Be careful when removing items because removing items can lead to poorer content validity—i.e., items may no longer be a representative set of the content of the construct. Removing items also reduces a measure’s reliability and ability to detect individual differences (Hagquist, 2019; Hagquist & Andrich, 2017). DIF effects tend to be small and inconsistent; removing items showing DIF may not have a big impact.

16.5.3.4 Resolve Biased Items

Another option, for items identified that show differential item functioning using IRT or measurement non-invariance using CFA, is to resolve instead of remove items. Resolving items involves allowing an item to have a different discrimination/factor loading and/or difficulty/intercept parameter for each group. Allowing item parameters to differ across groups has a very small effect on reliability and person separation, so it can be preferable to removing items (Hagquist, 2019; Hagquist & Andrich, 2017).

16.5.3.5 Use Alternative Modes of Testing

Another option is to use alternative modes of testing. For example, you could use audio or video to present test items, rather than requiring a person to read the items, or write answers. Typical testing and computerized exams are oriented toward the upper-middle class, which is therefore a procedure problem! McClelland’s (1973) argument is that we need more real-life testing. Real-life testing could help address stereotype threats and the effects of learning disabilities. However, testing in different modalities could change the construct(s) being assessed.

16.5.3.6 Use Work Records

Using work records is based on McClelland’s (1973) argument to use more realistic and authentic assessments of job-relevant abilities. Evidence on the value of work records for personnel selection is mixed. In some cases, use of work records can actually increase adverse impact on under-represented groups because the primary group typically already has an idea of how to get into the relevant job or is already in the relevant job; therefore, they have a leg up. It would be acceptable to use work records if you trained people first and then tested, but no one spends the time to do this.

16.5.3.7 Increase Time Limit

Another option is to allot people more testing time, as long as doing so does not change the construct. Time limits often lead to greater measurement error because scores conflate pace and quality of work. Increasing time limits requires convincing stakeholders that job performance is typically not “how fast you do things” but “how well you do them”—i.e., that time does not correlate with outcome of interest. The utility of increasing time limits depends on the domain. In some domains, efficiency is crucial (e.g., medicine, pilot). Increasing time limits is not that effective in reducing group differences, and it may actually increase group differences.

16.5.3.8 Use Motivation Sets

Using motivation sets involves finding ways to increase testing motivation for minority groups. It is probably an error to think that a test assesses just aptitude; therefore, we should also consider an individual’s motivation to test. Thus, part of the score has to do with ability and some of the score has to do with motivation. You should try to maximize each examinee’s motivation, so that the person’s score on the measure better captures their true ability score. Motivation sets could include, for example, using more realistic test stimuli that are clearly applicable to the school or job requirements (i.e., that have face validity) to motivate all test takers.

16.5.3.9 Use Instructional Sets

Using instructional sets involves coaching and training. For instance, you could inform examinees about the test content, provide study materials, and recommend test-taking strategies. This could narrow the gap between groups because there is an implicit assumption that the primary group already has “light” training. Using instructional sets aims to reduce error variance due to test anxiety, unfamiliar test format, and poor test-taking skills.

Giving minority groups better access to test preparation is based on the assumption that group differences emerge because of different access to test preparation materials. This could theoretically help to systematically reduce test score differences across groups. Standardized tests like the SAT/GRE/LSAT/GMAT/MCAT, etc. embrace coaching/training. For instance, the organization ETS gives training materials for free. After training, scores on standardized tests show some but minimal improvement. In general, training yields some improvement on quantitative subscales but minimal change on verbal subscales. However, the improvements tend to apply across groups, and they do not seem to lessen group differences in scores.

16.6 Getting Started

16.6.1 Load Libraries

Code
library("petersenlab") #to install: install.packages("remotes"); remotes::install_github("DevPsyLab/petersenlab")
library("lavaan")
library("semTools")
library("semPlot")
library("mirt")
library("dmacs") #to install: install.packages("remotes"); remotes::install_github("ddueber/dmacs")
library("strucchange")
library("MOTE")
library("tidyverse")
library("here")
library("tinytex")

16.6.2 Prepare Data

16.6.2.1 Load Data

cnlsy is a subset of a data set from the Children of the National Longitudinal Survey of Youth Survey (CNLSY). The CNLSY is a publicly available longitudinal data set provided by the Bureau of Labor Statistics (https://perma.cc/EH38-HDRN). The CNLSY data file for these examples is located on the book’s page of the Open Science Framework (https://osf.io/3pwza).

Code
cnlsy <- read_csv(here("Data", "cnlsy.csv"))

16.6.2.2 Simulate Data

For reproducibility, I set the seed below. Using the same seed will yield the same answer every time. There is nothing special about this particular seed.

Code
sampleSize <- 4000

set.seed(52242)

mydataBias <- data.frame(
  ID = 1:sampleSize,
  group = factor(c("male","female"),
                 levels = c("male","female")),
  unbiasedPredictor1 = NA,
  unbiasedPredictor2 = NA,
  unbiasedPredictor3 = NA,
  unbiasedCriterion1 = NA,
  unbiasedCriterion2 = NA,
  unbiasedCriterion3 = NA,
  predictor = rnorm(sampleSize, mean = 100, sd = 15),
  criterion1 = NA,
  criterion2 = NA,
  criterion3 = NA,
  criterion4 = NA,
  criterion5 = NA)

mydataBias$unbiasedPredictor1 <- rnorm(sampleSize, mean = 100, sd = 15)
mydataBias$unbiasedPredictor2[which(mydataBias$group == "male")] <- 
  rnorm(length(which(mydataBias$group == "male")), mean = 70, sd = 15)
mydataBias$unbiasedPredictor2[which(mydataBias$group == "female")] <- 
  rnorm(length(which(mydataBias$group == "female")), mean = 130, sd = 15)
mydataBias$unbiasedPredictor3[which(mydataBias$group == "male")] <- 
  rnorm(length(which(mydataBias$group == "male")), mean = 130, sd = 15)
mydataBias$unbiasedPredictor3[which(mydataBias$group == "female")] <- 
  rnorm(length(which(mydataBias$group == "female")), mean = 70, sd = 15)

mydataBias$unbiasedCriterion1 <- 1 * mydataBias$unbiasedPredictor1 + 
  rnorm(sampleSize, mean = 0, sd = 15)
mydataBias$unbiasedCriterion2 <- 1 * mydataBias$unbiasedPredictor2 + 
  rnorm(sampleSize, mean = 0, sd = 15)
mydataBias$unbiasedCriterion3 <- 1 * mydataBias$unbiasedPredictor3 + 
  rnorm(sampleSize, mean = 0, sd = 15)

mydataBias$criterion1[which(mydataBias$group == "male")] <- 
  .7 * mydataBias$predictor[which(mydataBias$group == "male")] + 
  rnorm(length(which(mydataBias$group == "male")), mean = 0, sd = 5)
mydataBias$criterion1[which(mydataBias$group == "female")] <- 
  .7 * mydataBias$predictor[which(mydataBias$group == "female")] + 
  rnorm(length(which(mydataBias$group == "female")), mean = 0, sd = 5)
mydataBias$criterion2[which(mydataBias$group == "male")] <- 
  .7 * mydataBias$predictor[which(mydataBias$group == "male")] + 
  rnorm(length(which(mydataBias$group == "male")), mean = 10, sd = 5)
mydataBias$criterion2[which(mydataBias$group == "female")] <- 
  .7 * mydataBias$predictor[which(mydataBias$group == "female")] + 
  rnorm(length(which(mydataBias$group == "female")), mean = 0, sd = 5)
mydataBias$criterion3[which(mydataBias$group == "male")] <- 
  .7 * mydataBias$predictor[which(mydataBias$group == "male")] + 
  rnorm(length(which(mydataBias$group == "male")), mean = 0, sd = 5)
mydataBias$criterion3[which(mydataBias$group == "female")] <- 
  .3 * mydataBias$predictor[which(mydataBias$group == "female")] + 
  rnorm(length(which(mydataBias$group == "female")), mean = 0, sd = 5)
mydataBias$criterion4[which(mydataBias$group == "male")] <- 
  .7 * mydataBias$predictor[which(mydataBias$group == "male")] + 
  rnorm(length(which(mydataBias$group == "male")), mean = 0, sd = 5)
mydataBias$criterion4[which(mydataBias$group == "female")] <- 
  .3 * mydataBias$predictor[which(mydataBias$group == "female")] + 
  rnorm(length(which(mydataBias$group == "female")), mean = 30, sd = 5)
mydataBias$criterion5[which(mydataBias$group == "male")] <- 
  .7 * mydataBias$predictor[which(mydataBias$group == "male")] + 
  rnorm(length(which(mydataBias$group == "male")), mean = 0, sd = 30)
mydataBias$criterion5[which(mydataBias$group == "female")] <- 
  .7 * mydataBias$predictor[which(mydataBias$group == "female")] + 
  rnorm(length(which(mydataBias$group == "female")), mean = 0, sd = 5)

16.6.2.3 Add Missing Data

Adding missing data to dataframes helps make examples more realistic to real-life data and helps you get in the habit of programming to account for missing data.

HolzingerSwineford1939 is a data set from the lavaan package (Rosseel et al., 2022) that contains mental ability test scores (x1x9) for seventh- and eighth-grade children.

Code
varNames <- names(mydataBias)
dimensionsDf <- dim(mydataBias[,-c(1,2)])
unlistedDf <- unlist(mydataBias[,-c(1,2)])
unlistedDf[sample(
  1:length(unlistedDf),
  size = .01 * length(unlistedDf))] <- NA
mydataBias <- cbind(
  mydataBias[,c("ID","group")],
  as.data.frame(
    matrix(
      unlistedDf,
      ncol = dimensionsDf[2])))
names(mydataBias) <- varNames

varNames <- names(HolzingerSwineford1939)
dimensionsDf <- dim(HolzingerSwineford1939[,paste("x", 1:9, sep = "")])
unlistedDf <- unlist(HolzingerSwineford1939[,paste("x", 1:9, sep = "")])
unlistedDf[sample(
  1:length(unlistedDf),
  size = .01 * length(unlistedDf))] <- NA
HolzingerSwineford1939 <- cbind(
  HolzingerSwineford1939[,1:6],
  as.data.frame(matrix(
    unlistedDf,
    ncol = dimensionsDf[2])))
names(HolzingerSwineford1939) <- varNames

16.7 Examples of Unbiased Tests (in Terms of Predictive Bias)

16.7.1 Unbiased test where males and females have equal means on predictor and criterion

Figure 16.19 depicts an example of an unbiased test where males and females have equal means on the predictor and criterion. The test is unbiased because there are no significant differences in the regression lines (of predictor predicting criterion) between males and females.

Code
summary(lm(
  unbiasedCriterion1 ~ unbiasedPredictor1 + group + unbiasedPredictor1:group,
  data = mydataBias))

Call:
lm(formula = unbiasedCriterion1 ~ unbiasedPredictor1 + group + 
    unbiasedPredictor1:group, data = mydataBias)

Residuals:
    Min      1Q  Median      3Q     Max 
-54.623 -10.050  -0.025  10.373  66.811 

Coefficients:
                                Estimate Std. Error t value Pr(>|t|)    
(Intercept)                     0.930697   2.295058   0.406    0.685    
unbiasedPredictor1              0.996976   0.022624  44.066   <2e-16 ***
groupfemale                    -1.165200   3.250481  -0.358    0.720    
unbiasedPredictor1:groupfemale -0.004397   0.032115  -0.137    0.891    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 15.13 on 3913 degrees of freedom
  (83 observations deleted due to missingness)
Multiple R-squared:  0.4963,    Adjusted R-squared:  0.496 
F-statistic:  1285 on 3 and 3913 DF,  p-value: < 2.2e-16
Code
plot(
  unbiasedCriterion1 ~ unbiasedPredictor1,
  data = mydataBias,
  xlim = c(
    0, 
    max(c(
      mydataBias$unbiasedCriterion1,
      mydataBias$unbiasedPredictor1),
      na.rm = TRUE)),
  ylim = c(
    0, 
    max(c(
      mydataBias$criterion1,
      mydataBias$predictor),
      na.rm = TRUE)),
  type = "n",
  xlab = "predictor",
  ylab = "criterion")
points(
  mydataBias$unbiasedPredictor1[which(mydataBias$group == "male")],
  mydataBias$unbiasedCriterion1[which(mydataBias$group == "male")],
  pch = 20,
  col = "blue")
points(mydataBias$unbiasedPredictor1[which(mydataBias$group == "female")],
       mydataBias$unbiasedCriterion1[which(mydataBias$group == "female")],
       pch = 1,
       col = "red")
abline(lm(
  unbiasedCriterion1 ~ unbiasedPredictor1,
  data = mydataBias[which(mydataBias$group == "male"),]),
  lty = 1,
  col = "blue")
abline(lm(
  unbiasedCriterion1 ~ unbiasedPredictor1,
  data = mydataBias[which(mydataBias$group == "female"),]),
  lty = 2,
  col = "red")
legend(
  "bottomright",
  c("Male","Female"),
  lty = c(1,2),
  pch = c(20,1),
  col = c("blue","red"))
Unbiased Test Where Males and Females Have Equal Means on Predictor and Criterion.

Figure 16.19: Unbiased Test Where Males and Females Have Equal Means on Predictor and Criterion.

16.7.2 Unbiased test where females have higher means than males on predictor and criterion

Figure 16.20 depicts an example of an unbiased test where females have higher means than males on the predictor and criterion. The test is unbiased because there are no differences in the regression lines (of predictor predicting criterion) between males and females.

Code
summary(lm(
  unbiasedCriterion2 ~ unbiasedPredictor2 + group + unbiasedPredictor2:group,
  data = mydataBias))

Call:
lm(formula = unbiasedCriterion2 ~ unbiasedPredictor2 + group + 
    unbiasedPredictor2:group, data = mydataBias)

Residuals:
    Min      1Q  Median      3Q     Max 
-47.302  -9.989   0.010   9.860  53.791 

Coefficients:
                               Estimate Std. Error t value Pr(>|t|)    
(Intercept)                    -1.29332    1.57773  -0.820    0.412    
unbiasedPredictor2              1.02006    0.02218  46.000   <2e-16 ***
groupfemale                     1.21294    3.28319   0.369    0.712    
unbiasedPredictor2:groupfemale -0.01501    0.03125  -0.480    0.631    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 14.81 on 3919 degrees of freedom
  (77 observations deleted due to missingness)
Multiple R-squared:  0.8412,    Adjusted R-squared:  0.8411 
F-statistic:  6921 on 3 and 3919 DF,  p-value: < 2.2e-16
Code
plot(
  unbiasedCriterion2 ~ unbiasedPredictor2,
  data = mydataBias,
  xlim = c(
    0, 
    max(c(
      mydataBias$unbiasedCriterion2,
      mydataBias$unbiasedPredictor2),
      na.rm = TRUE)),
  ylim = c(
    0, 
    max(c(
      mydataBias$criterion1,
      mydataBias$predictor),
      na.rm = TRUE)),
  type = "n",
  xlab = "predictor",
  ylab = "criterion")
points(
  mydataBias$unbiasedPredictor2[which(mydataBias$group == "male")],
  mydataBias$unbiasedCriterion2[which(mydataBias$group == "male")],
  pch = 20,
  col = "blue")
points(
  mydataBias$unbiasedPredictor2[which(mydataBias$group == "female")],
  mydataBias$unbiasedCriterion2[which(mydataBias$group == "female")],
  pch = 1,
  col = "red")
abline(lm(
  unbiasedCriterion2 ~ unbiasedPredictor2,
  data = mydataBias[which(mydataBias$group == "male"),]),
  lty = 1,
  col = "blue")
abline(lm(
  unbiasedCriterion2 ~ unbiasedPredictor2,
  data = mydataBias[which(mydataBias$group == "female"),]),
  lty = 2,
  col = "red")
legend(
  "bottomright",
  c("Male","Female"),
  lty = c(1,2),
  pch = c(20,1),
  col = c("blue","red"))
Unbiased Test Where Females Have Higher Means Than Males on Predictor and Criterion.

Figure 16.20: Unbiased Test Where Females Have Higher Means Than Males on Predictor and Criterion.

16.7.3 Unbiased test where males have higher means than females on predictor and criterion

Figure 16.21 depicts an example of an unbiased test where males have higher means than females on the predictor and criterion. The test is unbiased because there are no differences in the regression lines (of predictor predicting criterion) between males and females.

Code
summary(lm(
  unbiasedCriterion3 ~ unbiasedPredictor3 + group + unbiasedPredictor3:group,
  data = mydataBias))

Call:
lm(formula = unbiasedCriterion3 ~ unbiasedPredictor3 + group + 
    unbiasedPredictor3:group, data = mydataBias)

Residuals:
    Min      1Q  Median      3Q     Max 
-48.613 -10.115  -0.068   9.598  57.126 

Coefficients:
                               Estimate Std. Error t value Pr(>|t|)    
(Intercept)                    -1.68352    2.84985  -0.591    0.555    
unbiasedPredictor3              1.01072    0.02179  46.375   <2e-16 ***
groupfemale                     1.42842    3.26227   0.438    0.662    
unbiasedPredictor3:groupfemale -0.01187    0.03109  -0.382    0.703    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 14.82 on 3916 degrees of freedom
  (80 observations deleted due to missingness)
Multiple R-squared:  0.8376,    Adjusted R-squared:  0.8375 
F-statistic:  6732 on 3 and 3916 DF,  p-value: < 2.2e-16
Code
plot(
  unbiasedCriterion3 ~ unbiasedPredictor3,
  data = mydataBias,
  xlim = c(
    0,
    max(c(
      mydataBias$unbiasedCriterion3,
      mydataBias$unbiasedPredictor3),
      na.rm = TRUE)),
  ylim = c(0, max(c(
    mydataBias$criterion1,
    mydataBias$predictor),
    na.rm = TRUE)),
  type = "n",
  xlab = "predictor",
  ylab = "criterion")
points(
  mydataBias$unbiasedPredictor3[which(mydataBias$group == "male")],
  mydataBias$unbiasedCriterion3[which(mydataBias$group == "male")],
  pch = 20,
  col = "blue")
points(
  mydataBias$unbiasedPredictor3[which(mydataBias$group == "female")],
  mydataBias$unbiasedCriterion3[which(mydataBias$group == "female")],
  pch = 1,
  col = "red")
abline(lm(
  unbiasedCriterion3 ~ unbiasedPredictor3,
  data = mydataBias[which(mydataBias$group == "male"),]),
  lty = 1,
  col = "blue")
abline(lm(
  unbiasedCriterion3 ~ unbiasedPredictor3,
  data = mydataBias[which(mydataBias$group == "female"),]),
  lty = 2,
  col = "red")
legend(
  "bottomright",
  c("Male","Female"),
  lty = c(1,2),
  pch = c(20,1),
  col = c("blue","red"))
Unbiased Test Where Males Have Higher Means Than Females on Predictor and Criterion.

Figure 16.21: Unbiased Test Where Males Have Higher Means Than Females on Predictor and Criterion.

16.8 Predictive Bias: Different Regression Lines

16.8.1 Example of unbiased prediction (no differences in intercepts or slopes)

Figure 16.22 depicts an example of an unbiased test where males and females have equal means on the predictor and criterion. The test is unbiased because there are no differences in the regression lines (of predictor predicting criterion1) between males and females.

Code
summary(lm(
  criterion1 ~ predictor + group + predictor:group,
  data = mydataBias))

Call:
lm(formula = criterion1 ~ predictor + group + predictor:group, 
    data = mydataBias)

Residuals:
    Min      1Q  Median      3Q     Max 
-18.831  -3.338   0.004   3.330  19.335 

Coefficients:
                       Estimate Std. Error t value Pr(>|t|)    
(Intercept)           -0.264357   0.747269  -0.354    0.724    
predictor              0.701726   0.007370  95.207   <2e-16 ***
groupfemale           -0.515860   1.059267  -0.487    0.626    
predictor:groupfemale  0.006002   0.010476   0.573    0.567    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 4.952 on 3908 degrees of freedom
  (88 observations deleted due to missingness)
Multiple R-squared:  0.8225,    Adjusted R-squared:  0.8223 
F-statistic:  6035 on 3 and 3908 DF,  p-value: < 2.2e-16
Code
plot(
  criterion1 ~ predictor,
  data = mydataBias,
  xlim = c(
    0,
    max(c(
      mydataBias$criterion1,
      mydataBias$predictor),
      na.rm = TRUE)),
  ylim = c(
    0, 
    max(c(
      mydataBias$criterion1,
      mydataBias$predictor),
      na.rm = TRUE)),
  type = "n",
  xlab = "predictor",
  ylab = "criterion")
points(
  mydataBias$predictor[which(mydataBias$group == "male")],
  mydataBias$criterion1[which(mydataBias$group == "male")],
  pch = 20,
  col = "blue")
points(
  mydataBias$predictor[which(mydataBias$group == "female")],
  mydataBias$criterion1[which(mydataBias$group == "female")],
  pch = 1,
  col = "red")
abline(lm(
  criterion1 ~ predictor,
  data = mydataBias[which(mydataBias$group == "male"),]),
  lty = 1,
  col = "blue")
abline(lm(
  criterion1 ~ predictor,
  data = mydataBias[which(mydataBias$group == "female"),]),
  lty = 2,
  col = "red")
legend(
  "bottomright",
  c("Male","Female"),
  lty = c(1,2),
  pch = c(20,1),
  col = c("blue","red"))
Example of Unbiased Prediction (No Differences in Intercepts or Slopes Between Males and Females).

Figure 16.22: Example of Unbiased Prediction (No Differences in Intercepts or Slopes Between Males and Females).

16.8.2 Example of intercept bias

Figure 16.23 depicts an example of a biased test due to intercept bias. There are differences in the intercepts of the regression lines (of predictor predicting criterion2) between males and females: males have a higher intercept than females. That is, the same score on the predictor results in higher predictions for males than females.

Code
summary(lm(
  criterion2 ~ predictor + group + predictor:group,
  data = mydataBias))

Call:
lm(formula = criterion2 ~ predictor + group + predictor:group, 
    data = mydataBias)

Residuals:
     Min       1Q   Median       3Q      Max 
-18.6853  -3.4153  -0.0385   3.3979  18.0864 

Coefficients:
                        Estimate Std. Error t value Pr(>|t|)    
(Intercept)            10.054048   0.760084  13.228   <2e-16 ***
predictor               0.697987   0.007499  93.075   <2e-16 ***
groupfemale           -10.288893   1.074168  -9.578   <2e-16 ***
predictor:groupfemale   0.004748   0.010625   0.447    0.655    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 5.037 on 3913 degrees of freedom
  (83 observations deleted due to missingness)
Multiple R-squared:  0.8452,    Adjusted R-squared:  0.8451 
F-statistic:  7124 on 3 and 3913 DF,  p-value: < 2.2e-16
Code
plot(
  criterion2 ~ predictor,
  data = mydataBias,
  xlim = c(
    0,
    max(c(
      mydataBias$criterion2,
      mydataBias$predictor), na.rm = TRUE)),
  ylim = c(
    0,
    max(c(
      mydataBias$criterion2,
      mydataBias$predictor),
      na.rm = TRUE)),
  type = "n",
  xlab = "predictor",
  ylab = "criterion")
points(
  mydataBias$predictor[which(mydataBias$group == "male")],
  mydataBias$criterion2[which(mydataBias$group == "male")],
  pch = 20,
  col = "blue")
points(
  mydataBias$predictor[which(mydataBias$group == "female")],
  mydataBias$criterion2[which(mydataBias$group == "female")],
  pch = 1,
  col = "red")
abline(lm(
  criterion2 ~ predictor,
  data = mydataBias[which(mydataBias$group == "male"),]),
  lty = 1,
  col = "blue")
abline(lm(
  criterion2 ~ predictor,
  data = mydataBias[which(mydataBias$group == "female"),]),
  lty = 2,
  col = "red")
legend(
  "bottomright",
  c("Male","Female"),
  lty = c(1,2),
  pch = c(20,1),
  col = c("blue","red"))
Example of Intercept Bias in Prediction (Different Intercepts Between Males and Females).

Figure 16.23: Example of Intercept Bias in Prediction (Different Intercepts Between Males and Females).

16.8.3 Example of slope bias

Figure 16.24 depicts an example of a biased test due to slope bias. There are differences in the slopes of the regression lines (of predictor predicting criterion3) between males and females: males have a higher slope than females. That is, scores have stronger predictive validity for males than females.

Code
summary(lm(
  criterion3 ~ predictor + group + predictor:group,
  data = mydataBias))

Call:
lm(formula = criterion3 ~ predictor + group + predictor:group, 
    data = mydataBias)

Residuals:
     Min       1Q   Median       3Q      Max 
-18.3183  -3.3715   0.0256   3.3844  19.3484 

Coefficients:
                       Estimate Std. Error t value Pr(>|t|)    
(Intercept)           -1.041940   0.757914  -1.375    0.169    
predictor              0.708670   0.007475  94.804   <2e-16 ***
groupfemale            1.478534   1.074064   1.377    0.169    
predictor:groupfemale -0.413233   0.010621 -38.907   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 5.031 on 3927 degrees of freedom
  (69 observations deleted due to missingness)
Multiple R-squared:  0.9489,    Adjusted R-squared:  0.9489 
F-statistic: 2.432e+04 on 3 and 3927 DF,  p-value: < 2.2e-16
Code
plot(
  criterion3 ~ predictor,
  data = mydataBias,
  xlim = c(
    0,
    max(c(
      mydataBias$criterion3,
      mydataBias$predictor),
      na.rm = TRUE)),
  ylim = c(
    0,
    max(c(
      mydataBias$criterion3,
      mydataBias$predictor),
      na.rm = TRUE)),
  type = "n",
  xlab = "predictor",
  ylab = "criterion")
points(
  mydataBias$predictor[which(mydataBias$group == "male")],
  mydataBias$criterion3[which(mydataBias$group == "male")],
  pch = 20,
  col = "blue")
points(
  mydataBias$predictor[which(mydataBias$group == "female")],
  mydataBias$criterion3[which(mydataBias$group == "female")],
  pch = 1,
  col = "red")
abline(lm(
  criterion3 ~ predictor,
  data = mydataBias[which(mydataBias$group == "male"),]),
  lty = 1,
  col = "blue")
abline(lm(
  criterion3 ~ predictor,
  data = mydataBias[which(mydataBias$group == "female"),]),
  lty = 2,
  col = "red")
legend(
  "bottomright",
  c("Male","Female"),
  lty = c(1,2),
  pch = c(20,1),
  col = c("blue","red"))
Example of Slope Bias in Prediction (Different Slopes Between Males And Females).

Figure 16.24: Example of Slope Bias in Prediction (Different Slopes Between Males And Females).

16.8.4 Example of intercept and slope bias

Figure 16.25 depicts an example of a biased test due to intercept and slope bias. There are differences in the intercepts and slopes of the regression lines (of predictor predicting criterion4) between males and females: males have a higher slope than females. That is, scores have stronger predictive validity for males than females. Females have a higher intercept than males. That is, at lower scores, the same score on the predictor results in higher predictions for females than males; at higher scores on the predictor, the same score results in higher predictions for males than females.

Code
summary(lm(
  criterion4 ~ predictor + group + predictor:group,
  data = mydataBias))

Call:
lm(formula = criterion4 ~ predictor + group + predictor:group, 
    data = mydataBias)

Residuals:
     Min       1Q   Median       3Q      Max 
-18.2805  -3.4400   0.0269   3.3199  17.8824 

Coefficients:
                       Estimate Std. Error t value Pr(>|t|)    
(Intercept)           -0.276228   0.767080   -0.36    0.719    
predictor              0.702178   0.007565   92.81   <2e-16 ***
groupfemale           29.156167   1.086254   26.84   <2e-16 ***
predictor:groupfemale -0.392016   0.010743  -36.49   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 5.099 on 3930 degrees of freedom
  (66 observations deleted due to missingness)
Multiple R-squared:  0.7843,    Adjusted R-squared:  0.7842 
F-statistic:  4764 on 3 and 3930 DF,  p-value: < 2.2e-16
Code
plot(
  criterion4 ~ predictor,
  data = mydataBias,
  xlim = c(
    0,
    max(c(
      mydataBias$criterion4,
      mydataBias$predictor),
      na.rm = TRUE)),
  ylim = c(
    0,
    max(c(
      mydataBias$criterion4,
      mydataBias$predictor),
      na.rm = TRUE)),
  type = "n",
  xlab = "predictor",
  ylab = "criterion")
points(
  mydataBias$predictor[which(mydataBias$group == "male")],
  mydataBias$criterion4[which(mydataBias$group == "male")],
  pch = 20,
  col = "blue")
points(
  mydataBias$predictor[which(mydataBias$group == "female")],
  mydataBias$criterion4[which(mydataBias$group == "female")],
  pch = 1,
  col = "red")
abline(lm(
  criterion4 ~ predictor,
  data = mydataBias[which(mydataBias$group == "male"),]),
  lty = 1,
  col = "blue")
abline(lm(
  criterion4 ~ predictor,
  data = mydataBias[which(mydataBias$group == "female"),]),
  lty = 2,
  col = "red")
legend(
  "bottomright",
  c("Male","Female"),
  lty = c(1,2),
  pch = c(20,1),
  col = c("blue","red"))
Example of Intercept And Slope Bias in Prediction (Different Intercepts and Slopes Between Males and Females).

Figure 16.25: Example of Intercept And Slope Bias in Prediction (Different Intercepts and Slopes Between Males and Females).

16.8.5 Example of different measurement reliability/error across groups

In the example depicted in Figure 16.26, there are differences in the measurement reliability/error on the criterion between males and females: males’ scores have a lower reliability (higher measurement error) on the criterion than females’ scores. That is, we are more confident about a female’s level on the criterion given a particular score on the predictor than we are about a male’s level on the criterion given a particular score on the predictor.

Code
summary(lm(
  criterion5 ~ predictor + group + predictor:group,
  data = mydataBias))

Call:
lm(formula = criterion5 ~ predictor + group + predictor:group, 
    data = mydataBias)

Residuals:
     Min       1Q   Median       3Q      Max 
-109.295   -7.065    0.077    6.836  108.428 

Coefficients:
                      Estimate Std. Error t value Pr(>|t|)    
(Intercept)           -9.28386    3.32617  -2.791  0.00528 ** 
predictor              0.79172    0.03280  24.136  < 2e-16 ***
groupfemale            9.12452    4.70470   1.939  0.05252 .  
predictor:groupfemale -0.09083    0.04654  -1.952  0.05102 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 22.03 on 3920 degrees of freedom
  (76 observations deleted due to missingness)
Multiple R-squared:  0.2087,    Adjusted R-squared:  0.2081 
F-statistic: 344.6 on 3 and 3920 DF,  p-value: < 2.2e-16
Code
plot(
  criterion5 ~ predictor,
  data = mydataBias,
  xlim = c(
    0,
    max(c(mydataBias$criterion5,
          mydataBias$predictor),
        na.rm = TRUE)),
  ylim = c(
    0, 
    max(c(
      mydataBias$criterion5,
      mydataBias$predictor),
      na.rm = TRUE)),
  type = "n",
  xlab = "predictor",
  ylab = "criterion")
points(
  mydataBias$predictor[which(mydataBias$group == "male")],
  mydataBias$criterion5[which(mydataBias$group == "male")],
  pch = 20,
  col = "blue")
points(mydataBias$predictor[which(mydataBias$group == "female")],
       mydataBias$criterion5[which(mydataBias$group == "female")],
       pch = 1,
       col = "red")
abline(lm(criterion5 ~ predictor,
          data = mydataBias[which(mydataBias$group == "male"),]),
       lty = 1,
       col = "blue")
abline(lm(criterion5 ~ predictor,
          data = mydataBias[which(mydataBias$group == "female"),]),
       lty = 2,
       col = "red")
legend(
  "bottomright",
  c("Male","Female"),
  lty = c(1,2),
  pch = c(20,1),
  col = c("blue","red"))
Example of Different Measurement Reliability/Error Across Groups.

Figure 16.26: Example of Different Measurement Reliability/Error Across Groups.

16.9 Differential Item Functioning (DIF)

Differential item functioning (DIF) indicates that one or more items functions differently across groups. That is, one or more of the item parameters for an item differs across groups. For instance, the severity or discrimination of an item could be higher in one group compared to another group. If an item functions differently across groups, it can lead to biased scores for particular groups. Thus, when observing group differences in level on the measure or group differences in the measure’s association with other measures, it is unclear whether the observed group differences reflect true group differences or differences in the functioning of the measure across groups.

For instance, consider an item such as “disobedience to authority”, which is thought to reflect externalizing behavior in childhood. However, in adulthood, disobedience to authority could reflect prosocial functions, such as protesting against societally unjust actions, and may show weaker construct validity with respect to externalizing problems, as operationalized by a weaker discrimination coefficient in adulthood than in childhood. Tests of differential item functioning are equivalent to tests of measurement invariance.

Approaches to addressing DIF are described in Section 16.5.1. If we identify that an item shows non-negligible DIF, we have three primary options (described above): (1) drop the item for both groups, (2) drop the item for one group but keep it for the other group, or (3) freely estimate the parameters (discrimination and difficulty) for the item across groups.

Tests of DIF were conducted using the mirt package (Chalmers, 2020).

16.9.1 Item Descriptive Statistics

Code
itemstats(
  data = cnlsy[,c(
    "bpi_antisocialT1_1","bpi_antisocialT1_2","bpi_antisocialT1_3",
    "bpi_antisocialT1_4","bpi_antisocialT1_5","bpi_antisocialT1_6",
    "bpi_antisocialT1_7")],
  group = cnlsy$sex,
  ts.tables = TRUE)
$female
$female$overall
 N.complete    N mean_total.score sd_total.score ave.r  sd.r alpha
       1620 5641            2.554          2.132 0.255 0.083 0.681

$female$itemstats
                      N  mean    sd total.r total.r_if_rm alpha_if_rm
bpi_antisocialT1_1 1928 0.538 0.587   0.662         0.456       0.626
bpi_antisocialT1_2 1925 0.278 0.510   0.648         0.471       0.624
bpi_antisocialT1_3 1925 0.456 0.654   0.625         0.376       0.657
bpi_antisocialT1_4 1933 0.113 0.355   0.523         0.384       0.654
bpi_antisocialT1_5 1649 0.186 0.431   0.594         0.438       0.637
bpi_antisocialT1_6 1658 0.103 0.345   0.572         0.446       0.643
bpi_antisocialT1_7 1944 0.834 0.639   0.553         0.292       0.683

$female$proportions
                       0     1     2    NA
bpi_antisocialT1_1 0.174 0.151 0.016 0.658
bpi_antisocialT1_2 0.257 0.075 0.010 0.659
bpi_antisocialT1_3 0.216 0.094 0.031 0.659
bpi_antisocialT1_4 0.308 0.030 0.004 0.657
bpi_antisocialT1_5 0.243 0.044 0.005 0.708
bpi_antisocialT1_6 0.268 0.023 0.004 0.706
bpi_antisocialT1_7 0.104 0.194 0.046 0.655

$female$total.score_frequency
       0   1   2   3   4  5  6  7  8  9 10 11 12 13
Freq 217 364 349 283 166 91 67 38 12 10 11  4  4  4

$female$total.score_means
                          0        1        2
bpi_antisocialT1_1 1.354919 3.339237 7.216867
bpi_antisocialT1_2 1.846343 4.173789 8.192308
bpi_antisocialT1_3 1.593417 3.846682 5.406667
bpi_antisocialT1_4 2.204952 5.090278 9.045455
bpi_antisocialT1_5 2.010386 4.946721 7.892857
bpi_antisocialT1_6 2.189959 5.701613 9.227273
bpi_antisocialT1_7 1.004357 2.778970 4.746725

$female$total.score_sds
                          0        1        2
bpi_antisocialT1_1 1.262270 1.668474 2.763209
bpi_antisocialT1_2 1.480206 1.809814 2.664498
bpi_antisocialT1_3 1.329902 1.837884 2.783211
bpi_antisocialT1_4 1.743521 2.280698 2.802673
bpi_antisocialT1_5 1.566126 2.196431 3.258485
bpi_antisocialT1_6 1.684322 2.215906 2.844072
bpi_antisocialT1_7 1.232733 1.798814 2.475591


$male
$male$overall
 N.complete    N mean_total.score sd_total.score ave.r  sd.r alpha
       1645 5891             3.16          2.385 0.266 0.081 0.703

$male$itemstats
                      N  mean    sd total.r total.r_if_rm alpha_if_rm
bpi_antisocialT1_1 1948 0.615 0.585   0.634         0.449       0.660
bpi_antisocialT1_2 1947 0.372 0.559   0.655         0.485       0.650
bpi_antisocialT1_3 1949 0.520 0.668   0.582         0.346       0.692
bpi_antisocialT1_4 1948 0.233 0.486   0.596         0.441       0.666
bpi_antisocialT1_5 1676 0.367 0.547   0.627         0.454       0.662
bpi_antisocialT1_6 1688 0.177 0.446   0.556         0.406       0.675
bpi_antisocialT1_7 1962 0.834 0.646   0.582         0.357       0.687

$male$proportions
                       0     1     2    NA
bpi_antisocialT1_1 0.145 0.168 0.017 0.669
bpi_antisocialT1_2 0.220 0.097 0.013 0.669
bpi_antisocialT1_3 0.191 0.107 0.033 0.669
bpi_antisocialT1_4 0.263 0.058 0.010 0.669
bpi_antisocialT1_5 0.190 0.085 0.010 0.715
bpi_antisocialT1_6 0.243 0.035 0.008 0.713
bpi_antisocialT1_7 0.102 0.185 0.046 0.667

$male$total.score_frequency
       0   1   2   3   4   5   6  7  8  9 10 11 12 13 14
Freq 163 294 310 259 208 144 104 74 40 25  9  7  5  1  2

$male$total.score_means
                          0        1        2
bpi_antisocialT1_1 1.622478 3.944056 7.397849
bpi_antisocialT1_2 2.134191 4.767821 8.106061
bpi_antisocialT1_3 2.016754 4.400763 5.819277
bpi_antisocialT1_4 2.484091 5.539286 8.177778
bpi_antisocialT1_5 2.174111 4.817073 7.910714
bpi_antisocialT1_6 2.616762 5.810680 8.093023
bpi_antisocialT1_7 1.412500 3.375679 5.782787

$male$total.score_sds
                          0        1        2
bpi_antisocialT1_1 1.515237 1.982218 2.454542
bpi_antisocialT1_2 1.678482 1.955386 2.437709
bpi_antisocialT1_3 1.654589 2.069463 2.774973
bpi_antisocialT1_4 1.845772 2.119687 2.534210
bpi_antisocialT1_5 1.709715 2.047072 2.725481
bpi_antisocialT1_6 1.911288 2.181537 2.982600
bpi_antisocialT1_7 1.522500 1.906515 2.652344

16.9.2 Unconstrained Model

16.9.2.1 Fit Model

Code
unconstrainedModel <- multipleGroup(
  data = cnlsy[,c(
    "bpi_antisocialT1_1","bpi_antisocialT1_2","bpi_antisocialT1_3",
    "bpi_antisocialT1_4","bpi_antisocialT1_5","bpi_antisocialT1_6",
    "bpi_antisocialT1_7")],
  model = 1,
  group = cnlsy$sex,
  SE = TRUE)

16.9.2.2 Model Summary

Items that appear to differ:

  • items 5 and 6 are more discriminating (\(a\) parameter) among females than males
  • item 4 has higher difficulty/severity (\(b_1\) and \(b_2\) parameters) among males than females
Code
summary(unconstrainedModel)

----------
GROUP: female 
                      F1    h2
bpi_antisocialT1_1 0.698 0.487
bpi_antisocialT1_2 0.706 0.499
bpi_antisocialT1_3 0.532 0.283
bpi_antisocialT1_4 0.683 0.466
bpi_antisocialT1_5 0.763 0.582
bpi_antisocialT1_6 0.838 0.702
bpi_antisocialT1_7 0.429 0.184

SS loadings:  3.202 
Proportion Var:  0.457 

Factor correlations: 

   F1
F1  1

----------
GROUP: male 
                      F1    h2
bpi_antisocialT1_1 0.663 0.440
bpi_antisocialT1_2 0.736 0.542
bpi_antisocialT1_3 0.531 0.282
bpi_antisocialT1_4 0.713 0.508
bpi_antisocialT1_5 0.691 0.477
bpi_antisocialT1_6 0.702 0.492
bpi_antisocialT1_7 0.507 0.257

SS loadings:  2.999 
Proportion Var:  0.428 

Factor correlations: 

   F1
F1  1
Code
coef(unconstrainedModel,
     simplify = TRUE,
     IRTpars = TRUE)
$female
$items
                       a     b1    b2
bpi_antisocialT1_1 1.657  0.029 2.451
bpi_antisocialT1_2 1.697  0.953 2.763
bpi_antisocialT1_3 1.069  0.650 2.585
bpi_antisocialT1_4 1.590  1.895 3.465
bpi_antisocialT1_5 2.008  1.281 2.936
bpi_antisocialT1_6 2.614  1.625 2.770
bpi_antisocialT1_7 0.807 -1.184 2.583

$means
F1 
 0 

$cov
   F1
F1  1


$male
$items
                       a     b1    b2
bpi_antisocialT1_1 1.507 -0.238 2.505
bpi_antisocialT1_2 1.853  0.582 2.455
bpi_antisocialT1_3 1.067  0.384 2.480
bpi_antisocialT1_4 1.729  1.165 2.765
bpi_antisocialT1_5 1.625  0.635 2.757
bpi_antisocialT1_6 1.677  1.494 2.873
bpi_antisocialT1_7 1.002 -0.986 2.143

$means
F1 
 0 

$cov
   F1
F1  1

16.9.3 Constrained Model

Constrain item parameters to be equal across groups (to use as baseline model for identifying DIF).

16.9.3.1 Fit Model

Code
constrainedModel <- multipleGroup(
  data = cnlsy[,c(
    "bpi_antisocialT1_1","bpi_antisocialT1_2","bpi_antisocialT1_3",
    "bpi_antisocialT1_4","bpi_antisocialT1_5","bpi_antisocialT1_6",
    "bpi_antisocialT1_7")],
  model = 1,
  group = cnlsy$sex,
  invariance = c(c(
    "bpi_antisocialT1_1","bpi_antisocialT1_2","bpi_antisocialT1_3",
    "bpi_antisocialT1_4","bpi_antisocialT1_5","bpi_antisocialT1_6",
    "bpi_antisocialT1_7"),
    "free_means", "free_var"),
  SE = TRUE)

16.9.3.2 Model Summary

Code
summary(constrainedModel)

----------
GROUP: female 
                      F1    h2
bpi_antisocialT1_1 0.659 0.434
bpi_antisocialT1_2 0.709 0.503
bpi_antisocialT1_3 0.514 0.264
bpi_antisocialT1_4 0.704 0.495
bpi_antisocialT1_5 0.737 0.543
bpi_antisocialT1_6 0.768 0.590
bpi_antisocialT1_7 0.441 0.195

SS loadings:  3.024 
Proportion Var:  0.432 

Factor correlations: 

   F1
F1  1

----------
GROUP: male 
                      F1    h2
bpi_antisocialT1_1 0.669 0.447
bpi_antisocialT1_2 0.719 0.516
bpi_antisocialT1_3 0.524 0.275
bpi_antisocialT1_4 0.713 0.508
bpi_antisocialT1_5 0.745 0.556
bpi_antisocialT1_6 0.776 0.602
bpi_antisocialT1_7 0.451 0.203

SS loadings:  3.107 
Proportion Var:  0.444 

Factor correlations: 

   F1
F1  1
Code
coef(
  constrainedModel,
  simplify = TRUE,
  IRTpars = TRUE)
$female
$items
                       a     b1    b2
bpi_antisocialT1_1 1.492  0.089 2.791
bpi_antisocialT1_2 1.713  0.978 2.883
bpi_antisocialT1_3 1.020  0.733 2.836
bpi_antisocialT1_4 1.686  1.687 3.265
bpi_antisocialT1_5 1.854  1.138 3.014
bpi_antisocialT1_6 2.040  1.779 3.034
bpi_antisocialT1_7 0.837 -0.952 2.697

$means
F1 
 0 

$cov
   F1
F1  1


$male
$items
                       a     b1    b2
bpi_antisocialT1_1 1.492  0.089 2.791
bpi_antisocialT1_2 1.713  0.978 2.883
bpi_antisocialT1_3 1.020  0.733 2.836
bpi_antisocialT1_4 1.686  1.687 3.265
bpi_antisocialT1_5 1.854  1.138 3.014
bpi_antisocialT1_6 2.040  1.779 3.034
bpi_antisocialT1_7 0.837 -0.952 2.697

$means
  F1 
0.39 

$cov
      F1
F1 1.054

16.9.4 Compare model fit of constrained model to unconstrained model

The constrained model and the unconstrained model are considered “nested” models. The constrained model is nested within the unconstrained model because the unconstrained model includes all of the terms of the constrained model along with additional terms. Model fit of nested models can be compared with a chi-square difference test.

Code
anova(constrainedModel, unconstrainedModel)

The constrained model fits significantly worse than the unconstrained model, which suggests that item parameters differ between males and females.

16.9.5 Identify DIF by iteratively removing constraints from fully constrained model

One way to identify DIF is to iteratively remove constraints from a model in which all parameters are constrained to be the same across groups. Removing constraints allows individual items to have different item parameters across groups, to identify which items yield a significant improvement in model fit when allowing their discrimination and/or severity to differ across groups. Items 1, 3, 4, 5, 6, and 7 showed DIF in discrimination and/or severity, based on a significant chi-square difference test.

16.9.5.1 Items that differ in discrimination and/or severity

Code
difDropItems <- DIF(
  constrainedModel,
  c("a1","d1","d2"),
  scheme = "drop",
  simplify = TRUE)

difDropItems

16.9.5.2 Items that differ in discrimination

Items 1, 3, 4, and 5 showed DIF in discrimination, based on a significant chi-square difference test.

Code
difDropItemsDiscrimination <- DIF(
  constrainedModel,
  c("a1"),
  scheme = "drop",
  simplify = TRUE)

difDropItemsDiscrimination

16.9.5.3 Items that differ in severity

Items 1, 3, 4, 5, and 7 showed DIF in difficulty/severity, based on a significant chi-square difference test.

Code
difDropItemsSeverity <- DIF(
  constrainedModel,
  c("d1","d2"),
  scheme = "drop",
  simplify = TRUE)

difDropItemsSeverity

16.9.6 Identify DIF by iteratively adding constraints to unconstrained model

16.9.6.1 Items that differ in discrimination and/or severity

Another way to identify DIF is to iteratively add constraints to a model in which all parameters are allowed to differ across groups. Adding constraints forces individual items to have the same item parameters across groups, to identify which items yield a significant worsening in model fit when constraining their discrimination and/or severity to be the same across groups. Items 1, 2, 3, 4, 5, and 6 showed DIF in discrimination and/or severity, based on a significant chi-square difference test. The DIF in discrimination and/or severity is depicted in Figure 16.27.

Code
difAddItems <- DIF(
  unconstrainedModel,
  c("a1","d1","d2"),
  scheme = "add",
  simplify = TRUE,
  plotdif = TRUE)
Item Probability Functions for Examining Differential Item Functioning in Terms of Discrimination and/or Severity.

Figure 16.27: Item Probability Functions for Examining Differential Item Functioning in Terms of Discrimination and/or Severity.

Code
difAddItems

16.9.6.2 Items that differ in discrimination

Items 6 and 7 showed DIF in discrimination, based on a significant chi-square difference test. The DIF in discrimination is depicted in Figure 16.28.

Code
difAddItemsDiscrimination <- DIF(
  unconstrainedModel,
  c("a1"),
  scheme = "add",
  simplify = TRUE,
  plotdif = TRUE)
Item Probability Functions for Examining Differential Item Functioning in Terms of Discrimination.

Figure 16.28: Item Probability Functions for Examining Differential Item Functioning in Terms of Discrimination.

Code
difAddItemsDiscrimination

16.9.6.3 Items that differ in severity

Items 1, 2, 3, 4, 5, and 6 showed DIF in difficulty/severity, based on a significant chi-square difference test. The DIF in difficulty/severity is depicted in Figure 16.29

Code
difAddItemsSeverity <- DIF(
  unconstrainedModel,
  c("d1","d2"),
  scheme = "add",
  simplify = TRUE,
  plotdif = TRUE)
Item Probability Functions for Examining Differential Item Functioning in Terms of Severity.

Figure 16.29: Item Probability Functions for Examining Differential Item Functioning in Terms of Severity.

Code
difAddItemsSeverity

16.9.7 Compute effect size of DIF

Effect size measures of DIF were computed based on expected scores (Meade, 2010) using the mirt package (Chalmers, 2020). Some researchers recommend using a simulation-based procedure to examine the impact of DIF on screening accuracy (Gonzalez & Pelham, 2021).

16.9.7.1 Test-level DIF

In addition to consideration of differential test functioning, we can also consider whether the test as a whole (i.e., the collection of items) differs in its functioning by group—called differential test functioning. Differential test functioning is depicted in Figure 16.30. Estimates of differential test functioning are in Table ??.

Code
empirical_ES(
  unconstrainedModel,
  DIF = FALSE)

In general, the measure showed greater difficulty for females than for males. That is, at a given construct level, males were more likely than females to endorse the items. Otherwise said, it takes a higher construct level for females to obtain the same score on the measure as males.

Code
empirical_ES(
  unconstrainedModel,
  DIF = FALSE,
  plot = TRUE)
Differential Test Functioning by Sex.

Figure 16.30: Differential Test Functioning by Sex.

16.9.7.2 Item-level DIF

Differential item functioning is depicted in Figure 16.31. Estimates of differential item functioning are in Table ??.

Code
empirical_ES(
  unconstrainedModel,
  DIF = TRUE) %>% 
  round(., 2)

The measure-level differences in functioning appear to be largely driven by greater difficulty of items 4 and 5 for females than males. That is, at a given construct level, males were more likely than females to endorse items 4 and 5, in particular. Otherwise said, it takes a higher construct level for females than males to endorse items 4 and 5.

Code
empirical_ES(
  unconstrainedModel,
  DIF = TRUE,
  plot = TRUE)
Differential Item Functioning by Sex.

Figure 16.31: Differential Item Functioning by Sex.

16.9.8 Item plots

Plots of item response category characteristic curves by sex are in Figures 16.3216.38 below.

Code
itemplot(
  unconstrainedModel,
  "bpi_antisocialT1_1",
  type = "trace",
  par.settings = standard.theme("pdf", color = FALSE))
Item Response Category Characteristic Curves by Sex: Item 1.

Figure 16.32: Item Response Category Characteristic Curves by Sex: Item 1.

Code
itemplot(
  unconstrainedModel,
  "bpi_antisocialT1_2",
  type = "trace",
  par.settings = standard.theme("pdf", color = FALSE))
Item Response Category Characteristic Curves by Sex: Item 2.

Figure 16.33: Item Response Category Characteristic Curves by Sex: Item 2.

Code
itemplot(
  unconstrainedModel,
  "bpi_antisocialT1_3",
  type = "trace",
  par.settings = standard.theme("pdf", color = FALSE))
Item Response Category Characteristic Curves by Sex: Item 3.

Figure 16.34: Item Response Category Characteristic Curves by Sex: Item 3.

Code
itemplot(
  unconstrainedModel,
  "bpi_antisocialT1_4",
  type = "trace",
  par.settings = standard.theme("pdf", color = FALSE))
Item Response Category Characteristic Curves by Sex: Item 4.

Figure 16.35: Item Response Category Characteristic Curves by Sex: Item 4.

Code
itemplot(
  unconstrainedModel,
  "bpi_antisocialT1_5",
  type = "trace",
  par.settings = standard.theme("pdf", color = FALSE))
Item Response Category Characteristic Curves by Sex: Item 5.

Figure 16.36: Item Response Category Characteristic Curves by Sex: Item 5.

Code
itemplot(
  unconstrainedModel,
  "bpi_antisocialT1_6",
  type = "trace",
  par.settings = standard.theme("pdf", color = FALSE))
Item Response Category Characteristic Curves by Sex: Item 6.

Figure 16.37: Item Response Category Characteristic Curves by Sex: Item 6.

Code
itemplot(
  unconstrainedModel,
  "bpi_antisocialT1_7",
  type = "trace",
  par.settings = standard.theme("pdf", color = FALSE))
Item Response Category Characteristic Curves by Sex: Item 7.

Figure 16.38: Item Response Category Characteristic Curves by Sex: Item 7.

Plots of item information functions by sex are in Figures 16.3916.45 below. Items 1, 5, and 6 showed greater information for females than for males. Items 2, 4, and 7 showed greater information for females than for males.

Code
itemplot(
  unconstrainedModel,
  "bpi_antisocialT1_1",
  type = "info",
  par.settings = standard.theme("pdf", color = FALSE))
Item Information Curves by Sex: Item 1.

Figure 16.39: Item Information Curves by Sex: Item 1.

Code
itemplot(
  unconstrainedModel,
  "bpi_antisocialT1_2",
  type = "info",
  par.settings = standard.theme("pdf", color = FALSE))
Item Information Curves by Sex: Item 2.

Figure 16.40: Item Information Curves by Sex: Item 2.

Code
itemplot(
  unconstrainedModel,
  "bpi_antisocialT1_3",
  type = "info",
  par.settings = standard.theme("pdf", color = FALSE))
Item Information Curves by Sex: Item 3.

Figure 16.41: Item Information Curves by Sex: Item 3.

Code
itemplot(
  unconstrainedModel,
  "bpi_antisocialT1_4",
  type = "info",
  par.settings = standard.theme("pdf", color = FALSE))
Item Information Curves by Sex: Item 4.

Figure 16.42: Item Information Curves by Sex: Item 4.

Code
itemplot(
  unconstrainedModel,
  "bpi_antisocialT1_5",
  type = "info",
  par.settings = standard.theme("pdf", color = FALSE))
Item Information Curves by Sex: Item 5.

Figure 16.43: Item Information Curves by Sex: Item 5.

Code
itemplot(
  unconstrainedModel,
  "bpi_antisocialT1_6",
  type = "info",
  par.settings = standard.theme("pdf", color = FALSE))
Item Information Curves by Sex: Item 6.

Figure 16.44: Item Information Curves by Sex: Item 6.

Code
itemplot(
  unconstrainedModel,
  "bpi_antisocialT1_7",
  type = "info",
  par.settings = standard.theme("pdf", color = FALSE))
Item Information Curves by Sex: Item 7.

Figure 16.45: Item Information Curves by Sex: Item 7.

Plots of expected item scores by sex are are in Figures 16.4616.52 below.

Code
itemplot(
  unconstrainedModel,
  "bpi_antisocialT1_1",
  type = "score",
  par.settings = standard.theme("pdf", color = FALSE))
Expected Item Score by Sex: Item 1.

Figure 16.46: Expected Item Score by Sex: Item 1.

Code
itemplot(
  unconstrainedModel,
  "bpi_antisocialT1_2",
  type = "score",
  par.settings = standard.theme("pdf", color = FALSE))
Expected Item Score by Sex: Item 2.

Figure 16.47: Expected Item Score by Sex: Item 2.

Code
itemplot(
  unconstrainedModel,
  "bpi_antisocialT1_3",
  type = "score",
  par.settings = standard.theme("pdf", color = FALSE))
Expected Item Score by Sex: Item 3.

Figure 16.48: Expected Item Score by Sex: Item 3.

Code
itemplot(
  unconstrainedModel,
  "bpi_antisocialT1_4",
  type = "score",
  par.settings = standard.theme("pdf", color = FALSE))
Expected Item Score by Sex: Item 4.

Figure 16.49: Expected Item Score by Sex: Item 4.

Code
itemplot(
  unconstrainedModel,
  "bpi_antisocialT1_5",
  type = "score",
  par.settings = standard.theme("pdf", color = FALSE))
Expected Item Score by Sex: Item 5.

Figure 16.50: Expected Item Score by Sex: Item 5.

Code
itemplot(
  unconstrainedModel,
  "bpi_antisocialT1_6",
  type = "score",
  par.settings = standard.theme("pdf", color = FALSE))
Expected Item Score by Sex: Item 6.

Figure 16.51: Expected Item Score by Sex: Item 6.

Code
itemplot(
  unconstrainedModel,
  "bpi_antisocialT1_7",
  type = "score",
  par.settings = standard.theme("pdf", color = FALSE))
Expected Item Score by Sex: Item 7.

Figure 16.52: Expected Item Score by Sex: Item 7.

16.9.9 Addressing DIF

Based on the analyses above, item 5 shows the largest magnitude of DIF. Specifically, item 5 has a stronger discrimination parameter for women than men. So, we should handle item 5 first. If we deem the DIF for item 5 to be non-negligible, we have three primary options: (1) drop item 5 for both men and women, (2) drop item 5 for men but keep it for women, or (3) freely estimate the parameters (discrimination and difficulty) for item 5 across groups.

16.9.9.1 Drop item for both groups

The first option is to drop item 5 for both men and women groups. You might do this if you want to use a measure that has only those items that function equivalently across groups. However, this can lead to lower reliability and weaker ability to detect individual differences.

Code
constrainedModelDropItem5 <- multipleGroup(
  data = cnlsy[,c(
    "bpi_antisocialT1_1","bpi_antisocialT1_2","bpi_antisocialT1_3",
    "bpi_antisocialT1_4","bpi_antisocialT1_6","bpi_antisocialT1_7")],
  model = 1,
  group = cnlsy$sex,
  invariance = c(c(
    "bpi_antisocialT1_1","bpi_antisocialT1_2","bpi_antisocialT1_3",
    "bpi_antisocialT1_4","bpi_antisocialT1_6","bpi_antisocialT1_7"),
    "free_means", "free_var"),
  SE = TRUE)

coef(constrainedModelDropItem5, simplify = TRUE)

16.9.9.2 Drop item for one group but not another group

A second option is to drop item 5 for men but to keep it for women. You might do this if the item is invalid for men, but still valid for women. The coefficients show an item parameter for item 5 for men, but it was constrained to be equal to the parameter for women, and data were removed from the estimation of item 5 for men.

Code
dropItem5ForMen <- cnlsy
dropItem5ForMen[which(
  dropItem5ForMen$sex == "male"),
  "bpi_antisocialT1_5"] <- NA

constrainedModelDropItem5ForMen <- multipleGroup(
  data = dropItem5ForMen[,c(
    "bpi_antisocialT1_1","bpi_antisocialT1_2","bpi_antisocialT1_3",
    "bpi_antisocialT1_4","bpi_antisocialT1_5","bpi_antisocialT1_6",
    "bpi_antisocialT1_7")],
  model = 1,
  group = dropItem5ForMen$sex,
  invariance = c(c(
    "bpi_antisocialT1_1","bpi_antisocialT1_2","bpi_antisocialT1_3",
    "bpi_antisocialT1_4","bpi_antisocialT1_5","bpi_antisocialT1_6",
    "bpi_antisocialT1_7"),
    "free_means", "free_var"),
  SE = TRUE)

coef(constrainedModelDropItem5ForMen, simplify = TRUE)

16.9.9.3 Freely estimate item to have different parameters across groups

Alternatively, we can resolve DIF by allowing the item to have different parameters (discrimination and difficulty) across both groups. You might do this if the item is valid for both men and women, and it has importantly different item parameters nonetheless.

Code
constrainedModelResolveItem5 <- multipleGroup(
  data = cnlsy[,c(
    "bpi_antisocialT1_1","bpi_antisocialT1_2","bpi_antisocialT1_3",
    "bpi_antisocialT1_4","bpi_antisocialT1_5","bpi_antisocialT1_6",
    "bpi_antisocialT1_7")],
  model = 1,
  group = cnlsy$sex,
  invariance = c(c(
    "bpi_antisocialT1_1","bpi_antisocialT1_2","bpi_antisocialT1_3",
    "bpi_antisocialT1_4","bpi_antisocialT1_6","bpi_antisocialT1_7"),
    "free_means", "free_var"),
  SE = TRUE)

coef(constrainedModelResolveItem5, simplify = TRUE)

16.9.9.4 Compare model fit

We can compare the model fit against the fully constrained model. In this case, all three of these approaches for handling DIF resulted in improvement in model fit based on AIC and, for the two nested models, the chi-square difference test.

Code
anova(constrainedModel, constrainedModelDropItem5)
Code
anova(constrainedModel, constrainedModelDropItem5ForMen)
Code
anova(constrainedModel, constrainedModelResolveItem5)

16.9.9.5 Next steps

Whichever of these approaches we select, we would then identify and handle the remaining non-negligent DIF sequentially by magnitude. For instance, if we chose to resolve DIF in item 5 by allowing the item to have different parameters across both groups, we would then iteratively drop constraints to see which items continue to show non-negligible DIF.

Code
difDropItemsNew <- DIF(
  constrainedModelResolveItem5,
  c("a1","d1","d2"),
  scheme = "drop",
  simplify = TRUE)
Table 16.1: Differential Item Functioning After Resolving DIF in Item 5.
groups converged AIC SABIC HQ BIC X2 df p
bpi_antisocialT1_1 female,male TRUE -5.3749227 3.9116912 1.3031195 13.4443213 11.3749227 3 0.0098620
bpi_antisocialT1_2 female,male TRUE 3.9053742 13.1919881 10.5834164 22.7246182 2.0946258 3 0.5530008
bpi_antisocialT1_3 female,male TRUE 1.5024730 10.7890869 8.1805152 20.3217170 4.4975270 3 0.2125110
bpi_antisocialT1_4 female,male TRUE -26.4516356 -17.1650217 -19.7735934 -7.6323916 32.4516356 3 0.0000004
bpi_antisocialT1_5 female,male TRUE 0.0008518 0.0008518 0.0008518 0.0008518 -0.0008518 0 NaN
bpi_antisocialT1_6 female,male TRUE -9.9119308 -0.6253169 -3.2338886 8.9073132 15.9119308 3 0.0011821
bpi_antisocialT1_7 female,male TRUE -16.5525142 -7.2659003 -9.8744720 2.2667298 22.5525142 3 0.0000501

Estimates of DIF after resolving item 5 are in Table 16.1. Of the remaining DIF, item 4 appears to have the largest DIF. If we deem item 4 to show non-negligible DIF, we would address it using one of the three approaches above and then see which items continue to show the largest DIF, address it if necessary, then identify the remaining DIF, etc.

16.10 Measurement/Factorial Invariance

Before making comparisons across groups in terms of associations between constructs or their level on the construct, it is important to establish measurement invariance (also called factorial invariance) across the groups (Millsap, 2011). Tests of measurement invariance are equivalent to tests of differential item functioning. Measurement invariance can help provide greater confidence that the measure functions equivalently across groups, that is, the items have the same strength of association with the latent factor, and the latent factor is on the same metric (i.e., the items have the same level when accounting for the latent factor). For instance, If you observe differences between groups in level of depression without establishing measurement invariance across the groups, you do not know whether the differences observed reflect true group-related differences in depression or differences in the functioning of the measure across the groups.

Measurement invariance can be tested using many methods, including confirmatory factor analysis and IRT. CFA approaches to testing measurement invariance include multi-group CFA (MGCFA), multiple-indicator, multiple-causes (MIMIC) models, moderated nonlinear factor analysis (MNLFA), score-based tests (T. Wang et al., 2014), and the alignment method (Lai, 2021). The IRT approach to testing measurement (non-)invariance is to examine whether items show differential item functioning.

The tests of measurement invariance in confirmatory factor analysis (CFA) models were fit in the lavaan package (Rosseel et al., 2022). The examples were adapted from lavaan documentation: https://lavaan.ugent.be/tutorial/groups.html (archived at https://perma.cc/2FBK-RSAH). Procedures for testing measurement invariance are outlined in Putnick & Bornstein (2016). In addition to nested chi-square difference tests (\(\chi^2_{\text{diff}}\)), I also demonstrate permutation procedures for testing measurement invariance, as described by Jorgensen et al. (2018). Also, you are encouraged to read about MIMIC models (Cheng et al., 2016; W.-C. Wang et al., 2009), score-based tests (T. Wang et al., 2014), and MNLFA that allows for testing measurement invariance across continuous moderators (Bauer et al., 2020; Curran et al., 2014; N. C. Gottfredson et al., 2019). MNLFA is implemented in the mnlfa package (Robitzsch, 2019). You can also generate syntax for conducting MNLFA in Mplus software (Muthén & Muthén, 2019) using the aMNLFA package (V. Cole et al., 2018). The alignment method allows many groups to be compared (Han et al., 2019).

Model fit of nested models can be compared with a chi-square difference test, as a way of testing measurement invariance. In this approach to testing measurement invariance, first model fit is evaluated in a configural invariance model, in which the same number of factors is specified in each group, and which indicators load on which factors are the same in each group. Then, successive constraints are made across groups, including constraining factor loadings, intercepts, and residuals across groups. The metric (“weak factorial”) invariance model is similar to the configural invariance model, but it constrains the factor loadings to be the same across groups. The scalar (“strong factorial”) invariance model keeps the constraints of the metric invariance model, but it also constrains the intercepts to be the same across groups. The residual (“strict factorial”) invariance model keeps the constraints of the scalar invariance model, but it also constrains the residuals to be the same across groups. The fit of each constrained model is compared to the previous model without such constraints. That is, the fit of the metric invariance model is compared to the fit of the configural invariance model, the fit of the scalar invariance model is compared to the fit of the metric invariance model, and the fit of the residual invariance model is compared to the fit of the scalar invariance model.

The chi-square difference test is sensitive to sample size, and trivial differences in fit can be detected with large samples (Cheung & Rensvold, 2002). As a result, researchers also recommend examining change in additional criteria, including CFI, RMSEA, and SRMR (F. F. Chen, 2007). Cheung & Rensvold (2002) recommend a cutoff of \(\Delta \text{CFI} \geq -.01\) for identifying measurement non-invariance. F. F. Chen (2007) recommends the following cutoffs for identifying measurement non-invariance:

  • with a small sample size (total \(N \leq 300\)):
    • testing invariance of factor loadings:
      • \(\Delta \text{CFI} \geq -.005\)
      • supplemented by \(\Delta \text{RMSEA} \geq .010\) or \(\Delta \text{SRMR} \geq .025\)
    • testing invariance of intercepts or residuals:
      • \(\Delta \text{CFI} \geq -.005\)
      • supplemented by \(\Delta \text{RMSEA} \geq .010\) or \(\Delta \text{SRMR} \geq .005\)
  • with an adequate sample size (total \(N > 300\)):
    • testing invariance of factor loadings:
      • \(\Delta \text{CFI} \geq -.010\)
      • supplemented by \(\Delta \text{RMSEA} \geq .015\) or \(\Delta \text{SRMR} \geq .030\)
    • testing invariance of intercepts or residuals:
      • \(\Delta \text{CFI} \geq -.005\)
      • supplemented by \(\Delta \text{RMSEA} \geq .015\) or \(\Delta \text{SRMR} \geq .010\)

Little et al. (2007) suggested that researchers establish at least partial invariance of factor loadings (metric invariance) to compare covariances across groups, and at least partial invariance of intercepts (scalar invariance) to compare mean levels across groups. Partial invariance refers to invariance with some but not all indicators. So, to examine associations with other variables, invariance of at least some factor loadings would be preferable. To examine differences in level or growth, invariance of at least some factor loadings and intercepts would be preferable. Residual invariance is considered overly restrictive, and it is not generally expected that one establish residual invariance (Little, 2013).

Although measurement invariance is important to test, it is also worth noting that tests of measurement invariance and DIF rest on various fundamentally untestable assumptions related to scale setting (Raykov et al., 2020). For instance, to give the latent factor units, oftentimes researchers set one item’s factor loading to be equal across time (i.e., the marker variable). Results of factorial invariance tests can depend highly on which item is used as the anchor item for setting the scale of the latent factor (Belzak & Bauer, 2020). And subsequent tests of measurement invariance then rest on the fundamentally untestable assumption that changes in the level of the latent factor precipitates the same amount of change in the item across groups. If this scaling is not proper, however, it could lead to improper conclusions. Researchers are, therefore, not conclusively able to establish measurement invariance.

Several possible approaches may help address this. First, it can be helpful to place greater focus on degree of measurement invariance and confidence (rather than presence versus absence of measurement invariance). To the extent that non-invariance is trivial in effect size, it provides the researcher with greater confidence that they can use the measure to assess a construct in a comparable way over time. A number of studies describe how to test the effect size of measurement invariance (Gunn et al., 2020; e.g., Liu et al., 2017) or differential item functioning (Gonzalez & Pelham, 2021; e.g., Meade, 2010). Second, there may be important robustness checks. I describe each in greater detail below. One important sensitivity analysis could be to see if measurement invariance holds when using different marker variables. This would provide greater evidence that their apparent measurement invariance was not specific to one marker variable (i.e., one particular set of assumptions). A second robustness check would be to use effects coding, in which the average of items’ factor loadings is equal to 1, so the metric of the latent variable is on the metric of all of the items rather than just one item (Little et al., 2006). A third robustness check would be to use regularization, which is an alternative method to select the anchor items and to identify differential item functioning based on a machine learning technique that applies penalization to remove parameters that have little impact on model fit (Bauer et al., 2020; Belzak & Bauer, 2020).

Another issue with measurement invariance is that the null hypothesis that adding constraints across groups will not worsen fit is likely always false, because there are often at least slight differences across groups in factor loadings, intercepts, or residuals that reflect sampling variability. However, we are not interested in trivial differences across groups that reflect sampling variability. Instead, we are interested in the extent to which there are substantive, meaningful differences across the groups of large enough magnitude to be practically meaningful. Thus, it can also be helpful to consider whether the measures show approximate measurement invariance (Van De Schoot et al., 2015). For details on how to test approximate measurement invariance, see Van De Schoot et al. (2013).

When detecting measurement non-invariance in a given parameter (e.g., factor loadings, intercepts, or residuals), one can identify the specific items that show measurement non-invariance in one of two primary ways: (1) starting with a model that allows the given parameter to differ across groups, a researcher can iteratively add constraints to identify the item(s) for which measurement invariance fails, or (2) starting with a model that constrains the given parameter to be the same across groups, a researcher can iteratively remove constraints to identify the item(s) for which measurement invariance becomes established (and by process of elimination, the items for which measurement invariance does not become established).

Approaches to addressing measurement non-invariance are described in Section 16.5.1. If we identify that an item shows non-negligible non-invariance, we have three primary options (described above): (1) drop the item for both groups, (2) drop the item for one group but keep it for the other group, or (3) freely estimate the parameters (factor loadings and/or intercepts) for the item across groups.

Using the traditional chi-square difference test, tests of measurement invariance compare the model fit to a model with perfect fit. As the sample size grows larger, smaller differences in model fit will be detected as significant, thus rejecting measurement invariance even if the model fits well. So, it can also be useful to compare the model fit to a null hypothesis that the fit is poor, instead of the null hypothesis that the model is a perfect fit. Chi-square equivalence tests use the poor model fit as the null hypothesis. Thus, a significant chi-square equivalence test suggests that the equality constraints are plausible, providing support for the alternative hypothesis that the model fit is acceptable (Counsell et al., 2020).

16.10.1 Specify Models

16.10.1.1 Null Model

Fix residual variances and intercepts of manifest variables to be equal across groups.

Code
nullModel <- '
#Fix residual variances of manifest variables to be equal across groups
x1 ~~ c(psi1, psi1)*x1
x2 ~~ c(psi2, psi2)*x2
x3 ~~ c(psi3, psi3)*x3
x4 ~~ c(psi4, psi4)*x4
x5 ~~ c(psi5, psi5)*x5
x6 ~~ c(psi6, psi6)*x6
x7 ~~ c(psi7, psi7)*x7
x8 ~~ c(psi8, psi8)*x8
x9 ~~ c(psi9, psi9)*x9

#Fix intercepts of manifest variables to be equal across groups
x1 ~ c(tau1, tau1)*1
x2 ~ c(tau2, tau2)*1
x3 ~ c(tau3, tau3)*1
x4 ~ c(tau4, tau4)*1
x5 ~ c(tau5, tau5)*1
x6 ~ c(tau6, tau6)*1
x7 ~ c(tau7, tau7)*1
x8 ~ c(tau8, tau8)*1
x9 ~ c(tau9, tau9)*1
'

16.10.1.2 CFA Model

Code
cfaModel <- '
 #Factor loadings
 visual  =~ x1 + x2 + x3
 textual =~ x4 + x5 + x6
 speed   =~ x7 + x8 + x9
'

16.10.1.3 Configural Invariance

Specify the same number of factors in each group, and which indicators load on which factors are the same in each group.

Code
cfaModel_configuralInvariance <- '
 #Factor loadings (free the factor loading of the first indicator)
 visual  =~ NA*x1 + x2 + x3
 textual =~ NA*x4 + x5 + x6
 speed   =~ NA*x7 + x8 + x9
 
 #Fix latent means to zero
 visual ~ 0
 textual ~ 0
 speed ~ 0
 
 #Fix latent variances to one
 visual ~~ 1*visual
 textual ~~ 1*textual
 speed ~~ 1*speed
 
 #Estimate covariances among latent variables
 visual ~~ textual
 visual ~~ speed
 textual ~~ speed
 
 #Estimate residual variances of manifest variables
 x1 ~~ x1
 x2 ~~ x2
 x3 ~~ x3
 x4 ~~ x4
 x5 ~~ x5
 x6 ~~ x6
 x7 ~~ x7
 x8 ~~ x8
 x9 ~~ x9
 
 #Free intercepts of manifest variables
 x1 ~ NA*1
 x2 ~ NA*1
 x3 ~ NA*1
 x4 ~ NA*1
 x5 ~ NA*1
 x6 ~ NA*1
 x7 ~ NA*1
 x8 ~ NA*1
 x9 ~ NA*1
'

16.10.1.4 Metric (“Weak Factorial”) Invariance

Specify invariance of factor loadings across groups.

Code
cfaModel_metricInvariance <- '
 #Fix factor loadings to be the same across groups
 visual  =~ c(lambdax1,lambdax1)*x1 + c(lambdax2,lambdax2)*x2 + c(lambdax3,lambdax3)*x3
 textual =~ c(lambdax4,lambdax4)*x4 + c(lambdax5,lambdax5)*x5 + c(lambdax6,lambdax6)*x6
 speed   =~ c(lambdax7,lambdax7)*x7 + c(lambdax8,lambdax8)*x8 + c(lambdax9,lambdax9)*x9
 
 #Fix latent means to zero
 visual ~ 0
 textual ~ 0
 speed ~ 0
 
 #Fix latent variances to one in group 1; free latent variances in group 2
 visual ~~ c(1, NA)*visual
 textual ~~ c(1, NA)*textual
 speed ~~ c(1, NA)*speed
 
 #Estimate covariances among latent variables
 visual ~~ textual
 visual ~~ speed
 textual ~~ speed
 
 #Estimate residual variances of manifest variables
 x1 ~~ x1
 x2 ~~ x2
 x3 ~~ x3
 x4 ~~ x4
 x5 ~~ x5
 x6 ~~ x6
 x7 ~~ x7
 x8 ~~ x8
 x9 ~~ x9
 
 #Free intercepts of manifest variables
 x1 ~ NA*1
 x2 ~ NA*1
 x3 ~ NA*1
 x4 ~ NA*1
 x5 ~ NA*1
 x6 ~ NA*1
 x7 ~ NA*1
 x8 ~ NA*1
 x9 ~ NA*1
'

16.10.1.5 Scalar (“Strong Factorial”) Invariance

Specify invariance of factor loadings and intercepts across groups.

Code
cfaModel_scalarInvariance <- '
 #Fix factor loadings to be the same across groups
 visual  =~ c(lambdax1,lambdax1)*x1 + c(lambdax2,lambdax2)*x2 + c(lambdax3,lambdax3)*x3
 textual =~ c(lambdax4,lambdax4)*x4 + c(lambdax5,lambdax5)*x5 + c(lambdax6,lambdax6)*x6
 speed   =~ c(lambdax7,lambdax7)*x7 + c(lambdax8,lambdax8)*x8 + c(lambdax9,lambdax9)*x9
 
 #Fix latent means to zero in group 1; free latent means in group 2
 visual ~ c(0, NA)*1
 textual ~ c(0, NA)*1
 speed ~ c(0, NA)*1
 
 #Fix latent variances to one in group 1; free latent variances in group 2
 visual ~~ c(1, NA)*visual
 textual ~~ c(1, NA)*textual
 speed ~~ c(1, NA)*speed
 
 #Estimate covariances among latent variables
 visual ~~ textual
 visual ~~ speed
 textual ~~ speed
 
 #Estimate residual variances of manifest variables
 x1 ~~ x1
 x2 ~~ x2
 x3 ~~ x3
 x4 ~~ x4
 x5 ~~ x5
 x6 ~~ x6
 x7 ~~ x7
 x8 ~~ x8
 x9 ~~ x9
 
 #Fix intercepts of manifest variables across groups
 x1 ~ c(intx1, intx1)*1
 x2 ~ c(intx2, intx2)*1
 x3 ~ c(intx3, intx3)*1
 x4 ~ c(intx4, intx4)*1
 x5 ~ c(intx5, intx5)*1
 x6 ~ c(intx6, intx6)*1
 x7 ~ c(intx7, intx7)*1
 x8 ~ c(intx8, intx8)*1
 x9 ~ c(intx9, intx9)*1
'

16.10.1.6 Residual (“Strict Factorial”) Invariance

Specify invariance of factor loadings, intercepts, and residuals across groups.

Code
cfaModel_residualInvariance <- '
 #Fix factor loadings to be the same across groups
 visual  =~ c(lambdax1,lambdax1)*x1 + c(lambdax2,lambdax2)*x2 + c(lambdax3,lambdax3)*x3
 textual =~ c(lambdax4,lambdax4)*x4 + c(lambdax5,lambdax5)*x5 + c(lambdax6,lambdax6)*x6
 speed   =~ c(lambdax7,lambdax7)*x7 + c(lambdax8,lambdax8)*x8 + c(lambdax9,lambdax9)*x9
 
 #Fix latent means to zero
 visual ~ c(0, NA)*1
 textual ~ c(0, NA)*1
 speed ~ c(0, NA)*1
 
 #Fix latent variances to one in group 1; free latent variances in group 2
 visual ~~ c(1, NA)*visual
 textual ~~ c(1, NA)*textual
 speed ~~ c(1, NA)*speed
 
 #Estimate covariances among latent variables
 visual ~~ textual
 visual ~~ speed
 textual ~~ speed
 
 #Fix residual variances of manifest variables across groups
 x1 ~~ c(residx1, residx1)*x1
 x2 ~~ c(residx2, residx2)*x2
 x3 ~~ c(residx3, residx3)*x3
 x4 ~~ c(residx4, residx4)*x4
 x5 ~~ c(residx5, residx5)*x5
 x6 ~~ c(residx6, residx6)*x6
 x7 ~~ c(residx7, residx7)*x7
 x8 ~~ c(residx8, residx8)*x8
 x9 ~~ c(residx9, residx9)*x9
 
 #Fix intercepts of manifest variables across groups
 x1 ~ c(intx1, intx1)*1
 x2 ~ c(intx2, intx2)*1
 x3 ~ c(intx3, intx3)*1
 x4 ~ c(intx4, intx4)*1
 x5 ~ c(intx5, intx5)*1
 x6 ~ c(intx6, intx6)*1
 x7 ~ c(intx7, intx7)*1
 x8 ~ c(intx8, intx8)*1
 x9 ~ c(intx9, intx9)*1
'

16.10.2 Specify the Fit Indices of Interest

Code
myAFIs <- c("chisq", "chisq.scaled",
            "rmsea", "cfi", "tli", "srmr",
            "rmsea.robust", "cfi.robust", "tli.robust")
moreAFIs <- NULL # c("gammaHat","gammaHat.scaled")

16.10.3 Null Model

16.10.3.1 Fit the Model

Code
nullModelFit <- lavaan(
  nullModel,
  data = HolzingerSwineford1939,
  group = "school",
  missing = "ML",
  estimator = "MLR")

16.10.3.2 Model Summary

Code
summary(
  nullModelFit,
  fit.measures = TRUE,
  standardized = TRUE,
  rsquare = TRUE)
lavaan 0.6.17 ended normally after 42 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        36
  Number of equality constraints                    18

  Number of observations per group:                   
    2                                              132
    1                                              127
  Number of missing patterns per group:               
    2                                               48
    1                                               52

Model Test User Model:
                                              Standard      Scaled
  Test Statistic                               657.507     618.302
  Degrees of freedom                                90          90
  P-value (Chi-square)                           0.000       0.000
  Scaling correction factor                                  1.063
    Yuan-Bentler correction (Mplus variant)                       
  Test statistic for each group:
    2                                          314.801     296.031
    1                                          342.706     322.272

Model Test Baseline Model:

  Test statistic                               604.129     571.412
  Degrees of freedom                                72          72
  P-value                                        0.000       0.000
  Scaling correction factor                                  1.057

User Model versus Baseline Model:

  Comparative Fit Index (CFI)                    0.000       0.000
  Tucker-Lewis Index (TLI)                       0.147       0.154
                                                                  
  Robust Comparative Fit Index (CFI)                         0.000
  Robust Tucker-Lewis Index (TLI)                            0.150

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)              -2979.251   -2979.251
  Scaling correction factor                                  0.541
      for the MLR correction                                      
  Loglikelihood unrestricted model (H1)      -2650.498   -2650.498
  Scaling correction factor                                  1.066
      for the MLR correction                                      
                                                                  
  Akaike (AIC)                                5994.503    5994.503
  Bayesian (BIC)                              6058.526    6058.526
  Sample-size adjusted Bayesian (SABIC)       6001.459    6001.459

Root Mean Square Error of Approximation:

  RMSEA                                          0.221       0.213
  90 Percent confidence interval - lower         0.205       0.198
  90 Percent confidence interval - upper         0.237       0.228
  P-value H_0: RMSEA <= 0.050                    0.000       0.000
  P-value H_0: RMSEA >= 0.080                    1.000       1.000
                                                                  
  Robust RMSEA                                               0.257
  90 Percent confidence interval - lower                     0.238
  90 Percent confidence interval - upper                     0.277
  P-value H_0: Robust RMSEA <= 0.050                         0.000
  P-value H_0: Robust RMSEA >= 0.080                         1.000

Standardized Root Mean Square Residual:

  SRMR                                           0.281       0.281

Parameter Estimates:

  Standard errors                             Sandwich
  Information bread                           Observed
  Observed information based on                Hessian


Group 1 [2]:

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
    x1      (tau1)    4.923    0.079   62.591    0.000    4.923    4.249
    x2      (tau2)    6.060    0.076   79.251    0.000    6.060    5.249
    x3      (tau3)    2.191    0.076   28.807    0.000    2.191    1.974
    x4      (tau4)    3.050    0.080   37.996    0.000    3.050    2.603
    x5      (tau5)    4.317    0.084   51.540    0.000    4.317    3.475
    x6      (tau6)    2.179    0.070   31.159    0.000    2.179    2.140
    x7      (tau7)    4.175    0.069   60.464    0.000    4.175    4.013
    x8      (tau8)    5.509    0.068   80.899    0.000    5.509    5.467
    x9      (tau9)    5.371    0.069   78.103    0.000    5.371    5.278

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
    x1      (psi1)    1.343    0.134    9.992    0.000    1.343    1.000
    x2      (psi2)    1.333    0.139    9.570    0.000    1.333    1.000
    x3      (psi3)    1.232    0.090   13.697    0.000    1.232    1.000
    x4      (psi4)    1.373    0.143    9.607    0.000    1.373    1.000
    x5      (psi5)    1.544    0.130   11.887    0.000    1.544    1.000
    x6      (psi6)    1.037    0.132    7.842    0.000    1.037    1.000
    x7      (psi7)    1.083    0.092   11.749    0.000    1.083    1.000
    x8      (psi8)    1.015    0.132    7.719    0.000    1.015    1.000
    x9      (psi9)    1.036    0.111    9.332    0.000    1.036    1.000


Group 2 [1]:

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
    x1      (tau1)    4.923    0.079   62.591    0.000    4.923    4.249
    x2      (tau2)    6.060    0.076   79.251    0.000    6.060    5.249
    x3      (tau3)    2.191    0.076   28.807    0.000    2.191    1.974
    x4      (tau4)    3.050    0.080   37.996    0.000    3.050    2.603
    x5      (tau5)    4.317    0.084   51.540    0.000    4.317    3.475
    x6      (tau6)    2.179    0.070   31.159    0.000    2.179    2.140
    x7      (tau7)    4.175    0.069   60.464    0.000    4.175    4.013
    x8      (tau8)    5.509    0.068   80.899    0.000    5.509    5.467
    x9      (tau9)    5.371    0.069   78.103    0.000    5.371    5.278

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
    x1      (psi1)    1.343    0.134    9.992    0.000    1.343    1.000
    x2      (psi2)    1.333    0.139    9.570    0.000    1.333    1.000
    x3      (psi3)    1.232    0.090   13.697    0.000    1.232    1.000
    x4      (psi4)    1.373    0.143    9.607    0.000    1.373    1.000
    x5      (psi5)    1.544    0.130   11.887    0.000    1.544    1.000
    x6      (psi6)    1.037    0.132    7.842    0.000    1.037    1.000
    x7      (psi7)    1.083    0.092   11.749    0.000    1.083    1.000
    x8      (psi8)    1.015    0.132    7.719    0.000    1.015    1.000
    x9      (psi9)    1.036    0.111    9.332    0.000    1.036    1.000

16.10.4 Configural Invariance

Specify the same number of factors in each group, and which indicators load on which factors are the same in each group.

16.10.4.1 Model Syntax

Code
configuralInvarianceModel <- measEq.syntax(
  configural.model = cfaModel,
  data = HolzingerSwineford1939,
  ID.fac = "std.lv",
  group = "school")
Code
cat(as.character(configuralInvarianceModel))
## LOADINGS:

visual =~ c(NA, NA)*x1 + c(lambda.1_1.g1, lambda.1_1.g2)*x1
visual =~ c(NA, NA)*x2 + c(lambda.2_1.g1, lambda.2_1.g2)*x2
visual =~ c(NA, NA)*x3 + c(lambda.3_1.g1, lambda.3_1.g2)*x3
textual =~ c(NA, NA)*x4 + c(lambda.4_2.g1, lambda.4_2.g2)*x4
textual =~ c(NA, NA)*x5 + c(lambda.5_2.g1, lambda.5_2.g2)*x5
textual =~ c(NA, NA)*x6 + c(lambda.6_2.g1, lambda.6_2.g2)*x6
speed =~ c(NA, NA)*x7 + c(lambda.7_3.g1, lambda.7_3.g2)*x7
speed =~ c(NA, NA)*x8 + c(lambda.8_3.g1, lambda.8_3.g2)*x8
speed =~ c(NA, NA)*x9 + c(lambda.9_3.g1, lambda.9_3.g2)*x9

## INTERCEPTS:

x1 ~ c(NA, NA)*1 + c(nu.1.g1, nu.1.g2)*1
x2 ~ c(NA, NA)*1 + c(nu.2.g1, nu.2.g2)*1
x3 ~ c(NA, NA)*1 + c(nu.3.g1, nu.3.g2)*1
x4 ~ c(NA, NA)*1 + c(nu.4.g1, nu.4.g2)*1
x5 ~ c(NA, NA)*1 + c(nu.5.g1, nu.5.g2)*1
x6 ~ c(NA, NA)*1 + c(nu.6.g1, nu.6.g2)*1
x7 ~ c(NA, NA)*1 + c(nu.7.g1, nu.7.g2)*1
x8 ~ c(NA, NA)*1 + c(nu.8.g1, nu.8.g2)*1
x9 ~ c(NA, NA)*1 + c(nu.9.g1, nu.9.g2)*1

## UNIQUE-FACTOR VARIANCES:

x1 ~~ c(NA, NA)*x1 + c(theta.1_1.g1, theta.1_1.g2)*x1
x2 ~~ c(NA, NA)*x2 + c(theta.2_2.g1, theta.2_2.g2)*x2
x3 ~~ c(NA, NA)*x3 + c(theta.3_3.g1, theta.3_3.g2)*x3
x4 ~~ c(NA, NA)*x4 + c(theta.4_4.g1, theta.4_4.g2)*x4
x5 ~~ c(NA, NA)*x5 + c(theta.5_5.g1, theta.5_5.g2)*x5
x6 ~~ c(NA, NA)*x6 + c(theta.6_6.g1, theta.6_6.g2)*x6
x7 ~~ c(NA, NA)*x7 + c(theta.7_7.g1, theta.7_7.g2)*x7
x8 ~~ c(NA, NA)*x8 + c(theta.8_8.g1, theta.8_8.g2)*x8
x9 ~~ c(NA, NA)*x9 + c(theta.9_9.g1, theta.9_9.g2)*x9

## LATENT MEANS/INTERCEPTS:

visual ~ c(0, 0)*1 + c(alpha.1.g1, alpha.1.g2)*1
textual ~ c(0, 0)*1 + c(alpha.2.g1, alpha.2.g2)*1
speed ~ c(0, 0)*1 + c(alpha.3.g1, alpha.3.g2)*1

## COMMON-FACTOR VARIANCES:

visual ~~ c(1, 1)*visual + c(psi.1_1.g1, psi.1_1.g2)*visual
textual ~~ c(1, 1)*textual + c(psi.2_2.g1, psi.2_2.g2)*textual
speed ~~ c(1, 1)*speed + c(psi.3_3.g1, psi.3_3.g2)*speed

## COMMON-FACTOR COVARIANCES:

visual ~~ c(NA, NA)*textual + c(psi.2_1.g1, psi.2_1.g2)*textual
visual ~~ c(NA, NA)*speed + c(psi.3_1.g1, psi.3_1.g2)*speed
textual ~~ c(NA, NA)*speed + c(psi.3_2.g1, psi.3_2.g2)*speed
Code
configuralInvarianceSyntax <- as.character(configuralInvarianceModel)
16.10.4.1.1 Summary of Model Features
Code
summary(configuralInvarianceModel)
This lavaan model syntax specifies a CFA with 9 manifest indicators of 3 common factor(s).

To identify the location and scale of each common factor, the factor means and variances were fixed to 0 and 1, respectively, unless equality constraints on measurement parameters allow them to be freed.

Pattern matrix indicating num(eric), ord(ered), and lat(ent) indicators per factor:

   visual textual speed
x1 num                 
x2 num                 
x3 num                 
x4        num          
x5        num          
x6        num          
x7                num  
x8                num  
x9                num  


This model hypothesizes only configural invariance.
16.10.4.1.2 Model Syntax in Table Form
Code
lavaanify(cfaModel, ngroups = 2)

16.10.4.2 Fit the Model

Code
configuralInvarianceModel_fit <- cfa(
  configuralInvarianceSyntax,
  data = HolzingerSwineford1939,
  group = "school",
  missing = "ML",
  estimator = "MLR")
Code
configuralInvarianceModelFullSyntax_fit <- lavaan(
  cfaModel_configuralInvariance,
  data = HolzingerSwineford1939,
  group = "school",
  missing = "ML",
  estimator = "MLR")

16.10.4.3 Model Summary

Code
summary(
  configuralInvarianceModel_fit,
  fit.measures = TRUE,
  standardized = TRUE,
  rsquare = TRUE)
lavaan 0.6.17 ended normally after 62 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        60

  Number of observations per group:                   
    2                                              132
    1                                              127
  Number of missing patterns per group:               
    2                                               48
    1                                               52

Model Test User Model:
                                              Standard      Scaled
  Test Statistic                                71.443      70.904
  Degrees of freedom                                48          48
  P-value (Chi-square)                           0.016       0.017
  Scaling correction factor                                  1.008
    Yuan-Bentler correction (Mplus variant)                       
  Test statistic for each group:
    2                                           43.974      43.643
    1                                           27.469      27.262

Model Test Baseline Model:

  Test statistic                               604.129     571.412
  Degrees of freedom                                72          72
  P-value                                        0.000       0.000
  Scaling correction factor                                  1.057

User Model versus Baseline Model:

  Comparative Fit Index (CFI)                    0.956       0.954
  Tucker-Lewis Index (TLI)                       0.934       0.931
                                                                  
  Robust Comparative Fit Index (CFI)                         0.955
  Robust Tucker-Lewis Index (TLI)                            0.933

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)              -2686.219   -2686.219
  Scaling correction factor                                  1.114
      for the MLR correction                                      
  Loglikelihood unrestricted model (H1)      -2650.498   -2650.498
  Scaling correction factor                                  1.066
      for the MLR correction                                      
                                                                  
  Akaike (AIC)                                5492.439    5492.439
  Bayesian (BIC)                              5705.848    5705.848
  Sample-size adjusted Bayesian (SABIC)       5515.627    5515.627

Root Mean Square Error of Approximation:

  RMSEA                                          0.061       0.061
  90 Percent confidence interval - lower         0.027       0.026
  90 Percent confidence interval - upper         0.090       0.089
  P-value H_0: RMSEA <= 0.050                    0.252       0.264
  P-value H_0: RMSEA >= 0.080                    0.151       0.142
                                                                  
  Robust RMSEA                                               0.072
  90 Percent confidence interval - lower                     0.018
  90 Percent confidence interval - upper                     0.111
  P-value H_0: Robust RMSEA <= 0.050                         0.192
  P-value H_0: Robust RMSEA >= 0.080                         0.397

Standardized Root Mean Square Residual:

  SRMR                                           0.067       0.067

Parameter Estimates:

  Standard errors                             Sandwich
  Information bread                           Observed
  Observed information based on                Hessian


Group 1 [2]:

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  visual =~                                                             
    x1      (l.1_)    0.986    0.165    5.980    0.000    0.986    0.818
    x2      (l.2_)    0.540    0.150    3.598    0.000    0.540    0.449
    x3      (l.3_)    0.748    0.122    6.123    0.000    0.748    0.664
  textual =~                                                            
    x4      (l.4_)    0.910    0.105    8.698    0.000    0.910    0.794
    x5      (l.5_)    1.186    0.115   10.345    0.000    1.186    0.902
    x6      (l.6_)    0.783    0.107    7.298    0.000    0.783    0.807
  speed =~                                                              
    x7      (l.7_)    0.442    0.201    2.199    0.028    0.442    0.437
    x8      (l.8_)    0.594    0.200    2.969    0.003    0.594    0.637
    x9      (l.9_)    0.610    0.211    2.892    0.004    0.610    0.612

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  visual ~~                                                             
    textul  (p.2_)    0.453    0.125    3.614    0.000    0.453    0.453
    speed  (p.3_1)    0.423    0.225    1.882    0.060    0.423    0.423
  textual ~~                                                            
    speed  (p.3_2)    0.311    0.115    2.707    0.007    0.311    0.311

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .x1      (n.1.)    4.973    0.114   43.612    0.000    4.973    4.122
   .x2      (n.2.)    5.978    0.110   54.281    0.000    5.978    4.965
   .x3      (n.3.)    2.375    0.106   22.330    0.000    2.375    2.108
   .x4      (n.4.)    2.818    0.105   26.854    0.000    2.818    2.462
   .x5      (n.5.)    4.025    0.120   33.603    0.000    4.025    3.058
   .x6      (n.6.)    1.975    0.087   22.640    0.000    1.975    2.036
   .x7      (n.7.)    4.363    0.093   46.747    0.000    4.363    4.314
   .x8      (n.8.)    5.464    0.088   61.957    0.000    5.464    5.856
   .x9      (n.9.)    5.404    0.093   58.016    0.000    5.404    5.426
    visual  (a.1.)    0.000                               0.000    0.000
    textual (a.2.)    0.000                               0.000    0.000
    speed   (a.3.)    0.000                               0.000    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .x1      (t.1_)    0.482    0.265    1.822    0.068    0.482    0.331
   .x2      (t.2_)    1.158    0.199    5.817    0.000    1.158    0.798
   .x3      (t.3_)    0.711    0.163    4.372    0.000    0.711    0.560
   .x4      (t.4_)    0.484    0.093    5.198    0.000    0.484    0.369
   .x5      (t.5_)    0.324    0.128    2.542    0.011    0.324    0.187
   .x6      (t.6_)    0.328    0.075    4.364    0.000    0.328    0.348
   .x7      (t.7_)    0.827    0.172    4.810    0.000    0.827    0.809
   .x8      (t.8_)    0.518    0.204    2.535    0.011    0.518    0.595
   .x9      (t.9_)    0.620    0.261    2.375    0.018    0.620    0.625
    visual  (p.1_)    1.000                               1.000    1.000
    textual (p.2_)    1.000                               1.000    1.000
    speed   (p.3_)    1.000                               1.000    1.000

R-Square:
                   Estimate
    x1                0.669
    x2                0.202
    x3                0.440
    x4                0.631
    x5                0.813
    x6                0.652
    x7                0.191
    x8                0.405
    x9                0.375


Group 2 [1]:

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  visual =~                                                             
    x1      (l.1_)    0.741    0.127    5.830    0.000    0.741    0.661
    x2      (l.2_)    0.459    0.107    4.309    0.000    0.459    0.424
    x3      (l.3_)    0.783    0.103    7.615    0.000    0.783    0.745
  textual =~                                                            
    x4      (l.4_)    0.971    0.098    9.898    0.000    0.971    0.867
    x5      (l.5_)    0.931    0.106    8.806    0.000    0.931    0.815
    x6      (l.6_)    0.879    0.113    7.764    0.000    0.879    0.823
  speed =~                                                              
    x7      (l.7_)    0.629    0.105    5.982    0.000    0.629    0.605
    x8      (l.8_)    0.852    0.146    5.815    0.000    0.852    0.797
    x9      (l.9_)    0.712    0.150    4.755    0.000    0.712    0.687

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  visual ~~                                                             
    textul  (p.2_)    0.551    0.104    5.299    0.000    0.551    0.551
    speed  (p.3_1)    0.627    0.142    4.410    0.000    0.627    0.627
  textual ~~                                                            
    speed  (p.3_2)    0.350    0.174    2.015    0.044    0.350    0.350

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .x1      (n.1.)    4.882    0.105   46.528    0.000    4.882    4.354
   .x2      (n.2.)    6.160    0.103   59.887    0.000    6.160    5.692
   .x3      (n.3.)    1.990    0.098   20.392    0.000    1.990    1.894
   .x4      (n.4.)    3.305    0.104   31.664    0.000    3.305    2.951
   .x5      (n.5.)    4.669    0.104   44.694    0.000    4.669    4.089
   .x6      (n.6.)    2.444    0.103   23.796    0.000    2.444    2.286
   .x7      (n.7.)    4.004    0.097   41.179    0.000    4.004    3.850
   .x8      (n.8.)    5.570    0.099   56.357    0.000    5.570    5.211
   .x9      (n.9.)    5.374    0.099   54.445    0.000    5.374    5.189
    visual  (a.1.)    0.000                               0.000    0.000
    textual (a.2.)    0.000                               0.000    0.000
    speed   (a.3.)    0.000                               0.000    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .x1      (t.1_)    0.708    0.178    3.977    0.000    0.708    0.563
   .x2      (t.2_)    0.961    0.189    5.083    0.000    0.961    0.820
   .x3      (t.3_)    0.492    0.123    3.984    0.000    0.492    0.445
   .x4      (t.4_)    0.311    0.083    3.745    0.000    0.311    0.248
   .x5      (t.5_)    0.438    0.096    4.575    0.000    0.438    0.336
   .x6      (t.6_)    0.369    0.095    3.867    0.000    0.369    0.323
   .x7      (t.7_)    0.686    0.118    5.795    0.000    0.686    0.634
   .x8      (t.8_)    0.417    0.220    1.899    0.058    0.417    0.365
   .x9      (t.9_)    0.566    0.156    3.639    0.000    0.566    0.528
    visual  (p.1_)    1.000                               1.000    1.000
    textual (p.2_)    1.000                               1.000    1.000
    speed   (p.3_)    1.000                               1.000    1.000

R-Square:
                   Estimate
    x1                0.437
    x2                0.180
    x3                0.555
    x4                0.752
    x5                0.664
    x6                0.677
    x7                0.366
    x8                0.635
    x9                0.472
Code
summary(
  configuralInvarianceModelFullSyntax_fit,
  fit.measures = TRUE,
  standardized = TRUE,
  rsquare = TRUE)
lavaan 0.6.17 ended normally after 62 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        60

  Number of observations per group:                   
    2                                              132
    1                                              127
  Number of missing patterns per group:               
    2                                               48
    1                                               52

Model Test User Model:
                                              Standard      Scaled
  Test Statistic                                71.443      70.904
  Degrees of freedom                                48          48
  P-value (Chi-square)                           0.016       0.017
  Scaling correction factor                                  1.008
    Yuan-Bentler correction (Mplus variant)                       
  Test statistic for each group:
    2                                           43.974      43.643
    1                                           27.469      27.262

Model Test Baseline Model:

  Test statistic                               604.129     571.412
  Degrees of freedom                                72          72
  P-value                                        0.000       0.000
  Scaling correction factor                                  1.057

User Model versus Baseline Model:

  Comparative Fit Index (CFI)                    0.956       0.954
  Tucker-Lewis Index (TLI)                       0.934       0.931
                                                                  
  Robust Comparative Fit Index (CFI)                         0.955
  Robust Tucker-Lewis Index (TLI)                            0.933

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)              -2686.219   -2686.219
  Scaling correction factor                                  1.114
      for the MLR correction                                      
  Loglikelihood unrestricted model (H1)      -2650.498   -2650.498
  Scaling correction factor                                  1.066
      for the MLR correction                                      
                                                                  
  Akaike (AIC)                                5492.439    5492.439
  Bayesian (BIC)                              5705.848    5705.848
  Sample-size adjusted Bayesian (SABIC)       5515.627    5515.627

Root Mean Square Error of Approximation:

  RMSEA                                          0.061       0.061
  90 Percent confidence interval - lower         0.027       0.026
  90 Percent confidence interval - upper         0.090       0.089
  P-value H_0: RMSEA <= 0.050                    0.252       0.264
  P-value H_0: RMSEA >= 0.080                    0.151       0.142
                                                                  
  Robust RMSEA                                               0.072
  90 Percent confidence interval - lower                     0.018
  90 Percent confidence interval - upper                     0.111
  P-value H_0: Robust RMSEA <= 0.050                         0.192
  P-value H_0: Robust RMSEA >= 0.080                         0.397

Standardized Root Mean Square Residual:

  SRMR                                           0.067       0.067

Parameter Estimates:

  Standard errors                             Sandwich
  Information bread                           Observed
  Observed information based on                Hessian


Group 1 [2]:

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  visual =~                                                             
    x1                0.986    0.165    5.980    0.000    0.986    0.818
    x2                0.540    0.150    3.598    0.000    0.540    0.449
    x3                0.748    0.122    6.123    0.000    0.748    0.664
  textual =~                                                            
    x4                0.910    0.105    8.698    0.000    0.910    0.794
    x5                1.186    0.115   10.345    0.000    1.186    0.902
    x6                0.783    0.107    7.298    0.000    0.783    0.807
  speed =~                                                              
    x7                0.442    0.201    2.199    0.028    0.442    0.437
    x8                0.594    0.200    2.969    0.003    0.594    0.637
    x9                0.610    0.211    2.892    0.004    0.610    0.612

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  visual ~~                                                             
    textual           0.453    0.125    3.614    0.000    0.453    0.453
    speed             0.423    0.225    1.882    0.060    0.423    0.423
  textual ~~                                                            
    speed             0.311    0.115    2.707    0.007    0.311    0.311

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
    visual            0.000                               0.000    0.000
    textual           0.000                               0.000    0.000
    speed             0.000                               0.000    0.000
   .x1                4.973    0.114   43.612    0.000    4.973    4.122
   .x2                5.978    0.110   54.281    0.000    5.978    4.965
   .x3                2.375    0.106   22.330    0.000    2.375    2.108
   .x4                2.818    0.105   26.854    0.000    2.818    2.462
   .x5                4.025    0.120   33.603    0.000    4.025    3.058
   .x6                1.975    0.087   22.640    0.000    1.975    2.036
   .x7                4.363    0.093   46.747    0.000    4.363    4.314
   .x8                5.464    0.088   61.957    0.000    5.464    5.856
   .x9                5.404    0.093   58.016    0.000    5.404    5.426

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
    visual            1.000                               1.000    1.000
    textual           1.000                               1.000    1.000
    speed             1.000                               1.000    1.000
   .x1                0.482    0.265    1.822    0.068    0.482    0.331
   .x2                1.158    0.199    5.817    0.000    1.158    0.798
   .x3                0.711    0.163    4.372    0.000    0.711    0.560
   .x4                0.484    0.093    5.198    0.000    0.484    0.369
   .x5                0.324    0.128    2.542    0.011    0.324    0.187
   .x6                0.328    0.075    4.364    0.000    0.328    0.348
   .x7                0.827    0.172    4.810    0.000    0.827    0.809
   .x8                0.518    0.204    2.535    0.011    0.518    0.595
   .x9                0.620    0.261    2.375    0.018    0.620    0.625

R-Square:
                   Estimate
    x1                0.669
    x2                0.202
    x3                0.440
    x4                0.631
    x5                0.813
    x6                0.652
    x7                0.191
    x8                0.405
    x9                0.375


Group 2 [1]:

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  visual =~                                                             
    x1                0.741    0.127    5.830    0.000    0.741    0.661
    x2                0.459    0.107    4.309    0.000    0.459    0.424
    x3                0.783    0.103    7.615    0.000    0.783    0.745
  textual =~                                                            
    x4                0.971    0.098    9.898    0.000    0.971    0.867
    x5                0.931    0.106    8.806    0.000    0.931    0.815
    x6                0.879    0.113    7.764    0.000    0.879    0.823
  speed =~                                                              
    x7                0.629    0.105    5.982    0.000    0.629    0.605
    x8                0.852    0.146    5.815    0.000    0.852    0.797
    x9                0.712    0.150    4.755    0.000    0.712    0.687

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  visual ~~                                                             
    textual           0.551    0.104    5.299    0.000    0.551    0.551
    speed             0.627    0.142    4.410    0.000    0.627    0.627
  textual ~~                                                            
    speed             0.350    0.174    2.015    0.044    0.350    0.350

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
    visual            0.000                               0.000    0.000
    textual           0.000                               0.000    0.000
    speed             0.000                               0.000    0.000
   .x1                4.882    0.105   46.528    0.000    4.882    4.354
   .x2                6.160    0.103   59.887    0.000    6.160    5.692
   .x3                1.990    0.098   20.392    0.000    1.990    1.894
   .x4                3.305    0.104   31.664    0.000    3.305    2.951
   .x5                4.669    0.104   44.694    0.000    4.669    4.089
   .x6                2.444    0.103   23.796    0.000    2.444    2.286
   .x7                4.004    0.097   41.179    0.000    4.004    3.850
   .x8                5.570    0.099   56.357    0.000    5.570    5.211
   .x9                5.374    0.099   54.445    0.000    5.374    5.189

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
    visual            1.000                               1.000    1.000
    textual           1.000                               1.000    1.000
    speed             1.000                               1.000    1.000
   .x1                0.708    0.178    3.977    0.000    0.708    0.563
   .x2                0.961    0.189    5.083    0.000    0.961    0.820
   .x3                0.492    0.123    3.984    0.000    0.492    0.445
   .x4                0.311    0.083    3.745    0.000    0.311    0.248
   .x5                0.438    0.096    4.575    0.000    0.438    0.336
   .x6                0.369    0.095    3.867    0.000    0.369    0.323
   .x7                0.686    0.118    5.795    0.000    0.686    0.634
   .x8                0.417    0.220    1.899    0.058    0.417    0.365
   .x9                0.566    0.156    3.639    0.000    0.566    0.528

R-Square:
                   Estimate
    x1                0.437
    x2                0.180
    x3                0.555
    x4                0.752
    x5                0.664
    x6                0.677
    x7                0.366
    x8                0.635
    x9                0.472

16.10.4.4 Model Fit

You can specify the null model as the baseline model using: baseline.model = nullModelFit

Code
fitMeasures(configuralInvarianceModel_fit)
                         npar                          fmin 
                       60.000                         0.138 
                        chisq                            df 
                       71.443                        48.000 
                       pvalue                  chisq.scaled 
                        0.016                        70.904 
                    df.scaled                 pvalue.scaled 
                       48.000                         0.017 
         chisq.scaling.factor                baseline.chisq 
                        1.008                       604.129 
                  baseline.df               baseline.pvalue 
                       72.000                         0.000 
        baseline.chisq.scaled            baseline.df.scaled 
                      571.412                        72.000 
       baseline.pvalue.scaled baseline.chisq.scaling.factor 
                        0.000                         1.057 
                          cfi                           tli 
                        0.956                         0.934 
                   cfi.scaled                    tli.scaled 
                        0.954                         0.931 
                   cfi.robust                    tli.robust 
                        0.955                         0.933 
                         nnfi                           rfi 
                        0.934                         0.823 
                          nfi                          pnfi 
                        0.882                         0.588 
                          ifi                           rni 
                        0.958                         0.956 
                  nnfi.scaled                    rfi.scaled 
                        0.931                         0.814 
                   nfi.scaled                   pnfi.scaled 
                        0.876                         0.584 
                   ifi.scaled                    rni.scaled 
                        0.956                         0.954 
                  nnfi.robust                    rni.robust 
                        0.933                         0.955 
                         logl             unrestricted.logl 
                    -2686.219                     -2650.498 
                          aic                           bic 
                     5492.439                      5705.848 
                       ntotal                          bic2 
                      259.000                      5515.627 
            scaling.factor.h1             scaling.factor.h0 
                        1.066                         1.114 
                        rmsea                rmsea.ci.lower 
                        0.061                         0.027 
               rmsea.ci.upper                rmsea.ci.level 
                        0.090                         0.900 
                 rmsea.pvalue                rmsea.close.h0 
                        0.252                         0.050 
        rmsea.notclose.pvalue             rmsea.notclose.h0 
                        0.151                         0.080 
                 rmsea.scaled         rmsea.ci.lower.scaled 
                        0.061                         0.026 
        rmsea.ci.upper.scaled           rmsea.pvalue.scaled 
                        0.089                         0.264 
 rmsea.notclose.pvalue.scaled                  rmsea.robust 
                        0.142                         0.072 
        rmsea.ci.lower.robust         rmsea.ci.upper.robust 
                        0.018                         0.111 
          rmsea.pvalue.robust  rmsea.notclose.pvalue.robust 
                        0.192                         0.397 
                          rmr                    rmr_nomean 
                        0.080                         0.088 
                         srmr                  srmr_bentler 
                        0.067                         0.067 
          srmr_bentler_nomean                          crmr 
                        0.073                         0.073 
                  crmr_nomean                    srmr_mplus 
                        0.081                         0.067 
            srmr_mplus_nomean                         cn_05 
                        0.073                       237.261 
                        cn_01                           gfi 
                      268.119                         0.994 
                         agfi                          pgfi 
                        0.987                         0.442 
                          mfi                          ecvi 
                        0.956                         0.739 
Code
configuralInvarianceModelFitIndices <- fitMeasures(
  configuralInvarianceModel_fit)[c(
    "cfi.robust", "rmsea.robust", "srmr")]

configuralInvarianceModel_chisquare <- fitMeasures(
  configuralInvarianceModel_fit)[c("chisq.scaled")]

configuralInvarianceModel_chisquareScaling <- fitMeasures(
  configuralInvarianceModel_fit)[c("chisq.scaling.factor")]

configuralInvarianceModel_df <- fitMeasures(
  configuralInvarianceModel_fit)[c("df.scaled")]

configuralInvarianceModel_N <- lavInspect(
  configuralInvarianceModel_fit,
  what = "ntotal")

16.10.4.5 Effect size of non-invariance

16.10.4.5.1 dMACS

The effect size of measurement non-invariance, as described by Nye et al. (2019), was calculated using the dmacs package (Dueber, 2019).

Code
lavaan_dmacs(configuralInvarianceModel_fit)
$DMACS
      visual   textual     speed
x1 0.2250656        NA        NA
x2 0.1731621        NA        NA
x3 0.3523180        NA        NA
x4        NA 0.4264504        NA
x5        NA 0.5748120        NA
x6        NA 0.4769889        NA
x7        NA        NA 0.3927474
x8        NA        NA 0.2762461
x9        NA        NA 0.1037844

$ItemDeltaMean
        visual   textual       speed
x1 -0.09091876        NA          NA
x2  0.18252780        NA          NA
x3 -0.38511799        NA          NA
x4          NA 0.4868954          NA
x5          NA 0.6445379          NA
x6          NA 0.4690029          NA
x7          NA        NA -0.35907161
x8          NA        NA  0.10636300
x9          NA        NA -0.02947139

$MeanDiff
   visual   textual     speed 
-0.293509  1.600436 -0.282180 

16.10.4.6 Equivalence Test

The petersenlab package (Petersen, 2024b) contains the equiv_chi() function from Counsell et al. (2020) that performs an equivalence test: https://osf.io/cqu8v. An equivalence test evaluates the null hypothesis that a model is equivalent to another model. In this case, the equivalence test evaluates the null hypothesis that a model is equivalent to another model in terms of (poor) fit. Here, we operationalize a mediocre-fitting model as a model whose RMSEA is .08 or greater. So, the equivalence test evaluates whether our invariance model fits significantly better than a mediocre-fitting model. Thus, a statistically significant p-value indicates that our invariance model fits significantly better than the mediocre-fitting model (i.e., that invariance is established), whereas a non-significant p-value indicates that our invariance model does not fit significantly better than the mediocre-fitting model (i.e., that invariance is failed).

The chi-square equivalence test is non-significant, suggesting that the model fit of the configural invariance model is not acceptable (i.e., it is not better than the mediocre-fitting model). In other words, configural invariance failed.

Code
equiv_chi(
  alpha = .05,
  chi = configuralInvarianceModel_chisquare,
  df = configuralInvarianceModel_df,
  m = 2,
  N_sample = configuralInvarianceModel_N,
  popRMSEA = .08)

16.10.4.7 Permutation Test

Permutation procedures for testing measurement invariance are described in Jorgensen et al. (2018). The permutation test evaluates the null hypothesis that the model fit is not worse than the best-possible fitting model. A significant p-value indicates that the model fits significanlty worse than the best-possible fitting model. A non-significant p-value indicates that the model does not fit significantly worse than the best-possible fitting model.

Code
numPermutations <- 100

For reproducibility, I set the seed below. Using the same seed will yield the same answer every time. There is nothing special about this particular seed. You can specify the null model as the baseline model using: baseline.model = nullModelFit. Warning: this code takes a while to run based on \(100\) iterations. You can reduce the number of iterations to be faster.

Code
set.seed(52242)
configuralInvarianceTest <- permuteMeasEq(
  nPermute = numPermutations,
  modelType = "mgcfa",
  con = configuralInvarianceModel_fit,
  uncon = NULL,
  AFIs = myAFIs,
  moreAFIs = moreAFIs,
  parallelType = "multicore", #only 'snow' works on Windows, but right now, it is throwing an error
  iseed = 52242)
Code
RNGkind("default", "default", "default")
set.seed(52242)

configuralInvarianceTest
Omnibus p value based on parametric chi-squared difference test:

Chisq diff    Df diff Pr(>Chisq) 
    70.904     48.000      0.017 


Omnibus p values based on nonparametric permutation method: 

             AFI.Difference p.value
chisq                71.443    0.59
chisq.scaled         70.904    0.57
rmsea                 0.061    0.59
cfi                   0.956    0.58
tli                   0.934    0.58
srmr                  0.067    0.53
rmsea.robust          0.072    0.56
cfi.robust            0.955    0.56
tli.robust            0.933    0.56

The p-values are non-significant, indicating that the model does not fit significantly worse than the best-possible fitting model. In other words, configural invariance held.

16.10.4.8 Internal Consistency Reliability

Internal consistency reliability of items composing the latent factors, as quantified by omega (\(\omega\)) and average variance extracted (AVE), was estimated using the semTools package (Jorgensen et al., 2021).

Code
compRelSEM(configuralInvarianceModel_fit)
Code
AVE(configuralInvarianceModel_fit)

16.10.4.9 Path Diagram

A path diagram of the model generated using the semPlot package (Epskamp, 2022) is in the figures below (“1” = group 1; “2” = group 2).

Code
semPaths(
  configuralInvarianceModel_fit,
  what = "est",
  layout = "tree2",
  ask = FALSE,
  edge.label.cex = 1.2)
Configural Invariance Model in Confirmatory Factor Analysis (1 = Group 1; 2 = Group 2).

Figure 16.53: Configural Invariance Model in Confirmatory Factor Analysis (1 = Group 1; 2 = Group 2).

Configural Invariance Model in Confirmatory Factor Analysis (1 = Group 1; 2 = Group 2).

Figure 16.54: Configural Invariance Model in Confirmatory Factor Analysis (1 = Group 1; 2 = Group 2).

16.10.5 Metric (“Weak Factorial”) Invariance Model

Specify invariance of factor loadings across groups.

16.10.5.1 Model Syntax

Code
metricInvarianceModel <- measEq.syntax(
  configural.model = cfaModel,
  data = HolzingerSwineford1939,
  ID.fac = "std.lv",
  group = "school",
  group.equal = "loadings")
Code
cat(as.character(metricInvarianceModel))
## LOADINGS:

visual =~ c(NA, NA)*x1 + c(lambda.1_1, lambda.1_1)*x1
visual =~ c(NA, NA)*x2 + c(lambda.2_1, lambda.2_1)*x2
visual =~ c(NA, NA)*x3 + c(lambda.3_1, lambda.3_1)*x3
textual =~ c(NA, NA)*x4 + c(lambda.4_2, lambda.4_2)*x4
textual =~ c(NA, NA)*x5 + c(lambda.5_2, lambda.5_2)*x5
textual =~ c(NA, NA)*x6 + c(lambda.6_2, lambda.6_2)*x6
speed =~ c(NA, NA)*x7 + c(lambda.7_3, lambda.7_3)*x7
speed =~ c(NA, NA)*x8 + c(lambda.8_3, lambda.8_3)*x8
speed =~ c(NA, NA)*x9 + c(lambda.9_3, lambda.9_3)*x9

## INTERCEPTS:

x1 ~ c(NA, NA)*1 + c(nu.1.g1, nu.1.g2)*1
x2 ~ c(NA, NA)*1 + c(nu.2.g1, nu.2.g2)*1
x3 ~ c(NA, NA)*1 + c(nu.3.g1, nu.3.g2)*1
x4 ~ c(NA, NA)*1 + c(nu.4.g1, nu.4.g2)*1
x5 ~ c(NA, NA)*1 + c(nu.5.g1, nu.5.g2)*1
x6 ~ c(NA, NA)*1 + c(nu.6.g1, nu.6.g2)*1
x7 ~ c(NA, NA)*1 + c(nu.7.g1, nu.7.g2)*1
x8 ~ c(NA, NA)*1 + c(nu.8.g1, nu.8.g2)*1
x9 ~ c(NA, NA)*1 + c(nu.9.g1, nu.9.g2)*1

## UNIQUE-FACTOR VARIANCES:

x1 ~~ c(NA, NA)*x1 + c(theta.1_1.g1, theta.1_1.g2)*x1
x2 ~~ c(NA, NA)*x2 + c(theta.2_2.g1, theta.2_2.g2)*x2
x3 ~~ c(NA, NA)*x3 + c(theta.3_3.g1, theta.3_3.g2)*x3
x4 ~~ c(NA, NA)*x4 + c(theta.4_4.g1, theta.4_4.g2)*x4
x5 ~~ c(NA, NA)*x5 + c(theta.5_5.g1, theta.5_5.g2)*x5
x6 ~~ c(NA, NA)*x6 + c(theta.6_6.g1, theta.6_6.g2)*x6
x7 ~~ c(NA, NA)*x7 + c(theta.7_7.g1, theta.7_7.g2)*x7
x8 ~~ c(NA, NA)*x8 + c(theta.8_8.g1, theta.8_8.g2)*x8
x9 ~~ c(NA, NA)*x9 + c(theta.9_9.g1, theta.9_9.g2)*x9

## LATENT MEANS/INTERCEPTS:

visual ~ c(0, 0)*1 + c(alpha.1.g1, alpha.1.g2)*1
textual ~ c(0, 0)*1 + c(alpha.2.g1, alpha.2.g2)*1
speed ~ c(0, 0)*1 + c(alpha.3.g1, alpha.3.g2)*1

## COMMON-FACTOR VARIANCES:

visual ~~ c(1, NA)*visual + c(psi.1_1.g1, psi.1_1.g2)*visual
textual ~~ c(1, NA)*textual + c(psi.2_2.g1, psi.2_2.g2)*textual
speed ~~ c(1, NA)*speed + c(psi.3_3.g1, psi.3_3.g2)*speed

## COMMON-FACTOR COVARIANCES:

visual ~~ c(NA, NA)*textual + c(psi.2_1.g1, psi.2_1.g2)*textual
visual ~~ c(NA, NA)*speed + c(psi.3_1.g1, psi.3_1.g2)*speed
textual ~~ c(NA, NA)*speed + c(psi.3_2.g1, psi.3_2.g2)*speed
Code
metricInvarianceSyntax <- as.character(metricInvarianceModel)
16.10.5.1.1 Summary of Model Features
Code
summary(metricInvarianceModel)
This lavaan model syntax specifies a CFA with 9 manifest indicators of 3 common factor(s).

To identify the location and scale of each common factor, the factor means and variances were fixed to 0 and 1, respectively, unless equality constraints on measurement parameters allow them to be freed.

Pattern matrix indicating num(eric), ord(ered), and lat(ent) indicators per factor:

   visual textual speed
x1 num                 
x2 num                 
x3 num                 
x4        num          
x5        num          
x6        num          
x7                num  
x8                num  
x9                num  

The following types of parameter were constrained to equality across groups:

    loadings

16.10.5.2 Model Syntax in Table Form

Code
lavaanify(metricInvarianceModel, ngroups = 2)
Code
lavaanify(cfaModel_metricInvariance, ngroups = 2)

16.10.5.3 Fit the Model

Code
metricInvarianceModel_fit <- cfa(
  metricInvarianceSyntax,
  data = HolzingerSwineford1939,
  group = "school",
  missing = "ML",
  estimator = "MLR")
Code
metricInvarianceModelFullSyntax_fit <- lavaan(
  cfaModel_metricInvariance,
  data = HolzingerSwineford1939,
  group = "school",
  missing = "ML",
  estimator = "MLR")

16.10.5.4 Model Summary

Code
summary(
  metricInvarianceModel_fit,
  fit.measures = TRUE,
  standardized = TRUE,
  rsquare = TRUE)
lavaan 0.6.17 ended normally after 67 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        63
  Number of equality constraints                     9

  Number of observations per group:                   
    2                                              132
    1                                              127
  Number of missing patterns per group:               
    2                                               48
    1                                               52

Model Test User Model:
                                              Standard      Scaled
  Test Statistic                                78.011      75.195
  Degrees of freedom                                54          54
  P-value (Chi-square)                           0.018       0.030
  Scaling correction factor                                  1.037
    Yuan-Bentler correction (Mplus variant)                       
  Test statistic for each group:
    2                                           47.087      45.387
    1                                           30.924      29.808

Model Test Baseline Model:

  Test statistic                               604.129     571.412
  Degrees of freedom                                72          72
  P-value                                        0.000       0.000
  Scaling correction factor                                  1.057

User Model versus Baseline Model:

  Comparative Fit Index (CFI)                    0.955       0.958
  Tucker-Lewis Index (TLI)                       0.940       0.943
                                                                  
  Robust Comparative Fit Index (CFI)                         0.946
  Robust Tucker-Lewis Index (TLI)                            0.928

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)              -2689.503   -2689.503
  Scaling correction factor                                  0.939
      for the MLR correction                                      
  Loglikelihood unrestricted model (H1)      -2650.498   -2650.498
  Scaling correction factor                                  1.066
      for the MLR correction                                      
                                                                  
  Akaike (AIC)                                5487.007    5487.007
  Bayesian (BIC)                              5679.075    5679.075
  Sample-size adjusted Bayesian (SABIC)       5507.876    5507.876

Root Mean Square Error of Approximation:

  RMSEA                                          0.059       0.055
  90 Percent confidence interval - lower         0.025       0.019
  90 Percent confidence interval - upper         0.086       0.082
  P-value H_0: RMSEA <= 0.050                    0.296       0.368
  P-value H_0: RMSEA >= 0.080                    0.104       0.069
                                                                  
  Robust RMSEA                                               0.075
  90 Percent confidence interval - lower                     0.030
  90 Percent confidence interval - upper                     0.110
  P-value H_0: Robust RMSEA <= 0.050                         0.147
  P-value H_0: Robust RMSEA >= 0.080                         0.431

Standardized Root Mean Square Residual:

  SRMR                                           0.073       0.073

Parameter Estimates:

  Standard errors                             Sandwich
  Information bread                           Observed
  Observed information based on                Hessian


Group 1 [2]:

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  visual =~                                                             
    x1      (l.1_)    0.913    0.130    7.016    0.000    0.913    0.769
    x2      (l.2_)    0.539    0.097    5.557    0.000    0.539    0.450
    x3      (l.3_)    0.819    0.088    9.333    0.000    0.819    0.714
  textual =~                                                            
    x4      (l.4_)    0.950    0.090   10.553    0.000    0.950    0.811
    x5      (l.5_)    1.069    0.117    9.122    0.000    1.069    0.852
    x6      (l.6_)    0.824    0.092    8.926    0.000    0.824    0.832
  speed =~                                                              
    x7      (l.7_)    0.471    0.085    5.523    0.000    0.471    0.465
    x8      (l.8_)    0.636    0.108    5.897    0.000    0.636    0.679
    x9      (l.9_)    0.557    0.112    4.971    0.000    0.557    0.563

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  visual ~~                                                             
    textul  (p.2_)    0.455    0.127    3.578    0.000    0.455    0.455
    speed  (p.3_1)    0.396    0.139    2.837    0.005    0.396    0.396
  textual ~~                                                            
    speed  (p.3_2)    0.318    0.115    2.773    0.006    0.318    0.318

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .x1      (n.1.)    4.974    0.113   44.143    0.000    4.974    4.189
   .x2      (n.2.)    5.977    0.110   54.446    0.000    5.977    4.983
   .x3      (n.3.)    2.375    0.106   22.343    0.000    2.375    2.069
   .x4      (n.4.)    2.817    0.105   26.778    0.000    2.817    2.404
   .x5      (n.5.)    4.022    0.119   33.914    0.000    4.022    3.206
   .x6      (n.6.)    1.971    0.087   22.590    0.000    1.971    1.990
   .x7      (n.7.)    4.362    0.093   46.812    0.000    4.362    4.303
   .x8      (n.8.)    5.464    0.088   61.938    0.000    5.464    5.834
   .x9      (n.9.)    5.402    0.094   57.768    0.000    5.402    5.456
    visual  (a.1.)    0.000                               0.000    0.000
    textual (a.2.)    0.000                               0.000    0.000
    speed   (a.3.)    0.000                               0.000    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .x1      (t.1_)    0.576    0.179    3.221    0.001    0.576    0.409
   .x2      (t.2_)    1.147    0.185    6.212    0.000    1.147    0.798
   .x3      (t.3_)    0.646    0.134    4.811    0.000    0.646    0.491
   .x4      (t.4_)    0.470    0.090    5.216    0.000    0.470    0.343
   .x5      (t.5_)    0.431    0.109    3.941    0.000    0.431    0.274
   .x6      (t.6_)    0.302    0.075    4.033    0.000    0.302    0.308
   .x7      (t.7_)    0.806    0.122    6.590    0.000    0.806    0.784
   .x8      (t.8_)    0.473    0.120    3.928    0.000    0.473    0.539
   .x9      (t.9_)    0.670    0.149    4.495    0.000    0.670    0.683
    visual  (p.1_)    1.000                               1.000    1.000
    textual (p.2_)    1.000                               1.000    1.000
    speed   (p.3_)    1.000                               1.000    1.000

R-Square:
                   Estimate
    x1                0.591
    x2                0.202
    x3                0.509
    x4                0.657
    x5                0.726
    x6                0.692
    x7                0.216
    x8                0.461
    x9                0.317


Group 2 [1]:

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  visual =~                                                             
    x1      (l.1_)    0.913    0.130    7.016    0.000    0.801    0.704
    x2      (l.2_)    0.539    0.097    5.557    0.000    0.473    0.435
    x3      (l.3_)    0.819    0.088    9.333    0.000    0.719    0.696
  textual =~                                                            
    x4      (l.4_)    0.950    0.090   10.553    0.000    0.914    0.836
    x5      (l.5_)    1.069    0.117    9.122    0.000    1.029    0.866
    x6      (l.6_)    0.824    0.092    8.926    0.000    0.793    0.772
  speed =~                                                              
    x7      (l.7_)    0.471    0.085    5.523    0.000    0.619    0.596
    x8      (l.8_)    0.636    0.108    5.897    0.000    0.835    0.783
    x9      (l.9_)    0.557    0.112    4.971    0.000    0.732    0.703

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  visual ~~                                                             
    textul  (p.2_)    0.462    0.137    3.379    0.001    0.547    0.547
    speed  (p.3_1)    0.732    0.231    3.177    0.001    0.636    0.636
  textual ~~                                                            
    speed  (p.3_2)    0.473    0.241    1.966    0.049    0.375    0.375

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .x1      (n.1.)    4.881    0.105   46.577    0.000    4.881    4.289
   .x2      (n.2.)    6.159    0.103   59.876    0.000    6.159    5.664
   .x3      (n.3.)    1.993    0.098   20.368    0.000    1.993    1.932
   .x4      (n.4.)    3.299    0.103   31.949    0.000    3.299    3.017
   .x5      (n.5.)    4.674    0.105   44.381    0.000    4.674    3.932
   .x6      (n.6.)    2.436    0.101   24.159    0.000    2.436    2.371
   .x7      (n.7.)    4.002    0.097   41.140    0.000    4.002    3.855
   .x8      (n.8.)    5.571    0.099   56.335    0.000    5.571    5.226
   .x9      (n.9.)    5.375    0.098   54.583    0.000    5.375    5.165
    visual  (a.1.)    0.000                               0.000    0.000
    textual (a.2.)    0.000                               0.000    0.000
    speed   (a.3.)    0.000                               0.000    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .x1      (t.1_)    0.653    0.167    3.900    0.000    0.653    0.504
   .x2      (t.2_)    0.958    0.186    5.165    0.000    0.958    0.811
   .x3      (t.3_)    0.548    0.117    4.672    0.000    0.548    0.515
   .x4      (t.4_)    0.360    0.089    4.046    0.000    0.360    0.301
   .x5      (t.5_)    0.354    0.096    3.679    0.000    0.354    0.250
   .x6      (t.6_)    0.427    0.103    4.149    0.000    0.427    0.404
   .x7      (t.7_)    0.695    0.120    5.805    0.000    0.695    0.645
   .x8      (t.8_)    0.439    0.204    2.149    0.032    0.439    0.387
   .x9      (t.9_)    0.547    0.148    3.697    0.000    0.547    0.506
    visual  (p.1_)    0.770    0.210    3.663    0.000    1.000    1.000
    textual (p.2_)    0.926    0.228    4.058    0.000    1.000    1.000
    speed   (p.3_)    1.724    0.512    3.364    0.001    1.000    1.000

R-Square:
                   Estimate
    x1                0.496
    x2                0.189
    x3                0.485
    x4                0.699
    x5                0.750
    x6                0.596
    x7                0.355
    x8                0.613
    x9                0.494
Code
summary(
  metricInvarianceModelFullSyntax_fit,
  fit.measures = TRUE,
  standardized = TRUE,
  rsquare = TRUE)
lavaan 0.6.17 ended normally after 67 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        63
  Number of equality constraints                     9

  Number of observations per group:                   
    2                                              132
    1                                              127
  Number of missing patterns per group:               
    2                                               48
    1                                               52

Model Test User Model:
                                              Standard      Scaled
  Test Statistic                                78.011      75.195
  Degrees of freedom                                54          54
  P-value (Chi-square)                           0.018       0.030
  Scaling correction factor                                  1.037
    Yuan-Bentler correction (Mplus variant)                       
  Test statistic for each group:
    2                                           47.087      45.387
    1                                           30.924      29.808

Model Test Baseline Model:

  Test statistic                               604.129     571.412
  Degrees of freedom                                72          72
  P-value                                        0.000       0.000
  Scaling correction factor                                  1.057

User Model versus Baseline Model:

  Comparative Fit Index (CFI)                    0.955       0.958
  Tucker-Lewis Index (TLI)                       0.940       0.943
                                                                  
  Robust Comparative Fit Index (CFI)                         0.946
  Robust Tucker-Lewis Index (TLI)                            0.928

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)              -2689.503   -2689.503
  Scaling correction factor                                  0.939
      for the MLR correction                                      
  Loglikelihood unrestricted model (H1)      -2650.498   -2650.498
  Scaling correction factor                                  1.066
      for the MLR correction                                      
                                                                  
  Akaike (AIC)                                5487.007    5487.007
  Bayesian (BIC)                              5679.075    5679.075
  Sample-size adjusted Bayesian (SABIC)       5507.876    5507.876

Root Mean Square Error of Approximation:

  RMSEA                                          0.059       0.055
  90 Percent confidence interval - lower         0.025       0.019
  90 Percent confidence interval - upper         0.086       0.082
  P-value H_0: RMSEA <= 0.050                    0.296       0.368
  P-value H_0: RMSEA >= 0.080                    0.104       0.069
                                                                  
  Robust RMSEA                                               0.075
  90 Percent confidence interval - lower                     0.030
  90 Percent confidence interval - upper                     0.110
  P-value H_0: Robust RMSEA <= 0.050                         0.147
  P-value H_0: Robust RMSEA >= 0.080                         0.431

Standardized Root Mean Square Residual:

  SRMR                                           0.073       0.073

Parameter Estimates:

  Standard errors                             Sandwich
  Information bread                           Observed
  Observed information based on                Hessian


Group 1 [2]:

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  visual =~                                                             
    x1      (lmb1)    0.913    0.130    7.016    0.000    0.913    0.769
    x2      (lmb2)    0.539    0.097    5.557    0.000    0.539    0.450
    x3      (lmb3)    0.819    0.088    9.333    0.000    0.819    0.714
  textual =~                                                            
    x4      (lmb4)    0.950    0.090   10.553    0.000    0.950    0.811
    x5      (lmb5)    1.069    0.117    9.122    0.000    1.069    0.852
    x6      (lmb6)    0.824    0.092    8.926    0.000    0.824    0.832
  speed =~                                                              
    x7      (lmb7)    0.471    0.085    5.523    0.000    0.471    0.465
    x8      (lmb8)    0.636    0.108    5.897    0.000    0.636    0.679
    x9      (lmb9)    0.557    0.112    4.971    0.000    0.557    0.563

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  visual ~~                                                             
    textual           0.455    0.127    3.578    0.000    0.455    0.455
    speed             0.396    0.139    2.837    0.005    0.396    0.396
  textual ~~                                                            
    speed             0.318    0.115    2.773    0.006    0.318    0.318

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
    visual            0.000                               0.000    0.000
    textual           0.000                               0.000    0.000
    speed             0.000                               0.000    0.000
   .x1                4.974    0.113   44.143    0.000    4.974    4.189
   .x2                5.977    0.110   54.446    0.000    5.977    4.983
   .x3                2.375    0.106   22.343    0.000    2.375    2.069
   .x4                2.817    0.105   26.778    0.000    2.817    2.404
   .x5                4.022    0.119   33.914    0.000    4.022    3.206
   .x6                1.971    0.087   22.590    0.000    1.971    1.990
   .x7                4.362    0.093   46.812    0.000    4.362    4.303
   .x8                5.464    0.088   61.938    0.000    5.464    5.834
   .x9                5.402    0.094   57.768    0.000    5.402    5.456

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
    visual            1.000                               1.000    1.000
    textual           1.000                               1.000    1.000
    speed             1.000                               1.000    1.000
   .x1                0.576    0.179    3.221    0.001    0.576    0.409
   .x2                1.147    0.185    6.212    0.000    1.147    0.798
   .x3                0.646    0.134    4.811    0.000    0.646    0.491
   .x4                0.470    0.090    5.216    0.000    0.470    0.343
   .x5                0.431    0.109    3.941    0.000    0.431    0.274
   .x6                0.302    0.075    4.033    0.000    0.302    0.308
   .x7                0.806    0.122    6.590    0.000    0.806    0.784
   .x8                0.473    0.120    3.928    0.000    0.473    0.539
   .x9                0.670    0.149    4.495    0.000    0.670    0.683

R-Square:
                   Estimate
    x1                0.591
    x2                0.202
    x3                0.509
    x4                0.657
    x5                0.726
    x6                0.692
    x7                0.216
    x8                0.461
    x9                0.317


Group 2 [1]:

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  visual =~                                                             
    x1      (lmb1)    0.913    0.130    7.016    0.000    0.801    0.704
    x2      (lmb2)    0.539    0.097    5.557    0.000    0.473    0.435
    x3      (lmb3)    0.819    0.088    9.333    0.000    0.719    0.696
  textual =~                                                            
    x4      (lmb4)    0.950    0.090   10.553    0.000    0.914    0.836
    x5      (lmb5)    1.069    0.117    9.122    0.000    1.029    0.866
    x6      (lmb6)    0.824    0.092    8.926    0.000    0.793    0.772
  speed =~                                                              
    x7      (lmb7)    0.471    0.085    5.523    0.000    0.619    0.596
    x8      (lmb8)    0.636    0.108    5.897    0.000    0.835    0.783
    x9      (lmb9)    0.557    0.112    4.971    0.000    0.732    0.703

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  visual ~~                                                             
    textual           0.462    0.137    3.379    0.001    0.547    0.547
    speed             0.732    0.231    3.177    0.001    0.636    0.636
  textual ~~                                                            
    speed             0.473    0.241    1.966    0.049    0.375    0.375

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
    visual            0.000                               0.000    0.000
    textual           0.000                               0.000    0.000
    speed             0.000                               0.000    0.000
   .x1                4.881    0.105   46.577    0.000    4.881    4.289
   .x2                6.159    0.103   59.876    0.000    6.159    5.664
   .x3                1.993    0.098   20.368    0.000    1.993    1.932
   .x4                3.299    0.103   31.949    0.000    3.299    3.017
   .x5                4.674    0.105   44.381    0.000    4.674    3.932
   .x6                2.436    0.101   24.159    0.000    2.436    2.371
   .x7                4.002    0.097   41.140    0.000    4.002    3.855
   .x8                5.571    0.099   56.335    0.000    5.571    5.226
   .x9                5.375    0.098   54.583    0.000    5.375    5.165

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
    visual            0.770    0.210    3.663    0.000    1.000    1.000
    textual           0.926    0.228    4.058    0.000    1.000    1.000
    speed             1.724    0.512    3.364    0.001    1.000    1.000
   .x1                0.653    0.167    3.900    0.000    0.653    0.504
   .x2                0.958    0.186    5.165    0.000    0.958    0.811
   .x3                0.548    0.117    4.672    0.000    0.548    0.515
   .x4                0.360    0.089    4.046    0.000    0.360    0.301
   .x5                0.354    0.096    3.679    0.000    0.354    0.250
   .x6                0.427    0.103    4.149    0.000    0.427    0.404
   .x7                0.695    0.120    5.805    0.000    0.695    0.645
   .x8                0.439    0.204    2.149    0.032    0.439    0.387
   .x9                0.547    0.148    3.697    0.000    0.547    0.506

R-Square:
                   Estimate
    x1                0.496
    x2                0.189
    x3                0.485
    x4                0.699
    x5                0.750
    x6                0.596
    x7                0.355
    x8                0.613
    x9                0.494

16.10.5.5 Model Fit

You can specify the null model as the baseline model using: baseline.model = nullModelFit

Code
fitMeasures(metricInvarianceModel_fit)
                         npar                          fmin 
                       54.000                         0.151 
                        chisq                            df 
                       78.011                        54.000 
                       pvalue                  chisq.scaled 
                        0.018                        75.195 
                    df.scaled                 pvalue.scaled 
                       54.000                         0.030 
         chisq.scaling.factor                baseline.chisq 
                        1.037                       604.129 
                  baseline.df               baseline.pvalue 
                       72.000                         0.000 
        baseline.chisq.scaled            baseline.df.scaled 
                      571.412                        72.000 
       baseline.pvalue.scaled baseline.chisq.scaling.factor 
                        0.000                         1.057 
                          cfi                           tli 
                        0.955                         0.940 
                   cfi.scaled                    tli.scaled 
                        0.958                         0.943 
                   cfi.robust                    tli.robust 
                        0.946                         0.928 
                         nnfi                           rfi 
                        0.940                         0.828 
                          nfi                          pnfi 
                        0.871                         0.653 
                          ifi                           rni 
                        0.956                         0.955 
                  nnfi.scaled                    rfi.scaled 
                        0.943                         0.825 
                   nfi.scaled                   pnfi.scaled 
                        0.868                         0.651 
                   ifi.scaled                    rni.scaled 
                        0.959                         0.958 
                  nnfi.robust                    rni.robust 
                        0.928                         0.946 
                         logl             unrestricted.logl 
                    -2689.503                     -2650.498 
                          aic                           bic 
                     5487.007                      5679.075 
                       ntotal                          bic2 
                      259.000                      5507.876 
            scaling.factor.h1             scaling.factor.h0 
                        1.066                         0.939 
                        rmsea                rmsea.ci.lower 
                        0.059                         0.025 
               rmsea.ci.upper                rmsea.ci.level 
                        0.086                         0.900 
                 rmsea.pvalue                rmsea.close.h0 
                        0.296                         0.050 
        rmsea.notclose.pvalue             rmsea.notclose.h0 
                        0.104                         0.080 
                 rmsea.scaled         rmsea.ci.lower.scaled 
                        0.055                         0.019 
        rmsea.ci.upper.scaled           rmsea.pvalue.scaled 
                        0.082                         0.368 
 rmsea.notclose.pvalue.scaled                  rmsea.robust 
                        0.069                         0.075 
        rmsea.ci.lower.robust         rmsea.ci.upper.robust 
                        0.030                         0.110 
          rmsea.pvalue.robust  rmsea.notclose.pvalue.robust 
                        0.147                         0.431 
                          rmr                    rmr_nomean 
                        0.090                         0.099 
                         srmr                  srmr_bentler 
                        0.073                         0.073 
          srmr_bentler_nomean                          crmr 
                        0.080                         0.075 
                  crmr_nomean                    srmr_mplus 
                        0.084                         0.078 
            srmr_mplus_nomean                         cn_05 
                        0.078                       240.551 
                        cn_01                           gfi 
                      270.151                         0.995 
                         agfi                          pgfi 
                        0.990                         0.498 
                          mfi                          ecvi 
                        0.955                         0.718 
Code
metricInvarianceModelFitIndices <- fitMeasures(
  metricInvarianceModel_fit)[c(
    "cfi.robust", "rmsea.robust", "srmr")]

metricInvarianceModel_chisquare <- fitMeasures(
  metricInvarianceModel_fit)[c("chisq.scaled")]

metricInvarianceModel_chisquareScaling <- fitMeasures(
  metricInvarianceModel_fit)[c("chisq.scaling.factor")]

metricInvarianceModel_df <- fitMeasures(
  metricInvarianceModel_fit)[c("df.scaled")]

metricInvarianceModel_N <- lavInspect(
  metricInvarianceModel_fit, what = "ntotal")

16.10.5.6 Compare Model Fit

16.10.5.6.1 Nested Model (\(\chi^2\)) Difference Test

The configural invariance model and the metric (“weak factorial”) invariance model are considered “nested” models. The metric invariance model is nested within the configural invariance model because the configural invariance model includes all of the terms of the metric invariance model along with additional terms. Model fit of nested models can be compared with a chi-square difference test, also known as a likelihood ratio test or deviance test. A significant chi-square difference test would indicate that the simplified model with additional constraints (the metric invariance model) is significantly worse fitting than the more complex model that has fewer constraints (the configural invariance model).

Code
anova(configuralInvarianceModel_fit, metricInvarianceModel_fit)

The metric invariance model did not fit significantly worse than the configural invariance model, so metric invariance held. This provides evidence of measurement invariance of factor loadings across groups. Measurement invariance of factor loadings across groups provides support for examining whether the groups show different associations of the factor with other constructs (Little et al., 2007).

The petersenlab package (Petersen, 2024b) contains the satorraBentlerScaledChiSquareDifferenceTestStatistic() function that performs a Satorra-Bentler scaled chi-square difference test: Below is a Satorra-Bentler scaled chi-square difference test, where \(c0\) and \(c1\) are the scaling correction factor for the nested model and comparison model, respectively; \(d0\) and \(d1\) are the degrees of freedom of the nested model and comparison model, respectively; and \(T0\) and \(T1\) are the chi-square values of the nested model and comparison model, respectively.

Code
metricInvarianceModel_chisquareDiff <- 
  satorraBentlerScaledChiSquareDifferenceTestStatistic(
    T0 = metricInvarianceModel_chisquare,
    c0 = metricInvarianceModel_chisquareScaling,
    d0 = metricInvarianceModel_df,
    T1 = configuralInvarianceModel_chisquare,
    c1 = configuralInvarianceModel_chisquareScaling,
    d1 = configuralInvarianceModel_df)

metricInvarianceModel_chisquareDiff
chisq.scaled 
    5.146394 
16.10.5.6.2 Compare Other Fit Criteria
Code
round(metricInvarianceModelFitIndices - configuralInvarianceModelFitIndices, digits = 3)
  cfi.robust rmsea.robust         srmr 
      -0.009        0.003        0.007 
16.10.5.6.3 Score-Based Test

Score-based tests of measurement invariance are implemented using the strucchange package and are described by T. Wang et al. (2014).

Code
coef(metricInvarianceModel_fit)
  lambda.1_1   lambda.2_1   lambda.3_1   lambda.4_2   lambda.5_2   lambda.6_2 
       0.913        0.539        0.819        0.950        1.069        0.824 
  lambda.7_3   lambda.8_3   lambda.9_3      nu.1.g1      nu.2.g1      nu.3.g1 
       0.471        0.636        0.557        4.974        5.977        2.375 
     nu.4.g1      nu.5.g1      nu.6.g1      nu.7.g1      nu.8.g1      nu.9.g1 
       2.817        4.022        1.971        4.362        5.464        5.402 
theta.1_1.g1 theta.2_2.g1 theta.3_3.g1 theta.4_4.g1 theta.5_5.g1 theta.6_6.g1 
       0.576        1.147        0.646        0.470        0.431        0.302 
theta.7_7.g1 theta.8_8.g1 theta.9_9.g1   psi.2_1.g1   psi.3_1.g1   psi.3_2.g1 
       0.806        0.473        0.670        0.455        0.396        0.318 
  lambda.1_1   lambda.2_1   lambda.3_1   lambda.4_2   lambda.5_2   lambda.6_2 
       0.913        0.539        0.819        0.950        1.069        0.824 
  lambda.7_3   lambda.8_3   lambda.9_3      nu.1.g2      nu.2.g2      nu.3.g2 
       0.471        0.636        0.557        4.881        6.159        1.993 
     nu.4.g2      nu.5.g2      nu.6.g2      nu.7.g2      nu.8.g2      nu.9.g2 
       3.299        4.674        2.436        4.002        5.571        5.375 
theta.1_1.g2 theta.2_2.g2 theta.3_3.g2 theta.4_4.g2 theta.5_5.g2 theta.6_6.g2 
       0.653        0.958        0.548        0.360        0.354        0.427 
theta.7_7.g2 theta.8_8.g2 theta.9_9.g2   psi.1_1.g2   psi.2_2.g2   psi.3_3.g2 
       0.695        0.439        0.547        0.770        0.926        1.724 
  psi.2_1.g2   psi.3_1.g2   psi.3_2.g2 
       0.462        0.732        0.473 
Code
coef(metricInvarianceModel_fit)[1:9]
lambda.1_1 lambda.2_1 lambda.3_1 lambda.4_2 lambda.5_2 lambda.6_2 lambda.7_3 
 0.9131519  0.5394530  0.8190934  0.9499853  1.0692197  0.8238386  0.4712981 
lambda.8_3 lambda.9_3 
 0.6358889  0.5573554 
Code
sctest(
  metricInvarianceModel_fit,
  order.by = HolzingerSwineford1939$school,
  parm = 1:9,
  functional = "LMuo")
Error in `[<-`(`*tmp*`, wi, , value = -scores.H1 %*% Delta): subscript out of bounds

A score-based test and expected parameter change (EPC) estimates (Oberski, 2014; Oberski et al., 2015) are provided by the lavaan package (Rosseel et al., 2022).

Code
lavTestScore(
  metricInvarianceModel_fit,
  epc = TRUE
)
$test

total score test:

   test    X2 df p.value
1 score 5.672  9   0.772

$uni

univariate score tests:

   lhs op   rhs    X2 df p.value
1 .p1. == .p37. 0.772  1   0.380
2 .p2. == .p38. 0.050  1   0.823
3 .p3. == .p39. 1.108  1   0.293
4 .p4. == .p40. 0.996  1   0.318
5 .p5. == .p41. 4.389  1   0.036
6 .p6. == .p42. 1.382  1   0.240
7 .p7. == .p43. 0.012  1   0.914
8 .p8. == .p44. 0.052  1   0.819
9 .p9. == .p45. 0.108  1   0.742

$epc

expected parameter changes (epc) and expected parameter values (epv):

       lhs op     rhs block group free        label plabel   est    epc   epv
1   visual =~      x1     1     1    1   lambda.1_1   .p1. 0.913  0.070 0.983
2   visual =~      x2     1     1    2   lambda.2_1   .p2. 0.539  0.014 0.554
3   visual =~      x3     1     1    3   lambda.3_1   .p3. 0.819 -0.079 0.740
4  textual =~      x4     1     1    4   lambda.4_2   .p4. 0.950 -0.050 0.900
5  textual =~      x5     1     1    5   lambda.5_2   .p5. 1.069  0.093 1.162
6  textual =~      x6     1     1    6   lambda.6_2   .p6. 0.824 -0.046 0.778
7    speed =~      x7     1     1    7   lambda.7_3   .p7. 0.471 -0.007 0.464
8    speed =~      x8     1     1    8   lambda.8_3   .p8. 0.636 -0.022 0.614
9    speed =~      x9     1     1    9   lambda.9_3   .p9. 0.557  0.030 0.588
10      x1 ~1             1     1   10      nu.1.g1  .p10. 4.974  0.000 4.974
11      x2 ~1             1     1   11      nu.2.g1  .p11. 5.977  0.000 5.977
12      x3 ~1             1     1   12      nu.3.g1  .p12. 2.375  0.000 2.375
13      x4 ~1             1     1   13      nu.4.g1  .p13. 2.817  0.000 2.817
14      x5 ~1             1     1   14      nu.5.g1  .p14. 4.022  0.000 4.022
15      x6 ~1             1     1   15      nu.6.g1  .p15. 1.971  0.000 1.971
16      x7 ~1             1     1   16      nu.7.g1  .p16. 4.362  0.000 4.362
17      x8 ~1             1     1   17      nu.8.g1  .p17. 5.464  0.000 5.464
18      x9 ~1             1     1   18      nu.9.g1  .p18. 5.402  0.000 5.402
19      x1 ~~      x1     1     1   19 theta.1_1.g1  .p19. 0.576 -0.090 0.486
20      x2 ~~      x2     1     1   20 theta.2_2.g1  .p20. 1.147 -0.003 1.144
21      x3 ~~      x3     1     1   21 theta.3_3.g1  .p21. 0.646  0.080 0.727
22      x4 ~~      x4     1     1   22 theta.4_4.g1  .p22. 0.470  0.031 0.501
23      x5 ~~      x5     1     1   23 theta.5_5.g1  .p23. 0.431 -0.077 0.354
24      x6 ~~      x6     1     1   24 theta.6_6.g1  .p24. 0.302  0.029 0.331
25      x7 ~~      x7     1     1   25 theta.7_7.g1  .p25. 0.806  0.002 0.808
26      x8 ~~      x8     1     1   26 theta.8_8.g1  .p26. 0.473  0.021 0.494
27      x9 ~~      x9     1     1   27 theta.9_9.g1  .p27. 0.670 -0.022 0.648
28  visual ~1             1     1    0   alpha.1.g1  .p28. 0.000     NA    NA
29 textual ~1             1     1    0   alpha.2.g1  .p29. 0.000     NA    NA
30   speed ~1             1     1    0   alpha.3.g1  .p30. 0.000     NA    NA
31  visual ~~  visual     1     1    0   psi.1_1.g1  .p31. 1.000     NA    NA
32 textual ~~ textual     1     1    0   psi.2_2.g1  .p32. 1.000     NA    NA
33   speed ~~   speed     1     1    0   psi.3_3.g1  .p33. 1.000     NA    NA
34  visual ~~ textual     1     1   28   psi.2_1.g1  .p34. 0.455 -0.004 0.451
35  visual ~~   speed     1     1   29   psi.3_1.g1  .p35. 0.396 -0.001 0.395
36 textual ~~   speed     1     1   30   psi.3_2.g1  .p36. 0.318  0.001 0.319
37  visual =~      x1     2     2   31   lambda.1_1  .p37. 0.913 -0.058 0.855
38  visual =~      x2     2     2   32   lambda.2_1  .p38. 0.539 -0.018 0.521
39  visual =~      x3     2     2   33   lambda.3_1  .p39. 0.819  0.068 0.887
40 textual =~      x4     2     2   34   lambda.4_2  .p40. 0.950  0.052 1.002
41 textual =~      x5     2     2   35   lambda.5_2  .p41. 1.069 -0.095 0.974
42 textual =~      x6     2     2   36   lambda.6_2  .p42. 0.824  0.063 0.887
43   speed =~      x7     2     2   37   lambda.7_3  .p43. 0.471  0.003 0.474
44   speed =~      x8     2     2   38   lambda.8_3  .p44. 0.636  0.007 0.643
45   speed =~      x9     2     2   39   lambda.9_3  .p45. 0.557 -0.011 0.547
46      x1 ~1             2     2   40      nu.1.g2  .p46. 4.881  0.000 4.881
47      x2 ~1             2     2   41      nu.2.g2  .p47. 6.159  0.000 6.159
48      x3 ~1             2     2   42      nu.3.g2  .p48. 1.993  0.000 1.993
49      x4 ~1             2     2   43      nu.4.g2  .p49. 3.299  0.000 3.299
50      x5 ~1             2     2   44      nu.5.g2  .p50. 4.674  0.000 4.674
51      x6 ~1             2     2   45      nu.6.g2  .p51. 2.436  0.000 2.436
52      x7 ~1             2     2   46      nu.7.g2  .p52. 4.002  0.000 4.002
53      x8 ~1             2     2   47      nu.8.g2  .p53. 5.571  0.000 5.571
54      x9 ~1             2     2   48      nu.9.g2  .p54. 5.375  0.000 5.375
55      x1 ~~      x1     2     2   49 theta.1_1.g2  .p55. 0.653  0.046 0.699
56      x2 ~~      x2     2     2   50 theta.2_2.g2  .p56. 0.958  0.005 0.964
57      x3 ~~      x3     2     2   51 theta.3_3.g2  .p57. 0.548 -0.042 0.506
58      x4 ~~      x4     2     2   52 theta.4_4.g2  .p58. 0.360 -0.041 0.319
59      x5 ~~      x5     2     2   53 theta.5_5.g2  .p59. 0.354  0.079 0.433
60      x6 ~~      x6     2     2   54 theta.6_6.g2  .p60. 0.427 -0.029 0.398
61      x7 ~~      x7     2     2   55 theta.7_7.g2  .p61. 0.695 -0.001 0.694
62      x8 ~~      x8     2     2   56 theta.8_8.g2  .p62. 0.439 -0.009 0.430
63      x9 ~~      x9     2     2   57 theta.9_9.g2  .p63. 0.547  0.009 0.557
64  visual ~1             2     2    0   alpha.1.g2  .p64. 0.000     NA    NA
65 textual ~1             2     2    0   alpha.2.g2  .p65. 0.000     NA    NA
66   speed ~1             2     2    0   alpha.3.g2  .p66. 0.000     NA    NA
67  visual ~~  visual     2     2   58   psi.1_1.g2  .p67. 0.770 -0.004 0.766
68 textual ~~ textual     2     2   59   psi.2_2.g2  .p68. 0.926  0.000 0.926
69   speed ~~   speed     2     2   60   psi.3_3.g2  .p69. 1.724  0.000 1.724
70  visual ~~ textual     2     2   61   psi.2_1.g2  .p70. 0.462  0.000 0.461
71  visual ~~   speed     2     2   62   psi.3_1.g2  .p71. 0.732 -0.002 0.730
72 textual ~~   speed     2     2   63   psi.3_2.g2  .p72. 0.473  0.001 0.474
   sepc.lv sepc.all sepc.nox
1    0.070    0.059    0.059
2    0.014    0.012    0.012
3   -0.079   -0.069   -0.069
4   -0.050   -0.043   -0.043
5    0.093    0.074    0.074
6   -0.046   -0.047   -0.047
7   -0.007   -0.007   -0.007
8   -0.022   -0.023   -0.023
9    0.030    0.031    0.031
10   0.000    0.000    0.000
11   0.000    0.000    0.000
12   0.000    0.000    0.000
13   0.000    0.000    0.000
14   0.000    0.000    0.000
15   0.000    0.000    0.000
16   0.000    0.000    0.000
17   0.000    0.000    0.000
18   0.000    0.000    0.000
19  -0.576   -0.409   -0.409
20  -1.147   -0.798   -0.798
21   0.646    0.491    0.491
22   0.470    0.343    0.343
23  -0.431   -0.274   -0.274
24   0.302    0.308    0.308
25   0.806    0.784    0.784
26   0.473    0.539    0.539
27  -0.670   -0.683   -0.683
28      NA       NA       NA
29      NA       NA       NA
30      NA       NA       NA
31      NA       NA       NA
32      NA       NA       NA
33      NA       NA       NA
34  -0.004   -0.004   -0.004
35  -0.001   -0.001   -0.001
36   0.001    0.001    0.001
37  -0.051   -0.045   -0.045
38  -0.016   -0.015   -0.015
39   0.059    0.058    0.058
40   0.050    0.046    0.046
41  -0.091   -0.077   -0.077
42   0.061    0.059    0.059
43   0.004    0.004    0.004
44   0.009    0.008    0.008
45  -0.014   -0.013   -0.013
46   0.000    0.000    0.000
47   0.000    0.000    0.000
48   0.000    0.000    0.000
49   0.000    0.000    0.000
50   0.000    0.000    0.000
51   0.000    0.000    0.000
52   0.000    0.000    0.000
53   0.000    0.000    0.000
54   0.000    0.000    0.000
55   0.653    0.504    0.504
56   0.958    0.811    0.811
57  -0.548   -0.515   -0.515
58  -0.360   -0.301   -0.301
59   0.354    0.250    0.250
60  -0.427   -0.404   -0.404
61  -0.695   -0.645   -0.645
62  -0.439   -0.387   -0.387
63   0.547    0.506    0.506
64      NA       NA       NA
65      NA       NA       NA
66      NA       NA       NA
67  -1.000   -1.000   -1.000
68   1.000    1.000    1.000
69   1.000    1.000    1.000
70   0.000    0.000    0.000
71  -0.002   -0.002   -0.002
72   0.001    0.001    0.001
16.10.5.6.4 Equivalence Test

The petersenlab package (Petersen, 2024b) contains the equiv_chi() function from Counsell et al. (2020) that performs an equivalence test: https://osf.io/cqu8v. The chi-square equivalence test is non-significant, suggesting that the model fit is not acceptable.

Code
equiv_chi(
  alpha = .05,
  chi = metricInvarianceModel_chisquare,
  df = metricInvarianceModel_df,
  m = 2,
  N_sample = metricInvarianceModel_N,
  popRMSEA = .08)

Moreover, the equivalence test of the chi-square difference test is non-significant, suggesting that the degree of worsening of model fit is not acceptable. In other words, metric invariance failed.

Code
metricInvarianceModel_dfDiff <- 
  metricInvarianceModel_df - configuralInvarianceModel_df

equiv_chi(
  alpha = .05,
  chi = metricInvarianceModel_chisquareDiff,
  df = metricInvarianceModel_dfDiff,
  m = 2,
  N_sample = metricInvarianceModel_N,
  popRMSEA = .08)
16.10.5.6.5 Permutation Test

Permutation procedures for testing measurement invariance are described in (Jorgensen et al., 2018).

For reproducibility, I set the seed below. Using the same seed will yield the same answer every time. There is nothing special about this particular seed. You can specify the null model as the baseline model using: baseline.model = nullModelFit. Warning: this code takes a while to run based on \(100\) iterations. You can reduce the number of iterations to be faster.

Code
set.seed(52242)
metricInvarianceTest <- permuteMeasEq(
  nPermute = numPermutations,
  modelType = "mgcfa",
  con = metricInvarianceModel_fit,
  uncon = configuralInvarianceModel_fit,
  AFIs = myAFIs,
  moreAFIs = moreAFIs,
  parallelType = "multicore", #only 'snow' works on Windows, but right now, it is throwing an error
  iseed = 52242)
Code
RNGkind("default", "default", "default")
set.seed(52242)

metricInvarianceTest
Omnibus p value based on parametric chi-squared difference test:

Chisq diff    Df diff Pr(>Chisq) 
     5.146      6.000      0.525 


Omnibus p values based on nonparametric permutation method: 

             AFI.Difference p.value
chisq                 6.568    0.51
chisq.scaled          4.291    0.64
rmsea                -0.003    0.54
cfi                  -0.001    0.51
tli                   0.006    0.55
srmr                  0.007    0.47
rmsea.robust          0.003    0.12
cfi.robust           -0.009    0.13
tli.robust           -0.005    0.12

The p-values are non-significant, indicating that the model does not fit significantly worse than the configural invariance model. In other words, metric invariance held.

16.10.5.7 Internal Consistency Reliability

Internal consistency reliability of items composing the latent factors, as quantified by omega (\(\omega\)) and average variance extracted (AVE), was estimated using the semTools package (Jorgensen et al., 2021).

Code
compRelSEM(metricInvarianceModel_fit)
Code
AVE(metricInvarianceModel_fit)

16.10.5.8 Path Diagram

A path diagram of the model generated using the semPlot package (Epskamp, 2022) is below.

Code
semPaths(
  metricInvarianceModel_fit,
  what = "est",
  layout = "tree2",
  ask = FALSE,
  edge.label.cex = 1.2)
Metric Invariance Model in Confirmatory Factor Analysis.

Figure 16.55: Metric Invariance Model in Confirmatory Factor Analysis.

Metric Invariance Model in Confirmatory Factor Analysis.

Figure 16.56: Metric Invariance Model in Confirmatory Factor Analysis.

16.10.6 Scalar (“Strong Factorial”) Invariance Model

Specify invariance of factor loadings and intercepts across groups.

16.10.6.1 Model Syntax

Code
scalarInvarianceModel <- measEq.syntax(
  configural.model = cfaModel,
  data = HolzingerSwineford1939,
  ID.fac = "std.lv",
  group = "school",
  group.equal = c("loadings","intercepts"))
Code
cat(as.character(scalarInvarianceModel))
## LOADINGS:

visual =~ c(NA, NA)*x1 + c(lambda.1_1, lambda.1_1)*x1
visual =~ c(NA, NA)*x2 + c(lambda.2_1, lambda.2_1)*x2
visual =~ c(NA, NA)*x3 + c(lambda.3_1, lambda.3_1)*x3
textual =~ c(NA, NA)*x4 + c(lambda.4_2, lambda.4_2)*x4
textual =~ c(NA, NA)*x5 + c(lambda.5_2, lambda.5_2)*x5
textual =~ c(NA, NA)*x6 + c(lambda.6_2, lambda.6_2)*x6
speed =~ c(NA, NA)*x7 + c(lambda.7_3, lambda.7_3)*x7
speed =~ c(NA, NA)*x8 + c(lambda.8_3, lambda.8_3)*x8
speed =~ c(NA, NA)*x9 + c(lambda.9_3, lambda.9_3)*x9

## INTERCEPTS:

x1 ~ c(NA, NA)*1 + c(nu.1, nu.1)*1
x2 ~ c(NA, NA)*1 + c(nu.2, nu.2)*1
x3 ~ c(NA, NA)*1 + c(nu.3, nu.3)*1
x4 ~ c(NA, NA)*1 + c(nu.4, nu.4)*1
x5 ~ c(NA, NA)*1 + c(nu.5, nu.5)*1
x6 ~ c(NA, NA)*1 + c(nu.6, nu.6)*1
x7 ~ c(NA, NA)*1 + c(nu.7, nu.7)*1
x8 ~ c(NA, NA)*1 + c(nu.8, nu.8)*1
x9 ~ c(NA, NA)*1 + c(nu.9, nu.9)*1

## UNIQUE-FACTOR VARIANCES:

x1 ~~ c(NA, NA)*x1 + c(theta.1_1.g1, theta.1_1.g2)*x1
x2 ~~ c(NA, NA)*x2 + c(theta.2_2.g1, theta.2_2.g2)*x2
x3 ~~ c(NA, NA)*x3 + c(theta.3_3.g1, theta.3_3.g2)*x3
x4 ~~ c(NA, NA)*x4 + c(theta.4_4.g1, theta.4_4.g2)*x4
x5 ~~ c(NA, NA)*x5 + c(theta.5_5.g1, theta.5_5.g2)*x5
x6 ~~ c(NA, NA)*x6 + c(theta.6_6.g1, theta.6_6.g2)*x6
x7 ~~ c(NA, NA)*x7 + c(theta.7_7.g1, theta.7_7.g2)*x7
x8 ~~ c(NA, NA)*x8 + c(theta.8_8.g1, theta.8_8.g2)*x8
x9 ~~ c(NA, NA)*x9 + c(theta.9_9.g1, theta.9_9.g2)*x9

## LATENT MEANS/INTERCEPTS:

visual ~ c(0, NA)*1 + c(alpha.1.g1, alpha.1.g2)*1
textual ~ c(0, NA)*1 + c(alpha.2.g1, alpha.2.g2)*1
speed ~ c(0, NA)*1 + c(alpha.3.g1, alpha.3.g2)*1

## COMMON-FACTOR VARIANCES:

visual ~~ c(1, NA)*visual + c(psi.1_1.g1, psi.1_1.g2)*visual
textual ~~ c(1, NA)*textual + c(psi.2_2.g1, psi.2_2.g2)*textual
speed ~~ c(1, NA)*speed + c(psi.3_3.g1, psi.3_3.g2)*speed

## COMMON-FACTOR COVARIANCES:

visual ~~ c(NA, NA)*textual + c(psi.2_1.g1, psi.2_1.g2)*textual
visual ~~ c(NA, NA)*speed + c(psi.3_1.g1, psi.3_1.g2)*speed
textual ~~ c(NA, NA)*speed + c(psi.3_2.g1, psi.3_2.g2)*speed
Code
scalarInvarianceSyntax <- as.character(scalarInvarianceModel)
16.10.6.1.1 Summary of Model Features
Code
summary(scalarInvarianceModel)
This lavaan model syntax specifies a CFA with 9 manifest indicators of 3 common factor(s).

To identify the location and scale of each common factor, the factor means and variances were fixed to 0 and 1, respectively, unless equality constraints on measurement parameters allow them to be freed.

Pattern matrix indicating num(eric), ord(ered), and lat(ent) indicators per factor:

   visual textual speed
x1 num                 
x2 num                 
x3 num                 
x4        num          
x5        num          
x6        num          
x7                num  
x8                num  
x9                num  

The following types of parameter were constrained to equality across groups:

    loadings
    intercepts

16.10.6.2 Model Syntax in Table Form

Code
lavaanify(scalarInvarianceModel, ngroups = 2)
Code
lavaanify(cfaModel_scalarInvariance, ngroups = 2)

16.10.6.3 Fit the Model

Code
scalarInvarianceModel_fit <- cfa(
  scalarInvarianceSyntax,
  data = HolzingerSwineford1939,
  group = "school",
  missing = "ML",
  estimator = "MLR")
Code
scalarInvarianceModelFullSyntax_fit <- lavaan(
  cfaModel_scalarInvariance,
  data = HolzingerSwineford1939,
  group = "school",
  missing = "ML",
  estimator = "MLR")

16.10.6.4 Model Summary

Code
summary(
  scalarInvarianceModel_fit,
  fit.measures = TRUE,
  standardized = TRUE,
  rsquare = TRUE)
lavaan 0.6.17 ended normally after 67 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        66
  Number of equality constraints                    18

  Number of observations per group:                   
    2                                              132
    1                                              127
  Number of missing patterns per group:               
    2                                               48
    1                                               52

Model Test User Model:
                                              Standard      Scaled
  Test Statistic                                97.784      94.366
  Degrees of freedom                                60          60
  P-value (Chi-square)                           0.001       0.003
  Scaling correction factor                                  1.036
    Yuan-Bentler correction (Mplus variant)                       
  Test statistic for each group:
    2                                           57.155      55.157
    1                                           40.629      39.209

Model Test Baseline Model:

  Test statistic                               604.129     571.412
  Degrees of freedom                                72          72
  P-value                                        0.000       0.000
  Scaling correction factor                                  1.057

User Model versus Baseline Model:

  Comparative Fit Index (CFI)                    0.929       0.931
  Tucker-Lewis Index (TLI)                       0.915       0.917
                                                                  
  Robust Comparative Fit Index (CFI)                         0.928
  Robust Tucker-Lewis Index (TLI)                            0.914

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)              -2699.390   -2699.390
  Scaling correction factor                                  0.803
      for the MLR correction                                      
  Loglikelihood unrestricted model (H1)      -2650.498   -2650.498
  Scaling correction factor                                  1.066
      for the MLR correction                                      
                                                                  
  Akaike (AIC)                                5494.779    5494.779
  Bayesian (BIC)                              5665.507    5665.507
  Sample-size adjusted Bayesian (SABIC)       5513.330    5513.330

Root Mean Square Error of Approximation:

  RMSEA                                          0.070       0.067
  90 Percent confidence interval - lower         0.043       0.040
  90 Percent confidence interval - upper         0.094       0.091
  P-value H_0: RMSEA <= 0.050                    0.101       0.141
  P-value H_0: RMSEA >= 0.080                    0.260       0.193
                                                                  
  Robust RMSEA                                               0.082
  90 Percent confidence interval - lower                     0.046
  90 Percent confidence interval - upper                     0.114
  P-value H_0: Robust RMSEA <= 0.050                         0.070
  P-value H_0: Robust RMSEA >= 0.080                         0.561

Standardized Root Mean Square Residual:

  SRMR                                           0.079       0.079

Parameter Estimates:

  Standard errors                             Sandwich
  Information bread                           Observed
  Observed information based on                Hessian


Group 1 [2]:

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  visual =~                                                             
    x1      (l.1_)    0.900    0.131    6.892    0.000    0.900    0.760
    x2      (l.2_)    0.516    0.097    5.320    0.000    0.516    0.429
    x3      (l.3_)    0.835    0.090    9.300    0.000    0.835    0.718
  textual =~                                                            
    x4      (l.4_)    0.940    0.089   10.506    0.000    0.940    0.806
    x5      (l.5_)    1.083    0.118    9.182    0.000    1.083    0.858
    x6      (l.6_)    0.824    0.088    9.345    0.000    0.824    0.832
  speed =~                                                              
    x7      (l.7_)    0.460    0.094    4.903    0.000    0.460    0.447
    x8      (l.8_)    0.615    0.106    5.794    0.000    0.615    0.655
    x9      (l.9_)    0.571    0.116    4.933    0.000    0.571    0.578

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  visual ~~                                                             
    textul  (p.2_)    0.455    0.128    3.556    0.000    0.455    0.455
    speed  (p.3_1)    0.420    0.141    2.968    0.003    0.420    0.420
  textual ~~                                                            
    speed  (p.3_2)    0.316    0.115    2.742    0.006    0.316    0.316

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .x1      (nu.1)    5.017    0.106   47.479    0.000    5.017    4.235
   .x2      (nu.2)    6.127    0.093   65.843    0.000    6.127    5.095
   .x3      (nu.3)    2.258    0.103   21.837    0.000    2.258    1.941
   .x4      (nu.4)    2.790    0.098   28.357    0.000    2.790    2.392
   .x5      (nu.5)    4.043    0.116   34.780    0.000    4.043    3.203
   .x6      (nu.6)    1.971    0.080   24.757    0.000    1.971    1.989
   .x7      (nu.7)    4.191    0.080   52.551    0.000    4.191    4.069
   .x8      (nu.8)    5.543    0.081   68.124    0.000    5.543    5.909
   .x9      (nu.9)    5.411    0.086   63.043    0.000    5.411    5.475
    visual  (a.1.)    0.000                               0.000    0.000
    textual (a.2.)    0.000                               0.000    0.000
    speed   (a.3.)    0.000                               0.000    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .x1      (t.1_)    0.593    0.183    3.249    0.001    0.593    0.423
   .x2      (t.2_)    1.180    0.185    6.377    0.000    1.180    0.816
   .x3      (t.3_)    0.655    0.145    4.510    0.000    0.655    0.484
   .x4      (t.4_)    0.476    0.088    5.381    0.000    0.476    0.350
   .x5      (t.5_)    0.420    0.110    3.828    0.000    0.420    0.264
   .x6      (t.6_)    0.302    0.076    3.982    0.000    0.302    0.308
   .x7      (t.7_)    0.849    0.135    6.312    0.000    0.849    0.800
   .x8      (t.8_)    0.502    0.121    4.163    0.000    0.502    0.570
   .x9      (t.9_)    0.651    0.152    4.289    0.000    0.651    0.666
    visual  (p.1_)    1.000                               1.000    1.000
    textual (p.2_)    1.000                               1.000    1.000
    speed   (p.3_)    1.000                               1.000    1.000

R-Square:
                   Estimate
    x1                0.577
    x2                0.184
    x3                0.516
    x4                0.650
    x5                0.736
    x6                0.692
    x7                0.200
    x8                0.430
    x9                0.334


Group 2 [1]:

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  visual =~                                                             
    x1      (l.1_)    0.900    0.131    6.892    0.000    0.790    0.697
    x2      (l.2_)    0.516    0.097    5.320    0.000    0.452    0.415
    x3      (l.3_)    0.835    0.090    9.300    0.000    0.733    0.703
  textual =~                                                            
    x4      (l.4_)    0.940    0.089   10.506    0.000    0.901    0.828
    x5      (l.5_)    1.083    0.118    9.182    0.000    1.038    0.871
    x6      (l.6_)    0.824    0.088    9.345    0.000    0.789    0.769
  speed =~                                                              
    x7      (l.7_)    0.460    0.094    4.903    0.000    0.605    0.576
    x8      (l.8_)    0.615    0.106    5.794    0.000    0.808    0.758
    x9      (l.9_)    0.571    0.116    4.933    0.000    0.750    0.720

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  visual ~~                                                             
    textul  (p.2_)    0.460    0.136    3.377    0.001    0.547    0.547
    speed  (p.3_1)    0.749    0.234    3.205    0.001    0.650    0.650
  textual ~~                                                            
    speed  (p.3_2)    0.488    0.238    2.050    0.040    0.388    0.388

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .x1      (nu.1)    5.017    0.106   47.479    0.000    5.017    4.425
   .x2      (nu.2)    6.127    0.093   65.843    0.000    6.127    5.617
   .x3      (nu.3)    2.258    0.103   21.837    0.000    2.258    2.165
   .x4      (nu.4)    2.790    0.098   28.357    0.000    2.790    2.563
   .x5      (nu.5)    4.043    0.116   34.780    0.000    4.043    3.392
   .x6      (nu.6)    1.971    0.080   24.757    0.000    1.971    1.920
   .x7      (nu.7)    4.191    0.080   52.551    0.000    4.191    3.992
   .x8      (nu.8)    5.543    0.081   68.124    0.000    5.543    5.204
   .x9      (nu.9)    5.411    0.086   63.043    0.000    5.411    5.194
    visual  (a.1.)   -0.205    0.156   -1.314    0.189   -0.233   -0.233
    textual (a.2.)    0.565    0.149    3.785    0.000    0.590    0.590
    speed   (a.3.)   -0.075    0.190   -0.394    0.694   -0.057   -0.057

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .x1      (t.1_)    0.662    0.167    3.973    0.000    0.662    0.515
   .x2      (t.2_)    0.985    0.191    5.152    0.000    0.985    0.828
   .x3      (t.3_)    0.551    0.126    4.386    0.000    0.551    0.506
   .x4      (t.4_)    0.373    0.086    4.318    0.000    0.373    0.315
   .x5      (t.5_)    0.344    0.096    3.576    0.000    0.344    0.242
   .x6      (t.6_)    0.430    0.103    4.180    0.000    0.430    0.408
   .x7      (t.7_)    0.737    0.129    5.722    0.000    0.737    0.669
   .x8      (t.8_)    0.482    0.215    2.242    0.025    0.482    0.425
   .x9      (t.9_)    0.522    0.153    3.409    0.001    0.522    0.481
    visual  (p.1_)    0.770    0.209    3.678    0.000    1.000    1.000
    textual (p.2_)    0.918    0.231    3.977    0.000    1.000    1.000
    speed   (p.3_)    1.726    0.524    3.296    0.001    1.000    1.000

R-Square:
                   Estimate
    x1                0.485
    x2                0.172
    x3                0.494
    x4                0.685
    x5                0.758
    x6                0.592
    x7                0.331
    x8                0.575
    x9                0.519
Code
summary(
  scalarInvarianceModelFullSyntax_fit,
  fit.measures = TRUE,
  standardized = TRUE,
  rsquare = TRUE)
lavaan 0.6.17 ended normally after 62 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        66
  Number of equality constraints                    18

  Number of observations per group:                   
    2                                              132
    1                                              127
  Number of missing patterns per group:               
    2                                               48
    1                                               52

Model Test User Model:
                                              Standard      Scaled
  Test Statistic                                97.784      94.366
  Degrees of freedom                                60          60
  P-value (Chi-square)                           0.001       0.003
  Scaling correction factor                                  1.036
    Yuan-Bentler correction (Mplus variant)                       
  Test statistic for each group:
    2                                           57.155      55.157
    1                                           40.629      39.209

Model Test Baseline Model:

  Test statistic                               604.129     571.412
  Degrees of freedom                                72          72
  P-value                                        0.000       0.000
  Scaling correction factor                                  1.057

User Model versus Baseline Model:

  Comparative Fit Index (CFI)                    0.929       0.931
  Tucker-Lewis Index (TLI)                       0.915       0.917
                                                                  
  Robust Comparative Fit Index (CFI)                         0.928
  Robust Tucker-Lewis Index (TLI)                            0.914

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)              -2699.390   -2699.390
  Scaling correction factor                                  0.803
      for the MLR correction                                      
  Loglikelihood unrestricted model (H1)      -2650.498   -2650.498
  Scaling correction factor                                  1.066
      for the MLR correction                                      
                                                                  
  Akaike (AIC)                                5494.779    5494.779
  Bayesian (BIC)                              5665.507    5665.507
  Sample-size adjusted Bayesian (SABIC)       5513.330    5513.330

Root Mean Square Error of Approximation:

  RMSEA                                          0.070       0.067
  90 Percent confidence interval - lower         0.043       0.040
  90 Percent confidence interval - upper         0.094       0.091
  P-value H_0: RMSEA <= 0.050                    0.101       0.141
  P-value H_0: RMSEA >= 0.080                    0.260       0.193
                                                                  
  Robust RMSEA                                               0.082
  90 Percent confidence interval - lower                     0.046
  90 Percent confidence interval - upper                     0.114
  P-value H_0: Robust RMSEA <= 0.050                         0.070
  P-value H_0: Robust RMSEA >= 0.080                         0.561

Standardized Root Mean Square Residual:

  SRMR                                           0.079       0.079

Parameter Estimates:

  Standard errors                             Sandwich
  Information bread                           Observed
  Observed information based on                Hessian


Group 1 [2]:

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  visual =~                                                             
    x1      (lmb1)    0.900    0.131    6.892    0.000    0.900    0.760
    x2      (lmb2)    0.516    0.097    5.320    0.000    0.516    0.429
    x3      (lmb3)    0.835    0.090    9.300    0.000    0.835    0.718
  textual =~                                                            
    x4      (lmb4)    0.940    0.089   10.506    0.000    0.940    0.806
    x5      (lmb5)    1.083    0.118    9.182    0.000    1.083    0.858
    x6      (lmb6)    0.824    0.088    9.345    0.000    0.824    0.832
  speed =~                                                              
    x7      (lmb7)    0.460    0.094    4.903    0.000    0.460    0.447
    x8      (lmb8)    0.615    0.106    5.794    0.000    0.615    0.655
    x9      (lmb9)    0.571    0.116    4.934    0.000    0.571    0.578

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  visual ~~                                                             
    textual           0.455    0.128    3.557    0.000    0.455    0.455
    speed             0.420    0.141    2.968    0.003    0.420    0.420
  textual ~~                                                            
    speed             0.316    0.115    2.742    0.006    0.316    0.316

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
    visual            0.000                               0.000    0.000
    textual           0.000                               0.000    0.000
    speed             0.000                               0.000    0.000
   .x1      (int1)    5.017    0.106   47.479    0.000    5.017    4.235
   .x2      (int2)    6.127    0.093   65.843    0.000    6.127    5.095
   .x3      (int3)    2.258    0.103   21.837    0.000    2.258    1.941
   .x4      (int4)    2.790    0.098   28.357    0.000    2.790    2.392
   .x5      (int5)    4.043    0.116   34.780    0.000    4.043    3.203
   .x6      (int6)    1.971    0.080   24.757    0.000    1.971    1.989
   .x7      (int7)    4.191    0.080   52.551    0.000    4.191    4.069
   .x8      (int8)    5.543    0.081   68.124    0.000    5.543    5.909
   .x9      (int9)    5.411    0.086   63.043    0.000    5.411    5.475

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
    visual            1.000                               1.000    1.000
    textual           1.000                               1.000    1.000
    speed             1.000                               1.000    1.000
   .x1                0.593    0.183    3.249    0.001    0.593    0.423
   .x2                1.180    0.185    6.377    0.000    1.180    0.816
   .x3                0.655    0.145    4.510    0.000    0.655    0.484
   .x4                0.476    0.088    5.381    0.000    0.476    0.350
   .x5                0.420    0.110    3.828    0.000    0.420    0.264
   .x6                0.302    0.076    3.982    0.000    0.302    0.308
   .x7                0.849    0.135    6.312    0.000    0.849    0.800
   .x8                0.502    0.121    4.163    0.000    0.502    0.570
   .x9                0.651    0.152    4.289    0.000    0.651    0.666

R-Square:
                   Estimate
    x1                0.577
    x2                0.184
    x3                0.516
    x4                0.650
    x5                0.736
    x6                0.692
    x7                0.200
    x8                0.430
    x9                0.334


Group 2 [1]:

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  visual =~                                                             
    x1      (lmb1)    0.900    0.131    6.892    0.000    0.790    0.697
    x2      (lmb2)    0.516    0.097    5.320    0.000    0.452    0.415
    x3      (lmb3)    0.835    0.090    9.300    0.000    0.733    0.703
  textual =~                                                            
    x4      (lmb4)    0.940    0.089   10.506    0.000    0.901    0.828
    x5      (lmb5)    1.083    0.118    9.182    0.000    1.038    0.871
    x6      (lmb6)    0.824    0.088    9.345    0.000    0.789    0.769
  speed =~                                                              
    x7      (lmb7)    0.460    0.094    4.903    0.000    0.605    0.576
    x8      (lmb8)    0.615    0.106    5.794    0.000    0.808    0.758
    x9      (lmb9)    0.571    0.116    4.934    0.000    0.750    0.720

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  visual ~~                                                             
    textual           0.460    0.136    3.377    0.001    0.547    0.547
    speed             0.749    0.234    3.205    0.001    0.650    0.650
  textual ~~                                                            
    speed             0.488    0.238    2.050    0.040    0.388    0.388

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
    visual           -0.205    0.156   -1.314    0.189   -0.233   -0.233
    textual           0.565    0.149    3.785    0.000    0.590    0.590
    speed            -0.075    0.190   -0.394    0.694   -0.057   -0.057
   .x1      (int1)    5.017    0.106   47.479    0.000    5.017    4.425
   .x2      (int2)    6.127    0.093   65.843    0.000    6.127    5.617
   .x3      (int3)    2.258    0.103   21.837    0.000    2.258    2.165
   .x4      (int4)    2.790    0.098   28.357    0.000    2.790    2.563
   .x5      (int5)    4.043    0.116   34.780    0.000    4.043    3.392
   .x6      (int6)    1.971    0.080   24.757    0.000    1.971    1.920
   .x7      (int7)    4.191    0.080   52.551    0.000    4.191    3.992
   .x8      (int8)    5.543    0.081   68.124    0.000    5.543    5.204
   .x9      (int9)    5.411    0.086   63.043    0.000    5.411    5.194

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
    visual            0.770    0.209    3.678    0.000    1.000    1.000
    textual           0.918    0.231    3.977    0.000    1.000    1.000
    speed             1.726    0.524    3.296    0.001    1.000    1.000
   .x1                0.662    0.167    3.973    0.000    0.662    0.515
   .x2                0.985    0.191    5.152    0.000    0.985    0.828
   .x3                0.551    0.126    4.386    0.000    0.551    0.506
   .x4                0.373    0.086    4.318    0.000    0.373    0.315
   .x5                0.344    0.096    3.576    0.000    0.344    0.242
   .x6                0.430    0.103    4.180    0.000    0.430    0.408
   .x7                0.737    0.129    5.722    0.000    0.737    0.669
   .x8                0.482    0.215    2.242    0.025    0.482    0.425
   .x9                0.522    0.153    3.409    0.001    0.522    0.481

R-Square:
                   Estimate
    x1                0.485
    x2                0.172
    x3                0.494
    x4                0.685
    x5                0.758
    x6                0.592
    x7                0.331
    x8                0.575
    x9                0.519

16.10.6.5 Model Fit

You can specify the null model as the baseline model using: baseline.model = nullModelFit

Code
fitMeasures(scalarInvarianceModel_fit)
                         npar                          fmin 
                       48.000                         0.189 
                        chisq                            df 
                       97.784                        60.000 
                       pvalue                  chisq.scaled 
                        0.001                        94.366 
                    df.scaled                 pvalue.scaled 
                       60.000                         0.003 
         chisq.scaling.factor                baseline.chisq 
                        1.036                       604.129 
                  baseline.df               baseline.pvalue 
                       72.000                         0.000 
        baseline.chisq.scaled            baseline.df.scaled 
                      571.412                        72.000 
       baseline.pvalue.scaled baseline.chisq.scaling.factor 
                        0.000                         1.057 
                          cfi                           tli 
                        0.929                         0.915 
                   cfi.scaled                    tli.scaled 
                        0.931                         0.917 
                   cfi.robust                    tli.robust 
                        0.928                         0.914 
                         nnfi                           rfi 
                        0.915                         0.806 
                          nfi                          pnfi 
                        0.838                         0.698 
                          ifi                           rni 
                        0.931                         0.929 
                  nnfi.scaled                    rfi.scaled 
                        0.917                         0.802 
                   nfi.scaled                   pnfi.scaled 
                        0.835                         0.696 
                   ifi.scaled                    rni.scaled 
                        0.933                         0.931 
                  nnfi.robust                    rni.robust 
                        0.914                         0.928 
                         logl             unrestricted.logl 
                    -2699.390                     -2650.498 
                          aic                           bic 
                     5494.779                      5665.507 
                       ntotal                          bic2 
                      259.000                      5513.330 
            scaling.factor.h1             scaling.factor.h0 
                        1.066                         0.803 
                        rmsea                rmsea.ci.lower 
                        0.070                         0.043 
               rmsea.ci.upper                rmsea.ci.level 
                        0.094                         0.900 
                 rmsea.pvalue                rmsea.close.h0 
                        0.101                         0.050 
        rmsea.notclose.pvalue             rmsea.notclose.h0 
                        0.260                         0.080 
                 rmsea.scaled         rmsea.ci.lower.scaled 
                        0.067                         0.040 
        rmsea.ci.upper.scaled           rmsea.pvalue.scaled 
                        0.091                         0.141 
 rmsea.notclose.pvalue.scaled                  rmsea.robust 
                        0.193                         0.082 
        rmsea.ci.lower.robust         rmsea.ci.upper.robust 
                        0.046                         0.114 
          rmsea.pvalue.robust  rmsea.notclose.pvalue.robust 
                        0.070                         0.561 
                          rmr                    rmr_nomean 
                        0.095                         0.097 
                         srmr                  srmr_bentler 
                        0.079                         0.079 
          srmr_bentler_nomean                          crmr 
                        0.079                         0.081 
                  crmr_nomean                    srmr_mplus 
                        0.082                         0.086 
            srmr_mplus_nomean                         cn_05 
                        0.077                       210.464 
                        cn_01                           gfi 
                      235.091                         0.994 
                         agfi                          pgfi 
                        0.990                         0.552 
                          mfi                          ecvi 
                        0.930                         0.748 
Code
scalarInvarianceModelFitIndices <- fitMeasures(
  scalarInvarianceModel_fit)[c(
    "cfi.robust", "rmsea.robust", "srmr")]

scalarInvarianceModel_chisquare <- fitMeasures(
  scalarInvarianceModel_fit)[c("chisq.scaled")]

scalarInvarianceModel_chisquareScaling <- fitMeasures(
  scalarInvarianceModel_fit)[c("chisq.scaling.factor")]

scalarInvarianceModel_df <- fitMeasures(
  scalarInvarianceModel_fit)[c("df.scaled")]

scalarInvarianceModel_N <- lavInspect(
  scalarInvarianceModel_fit, what = "ntotal")

16.10.6.6 Compare Model Fit

16.10.6.6.1 Nested Model (\(\chi^2\)) Difference Test

The metric invariance model and the scalar (“strong factorial”) invariance model are considered “nested” models. The scalar invariance model is nested within the metric invariance model because the metric invariance model includes all of the terms of the metric invariance model along with additional terms. Model fit of nested models can be compared with a chi-square difference test, also known as a likelihood ratio test or deviance test. A significant chi-square difference test would indicate that the simplified model with additional constraints (the scalar invariance model) is significantly worse fitting than the more complex model that has fewer constraints (the metric invariance model).

Code
anova(metricInvarianceModel_fit, scalarInvarianceModel_fit)

The scalar invariance model fit significantly worse than the configural invariance model, so scalar invariance did not hold. This provides evidence of measurement non-invariance of indicator intercepts across groups. Measurement non-invariance of indicator intercepts poses challenges to being able to meaningfully compare levels on the latent factor across groups (Little et al., 2007).

The petersenlab package (Petersen, 2024b) contains the satorraBentlerScaledChiSquareDifferenceTestStatistic() function that performs a Satorra-Bentler scaled chi-square difference test:

Code
scalarInvarianceModel_chisquareDiff <- 
  satorraBentlerScaledChiSquareDifferenceTestStatistic(
    T0 = scalarInvarianceModel_chisquare,
    c0 = scalarInvarianceModel_chisquareScaling,
    d0 = scalarInvarianceModel_df,
    T1 = metricInvarianceModel_chisquare,
    c1 = metricInvarianceModel_chisquareScaling,
    d1 = metricInvarianceModel_df)

scalarInvarianceModel_chisquareDiff
chisq.scaled 
    19.28849 
16.10.6.6.2 Compare Other Fit Criteria
Code
round(
  scalarInvarianceModelFitIndices - metricInvarianceModelFitIndices,
  digits = 3)
  cfi.robust rmsea.robust         srmr 
      -0.018        0.007        0.006 
16.10.6.6.3 Score-Based Test

Score-based tests of measurement invariance are implemented using the strucchange package and are described by T. Wang et al. (2014).

Code
coef(scalarInvarianceModel_fit)
  lambda.1_1   lambda.2_1   lambda.3_1   lambda.4_2   lambda.5_2   lambda.6_2 
       0.900        0.516        0.835        0.940        1.083        0.824 
  lambda.7_3   lambda.8_3   lambda.9_3         nu.1         nu.2         nu.3 
       0.460        0.615        0.571        5.017        6.127        2.258 
        nu.4         nu.5         nu.6         nu.7         nu.8         nu.9 
       2.790        4.043        1.971        4.191        5.543        5.411 
theta.1_1.g1 theta.2_2.g1 theta.3_3.g1 theta.4_4.g1 theta.5_5.g1 theta.6_6.g1 
       0.593        1.180        0.655        0.476        0.420        0.302 
theta.7_7.g1 theta.8_8.g1 theta.9_9.g1   psi.2_1.g1   psi.3_1.g1   psi.3_2.g1 
       0.849        0.502        0.651        0.455        0.420        0.316 
  lambda.1_1   lambda.2_1   lambda.3_1   lambda.4_2   lambda.5_2   lambda.6_2 
       0.900        0.516        0.835        0.940        1.083        0.824 
  lambda.7_3   lambda.8_3   lambda.9_3         nu.1         nu.2         nu.3 
       0.460        0.615        0.571        5.017        6.127        2.258 
        nu.4         nu.5         nu.6         nu.7         nu.8         nu.9 
       2.790        4.043        1.971        4.191        5.543        5.411 
theta.1_1.g2 theta.2_2.g2 theta.3_3.g2 theta.4_4.g2 theta.5_5.g2 theta.6_6.g2 
       0.662        0.985        0.551        0.373        0.344        0.430 
theta.7_7.g2 theta.8_8.g2 theta.9_9.g2   alpha.1.g2   alpha.2.g2   alpha.3.g2 
       0.737        0.482        0.522       -0.205        0.565       -0.075 
  psi.1_1.g2   psi.2_2.g2   psi.3_3.g2   psi.2_1.g2   psi.3_1.g2   psi.3_2.g2 
       0.770        0.918        1.726        0.460        0.749        0.488 
Code
coef(scalarInvarianceModel_fit)[10:18]
    nu.1     nu.2     nu.3     nu.4     nu.5     nu.6     nu.7     nu.8 
5.017046 6.127367 2.258070 2.789714 4.042892 1.970548 4.191389 5.543168 
    nu.9 
5.410889 
Code
sctest(
  scalarInvarianceModel_fit,
  order.by = HolzingerSwineford1939$school,
  parm = 10:18,
  functional = "LMuo")
Error in `[<-`(`*tmp*`, wi, , value = -scores.H1 %*% Delta): subscript out of bounds

A score-based test and expected parameter change (EPC) estimates (Oberski, 2014; Oberski et al., 2015) are provided by the lavaan package (Rosseel et al., 2022).

Code
lavTestScore(
  scalarInvarianceModel_fit,
  epc = TRUE
)
$test

total score test:

   test     X2 df p.value
1 score 25.395 18   0.114

$uni

univariate score tests:

     lhs op   rhs    X2 df p.value
1   .p1. == .p37. 1.089  1   0.297
2   .p2. == .p38. 0.323  1   0.570
3   .p3. == .p39. 2.160  1   0.142
4   .p4. == .p40. 0.548  1   0.459
5   .p5. == .p41. 3.142  1   0.076
6   .p6. == .p42. 1.243  1   0.265
7   .p7. == .p43. 0.147  1   0.702
8   .p8. == .p44. 0.009  1   0.925
9   .p9. == .p45. 0.202  1   0.653
10 .p10. == .p46. 1.340  1   0.247
11 .p11. == .p47. 4.970  1   0.026
12 .p12. == .p48. 6.462  1   0.011
13 .p13. == .p49. 0.449  1   0.503
14 .p14. == .p50. 0.400  1   0.527
15 .p15. == .p51. 0.000  1   0.986
16 .p16. == .p52. 9.223  1   0.002
17 .p17. == .p53. 4.666  1   0.031
18 .p18. == .p54. 0.030  1   0.862

$epc

expected parameter changes (epc) and expected parameter values (epv):

       lhs op     rhs block group free        label plabel    est    epc    epv
1   visual =~      x1     1     1    1   lambda.1_1   .p1.  0.900  0.088  0.988
2   visual =~      x2     1     1    2   lambda.2_1   .p2.  0.516  0.043  0.559
3   visual =~      x3     1     1    3   lambda.3_1   .p3.  0.835 -0.111  0.724
4  textual =~      x4     1     1    4   lambda.4_2   .p4.  0.940 -0.040  0.900
5  textual =~      x5     1     1    5   lambda.5_2   .p5.  1.083  0.082  1.166
6  textual =~      x6     1     1    6   lambda.6_2   .p6.  0.824 -0.046  0.778
7    speed =~      x7     1     1    7   lambda.7_3   .p7.  0.460 -0.035  0.425
8    speed =~      x8     1     1    8   lambda.8_3   .p8.  0.615 -0.012  0.602
9    speed =~      x9     1     1    9   lambda.9_3   .p9.  0.571  0.038  0.609
10      x1 ~1             1     1   10         nu.1  .p10.  5.017 -0.045  4.973
11      x2 ~1             1     1   11         nu.2  .p11.  6.127 -0.151  5.976
12      x3 ~1             1     1   12         nu.3  .p12.  2.258  0.116  2.374
13      x4 ~1             1     1   13         nu.4  .p13.  2.790  0.027  2.817
14      x5 ~1             1     1   14         nu.5  .p14.  4.043 -0.021  4.022
15      x6 ~1             1     1   15         nu.6  .p15.  1.971  0.001  1.971
16      x7 ~1             1     1   16         nu.7  .p16.  4.191  0.171  4.363
17      x8 ~1             1     1   17         nu.8  .p17.  5.543 -0.079  5.464
18      x9 ~1             1     1   18         nu.9  .p18.  5.411 -0.009  5.402
19      x1 ~~      x1     1     1   19 theta.1_1.g1  .p19.  0.593 -0.114  0.480
20      x2 ~~      x2     1     1   20 theta.2_2.g1  .p20.  1.180 -0.014  1.166
21      x3 ~~      x3     1     1   21 theta.3_3.g1  .p21.  0.655  0.116  0.771
22      x4 ~~      x4     1     1   22 theta.4_4.g1  .p22.  0.476  0.025  0.501
23      x5 ~~      x5     1     1   23 theta.5_5.g1  .p23.  0.420 -0.071  0.349
24      x6 ~~      x6     1     1   24 theta.6_6.g1  .p24.  0.302  0.029  0.331
25      x7 ~~      x7     1     1   25 theta.7_7.g1  .p25.  0.849  0.016  0.866
26      x8 ~~      x8     1     1   26 theta.8_8.g1  .p26.  0.502  0.013  0.515
27      x9 ~~      x9     1     1   27 theta.9_9.g1  .p27.  0.651 -0.028  0.623
28  visual ~1             1     1    0   alpha.1.g1  .p28.  0.000     NA     NA
29 textual ~1             1     1    0   alpha.2.g1  .p29.  0.000     NA     NA
30   speed ~1             1     1    0   alpha.3.g1  .p30.  0.000     NA     NA
31  visual ~~  visual     1     1    0   psi.1_1.g1  .p31.  1.000     NA     NA
32 textual ~~ textual     1     1    0   psi.2_2.g1  .p32.  1.000     NA     NA
33   speed ~~   speed     1     1    0   psi.3_3.g1  .p33.  1.000     NA     NA
34  visual ~~ textual     1     1   28   psi.2_1.g1  .p34.  0.455 -0.004  0.451
35  visual ~~   speed     1     1   29   psi.3_1.g1  .p35.  0.420 -0.002  0.417
36 textual ~~   speed     1     1   30   psi.3_2.g1  .p36.  0.316  0.000  0.315
37  visual =~      x1     2     2   31   lambda.1_1  .p37.  0.900 -0.044  0.856
38  visual =~      x2     2     2   32   lambda.2_1  .p38.  0.516  0.003  0.519
39  visual =~      x3     2     2   33   lambda.3_1  .p39.  0.835  0.041  0.876
40 textual =~      x4     2     2   34   lambda.4_2  .p40.  0.940  0.065  1.006
41 textual =~      x5     2     2   35   lambda.5_2  .p41.  1.083 -0.106  0.977
42 textual =~      x6     2     2   36   lambda.6_2  .p42.  0.824  0.067  0.891
43   speed =~      x7     2     2   37   lambda.7_3  .p43.  0.460  0.003  0.463
44   speed =~      x8     2     2   38   lambda.8_3  .p44.  0.615  0.011  0.626
45   speed =~      x9     2     2   39   lambda.9_3  .p45.  0.571 -0.014  0.557
46      x1 ~1             2     2   40         nu.1  .p46.  5.017  0.027  5.044
47      x2 ~1             2     2   41         nu.2  .p47.  6.127  0.132  6.259
48      x3 ~1             2     2   42         nu.3  .p48.  2.258 -0.096  2.162
49      x4 ~1             2     2   43         nu.4  .p49.  2.790 -0.059  2.731
50      x5 ~1             2     2   44         nu.5  .p50.  4.043  0.079  4.122
51      x6 ~1             2     2   45         nu.6  .p51.  1.971 -0.037  1.933
52      x7 ~1             2     2   46         nu.7  .p52.  4.191 -0.150  4.041
53      x8 ~1             2     2   47         nu.8  .p53.  5.543  0.082  5.625
54      x9 ~1             2     2   48         nu.9  .p54.  5.411  0.013  5.424
55      x1 ~~      x1     2     2   49 theta.1_1.g2  .p55.  0.662  0.033  0.694
56      x2 ~~      x2     2     2   50 theta.2_2.g2  .p56.  0.985 -0.001  0.985
57      x3 ~~      x3     2     2   51 theta.3_3.g2  .p57.  0.551 -0.027  0.523
58      x4 ~~      x4     2     2   52 theta.4_4.g2  .p58.  0.373 -0.049  0.324
59      x5 ~~      x5     2     2   53 theta.5_5.g2  .p59.  0.344  0.093  0.437
60      x6 ~~      x6     2     2   54 theta.6_6.g2  .p60.  0.430 -0.032  0.398
61      x7 ~~      x7     2     2   55 theta.7_7.g2  .p61.  0.737 -0.001  0.736
62      x8 ~~      x8     2     2   56 theta.8_8.g2  .p62.  0.482 -0.013  0.469
63      x9 ~~      x9     2     2   57 theta.9_9.g2  .p63.  0.522  0.014  0.536
64  visual ~1             2     2   58   alpha.1.g2  .p64. -0.205  0.012 -0.193
65 textual ~1             2     2   59   alpha.2.g2  .p65.  0.565  0.000  0.565
66   speed ~1             2     2   60   alpha.3.g2  .p66. -0.075 -0.011 -0.086
67  visual ~~  visual     2     2   61   psi.1_1.g2  .p67.  0.770 -0.001  0.769
68 textual ~~ textual     2     2   62   psi.2_2.g2  .p68.  0.918  0.000  0.918
69   speed ~~   speed     2     2   63   psi.3_3.g2  .p69.  1.726  0.000  1.726
70  visual ~~ textual     2     2   64   psi.2_1.g2  .p70.  0.460  0.001  0.461
71  visual ~~   speed     2     2   65   psi.3_1.g2  .p71.  0.749 -0.001  0.748
72 textual ~~   speed     2     2   66   psi.3_2.g2  .p72.  0.488  0.002  0.489
   sepc.lv sepc.all sepc.nox
1    0.088    0.074    0.074
2    0.043    0.036    0.036
3   -0.111   -0.095   -0.095
4   -0.040   -0.034   -0.034
5    0.082    0.065    0.065
6   -0.046   -0.047   -0.047
7   -0.035   -0.034   -0.034
8   -0.012   -0.013   -0.013
9    0.038    0.039    0.039
10  -0.045   -0.038   -0.038
11  -0.151   -0.126   -0.126
12   0.116    0.100    0.100
13   0.027    0.023    0.023
14  -0.021   -0.016   -0.016
15   0.001    0.001    0.001
16   0.171    0.166    0.166
17  -0.079   -0.085   -0.085
18  -0.009   -0.009   -0.009
19  -0.593   -0.423   -0.423
20  -1.180   -0.816   -0.816
21   0.655    0.484    0.484
22   0.476    0.350    0.350
23  -0.420   -0.264   -0.264
24   0.302    0.308    0.308
25   0.849    0.800    0.800
26   0.502    0.570    0.570
27  -0.651   -0.666   -0.666
28      NA       NA       NA
29      NA       NA       NA
30      NA       NA       NA
31      NA       NA       NA
32      NA       NA       NA
33      NA       NA       NA
34  -0.004   -0.004   -0.004
35  -0.002   -0.002   -0.002
36   0.000    0.000    0.000
37  -0.039   -0.034   -0.034
38   0.003    0.002    0.002
39   0.036    0.035    0.035
40   0.063    0.058    0.058
41  -0.102   -0.085   -0.085
42   0.064    0.062    0.062
43   0.004    0.004    0.004
44   0.014    0.013    0.013
45  -0.018   -0.017   -0.017
46   0.027    0.024    0.024
47   0.132    0.121    0.121
48  -0.096   -0.092   -0.092
49  -0.059   -0.054   -0.054
50   0.079    0.066    0.066
51  -0.037   -0.036   -0.036
52  -0.150   -0.143   -0.143
53   0.082    0.077    0.077
54   0.013    0.012    0.012
55   0.662    0.515    0.515
56  -0.985   -0.828   -0.828
57  -0.551   -0.506   -0.506
58  -0.373   -0.315   -0.315
59   0.344    0.242    0.242
60  -0.430   -0.408   -0.408
61  -0.737   -0.669   -0.669
62  -0.482   -0.425   -0.425
63   0.522    0.481    0.481
64   0.014    0.014    0.014
65   0.000    0.000    0.000
66  -0.009   -0.009   -0.009
67  -1.000   -1.000   -1.000
68   1.000    1.000    1.000
69  -1.000   -1.000   -1.000
70   0.002    0.002    0.002
71  -0.001   -0.001   -0.001
72   0.001    0.001    0.001
16.10.6.6.4 Equivalence Test

The petersenlab package (Petersen, 2024b) contains the equiv_chi() function from Counsell et al. (2020) that performs an equivalence test: https://osf.io/cqu8v. The chi-square equivalence test is non-significant, suggesting that the model fit is not acceptable.

Code
equiv_chi(
  alpha = .05,
  chi = scalarInvarianceModel_chisquare,
  df = scalarInvarianceModel_df,
  m = 2,
  N_sample = scalarInvarianceModel_N,
  popRMSEA = .08)

Moreover, the equivalence test of the chi-square difference test is non-significant, suggesting that the degree of worsening of model fit is not acceptable. In other words, scalar invariance failed.

Code
scalarInvarianceModel_dfDiff <- 
  scalarInvarianceModel_df - metricInvarianceModel_df

equiv_chi(
  alpha = .05,
  chi = scalarInvarianceModel_chisquareDiff,
  df = scalarInvarianceModel_dfDiff,
  m = 2,
  N_sample = scalarInvarianceModel_N,
  popRMSEA = .08)
16.10.6.6.5 Permutation Test

Permutation procedures for testing measurement invariance are described in (Jorgensen et al., 2018). For reproducibility, I set the seed below. Using the same seed will yield the same answer every time. There is nothing special about this particular seed. You can specify the null model as the baseline model using: baseline.model = nullModelFit

Warning: this code takes a while to run based on 100 iterations. You can reduce the number of iterations to be faster.

Code
set.seed(52242)
scalarInvarianceTest <- permuteMeasEq(
  nPermute = numPermutations,
  modelType = "mgcfa",
  con = scalarInvarianceModel_fit,
  uncon = metricInvarianceModel_fit,
  AFIs = myAFIs,
  moreAFIs = moreAFIs,
  parallelType = "multicore", #only 'snow' works on Windows, but right now, it is throwing an error
  iseed = 52242)
Code
RNGkind("default", "default", "default")
set.seed(52242)

scalarInvarianceTest
Omnibus p value based on parametric chi-squared difference test:

Chisq diff    Df diff Pr(>Chisq) 
    19.288      6.000      0.004 


Omnibus p values based on nonparametric permutation method: 

             AFI.Difference p.value
chisq                19.773    0.00
chisq.scaled         19.171    0.00
rmsea                 0.011    0.00
cfi                  -0.026    0.00
tli                  -0.025    0.00
srmr                  0.006    0.03
rmsea.robust          0.007    0.02
cfi.robust           -0.018    0.00
tli.robust           -0.015    0.01

The p-values are significant, indicating that the model fit significantly worse than the metric invariance model. In other words, scalar invariance failed.

16.10.6.7 Internal Consistency Reliability

Internal consistency reliability of items composing the latent factors, as quantified by omega (\(\omega\)) and average variance extracted (AVE), was estimated using the semTools package (Jorgensen et al., 2021).

Code
compRelSEM(scalarInvarianceModel_fit)
Code
AVE(scalarInvarianceModel_fit)

16.10.6.8 Path Diagram

A path diagram of the model generated using the semPlot package (Epskamp, 2022) is below.

Code
semPaths(
  scalarInvarianceModel_fit,
  what = "est",
  layout = "tree2",
  ask = FALSE,
  edge.label.cex = 1.2)
Scalar Invariance Model in Confirmatory Factor Analysis.

Figure 16.57: Scalar Invariance Model in Confirmatory Factor Analysis.

Scalar Invariance Model in Confirmatory Factor Analysis.

Figure 16.58: Scalar Invariance Model in Confirmatory Factor Analysis.

16.10.7 Residual (“Strict Factorial”) Invariance Model

Specify invariance of factor loadings, intercepts, and residuals across groups.

16.10.7.1 Model Syntax

Code
residualInvarianceModel <- measEq.syntax(
  configural.model = cfaModel,
  data = HolzingerSwineford1939,
  ID.fac = "std.lv",
  group = "school",
  group.equal = c("loadings","intercepts","residuals"))
Code
cat(as.character(residualInvarianceModel))
## LOADINGS:

visual =~ c(NA, NA)*x1 + c(lambda.1_1, lambda.1_1)*x1
visual =~ c(NA, NA)*x2 + c(lambda.2_1, lambda.2_1)*x2
visual =~ c(NA, NA)*x3 + c(lambda.3_1, lambda.3_1)*x3
textual =~ c(NA, NA)*x4 + c(lambda.4_2, lambda.4_2)*x4
textual =~ c(NA, NA)*x5 + c(lambda.5_2, lambda.5_2)*x5
textual =~ c(NA, NA)*x6 + c(lambda.6_2, lambda.6_2)*x6
speed =~ c(NA, NA)*x7 + c(lambda.7_3, lambda.7_3)*x7
speed =~ c(NA, NA)*x8 + c(lambda.8_3, lambda.8_3)*x8
speed =~ c(NA, NA)*x9 + c(lambda.9_3, lambda.9_3)*x9

## INTERCEPTS:

x1 ~ c(NA, NA)*1 + c(nu.1, nu.1)*1
x2 ~ c(NA, NA)*1 + c(nu.2, nu.2)*1
x3 ~ c(NA, NA)*1 + c(nu.3, nu.3)*1
x4 ~ c(NA, NA)*1 + c(nu.4, nu.4)*1
x5 ~ c(NA, NA)*1 + c(nu.5, nu.5)*1
x6 ~ c(NA, NA)*1 + c(nu.6, nu.6)*1
x7 ~ c(NA, NA)*1 + c(nu.7, nu.7)*1
x8 ~ c(NA, NA)*1 + c(nu.8, nu.8)*1
x9 ~ c(NA, NA)*1 + c(nu.9, nu.9)*1

## UNIQUE-FACTOR VARIANCES:

x1 ~~ c(NA, NA)*x1 + c(theta.1_1, theta.1_1)*x1
x2 ~~ c(NA, NA)*x2 + c(theta.2_2, theta.2_2)*x2
x3 ~~ c(NA, NA)*x3 + c(theta.3_3, theta.3_3)*x3
x4 ~~ c(NA, NA)*x4 + c(theta.4_4, theta.4_4)*x4
x5 ~~ c(NA, NA)*x5 + c(theta.5_5, theta.5_5)*x5
x6 ~~ c(NA, NA)*x6 + c(theta.6_6, theta.6_6)*x6
x7 ~~ c(NA, NA)*x7 + c(theta.7_7, theta.7_7)*x7
x8 ~~ c(NA, NA)*x8 + c(theta.8_8, theta.8_8)*x8
x9 ~~ c(NA, NA)*x9 + c(theta.9_9, theta.9_9)*x9

## LATENT MEANS/INTERCEPTS:

visual ~ c(0, NA)*1 + c(alpha.1.g1, alpha.1.g2)*1
textual ~ c(0, NA)*1 + c(alpha.2.g1, alpha.2.g2)*1
speed ~ c(0, NA)*1 + c(alpha.3.g1, alpha.3.g2)*1

## COMMON-FACTOR VARIANCES:

visual ~~ c(1, NA)*visual + c(psi.1_1.g1, psi.1_1.g2)*visual
textual ~~ c(1, NA)*textual + c(psi.2_2.g1, psi.2_2.g2)*textual
speed ~~ c(1, NA)*speed + c(psi.3_3.g1, psi.3_3.g2)*speed

## COMMON-FACTOR COVARIANCES:

visual ~~ c(NA, NA)*textual + c(psi.2_1.g1, psi.2_1.g2)*textual
visual ~~ c(NA, NA)*speed + c(psi.3_1.g1, psi.3_1.g2)*speed
textual ~~ c(NA, NA)*speed + c(psi.3_2.g1, psi.3_2.g2)*speed
Code
residualInvarianceSyntax <- as.character(residualInvarianceModel)
16.10.7.1.1 Summary of Model Features
Code
summary(residualInvarianceModel)
This lavaan model syntax specifies a CFA with 9 manifest indicators of 3 common factor(s).

To identify the location and scale of each common factor, the factor means and variances were fixed to 0 and 1, respectively, unless equality constraints on measurement parameters allow them to be freed.

Pattern matrix indicating num(eric), ord(ered), and lat(ent) indicators per factor:

   visual textual speed
x1 num                 
x2 num                 
x3 num                 
x4        num          
x5        num          
x6        num          
x7                num  
x8                num  
x9                num  

The following types of parameter were constrained to equality across groups:

    loadings
    intercepts
    residuals

16.10.7.2 Model Syntax in Table Form

Code
lavaanify(residualInvarianceModel, ngroups = 2)
Code
lavaanify(cfaModel_residualInvariance, ngroups = 2)

16.10.7.3 Fit the Model

Code
residualInvarianceModel_fit <- cfa(
  residualInvarianceSyntax,
  data = HolzingerSwineford1939,
  group = "school",
  missing = "ML",
  estimator = "MLR")
Code
residualInvarianceModelFullSyntax_fit <- lavaan(
  cfaModel_residualInvariance,
  data = HolzingerSwineford1939,
  group = "school",
  missing = "ML",
  estimator = "MLR")

16.10.7.4 Model Summary

Code
summary(
  residualInvarianceModel_fit,
  fit.measures = TRUE,
  standardized = TRUE,
  rsquare = TRUE)
lavaan 0.6.17 ended normally after 61 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        66
  Number of equality constraints                    27

  Number of observations per group:                   
    2                                              132
    1                                              127
  Number of missing patterns per group:               
    2                                               48
    1                                               52

Model Test User Model:
                                              Standard      Scaled
  Test Statistic                               102.581      97.522
  Degrees of freedom                                69          69
  P-value (Chi-square)                           0.005       0.014
  Scaling correction factor                                  1.052
    Yuan-Bentler correction (Mplus variant)                       
  Test statistic for each group:
    2                                           58.408      55.528
    1                                           44.173      41.994

Model Test Baseline Model:

  Test statistic                               604.129     571.412
  Degrees of freedom                                72          72
  P-value                                        0.000       0.000
  Scaling correction factor                                  1.057

User Model versus Baseline Model:

  Comparative Fit Index (CFI)                    0.937       0.943
  Tucker-Lewis Index (TLI)                       0.934       0.940
                                                                  
  Robust Comparative Fit Index (CFI)                         0.940
  Robust Tucker-Lewis Index (TLI)                            0.938

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)              -2701.788   -2701.788
  Scaling correction factor                                  0.645
      for the MLR correction                                      
  Loglikelihood unrestricted model (H1)      -2650.498   -2650.498
  Scaling correction factor                                  1.066
      for the MLR correction                                      
                                                                  
  Akaike (AIC)                                5481.576    5481.576
  Bayesian (BIC)                              5620.293    5620.293
  Sample-size adjusted Bayesian (SABIC)       5496.648    5496.648

Root Mean Square Error of Approximation:

  RMSEA                                          0.061       0.056
  90 Percent confidence interval - lower         0.034       0.028
  90 Percent confidence interval - upper         0.085       0.080
  P-value H_0: RMSEA <= 0.050                    0.221       0.322
  P-value H_0: RMSEA >= 0.080                    0.103       0.054
                                                                  
  Robust RMSEA                                               0.070
  90 Percent confidence interval - lower                     0.030
  90 Percent confidence interval - upper                     0.101
  P-value H_0: Robust RMSEA <= 0.050                         0.173
  P-value H_0: Robust RMSEA >= 0.080                         0.320

Standardized Root Mean Square Residual:

  SRMR                                           0.082       0.082

Parameter Estimates:

  Standard errors                             Sandwich
  Information bread                           Observed
  Observed information based on                Hessian


Group 1 [2]:

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  visual =~                                                             
    x1      (l.1_)    0.892    0.123    7.263    0.000    0.892    0.743
    x2      (l.2_)    0.523    0.099    5.294    0.000    0.523    0.449
    x3      (l.3_)    0.848    0.092    9.224    0.000    0.848    0.742
  textual =~                                                            
    x4      (l.4_)    0.945    0.089   10.651    0.000    0.945    0.826
    x5      (l.5_)    1.089    0.115    9.473    0.000    1.089    0.870
    x6      (l.6_)    0.828    0.084    9.871    0.000    0.828    0.807
  speed =~                                                              
    x7      (l.7_)    0.469    0.095    4.939    0.000    0.469    0.465
    x8      (l.8_)    0.631    0.103    6.129    0.000    0.631    0.670
    x9      (l.9_)    0.586    0.112    5.251    0.000    0.586    0.607

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  visual ~~                                                             
    textul  (p.2_)    0.430    0.132    3.262    0.001    0.430    0.430
    speed  (p.3_1)    0.414    0.139    2.975    0.003    0.414    0.414
  textual ~~                                                            
    speed  (p.3_2)    0.303    0.112    2.712    0.007    0.303    0.303

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .x1      (nu.1)    5.023    0.105   47.688    0.000    5.023    4.182
   .x2      (nu.2)    6.117    0.090   68.293    0.000    6.117    5.249
   .x3      (nu.3)    2.270    0.104   21.872    0.000    2.270    1.986
   .x4      (nu.4)    2.796    0.098   28.415    0.000    2.796    2.445
   .x5      (nu.5)    4.041    0.115   35.290    0.000    4.041    3.225
   .x6      (nu.6)    1.971    0.080   24.742    0.000    1.971    1.920
   .x7      (nu.7)    4.203    0.078   54.170    0.000    4.203    4.169
   .x8      (nu.8)    5.542    0.082   67.222    0.000    5.542    5.883
   .x9      (nu.9)    5.411    0.086   62.848    0.000    5.411    5.606
    visual  (a.1.)    0.000                               0.000    0.000
    textual (a.2.)    0.000                               0.000    0.000
    speed   (a.3.)    0.000                               0.000    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .x1      (t.1_)    0.647    0.141    4.573    0.000    0.647    0.448
   .x2      (t.2_)    1.084    0.133    8.131    0.000    1.084    0.798
   .x3      (t.3_)    0.588    0.109    5.404    0.000    0.588    0.450
   .x4      (t.4_)    0.415    0.066    6.324    0.000    0.415    0.317
   .x5      (t.5_)    0.383    0.084    4.549    0.000    0.383    0.244
   .x6      (t.6_)    0.368    0.064    5.787    0.000    0.368    0.349
   .x7      (t.7_)    0.796    0.098    8.157    0.000    0.796    0.784
   .x8      (t.8_)    0.489    0.128    3.830    0.000    0.489    0.551
   .x9      (t.9_)    0.588    0.122    4.838    0.000    0.588    0.631
    visual  (p.1_)    1.000                               1.000    1.000
    textual (p.2_)    1.000                               1.000    1.000
    speed   (p.3_)    1.000                               1.000    1.000

R-Square:
                   Estimate
    x1                0.552
    x2                0.202
    x3                0.550
    x4                0.683
    x5                0.756
    x6                0.651
    x7                0.216
    x8                0.449
    x9                0.369


Group 2 [1]:

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  visual =~                                                             
    x1      (l.1_)    0.892    0.123    7.263    0.000    0.772    0.692
    x2      (l.2_)    0.523    0.099    5.294    0.000    0.453    0.399
    x3      (l.3_)    0.848    0.092    9.224    0.000    0.733    0.691
  textual =~                                                            
    x4      (l.4_)    0.945    0.089   10.651    0.000    0.900    0.813
    x5      (l.5_)    1.089    0.115    9.473    0.000    1.038    0.859
    x6      (l.6_)    0.828    0.084    9.871    0.000    0.789    0.793
  speed =~                                                              
    x7      (l.7_)    0.469    0.095    4.939    0.000    0.594    0.554
    x8      (l.8_)    0.631    0.103    6.129    0.000    0.799    0.753
    x9      (l.9_)    0.586    0.112    5.251    0.000    0.742    0.695

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  visual ~~                                                             
    textul  (p.2_)    0.460    0.135    3.412    0.001    0.558    0.558
    speed  (p.3_1)    0.729    0.223    3.268    0.001    0.665    0.665
  textual ~~                                                            
    speed  (p.3_2)    0.465    0.226    2.053    0.040    0.385    0.385

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .x1      (nu.1)    5.023    0.105   47.688    0.000    5.023    4.507
   .x2      (nu.2)    6.117    0.090   68.293    0.000    6.117    5.388
   .x3      (nu.3)    2.270    0.104   21.872    0.000    2.270    2.140
   .x4      (nu.4)    2.796    0.098   28.415    0.000    2.796    2.525
   .x5      (nu.5)    4.041    0.115   35.290    0.000    4.041    3.344
   .x6      (nu.6)    1.971    0.080   24.742    0.000    1.971    1.980
   .x7      (nu.7)    4.203    0.078   54.170    0.000    4.203    3.921
   .x8      (nu.8)    5.542    0.082   67.222    0.000    5.542    5.218
   .x9      (nu.9)    5.411    0.086   62.848    0.000    5.411    5.070
    visual  (a.1.)   -0.211    0.157   -1.342    0.179   -0.244   -0.244
    textual (a.2.)    0.559    0.147    3.809    0.000    0.587    0.587
    speed   (a.3.)   -0.073    0.188   -0.390    0.696   -0.058   -0.058

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .x1      (t.1_)    0.647    0.141    4.573    0.000    0.647    0.521
   .x2      (t.2_)    1.084    0.133    8.131    0.000    1.084    0.841
   .x3      (t.3_)    0.588    0.109    5.404    0.000    0.588    0.522
   .x4      (t.4_)    0.415    0.066    6.324    0.000    0.415    0.339
   .x5      (t.5_)    0.383    0.084    4.549    0.000    0.383    0.262
   .x6      (t.6_)    0.368    0.064    5.787    0.000    0.368    0.372
   .x7      (t.7_)    0.796    0.098    8.157    0.000    0.796    0.693
   .x8      (t.8_)    0.489    0.128    3.830    0.000    0.489    0.434
   .x9      (t.9_)    0.588    0.122    4.838    0.000    0.588    0.516
    visual  (p.1_)    0.748    0.185    4.050    0.000    1.000    1.000
    textual (p.2_)    0.908    0.226    4.009    0.000    1.000    1.000
    speed   (p.3_)    1.604    0.491    3.266    0.001    1.000    1.000

R-Square:
                   Estimate
    x1                0.479
    x2                0.159
    x3                0.478
    x4                0.661
    x5                0.738
    x6                0.628
    x7                0.307
    x8                0.566
    x9                0.484
Code
summary(
  residualInvarianceModelFullSyntax_fit,
  fit.measures = TRUE,
  standardized = TRUE,
  rsquare = TRUE)
lavaan 0.6.17 ended normally after 62 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        66
  Number of equality constraints                    27

  Number of observations per group:                   
    2                                              132
    1                                              127
  Number of missing patterns per group:               
    2                                               48
    1                                               52

Model Test User Model:
                                              Standard      Scaled
  Test Statistic                               102.581      97.522
  Degrees of freedom                                69          69
  P-value (Chi-square)                           0.005       0.014
  Scaling correction factor                                  1.052
    Yuan-Bentler correction (Mplus variant)                       
  Test statistic for each group:
    2                                           58.408      55.528
    1                                           44.173      41.994

Model Test Baseline Model:

  Test statistic                               604.129     571.412
  Degrees of freedom                                72          72
  P-value                                        0.000       0.000
  Scaling correction factor                                  1.057

User Model versus Baseline Model:

  Comparative Fit Index (CFI)                    0.937       0.943
  Tucker-Lewis Index (TLI)                       0.934       0.940
                                                                  
  Robust Comparative Fit Index (CFI)                         0.940
  Robust Tucker-Lewis Index (TLI)                            0.938

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)              -2701.788   -2701.788
  Scaling correction factor                                  0.645
      for the MLR correction                                      
  Loglikelihood unrestricted model (H1)      -2650.498   -2650.498
  Scaling correction factor                                  1.066
      for the MLR correction                                      
                                                                  
  Akaike (AIC)                                5481.576    5481.576
  Bayesian (BIC)                              5620.293    5620.293
  Sample-size adjusted Bayesian (SABIC)       5496.648    5496.648

Root Mean Square Error of Approximation:

  RMSEA                                          0.061       0.056
  90 Percent confidence interval - lower         0.034       0.028
  90 Percent confidence interval - upper         0.085       0.080
  P-value H_0: RMSEA <= 0.050                    0.221       0.322
  P-value H_0: RMSEA >= 0.080                    0.103       0.054
                                                                  
  Robust RMSEA                                               0.070
  90 Percent confidence interval - lower                     0.030
  90 Percent confidence interval - upper                     0.101
  P-value H_0: Robust RMSEA <= 0.050                         0.173
  P-value H_0: Robust RMSEA >= 0.080                         0.320

Standardized Root Mean Square Residual:

  SRMR                                           0.082       0.082

Parameter Estimates:

  Standard errors                             Sandwich
  Information bread                           Observed
  Observed information based on                Hessian


Group 1 [2]:

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  visual =~                                                             
    x1      (lmb1)    0.892    0.123    7.263    0.000    0.892    0.743
    x2      (lmb2)    0.523    0.099    5.294    0.000    0.523    0.449
    x3      (lmb3)    0.848    0.092    9.224    0.000    0.848    0.742
  textual =~                                                            
    x4      (lmb4)    0.945    0.089   10.651    0.000    0.945    0.826
    x5      (lmb5)    1.089    0.115    9.473    0.000    1.089    0.870
    x6      (lmb6)    0.828    0.084    9.871    0.000    0.828    0.807
  speed =~                                                              
    x7      (lmb7)    0.469    0.095    4.939    0.000    0.469    0.465
    x8      (lmb8)    0.631    0.103    6.129    0.000    0.631    0.670
    x9      (lmb9)    0.586    0.112    5.251    0.000    0.586    0.607

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  visual ~~                                                             
    textual           0.430    0.132    3.262    0.001    0.430    0.430
    speed             0.414    0.139    2.975    0.003    0.414    0.414
  textual ~~                                                            
    speed             0.303    0.112    2.712    0.007    0.303    0.303

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
    visual            0.000                               0.000    0.000
    textual           0.000                               0.000    0.000
    speed             0.000                               0.000    0.000
   .x1      (int1)    5.023    0.105   47.689    0.000    5.023    4.182
   .x2      (int2)    6.117    0.090   68.293    0.000    6.117    5.249
   .x3      (int3)    2.270    0.104   21.872    0.000    2.270    1.986
   .x4      (int4)    2.796    0.098   28.415    0.000    2.796    2.445
   .x5      (int5)    4.041    0.115   35.290    0.000    4.041    3.225
   .x6      (int6)    1.971    0.080   24.742    0.000    1.971    1.920
   .x7      (int7)    4.203    0.078   54.170    0.000    4.203    4.169
   .x8      (int8)    5.542    0.082   67.222    0.000    5.542    5.883
   .x9      (int9)    5.411    0.086   62.848    0.000    5.411    5.606

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
    visual            1.000                               1.000    1.000
    textual           1.000                               1.000    1.000
    speed             1.000                               1.000    1.000
   .x1      (rsd1)    0.647    0.141    4.573    0.000    0.647    0.448
   .x2      (rsd2)    1.084    0.133    8.131    0.000    1.084    0.798
   .x3      (rsd3)    0.588    0.109    5.404    0.000    0.588    0.450
   .x4      (rsd4)    0.415    0.066    6.324    0.000    0.415    0.317
   .x5      (rsd5)    0.383    0.084    4.549    0.000    0.383    0.244
   .x6      (rsd6)    0.368    0.064    5.787    0.000    0.368    0.349
   .x7      (rsd7)    0.796    0.098    8.157    0.000    0.796    0.784
   .x8      (rsd8)    0.489    0.128    3.830    0.000    0.489    0.551
   .x9      (rsd9)    0.588    0.122    4.838    0.000    0.588    0.631

R-Square:
                   Estimate
    x1                0.552
    x2                0.202
    x3                0.550
    x4                0.683
    x5                0.756
    x6                0.651
    x7                0.216
    x8                0.449
    x9                0.369


Group 2 [1]:

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  visual =~                                                             
    x1      (lmb1)    0.892    0.123    7.263    0.000    0.772    0.692
    x2      (lmb2)    0.523    0.099    5.294    0.000    0.453    0.399
    x3      (lmb3)    0.848    0.092    9.224    0.000    0.733    0.691
  textual =~                                                            
    x4      (lmb4)    0.945    0.089   10.651    0.000    0.900    0.813
    x5      (lmb5)    1.089    0.115    9.473    0.000    1.038    0.859
    x6      (lmb6)    0.828    0.084    9.871    0.000    0.789    0.793
  speed =~                                                              
    x7      (lmb7)    0.469    0.095    4.939    0.000    0.594    0.554
    x8      (lmb8)    0.631    0.103    6.129    0.000    0.799    0.753
    x9      (lmb9)    0.586    0.112    5.251    0.000    0.742    0.695

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  visual ~~                                                             
    textual           0.460    0.135    3.412    0.001    0.558    0.558
    speed             0.729    0.223    3.268    0.001    0.665    0.665
  textual ~~                                                            
    speed             0.465    0.226    2.053    0.040    0.385    0.385

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
    visual           -0.211    0.157   -1.342    0.179   -0.244   -0.244
    textual           0.559    0.147    3.809    0.000    0.587    0.587
    speed            -0.073    0.188   -0.390    0.696   -0.058   -0.058
   .x1      (int1)    5.023    0.105   47.689    0.000    5.023    4.507
   .x2      (int2)    6.117    0.090   68.293    0.000    6.117    5.388
   .x3      (int3)    2.270    0.104   21.872    0.000    2.270    2.140
   .x4      (int4)    2.796    0.098   28.415    0.000    2.796    2.525
   .x5      (int5)    4.041    0.115   35.290    0.000    4.041    3.344
   .x6      (int6)    1.971    0.080   24.742    0.000    1.971    1.980
   .x7      (int7)    4.203    0.078   54.170    0.000    4.203    3.921
   .x8      (int8)    5.542    0.082   67.222    0.000    5.542    5.218
   .x9      (int9)    5.411    0.086   62.848    0.000    5.411    5.070

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
    visual            0.748    0.185    4.050    0.000    1.000    1.000
    textual           0.908    0.226    4.009    0.000    1.000    1.000
    speed             1.604    0.491    3.266    0.001    1.000    1.000
   .x1      (rsd1)    0.647    0.141    4.573    0.000    0.647    0.521
   .x2      (rsd2)    1.084    0.133    8.131    0.000    1.084    0.841
   .x3      (rsd3)    0.588    0.109    5.404    0.000    0.588    0.522
   .x4      (rsd4)    0.415    0.066    6.324    0.000    0.415    0.339
   .x5      (rsd5)    0.383    0.084    4.549    0.000    0.383    0.262
   .x6      (rsd6)    0.368    0.064    5.787    0.000    0.368    0.372
   .x7      (rsd7)    0.796    0.098    8.157    0.000    0.796    0.693
   .x8      (rsd8)    0.489    0.128    3.830    0.000    0.489    0.434
   .x9      (rsd9)    0.588    0.122    4.838    0.000    0.588    0.516

R-Square:
                   Estimate
    x1                0.479
    x2                0.159
    x3                0.478
    x4                0.661
    x5                0.738
    x6                0.628
    x7                0.307
    x8                0.566
    x9                0.484

16.10.7.5 Model Fit

You can specify the null model as the baseline model using: baseline.model = nullModelFit

Code
fitMeasures(residualInvarianceModel_fit)
                         npar                          fmin 
                       39.000                         0.198 
                        chisq                            df 
                      102.581                        69.000 
                       pvalue                  chisq.scaled 
                        0.005                        97.522 
                    df.scaled                 pvalue.scaled 
                       69.000                         0.014 
         chisq.scaling.factor                baseline.chisq 
                        1.052                       604.129 
                  baseline.df               baseline.pvalue 
                       72.000                         0.000 
        baseline.chisq.scaled            baseline.df.scaled 
                      571.412                        72.000 
       baseline.pvalue.scaled baseline.chisq.scaling.factor 
                        0.000                         1.057 
                          cfi                           tli 
                        0.937                         0.934 
                   cfi.scaled                    tli.scaled 
                        0.943                         0.940 
                   cfi.robust                    tli.robust 
                        0.940                         0.938 
                         nnfi                           rfi 
                        0.934                         0.823 
                          nfi                          pnfi 
                        0.830                         0.796 
                          ifi                           rni 
                        0.937                         0.937 
                  nnfi.scaled                    rfi.scaled 
                        0.940                         0.822 
                   nfi.scaled                   pnfi.scaled 
                        0.829                         0.795 
                   ifi.scaled                    rni.scaled 
                        0.943                         0.943 
                  nnfi.robust                    rni.robust 
                        0.938                         0.940 
                         logl             unrestricted.logl 
                    -2701.788                     -2650.498 
                          aic                           bic 
                     5481.576                      5620.293 
                       ntotal                          bic2 
                      259.000                      5496.648 
            scaling.factor.h1             scaling.factor.h0 
                        1.066                         0.645 
                        rmsea                rmsea.ci.lower 
                        0.061                         0.034 
               rmsea.ci.upper                rmsea.ci.level 
                        0.085                         0.900 
                 rmsea.pvalue                rmsea.close.h0 
                        0.221                         0.050 
        rmsea.notclose.pvalue             rmsea.notclose.h0 
                        0.103                         0.080 
                 rmsea.scaled         rmsea.ci.lower.scaled 
                        0.056                         0.028 
        rmsea.ci.upper.scaled           rmsea.pvalue.scaled 
                        0.080                         0.322 
 rmsea.notclose.pvalue.scaled                  rmsea.robust 
                        0.054                         0.070 
        rmsea.ci.lower.robust         rmsea.ci.upper.robust 
                        0.030                         0.101 
          rmsea.pvalue.robust  rmsea.notclose.pvalue.robust 
                        0.173                         0.320 
                          rmr                    rmr_nomean 
                        0.098                         0.101 
                         srmr                  srmr_bentler 
                        0.082                         0.082 
          srmr_bentler_nomean                          crmr 
                        0.082                         0.082 
                  crmr_nomean                    srmr_mplus 
                        0.083                         0.100 
            srmr_mplus_nomean                         cn_05 
                        0.081                       226.698 
                        cn_01                           gfi 
                      251.533                         0.994 
                         agfi                          pgfi 
                        0.990                         0.635 
                          mfi                          ecvi 
                        0.937                         0.697 
Code
residualInvarianceModelFitIndices <- fitMeasures(
  residualInvarianceModel_fit)[c(
    "cfi.robust", "rmsea.robust", "srmr")]

residualInvarianceModel_chisquare <- fitMeasures(
  residualInvarianceModel_fit)[c("chisq.scaled")]

residualInvarianceModel_chisquareScaling <- fitMeasures(
  residualInvarianceModel_fit)[c("chisq.scaling.factor")]

residualInvarianceModel_df <- fitMeasures(
  residualInvarianceModel_fit)[c("df.scaled")]

residualInvarianceModel_N <- lavInspect(
  residualInvarianceModel_fit,
  what = "ntotal")

16.10.7.6 Compare Model Fit

16.10.7.6.1 Nested Model (\(\chi^2\)) Difference Test

The scalar invariance model and the residual invariance model are considered “nested” models. The residual invariance model is nested within the scalar invariance model because the scalar invariance model includes all of the terms of the residual invariance model along with additional terms. Model fit of nested models can be compared with a chi-square difference test, also known as a likelihood ratio test or deviance test. A significant chi-square difference test would indicate that the simplified model with additional constraints (the residual invariance model) is significantly worse fitting than the more complex model that has fewer constraints (the scalar invariance model).

Code
anova(scalarInvarianceModel_fit, residualInvarianceModel_fit)

In this instance, you do not need to examine whether residual invariance held because the preceding level of measurement invariance (scalar invariance) did not hold.

The petersenlab package (Petersen, 2024b) contains the satorraBentlerScaledChiSquareDifferenceTestStatistic() function that performs a Satorra-Bentler scaled chi-square difference test:

Code
residualInvarianceModel_chisquareDiff <- 
  satorraBentlerScaledChiSquareDifferenceTestStatistic(
    T0 = residualInvarianceModel_chisquare,
    c0 = residualInvarianceModel_chisquareScaling,
    d0 = residualInvarianceModel_df,
    T1 = scalarInvarianceModel_chisquare,
    c1 = scalarInvarianceModel_chisquareScaling,
    d1 = scalarInvarianceModel_df)

residualInvarianceModel_chisquareDiff
chisq.scaled 
    4.148739 
16.10.7.6.2 Compare Other Fit Criteria
Code
round(residualInvarianceModelFitIndices - scalarInvarianceModelFitIndices,
      digits = 3)
  cfi.robust rmsea.robust         srmr 
       0.012       -0.012        0.003 
16.10.7.6.3 Score-Based Test

Score-based tests of measurement invariance are implemented using the strucchange package and are described by T. Wang et al. (2014).

Code
coef(residualInvarianceModel_fit)
lambda.1_1 lambda.2_1 lambda.3_1 lambda.4_2 lambda.5_2 lambda.6_2 lambda.7_3 
     0.892      0.523      0.848      0.945      1.089      0.828      0.469 
lambda.8_3 lambda.9_3       nu.1       nu.2       nu.3       nu.4       nu.5 
     0.631      0.586      5.023      6.117      2.270      2.796      4.041 
      nu.6       nu.7       nu.8       nu.9  theta.1_1  theta.2_2  theta.3_3 
     1.971      4.203      5.542      5.411      0.647      1.084      0.588 
 theta.4_4  theta.5_5  theta.6_6  theta.7_7  theta.8_8  theta.9_9 psi.2_1.g1 
     0.415      0.383      0.368      0.796      0.489      0.588      0.430 
psi.3_1.g1 psi.3_2.g1 lambda.1_1 lambda.2_1 lambda.3_1 lambda.4_2 lambda.5_2 
     0.414      0.303      0.892      0.523      0.848      0.945      1.089 
lambda.6_2 lambda.7_3 lambda.8_3 lambda.9_3       nu.1       nu.2       nu.3 
     0.828      0.469      0.631      0.586      5.023      6.117      2.270 
      nu.4       nu.5       nu.6       nu.7       nu.8       nu.9  theta.1_1 
     2.796      4.041      1.971      4.203      5.542      5.411      0.647 
 theta.2_2  theta.3_3  theta.4_4  theta.5_5  theta.6_6  theta.7_7  theta.8_8 
     1.084      0.588      0.415      0.383      0.368      0.796      0.489 
 theta.9_9 alpha.1.g2 alpha.2.g2 alpha.3.g2 psi.1_1.g2 psi.2_2.g2 psi.3_3.g2 
     0.588     -0.211      0.559     -0.073      0.748      0.908      1.604 
psi.2_1.g2 psi.3_1.g2 psi.3_2.g2 
     0.460      0.729      0.465 
Code
coef(residualInvarianceModel_fit)[19:27]
theta.1_1 theta.2_2 theta.3_3 theta.4_4 theta.5_5 theta.6_6 theta.7_7 theta.8_8 
0.6466924 1.0835856 0.5875180 0.4152642 0.3828318 0.3681839 0.7964128 0.4892110 
theta.9_9 
0.5881342 
Code
sctest(
  residualInvarianceModel_fit,
  order.by = HolzingerSwineford1939$school,
  parm = 19:27,
  functional = "LMuo")
Error in `[<-`(`*tmp*`, wi, , value = -scores.H1 %*% Delta): subscript out of bounds

A score-based test and expected parameter change (EPC) estimates (Oberski, 2014; Oberski et al., 2015) are provided by the lavaan package (Rosseel et al., 2022).

Code
lavTestScore(
  residualInvarianceModel_fit,
  epc = TRUE
)
$test

total score test:

   test     X2 df p.value
1 score 30.137 27   0.308

$uni

univariate score tests:

     lhs op   rhs    X2 df p.value
1   .p1. == .p37. 0.084  1   0.773
2   .p2. == .p38. 0.626  1   0.429
3   .p3. == .p39. 0.582  1   0.446
4   .p4. == .p40. 0.116  1   0.734
5   .p5. == .p41. 3.263  1   0.071
6   .p6. == .p42. 2.855  1   0.091
7   .p7. == .p43. 0.083  1   0.774
8   .p8. == .p44. 0.133  1   0.716
9   .p9. == .p45. 0.373  1   0.541
10 .p10. == .p46. 1.251  1   0.263
11 .p11. == .p47. 5.160  1   0.023
12 .p12. == .p48. 6.367  1   0.012
13 .p13. == .p49. 0.426  1   0.514
14 .p14. == .p50. 0.352  1   0.553
15 .p15. == .p51. 0.000  1   0.991
16 .p16. == .p52. 9.164  1   0.002
17 .p17. == .p53. 4.677  1   0.031
18 .p18. == .p54. 0.022  1   0.882
19 .p19. == .p55. 0.082  1   0.774
20 .p20. == .p56. 0.682  1   0.409
21 .p21. == .p57. 0.262  1   0.609
22 .p22. == .p58. 0.722  1   0.396
23 .p23. == .p59. 0.198  1   0.656
24 .p24. == .p60. 1.123  1   0.289
25 .p25. == .p61. 0.362  1   0.548
26 .p26. == .p62. 0.041  1   0.839
27 .p27. == .p63. 0.798  1   0.372

$epc

expected parameter changes (epc) and expected parameter values (epv):

       lhs op     rhs block group free      label plabel    est    epc    epv
1   visual =~      x1     1     1    1 lambda.1_1   .p1.  0.892  0.090  0.982
2   visual =~      x2     1     1    2 lambda.2_1   .p2.  0.523  0.044  0.567
3   visual =~      x3     1     1    3 lambda.3_1   .p3.  0.848 -0.117  0.731
4  textual =~      x4     1     1    4 lambda.4_2   .p4.  0.945 -0.042  0.903
5  textual =~      x5     1     1    5 lambda.5_2   .p5.  1.089  0.080  1.169
6  textual =~      x6     1     1    6 lambda.6_2   .p6.  0.828 -0.056  0.772
7    speed =~      x7     1     1    7 lambda.7_3   .p7.  0.469 -0.038  0.431
8    speed =~      x8     1     1    8 lambda.8_3   .p8.  0.631 -0.028  0.603
9    speed =~      x9     1     1    9 lambda.9_3   .p9.  0.586  0.022  0.608
10      x1 ~1             1     1   10       nu.1  .p10.  5.023 -0.047  4.976
11      x2 ~1             1     1   11       nu.2  .p11.  6.117 -0.141  5.976
12      x3 ~1             1     1   12       nu.3  .p12.  2.270  0.104  2.374
13      x4 ~1             1     1   13       nu.4  .p13.  2.796  0.023  2.819
14      x5 ~1             1     1   14       nu.5  .p14.  4.041 -0.018  4.023
15      x6 ~1             1     1   15       nu.6  .p15.  1.971  0.000  1.970
16      x7 ~1             1     1   16       nu.7  .p16.  4.203  0.160  4.363
17      x8 ~1             1     1   17       nu.8  .p17.  5.542 -0.078  5.465
18      x9 ~1             1     1   18       nu.9  .p18.  5.411 -0.007  5.404
19      x1 ~~      x1     1     1   19  theta.1_1  .p19.  0.647 -0.154  0.493
20      x2 ~~      x2     1     1   20  theta.2_2  .p20.  1.084  0.070  1.153
21      x3 ~~      x3     1     1   21  theta.3_3  .p21.  0.588  0.173  0.761
22      x4 ~~      x4     1     1   22  theta.4_4  .p22.  0.415  0.080  0.495
23      x5 ~~      x5     1     1   23  theta.5_5  .p23.  0.383 -0.045  0.338
24      x6 ~~      x6     1     1   24  theta.6_6  .p24.  0.368 -0.029  0.339
25      x7 ~~      x7     1     1   25  theta.7_7  .p25.  0.796  0.062  0.858
26      x8 ~~      x8     1     1   26  theta.8_8  .p26.  0.489  0.026  0.515
27      x9 ~~      x9     1     1   27  theta.9_9  .p27.  0.588  0.036  0.624
28  visual ~1             1     1    0 alpha.1.g1  .p28.  0.000     NA     NA
29 textual ~1             1     1    0 alpha.2.g1  .p29.  0.000     NA     NA
30   speed ~1             1     1    0 alpha.3.g1  .p30.  0.000     NA     NA
31  visual ~~  visual     1     1    0 psi.1_1.g1  .p31.  1.000     NA     NA
32 textual ~~ textual     1     1    0 psi.2_2.g1  .p32.  1.000     NA     NA
33   speed ~~   speed     1     1    0 psi.3_3.g1  .p33.  1.000     NA     NA
34  visual ~~ textual     1     1   28 psi.2_1.g1  .p34.  0.430  0.002  0.432
35  visual ~~   speed     1     1   29 psi.3_1.g1  .p35.  0.414  0.011  0.424
36 textual ~~   speed     1     1   30 psi.3_2.g1  .p36.  0.303  0.006  0.309
37  visual =~      x1     2     2   31 lambda.1_1  .p37.  0.892 -0.028  0.864
38  visual =~      x2     2     2   32 lambda.2_1  .p38.  0.523 -0.005  0.518
39  visual =~      x3     2     2   33 lambda.3_1  .p39.  0.848  0.032  0.880
40 textual =~      x4     2     2   34 lambda.4_2  .p40.  0.945  0.070  1.015
41 textual =~      x5     2     2   35 lambda.5_2  .p41.  1.089 -0.115  0.975
42 textual =~      x6     2     2   36 lambda.6_2  .p42.  0.828  0.069  0.897
43   speed =~      x7     2     2   37 lambda.7_3  .p43.  0.469  0.008  0.477
44   speed =~      x8     2     2   38 lambda.8_3  .p44.  0.631  0.016  0.647
45   speed =~      x9     2     2   39 lambda.9_3  .p45.  0.586 -0.006  0.580
46      x1 ~1             2     2   40       nu.1  .p46.  5.023  0.027  5.050
47      x2 ~1             2     2   41       nu.2  .p47.  6.117  0.146  6.262
48      x3 ~1             2     2   42       nu.3  .p48.  2.270 -0.103  2.167
49      x4 ~1             2     2   43       nu.4  .p49.  2.796 -0.063  2.733
50      x5 ~1             2     2   44       nu.5  .p50.  4.041  0.083  4.124
51      x6 ~1             2     2   45       nu.6  .p51.  1.971 -0.037  1.933
52      x7 ~1             2     2   46       nu.7  .p52.  4.203 -0.160  4.043
53      x8 ~1             2     2   47       nu.8  .p53.  5.542  0.085  5.627
54      x9 ~1             2     2   48       nu.9  .p54.  5.411  0.014  5.426
55      x1 ~~      x1     2     2   49  theta.1_1  .p55.  0.647  0.043  0.690
56      x2 ~~      x2     2     2   50  theta.2_2  .p56.  1.084 -0.090  0.993
57      x3 ~~      x3     2     2   51  theta.3_3  .p57.  0.588 -0.058  0.529
58      x4 ~~      x4     2     2   52  theta.4_4  .p58.  0.415 -0.097  0.319
59      x5 ~~      x5     2     2   53  theta.5_5  .p59.  0.383  0.067  0.450
60      x6 ~~      x6     2     2   54  theta.6_6  .p60.  0.368  0.025  0.393
61      x7 ~~      x7     2     2   55  theta.7_7  .p61.  0.796 -0.052  0.745
62      x8 ~~      x8     2     2   56  theta.8_8  .p62.  0.489 -0.020  0.469
63      x9 ~~      x9     2     2   57  theta.9_9  .p63.  0.588 -0.058  0.530
64  visual ~1             2     2   58 alpha.1.g2  .p64. -0.211  0.013 -0.198
65 textual ~1             2     2   59 alpha.2.g2  .p65.  0.559  0.000  0.560
66   speed ~1             2     2   60 alpha.3.g2  .p66. -0.073 -0.013 -0.086
67  visual ~~  visual     2     2   61 psi.1_1.g2  .p67.  0.748  0.007  0.755
68 textual ~~ textual     2     2   62 psi.2_2.g2  .p68.  0.908  0.001  0.909
69   speed ~~   speed     2     2   63 psi.3_3.g2  .p69.  1.604  0.006  1.610
70  visual ~~ textual     2     2   64 psi.2_1.g2  .p70.  0.460 -0.001  0.459
71  visual ~~   speed     2     2   65 psi.3_1.g2  .p71.  0.729 -0.010  0.719
72 textual ~~   speed     2     2   66 psi.3_2.g2  .p72.  0.465 -0.006  0.459
   sepc.lv sepc.all sepc.nox
1    0.090    0.075    0.075
2    0.044    0.038    0.038
3   -0.117   -0.102   -0.102
4   -0.042   -0.037   -0.037
5    0.080    0.064    0.064
6   -0.056   -0.055   -0.055
7   -0.038   -0.038   -0.038
8   -0.028   -0.029   -0.029
9    0.022    0.023    0.023
10  -0.047   -0.039   -0.039
11  -0.141   -0.121   -0.121
12   0.104    0.091    0.091
13   0.023    0.020    0.020
14  -0.018   -0.014   -0.014
15   0.000    0.000    0.000
16   0.160    0.159    0.159
17  -0.078   -0.082   -0.082
18  -0.007   -0.007   -0.007
19  -0.647   -0.448   -0.448
20   1.084    0.798    0.798
21   0.588    0.450    0.450
22   0.415    0.317    0.317
23  -0.383   -0.244   -0.244
24  -0.368   -0.349   -0.349
25   0.796    0.784    0.784
26   0.489    0.551    0.551
27   0.588    0.631    0.631
28      NA       NA       NA
29      NA       NA       NA
30      NA       NA       NA
31      NA       NA       NA
32      NA       NA       NA
33      NA       NA       NA
34   0.002    0.002    0.002
35   0.011    0.011    0.011
36   0.006    0.006    0.006
37  -0.024   -0.022   -0.022
38  -0.004   -0.004   -0.004
39   0.028    0.026    0.026
40   0.067    0.061    0.061
41  -0.109   -0.090   -0.090
42   0.066    0.066    0.066
43   0.010    0.010    0.010
44   0.021    0.020    0.020
45  -0.007   -0.007   -0.007
46   0.027    0.025    0.025
47   0.146    0.128    0.128
48  -0.103   -0.097   -0.097
49  -0.063   -0.057   -0.057
50   0.083    0.069    0.069
51  -0.037   -0.038   -0.038
52  -0.160   -0.150   -0.150
53   0.085    0.080    0.080
54   0.014    0.013    0.013
55   0.647    0.521    0.521
56  -1.084   -0.841   -0.841
57  -0.588   -0.522   -0.522
58  -0.415   -0.339   -0.339
59   0.383    0.262    0.262
60   0.368    0.372    0.372
61  -0.796   -0.693   -0.693
62  -0.489   -0.434   -0.434
63  -0.588   -0.516   -0.516
64   0.015    0.015    0.015
65   0.000    0.000    0.000
66  -0.010   -0.010   -0.010
67   1.000    1.000    1.000
68   1.000    1.000    1.000
69   1.000    1.000    1.000
70  -0.001   -0.001   -0.001
71  -0.009   -0.009   -0.009
72  -0.005   -0.005   -0.005
16.10.7.6.4 Equivalence Test

The petersenlab package (Petersen, 2024b) contains the equiv_chi() function from Counsell et al. (2020) that performs an equivalence test: https://osf.io/cqu8v. The chi-square equivalence test is non-significant, suggesting that the model fit is not acceptable. The equivalence test of the chi-square difference test is non-significant, suggesting that the degree of worsening of model fit is not acceptable.

Code
equiv_chi(
  alpha = .05,
  chi = residualInvarianceModel_chisquare,
  df = residualInvarianceModel_df,
  m = 2,
  N_sample = residualInvarianceModel_N,
  popRMSEA = .08)

Moreover, the equivalence test of the chi-square difference test is non-significant, suggesting that the degree of worsening of model fit is not acceptable. In other words, residual invariance failed.

Code
residualInvarianceModel_dfDiff <- 
  residualInvarianceModel_df - scalarInvarianceModel_df

equiv_chi(
  alpha = .05,
  chi = residualInvarianceModel_chisquareDiff,
  df = residualInvarianceModel_dfDiff,
  m = 2,
  N_sample = residualInvarianceModel_N,
  popRMSEA = .08)
16.10.7.6.5 Permutation Test

Permutation procedures for testing measurement invariance are described in (Jorgensen et al., 2018). For reproducibility, I set the seed below. Using the same seed will yield the same answer every time. There is nothing special about this particular seed. You can specify the null model as the baseline model using: baseline.model = nullModelFit.

Warning: this code takes a while to run based on \(100\) iterations. You can reduce the number of iterations to be faster.

Code
set.seed(52242)
residualInvarianceTest <- permuteMeasEq(
  nPermute = numPermutations,
  modelType = "mgcfa",
  con = residualInvarianceModel_fit,
  uncon = scalarInvarianceModel_fit,
  AFIs = myAFIs,
  moreAFIs = moreAFIs,
  parallelType = "multicore", #only 'snow' works on Windows, but right now, it is throwing an error
  iseed = 52242)
Code
RNGkind("default", "default", "default")
set.seed(52242)

residualInvarianceTest
Omnibus p value based on parametric chi-squared difference test:

Chisq diff    Df diff Pr(>Chisq) 
     4.149      9.000      0.901 


Omnibus p values based on nonparametric permutation method: 

             AFI.Difference p.value
chisq                 4.797    0.94
chisq.scaled          3.156    0.95
rmsea                -0.008    0.93
cfi                   0.008    0.94
tli                   0.019    0.96
srmr                  0.003    0.79
rmsea.robust         -0.012    0.97
cfi.robust            0.012    0.97
tli.robust            0.024    1.00

The p-values are non-significant, indicating that the model did not fit significantly worse than the scalar invariance model.

16.10.7.7 Internal Consistency Reliability

Internal consistency reliability of items composing the latent factors, as quantified by omega (\(\omega\)) and average variance extracted (AVE), was estimated using the semTools package (Jorgensen et al., 2021).

Code
compRelSEM(residualInvarianceModel_fit)
Code
AVE(residualInvarianceModel_fit)

16.10.7.8 Path Diagram

A path diagram of the model generated using the semPlot package (Epskamp, 2022) is below.

Code
semPaths(
  residualInvarianceModel_fit,
  what = "est",
  layout = "tree2",
  ask = FALSE,
  edge.label.cex = 1.2)
Residual Invariance Model in Confirmatory Factor Analysis.

Figure 16.59: Residual Invariance Model in Confirmatory Factor Analysis.

Residual Invariance Model in Confirmatory Factor Analysis.

Figure 16.60: Residual Invariance Model in Confirmatory Factor Analysis.

16.10.8 Addressing Measurement Non-Invariance

In the example above, we detected measurement non-invariance of intercepts, but not factor loadings. Thus, we would want to identify which item(s) show non-invariant intercepts across groups. When detecting measurement non-invariance in a given parameter (e.g., factor loadings, intercepts, or residuals), one can identify the specific items that show measurement non-invariance in one of two primary ways: (1) starting with a model that allows the given parameter to differ across groups, a researcher can iteratively add constraints to identify the item(s) for which measurement invariance fails, or (2) starting with a model that constrains the given parameter to be the same across groups, a researcher can iteratively remove constraints to identify the item(s) for which measurement invariance becomes established (and by process of elimination, the items for which measurement invariance does not become established).

16.10.8.1 Iteratively add constraints to identify measurement non-invariance

In the example above, we detected measurement non-invariance of intercepts. One approach to identify measurement non-invariance is to iteratively add constraints to a model that allows the given parameter (intercepts) to differ across groups. So, we will use the metric invariance model as the baseline model (in which item intercepts are allowed to differ across groups), and iteratively add constraints to identify which item(s) show non-invariant intercepts across groups.

16.10.8.1.1 Variable 1

Variable 1 does not have non-invariant intercepts.

Code
cfaModel_metricInvarianceV1 <- '
 #Fix factor loadings to be the same across groups
 visual  =~ c(lambdax1,lambdax1)*x1 + c(lambdax2,lambdax2)*x2 + c(lambdax3,lambdax3)*x3
 textual =~ c(lambdax4,lambdax4)*x4 + c(lambdax5,lambdax5)*x5 + c(lambdax6,lambdax6)*x6
 speed   =~ c(lambdax7,lambdax7)*x7 + c(lambdax8,lambdax8)*x8 + c(lambdax9,lambdax9)*x9
 
 #Fix latent means to zero
 visual ~ 0
 textual ~ 0
 speed ~ 0
 
 #Fix latent variances to one in group 1; free latent variances in group 2
 visual ~~ c(1, NA)*visual
 textual ~~ c(1, NA)*textual
 speed ~~ c(1, NA)*speed
 
 #Estimate covariances among latent variables
 visual ~~ textual
 visual ~~ speed
 textual ~~ speed
 
 #Estimate residual variances of manifest variables
 x1 ~~ x1
 x2 ~~ x2
 x3 ~~ x3
 x4 ~~ x4
 x5 ~~ x5
 x6 ~~ x6
 x7 ~~ x7
 x8 ~~ x8
 x9 ~~ x9
 
 #Iteratively fix intercepts of manifest variables across groups
 x1 ~ c(intx1, intx1)*1
 x2 ~ NA*1
 x3 ~ NA*1
 x4 ~ NA*1
 x5 ~ NA*1
 x6 ~ NA*1
 x7 ~ NA*1
 x8 ~ NA*1
 x9 ~ NA*1
'

cfaModel_metricInvarianceV1_fit <- lavaan(
  cfaModel_metricInvarianceV1,
  data = HolzingerSwineford1939,
  group = "school",
  missing = "ML",
  estimator = "MLR")

anova(metricInvarianceModel_fit, cfaModel_metricInvarianceV1_fit)
16.10.8.1.2 Variable 2

Variable 2 has non-invariant intercepts.

Code
cfaModel_metricInvarianceV2 <- '
 #Fix factor loadings to be the same across groups
 visual  =~ c(lambdax1,lambdax1)*x1 + c(lambdax2,lambdax2)*x2 + c(lambdax3,lambdax3)*x3
 textual =~ c(lambdax4,lambdax4)*x4 + c(lambdax5,lambdax5)*x5 + c(lambdax6,lambdax6)*x6
 speed   =~ c(lambdax7,lambdax7)*x7 + c(lambdax8,lambdax8)*x8 + c(lambdax9,lambdax9)*x9
 
 #Fix latent means to zero
 visual ~ 0
 textual ~ 0
 speed ~ 0
 
 #Fix latent variances to one in group 1; free latent variances in group 2
 visual ~~ c(1, NA)*visual
 textual ~~ c(1, NA)*textual
 speed ~~ c(1, NA)*speed
 
 #Estimate covariances among latent variables
 visual ~~ textual
 visual ~~ speed
 textual ~~ speed
 
 #Estimate residual variances of manifest variables
 x1 ~~ x1
 x2 ~~ x2
 x3 ~~ x3
 x4 ~~ x4
 x5 ~~ x5
 x6 ~~ x6
 x7 ~~ x7
 x8 ~~ x8
 x9 ~~ x9
 
 #Iteratively fix intercepts of manifest variables across groups
 x1 ~ c(intx1, intx1)*1
 x2 ~ c(intx1, intx1)*1
 x3 ~ NA*1
 x4 ~ NA*1
 x5 ~ NA*1
 x6 ~ NA*1
 x7 ~ NA*1
 x8 ~ NA*1
 x9 ~ NA*1
'

cfaModel_metricInvarianceV2_fit <- lavaan(
  cfaModel_metricInvarianceV2,
  data = HolzingerSwineford1939,
  group = "school",
  missing = "ML",
  estimator = "MLR")

anova(metricInvarianceModel_fit, cfaModel_metricInvarianceV2_fit)

Variable 2 appears to have larger intercepts in group 2 than in group 1:

Code
summary(metricInvarianceModel_fit)
lavaan 0.6.17 ended normally after 67 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        63
  Number of equality constraints                     9

  Number of observations per group:                   
    2                                              132
    1                                              127
  Number of missing patterns per group:               
    2                                               48
    1                                               52

Model Test User Model:
                                              Standard      Scaled
  Test Statistic                                78.011      75.195
  Degrees of freedom                                54          54
  P-value (Chi-square)                           0.018       0.030
  Scaling correction factor                                  1.037
    Yuan-Bentler correction (Mplus variant)                       
  Test statistic for each group:
    2                                           47.087      45.387
    1                                           30.924      29.808

Parameter Estimates:

  Standard errors                             Sandwich
  Information bread                           Observed
  Observed information based on                Hessian


Group 1 [2]:

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)
  visual =~                                           
    x1      (l.1_)    0.913    0.130    7.016    0.000
    x2      (l.2_)    0.539    0.097    5.557    0.000
    x3      (l.3_)    0.819    0.088    9.333    0.000
  textual =~                                          
    x4      (l.4_)    0.950    0.090   10.553    0.000
    x5      (l.5_)    1.069    0.117    9.122    0.000
    x6      (l.6_)    0.824    0.092    8.926    0.000
  speed =~                                            
    x7      (l.7_)    0.471    0.085    5.523    0.000
    x8      (l.8_)    0.636    0.108    5.897    0.000
    x9      (l.9_)    0.557    0.112    4.971    0.000

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)
  visual ~~                                           
    textul  (p.2_)    0.455    0.127    3.578    0.000
    speed  (p.3_1)    0.396    0.139    2.837    0.005
  textual ~~                                          
    speed  (p.3_2)    0.318    0.115    2.773    0.006

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)
   .x1      (n.1.)    4.974    0.113   44.143    0.000
   .x2      (n.2.)    5.977    0.110   54.446    0.000
   .x3      (n.3.)    2.375    0.106   22.343    0.000
   .x4      (n.4.)    2.817    0.105   26.778    0.000
   .x5      (n.5.)    4.022    0.119   33.914    0.000
   .x6      (n.6.)    1.971    0.087   22.590    0.000
   .x7      (n.7.)    4.362    0.093   46.812    0.000
   .x8      (n.8.)    5.464    0.088   61.938    0.000
   .x9      (n.9.)    5.402    0.094   57.768    0.000
    visual  (a.1.)    0.000                           
    textual (a.2.)    0.000                           
    speed   (a.3.)    0.000                           

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .x1      (t.1_)    0.576    0.179    3.221    0.001
   .x2      (t.2_)    1.147    0.185    6.212    0.000
   .x3      (t.3_)    0.646    0.134    4.811    0.000
   .x4      (t.4_)    0.470    0.090    5.216    0.000
   .x5      (t.5_)    0.431    0.109    3.941    0.000
   .x6      (t.6_)    0.302    0.075    4.033    0.000
   .x7      (t.7_)    0.806    0.122    6.590    0.000
   .x8      (t.8_)    0.473    0.120    3.928    0.000
   .x9      (t.9_)    0.670    0.149    4.495    0.000
    visual  (p.1_)    1.000                           
    textual (p.2_)    1.000                           
    speed   (p.3_)    1.000                           


Group 2 [1]:

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)
  visual =~                                           
    x1      (l.1_)    0.913    0.130    7.016    0.000
    x2      (l.2_)    0.539    0.097    5.557    0.000
    x3      (l.3_)    0.819    0.088    9.333    0.000
  textual =~                                          
    x4      (l.4_)    0.950    0.090   10.553    0.000
    x5      (l.5_)    1.069    0.117    9.122    0.000
    x6      (l.6_)    0.824    0.092    8.926    0.000
  speed =~                                            
    x7      (l.7_)    0.471    0.085    5.523    0.000
    x8      (l.8_)    0.636    0.108    5.897    0.000
    x9      (l.9_)    0.557    0.112    4.971    0.000

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)
  visual ~~                                           
    textul  (p.2_)    0.462    0.137    3.379    0.001
    speed  (p.3_1)    0.732    0.231    3.177    0.001
  textual ~~                                          
    speed  (p.3_2)    0.473    0.241    1.966    0.049

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)
   .x1      (n.1.)    4.881    0.105   46.577    0.000
   .x2      (n.2.)    6.159    0.103   59.876    0.000
   .x3      (n.3.)    1.993    0.098   20.368    0.000
   .x4      (n.4.)    3.299    0.103   31.949    0.000
   .x5      (n.5.)    4.674    0.105   44.381    0.000
   .x6      (n.6.)    2.436    0.101   24.159    0.000
   .x7      (n.7.)    4.002    0.097   41.140    0.000
   .x8      (n.8.)    5.571    0.099   56.335    0.000
   .x9      (n.9.)    5.375    0.098   54.583    0.000
    visual  (a.1.)    0.000                           
    textual (a.2.)    0.000                           
    speed   (a.3.)    0.000                           

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .x1      (t.1_)    0.653    0.167    3.900    0.000
   .x2      (t.2_)    0.958    0.186    5.165    0.000
   .x3      (t.3_)    0.548    0.117    4.672    0.000
   .x4      (t.4_)    0.360    0.089    4.046    0.000
   .x5      (t.5_)    0.354    0.096    3.679    0.000
   .x6      (t.6_)    0.427    0.103    4.149    0.000
   .x7      (t.7_)    0.695    0.120    5.805    0.000
   .x8      (t.8_)    0.439    0.204    2.149    0.032
   .x9      (t.9_)    0.547    0.148    3.697    0.000
    visual  (p.1_)    0.770    0.210    3.663    0.000
    textual (p.2_)    0.926    0.228    4.058    0.000
    speed   (p.3_)    1.724    0.512    3.364    0.001

Thus, in subsequent models, we would allow variable 2 to have different intercepts across groups.

16.10.8.1.3 Variable 3

Variable 3 has non-invariant intercepts.

Code
cfaModel_metricInvarianceV3 <- '
 #Fix factor loadings to be the same across groups
 visual  =~ c(lambdax1,lambdax1)*x1 + c(lambdax2,lambdax2)*x2 + c(lambdax3,lambdax3)*x3
 textual =~ c(lambdax4,lambdax4)*x4 + c(lambdax5,lambdax5)*x5 + c(lambdax6,lambdax6)*x6
 speed   =~ c(lambdax7,lambdax7)*x7 + c(lambdax8,lambdax8)*x8 + c(lambdax9,lambdax9)*x9
 
 #Fix latent means to zero
 visual ~ 0
 textual ~ 0
 speed ~ 0
 
 #Fix latent variances to one in group 1; free latent variances in group 2
 visual ~~ c(1, NA)*visual
 textual ~~ c(1, NA)*textual
 speed ~~ c(1, NA)*speed
 
 #Estimate covariances among latent variables
 visual ~~ textual
 visual ~~ speed
 textual ~~ speed
 
 #Estimate residual variances of manifest variables
 x1 ~~ x1
 x2 ~~ x2
 x3 ~~ x3
 x4 ~~ x4
 x5 ~~ x5
 x6 ~~ x6
 x7 ~~ x7
 x8 ~~ x8
 x9 ~~ x9
 
 #Iteratively fix intercepts of manifest variables across groups
 x1 ~ c(intx1, intx1)*1
 x2 ~ NA*1
 x3 ~ c(intx1, intx1)*1
 x4 ~ NA*1
 x5 ~ NA*1
 x6 ~ NA*1
 x7 ~ NA*1
 x8 ~ NA*1
 x9 ~ NA*1
'

cfaModel_metricInvarianceV3_fit <- lavaan(
  cfaModel_metricInvarianceV3,
  data = HolzingerSwineford1939,
  group = "school",
  missing = "ML",
  estimator = "MLR")

anova(metricInvarianceModel_fit, cfaModel_metricInvarianceV3_fit)
16.10.8.1.4 Variable 4

Variable 4 has non-invariant intercepts.

Code
cfaModel_metricInvarianceV4 <- '
 #Fix factor loadings to be the same across groups
 visual  =~ c(lambdax1,lambdax1)*x1 + c(lambdax2,lambdax2)*x2 + c(lambdax3,lambdax3)*x3
 textual =~ c(lambdax4,lambdax4)*x4 + c(lambdax5,lambdax5)*x5 + c(lambdax6,lambdax6)*x6
 speed   =~ c(lambdax7,lambdax7)*x7 + c(lambdax8,lambdax8)*x8 + c(lambdax9,lambdax9)*x9
 
 #Fix latent means to zero
 visual ~ 0
 textual ~ 0
 speed ~ 0
 
 #Fix latent variances to one in group 1; free latent variances in group 2
 visual ~~ c(1, NA)*visual
 textual ~~ c(1, NA)*textual
 speed ~~ c(1, NA)*speed
 
 #Estimate covariances among latent variables
 visual ~~ textual
 visual ~~ speed
 textual ~~ speed
 
 #Estimate residual variances of manifest variables
 x1 ~~ x1
 x2 ~~ x2
 x3 ~~ x3
 x4 ~~ x4
 x5 ~~ x5
 x6 ~~ x6
 x7 ~~ x7
 x8 ~~ x8
 x9 ~~ x9
 
 #Iteratively fix intercepts of manifest variables across groups
 x1 ~ c(intx1, intx1)*1
 x2 ~ NA*1
 x3 ~ NA*1
 x4 ~ c(intx1, intx1)*1
 x5 ~ NA*1
 x6 ~ NA*1
 x7 ~ NA*1
 x8 ~ NA*1
 x9 ~ NA*1
'

cfaModel_metricInvarianceV4_fit <- lavaan(
  cfaModel_metricInvarianceV4,
  data = HolzingerSwineford1939,
  group = "school",
  missing = "ML",
  estimator = "MLR")

anova(metricInvarianceModel_fit, cfaModel_metricInvarianceV4_fit)
16.10.8.1.5 Variable 5

Variable 5 has non-invariant intercepts.

Code
cfaModel_metricInvarianceV5 <- '
 #Fix factor loadings to be the same across groups
 visual  =~ c(lambdax1,lambdax1)*x1 + c(lambdax2,lambdax2)*x2 + c(lambdax3,lambdax3)*x3
 textual =~ c(lambdax4,lambdax4)*x4 + c(lambdax5,lambdax5)*x5 + c(lambdax6,lambdax6)*x6
 speed   =~ c(lambdax7,lambdax7)*x7 + c(lambdax8,lambdax8)*x8 + c(lambdax9,lambdax9)*x9
 
 #Fix latent means to zero
 visual ~ 0
 textual ~ 0
 speed ~ 0
 
 #Fix latent variances to one in group 1; free latent variances in group 2
 visual ~~ c(1, NA)*visual
 textual ~~ c(1, NA)*textual
 speed ~~ c(1, NA)*speed
 
 #Estimate covariances among latent variables
 visual ~~ textual
 visual ~~ speed
 textual ~~ speed
 
 #Estimate residual variances of manifest variables
 x1 ~~ x1
 x2 ~~ x2
 x3 ~~ x3
 x4 ~~ x4
 x5 ~~ x5
 x6 ~~ x6
 x7 ~~ x7
 x8 ~~ x8
 x9 ~~ x9
 
 #Iteratively fix intercepts of manifest variables across groups
 x1 ~ c(intx1, intx1)*1
 x2 ~ NA*1
 x3 ~ NA*1
 x4 ~ NA*1
 x5 ~ c(intx1, intx1)*1
 x6 ~ NA*1
 x7 ~ NA*1
 x8 ~ NA*1
 x9 ~ NA*1
'

cfaModel_metricInvarianceV5_fit <- lavaan(
  cfaModel_metricInvarianceV5,
  data = HolzingerSwineford1939,
  group = "school",
  missing = "ML",
  estimator = "MLR")

anova(metricInvarianceModel_fit, cfaModel_metricInvarianceV5_fit)
16.10.8.1.6 Variable 6

Variable 6 has non-invariant intercepts.

Code
cfaModel_metricInvarianceV6 <- '
 #Fix factor loadings to be the same across groups
 visual  =~ c(lambdax1,lambdax1)*x1 + c(lambdax2,lambdax2)*x2 + c(lambdax3,lambdax3)*x3
 textual =~ c(lambdax4,lambdax4)*x4 + c(lambdax5,lambdax5)*x5 + c(lambdax6,lambdax6)*x6
 speed   =~ c(lambdax7,lambdax7)*x7 + c(lambdax8,lambdax8)*x8 + c(lambdax9,lambdax9)*x9
 
 #Fix latent means to zero
 visual ~ 0
 textual ~ 0
 speed ~ 0
 
 #Fix latent variances to one in group 1; free latent variances in group 2
 visual ~~ c(1, NA)*visual
 textual ~~ c(1, NA)*textual
 speed ~~ c(1, NA)*speed
 
 #Estimate covariances among latent variables
 visual ~~ textual
 visual ~~ speed
 textual ~~ speed
 
 #Estimate residual variances of manifest variables
 x1 ~~ x1
 x2 ~~ x2
 x3 ~~ x3
 x4 ~~ x4
 x5 ~~ x5
 x6 ~~ x6
 x7 ~~ x7
 x8 ~~ x8
 x9 ~~ x9
 
 #Iteratively fix intercepts of manifest variables across groups
 x1 ~ c(intx1, intx1)*1
 x2 ~ NA*1
 x3 ~ NA*1
 x4 ~ NA*1
 x5 ~ NA*1
 x6 ~ c(intx1, intx1)*1
 x7 ~ NA*1
 x8 ~ NA*1
 x9 ~ NA*1
'

cfaModel_metricInvarianceV6_fit <- lavaan(
  cfaModel_metricInvarianceV6,
  data = HolzingerSwineford1939,
  group = "school",
  missing = "ML",
  estimator = "MLR")

anova(metricInvarianceModel_fit, cfaModel_metricInvarianceV6_fit)
16.10.8.1.7 Variable 7

Variable 7 has non-invariant intercepts.

Code
cfaModel_metricInvarianceV7 <- '
 #Fix factor loadings to be the same across groups
 visual  =~ c(lambdax1,lambdax1)*x1 + c(lambdax2,lambdax2)*x2 + c(lambdax3,lambdax3)*x3
 textual =~ c(lambdax4,lambdax4)*x4 + c(lambdax5,lambdax5)*x5 + c(lambdax6,lambdax6)*x6
 speed   =~ c(lambdax7,lambdax7)*x7 + c(lambdax8,lambdax8)*x8 + c(lambdax9,lambdax9)*x9
 
 #Fix latent means to zero
 visual ~ 0
 textual ~ 0
 speed ~ 0
 
 #Fix latent variances to one in group 1; free latent variances in group 2
 visual ~~ c(1, NA)*visual
 textual ~~ c(1, NA)*textual
 speed ~~ c(1, NA)*speed
 
 #Estimate covariances among latent variables
 visual ~~ textual
 visual ~~ speed
 textual ~~ speed
 
 #Estimate residual variances of manifest variables
 x1 ~~ x1
 x2 ~~ x2
 x3 ~~ x3
 x4 ~~ x4
 x5 ~~ x5
 x6 ~~ x6
 x7 ~~ x7
 x8 ~~ x8
 x9 ~~ x9
 
 #Iteratively fix intercepts of manifest variables across groups
 x1 ~ c(intx1, intx1)*1
 x2 ~ NA*1
 x3 ~ NA*1
 x4 ~ NA*1
 x5 ~ NA*1
 x6 ~ NA*1
 x7 ~ c(intx1, intx1)*1
 x8 ~ NA*1
 x9 ~ NA*1
'

cfaModel_metricInvarianceV7_fit <- lavaan(
  cfaModel_metricInvarianceV7,
  data = HolzingerSwineford1939,
  group = "school",
  missing = "ML",
  estimator = "MLR")

anova(metricInvarianceModel_fit, cfaModel_metricInvarianceV7_fit)
16.10.8.1.8 Variable 8

Variable 8 has non-invariant intercepts.

Code
cfaModel_metricInvarianceV8 <- '
 #Fix factor loadings to be the same across groups
 visual  =~ c(lambdax1,lambdax1)*x1 + c(lambdax2,lambdax2)*x2 + c(lambdax3,lambdax3)*x3
 textual =~ c(lambdax4,lambdax4)*x4 + c(lambdax5,lambdax5)*x5 + c(lambdax6,lambdax6)*x6
 speed   =~ c(lambdax7,lambdax7)*x7 + c(lambdax8,lambdax8)*x8 + c(lambdax9,lambdax9)*x9
 
 #Fix latent means to zero
 visual ~ 0
 textual ~ 0
 speed ~ 0
 
 #Fix latent variances to one in group 1; free latent variances in group 2
 visual ~~ c(1, NA)*visual
 textual ~~ c(1, NA)*textual
 speed ~~ c(1, NA)*speed
 
 #Estimate covariances among latent variables
 visual ~~ textual
 visual ~~ speed
 textual ~~ speed
 
 #Estimate residual variances of manifest variables
 x1 ~~ x1
 x2 ~~ x2
 x3 ~~ x3
 x4 ~~ x4
 x5 ~~ x5
 x6 ~~ x6
 x7 ~~ x7
 x8 ~~ x8
 x9 ~~ x9
 
 #Iteratively fix intercepts of manifest variables across groups
 x1 ~ c(intx1, intx1)*1
 x2 ~ NA*1
 x3 ~ NA*1
 x4 ~ NA*1
 x5 ~ NA*1
 x6 ~ NA*1
 x7 ~ NA*1
 x8 ~ c(intx1, intx1)*1
 x9 ~ NA*1
'

cfaModel_metricInvarianceV8_fit <- lavaan(
  cfaModel_metricInvarianceV8,
  data = HolzingerSwineford1939,
  group = "school",
  missing = "ML",
  estimator = "MLR")

anova(metricInvarianceModel_fit, cfaModel_metricInvarianceV8_fit)
16.10.8.1.9 Variable 9

Variable 9 has non-invariant intercepts.

Code
cfaModel_metricInvarianceV9 <- '
 #Fix factor loadings to be the same across groups
 visual  =~ c(lambdax1,lambdax1)*x1 + c(lambdax2,lambdax2)*x2 + c(lambdax3,lambdax3)*x3
 textual =~ c(lambdax4,lambdax4)*x4 + c(lambdax5,lambdax5)*x5 + c(lambdax6,lambdax6)*x6
 speed   =~ c(lambdax7,lambdax7)*x7 + c(lambdax8,lambdax8)*x8 + c(lambdax9,lambdax9)*x9
 
 #Fix latent means to zero
 visual ~ 0
 textual ~ 0
 speed ~ 0
 
 #Fix latent variances to one in group 1; free latent variances in group 2
 visual ~~ c(1, NA)*visual
 textual ~~ c(1, NA)*textual
 speed ~~ c(1, NA)*speed
 
 #Estimate covariances among latent variables
 visual ~~ textual
 visual ~~ speed
 textual ~~ speed
 
 #Estimate residual variances of manifest variables
 x1 ~~ x1
 x2 ~~ x2
 x3 ~~ x3
 x4 ~~ x4
 x5 ~~ x5
 x6 ~~ x6
 x7 ~~ x7
 x8 ~~ x8
 x9 ~~ x9
 
 #Iteratively fix intercepts of manifest variables across groups
 x1 ~ c(intx1, intx1)*1
 x2 ~ NA*1
 x3 ~ NA*1
 x4 ~ NA*1
 x5 ~ NA*1
 x6 ~ NA*1
 x7 ~ NA*1
 x8 ~ NA*1
 x9 ~ c(intx1, intx1)*1
'

cfaModel_metricInvarianceV9_fit <- lavaan(
  cfaModel_metricInvarianceV9,
  data = HolzingerSwineford1939,
  group = "school",
  missing = "ML",
  estimator = "MLR")

anova(metricInvarianceModel_fit, cfaModel_metricInvarianceV9_fit)
16.10.8.1.10 Summary

When iteratively adding constraints, item 1 showed measurement invariance of intercepts, but all other items showed measurement non-invariance of intercepts. This could be due to the selection of starting with item 1 as the first constraint. We could try applying constraints in a different order to see the extent to which the order influences which items are identified as showing measurement non-invariance.

16.10.8.2 Iteratively drop constraints to identify measurement non-invariance

Another approach to identify measurement non-invariance is to iteratively drop constraints to a model that constrains the given parameter (intercepts) to be the same across groups. So, we will use the scalar invariance model as the baseline model (in which item intercepts are constrained to be the same across groups), and we will iteratively drop constraints to identify the items for which measurement invariance becomes established (relative to the metric invariance model which allows item intercepts to differ across groups), and therefore, identifies which item(s) show non-invariant intercepts across groups.

16.10.8.2.1 Variable 1

Measurement invariance does not yet become established when freeing intercepts of variable 1 across groups. This suggests that, whether or not variable 1 shows non-invariant intercepts, there are other items that show non-invariant intercepts.

Code
cfaModel_scalarInvarianceV1 <- '
 #Fix factor loadings to be the same across groups
 visual  =~ c(lambdax1,lambdax1)*x1 + c(lambdax2,lambdax2)*x2 + c(lambdax3,lambdax3)*x3
 textual =~ c(lambdax4,lambdax4)*x4 + c(lambdax5,lambdax5)*x5 + c(lambdax6,lambdax6)*x6
 speed   =~ c(lambdax7,lambdax7)*x7 + c(lambdax8,lambdax8)*x8 + c(lambdax9,lambdax9)*x9
 
 #Fix latent means to zero
 visual ~ 0
 textual ~ 0
 speed ~ 0
 
 #Fix latent variances to one in group 1; free latent variances in group 2
 visual ~~ c(1, NA)*visual
 textual ~~ c(1, NA)*textual
 speed ~~ c(1, NA)*speed
 
 #Estimate covariances among latent variables
 visual ~~ textual
 visual ~~ speed
 textual ~~ speed
 
 #Estimate residual variances of manifest variables
 x1 ~~ x1
 x2 ~~ x2
 x3 ~~ x3
 x4 ~~ x4
 x5 ~~ x5
 x6 ~~ x6
 x7 ~~ x7
 x8 ~~ x8
 x9 ~~ x9
 
 #Iteratively free intercepts across groups
 x1 ~ NA*1
 x2 ~ c(intx2, intx2)*1
 x3 ~ c(intx3, intx3)*1
 x4 ~ c(intx4, intx4)*1
 x5 ~ c(intx5, intx5)*1
 x6 ~ c(intx6, intx6)*1
 x7 ~ c(intx7, intx7)*1
 x8 ~ c(intx8, intx8)*1
 x9 ~ c(intx9, intx9)*1
'

cfaModel_scalarInvarianceV1_fit <- lavaan(
  cfaModel_scalarInvarianceV1,
  data = HolzingerSwineford1939,
  group = "school",
  missing = "ML",
  estimator = "MLR")

anova(metricInvarianceModel_fit, cfaModel_scalarInvarianceV1_fit)
16.10.8.2.2 Variable 2

Measurement invariance does not yet become established when freeing intercepts of variable 2 across groups. This suggests that, whether or not variables 1 and 2 show non-invariant intercepts, there are other items that show non-invariant intercepts.

Code
cfaModel_scalarInvarianceV2 <- '
 #Fix factor loadings to be the same across groups
 visual  =~ c(lambdax1,lambdax1)*x1 + c(lambdax2,lambdax2)*x2 + c(lambdax3,lambdax3)*x3
 textual =~ c(lambdax4,lambdax4)*x4 + c(lambdax5,lambdax5)*x5 + c(lambdax6,lambdax6)*x6
 speed   =~ c(lambdax7,lambdax7)*x7 + c(lambdax8,lambdax8)*x8 + c(lambdax9,lambdax9)*x9
 
 #Fix latent means to zero
 visual ~ 0
 textual ~ 0
 speed ~ 0
 
 #Fix latent variances to one in group 1; free latent variances in group 2
 visual ~~ c(1, NA)*visual
 textual ~~ c(1, NA)*textual
 speed ~~ c(1, NA)*speed
 
 #Estimate covariances among latent variables
 visual ~~ textual
 visual ~~ speed
 textual ~~ speed
 
 #Estimate residual variances of manifest variables
 x1 ~~ x1
 x2 ~~ x2
 x3 ~~ x3
 x4 ~~ x4
 x5 ~~ x5
 x6 ~~ x6
 x7 ~~ x7
 x8 ~~ x8
 x9 ~~ x9
 
 #Iteratively free intercepts across groups
 x1 ~ NA*1
 x2 ~ NA*1
 x3 ~ c(intx3, intx3)*1
 x4 ~ c(intx4, intx4)*1
 x5 ~ c(intx5, intx5)*1
 x6 ~ c(intx6, intx6)*1
 x7 ~ c(intx7, intx7)*1
 x8 ~ c(intx8, intx8)*1
 x9 ~ c(intx9, intx9)*1
'

cfaModel_scalarInvarianceV2_fit <- lavaan(
  cfaModel_scalarInvarianceV2,
  data = HolzingerSwineford1939,
  group = "school",
  missing = "ML",
  estimator = "MLR")

anova(metricInvarianceModel_fit, cfaModel_scalarInvarianceV2_fit)
16.10.8.2.3 Variable 3

Measurement invariance does not yet become established when freeing intercepts of variable 3 across groups. This suggests that, whether or not variables 1, 2, and 3 show non-invariant intercepts, there are other items that show non-invariant intercepts.

Code
cfaModel_scalarInvarianceV3 <- '
 #Fix factor loadings to be the same across groups
 visual  =~ c(lambdax1,lambdax1)*x1 + c(lambdax2,lambdax2)*x2 + c(lambdax3,lambdax3)*x3
 textual =~ c(lambdax4,lambdax4)*x4 + c(lambdax5,lambdax5)*x5 + c(lambdax6,lambdax6)*x6
 speed   =~ c(lambdax7,lambdax7)*x7 + c(lambdax8,lambdax8)*x8 + c(lambdax9,lambdax9)*x9
 
 #Fix latent means to zero
 visual ~ 0
 textual ~ 0
 speed ~ 0
 
 #Fix latent variances to one in group 1; free latent variances in group 2
 visual ~~ c(1, NA)*visual
 textual ~~ c(1, NA)*textual
 speed ~~ c(1, NA)*speed
 
 #Estimate covariances among latent variables
 visual ~~ textual
 visual ~~ speed
 textual ~~ speed
 
 #Estimate residual variances of manifest variables
 x1 ~~ x1
 x2 ~~ x2
 x3 ~~ x3
 x4 ~~ x4
 x5 ~~ x5
 x6 ~~ x6
 x7 ~~ x7
 x8 ~~ x8
 x9 ~~ x9
 
 #Iteratively free intercepts across groups
 x1 ~ NA*1
 x2 ~ NA*1
 x3 ~ NA*1
 x4 ~ c(intx4, intx4)*1
 x5 ~ c(intx5, intx5)*1
 x6 ~ c(intx6, intx6)*1
 x7 ~ c(intx7, intx7)*1
 x8 ~ c(intx8, intx8)*1
 x9 ~ c(intx9, intx9)*1
'

cfaModel_scalarInvarianceV3_fit <- lavaan(
  cfaModel_scalarInvarianceV3,
  data = HolzingerSwineford1939,
  group = "school",
  missing = "ML",
  estimator = "MLR")

anova(metricInvarianceModel_fit, cfaModel_scalarInvarianceV3_fit)
16.10.8.2.4 Variable 4

Measurement invariance does not yet become established when freeing intercepts of variable 4 across groups. This suggests that, whether or not variables 1, 2, 3, and 4 show non-invariant intercepts, there are other items that show non-invariant intercepts.

Code
cfaModel_scalarInvarianceV4 <- '
 #Fix factor loadings to be the same across groups
 visual  =~ c(lambdax1,lambdax1)*x1 + c(lambdax2,lambdax2)*x2 + c(lambdax3,lambdax3)*x3
 textual =~ c(lambdax4,lambdax4)*x4 + c(lambdax5,lambdax5)*x5 + c(lambdax6,lambdax6)*x6
 speed   =~ c(lambdax7,lambdax7)*x7 + c(lambdax8,lambdax8)*x8 + c(lambdax9,lambdax9)*x9
 
 #Fix latent means to zero
 visual ~ 0
 textual ~ 0
 speed ~ 0
 
 #Fix latent variances to one in group 1; free latent variances in group 2
 visual ~~ c(1, NA)*visual
 textual ~~ c(1, NA)*textual
 speed ~~ c(1, NA)*speed
 
 #Estimate covariances among latent variables
 visual ~~ textual
 visual ~~ speed
 textual ~~ speed
 
 #Estimate residual variances of manifest variables
 x1 ~~ x1
 x2 ~~ x2
 x3 ~~ x3
 x4 ~~ x4
 x5 ~~ x5
 x6 ~~ x6
 x7 ~~ x7
 x8 ~~ x8
 x9 ~~ x9
 
 #Iteratively free intercepts across groups
 x1 ~ NA*1
 x2 ~ NA*1
 x3 ~ NA*1
 x4 ~ NA*1
 x5 ~ c(intx5, intx5)*1
 x6 ~ c(intx6, intx6)*1
 x7 ~ c(intx7, intx7)*1
 x8 ~ c(intx8, intx8)*1
 x9 ~ c(intx9, intx9)*1
'

cfaModel_scalarInvarianceV4_fit <- lavaan(
  cfaModel_scalarInvarianceV4,
  data = HolzingerSwineford1939,
  group = "school",
  missing = "ML",
  estimator = "MLR")

anova(metricInvarianceModel_fit, cfaModel_scalarInvarianceV4_fit)
16.10.8.2.5 Variable 5

Measurement invariance does not yet become established when freeing intercepts of variable 5 across groups. This suggests that, whether or not variables 1, 2, 3, 4, and 5 show non-invariant intercepts, there are other items that show non-invariant intercepts.

Code
cfaModel_scalarInvarianceV5 <- '
 #Fix factor loadings to be the same across groups
 visual  =~ c(lambdax1,lambdax1)*x1 + c(lambdax2,lambdax2)*x2 + c(lambdax3,lambdax3)*x3
 textual =~ c(lambdax4,lambdax4)*x4 + c(lambdax5,lambdax5)*x5 + c(lambdax6,lambdax6)*x6
 speed   =~ c(lambdax7,lambdax7)*x7 + c(lambdax8,lambdax8)*x8 + c(lambdax9,lambdax9)*x9
 
 #Fix latent means to zero
 visual ~ 0
 textual ~ 0
 speed ~ 0
 
 #Fix latent variances to one in group 1; free latent variances in group 2
 visual ~~ c(1, NA)*visual
 textual ~~ c(1, NA)*textual
 speed ~~ c(1, NA)*speed
 
 #Estimate covariances among latent variables
 visual ~~ textual
 visual ~~ speed
 textual ~~ speed
 
 #Estimate residual variances of manifest variables
 x1 ~~ x1
 x2 ~~ x2
 x3 ~~ x3
 x4 ~~ x4
 x5 ~~ x5
 x6 ~~ x6
 x7 ~~ x7
 x8 ~~ x8
 x9 ~~ x9
 
 #Iteratively free intercepts across groups
 x1 ~ NA*1
 x2 ~ NA*1
 x3 ~ NA*1
 x4 ~ NA*1
 x5 ~ NA*1
 x6 ~ c(intx6, intx6)*1
 x7 ~ c(intx7, intx7)*1
 x8 ~ c(intx8, intx8)*1
 x9 ~ c(intx9, intx9)*1
'

cfaModel_scalarInvarianceV5_fit <- lavaan(
  cfaModel_scalarInvarianceV5,
  data = HolzingerSwineford1939,
  group = "school",
  missing = "ML",
  estimator = "MLR")

anova(metricInvarianceModel_fit, cfaModel_scalarInvarianceV5_fit)
16.10.8.2.6 Variable 6

Measurement invariance does not yet become established when freeing intercepts of variable 6 across groups. This suggests that, whether or not variables 1, 2, 3, 4, 5, and 6 show non-invariant intercepts, there are other items that show non-invariant intercepts.

Code
cfaModel_scalarInvarianceV6 <- '
 #Fix factor loadings to be the same across groups
 visual  =~ c(lambdax1,lambdax1)*x1 + c(lambdax2,lambdax2)*x2 + c(lambdax3,lambdax3)*x3
 textual =~ c(lambdax4,lambdax4)*x4 + c(lambdax5,lambdax5)*x5 + c(lambdax6,lambdax6)*x6
 speed   =~ c(lambdax7,lambdax7)*x7 + c(lambdax8,lambdax8)*x8 + c(lambdax9,lambdax9)*x9
 
 #Fix latent means to zero
 visual ~ 0
 textual ~ 0
 speed ~ 0
 
 #Fix latent variances to one in group 1; free latent variances in group 2
 visual ~~ c(1, NA)*visual
 textual ~~ c(1, NA)*textual
 speed ~~ c(1, NA)*speed
 
 #Estimate covariances among latent variables
 visual ~~ textual
 visual ~~ speed
 textual ~~ speed
 
 #Estimate residual variances of manifest variables
 x1 ~~ x1
 x2 ~~ x2
 x3 ~~ x3
 x4 ~~ x4
 x5 ~~ x5
 x6 ~~ x6
 x7 ~~ x7
 x8 ~~ x8
 x9 ~~ x9
 
 #Iteratively free intercepts across groups
 x1 ~ NA*1
 x2 ~ NA*1
 x3 ~ NA*1
 x4 ~ NA*1
 x5 ~ NA*1
 x6 ~ NA*1
 x7 ~ c(intx7, intx7)*1
 x8 ~ c(intx8, intx8)*1
 x9 ~ c(intx9, intx9)*1
'

cfaModel_scalarInvarianceV6_fit <- lavaan(
  cfaModel_scalarInvarianceV6,
  data = HolzingerSwineford1939,
  group = "school",
  missing = "ML",
  estimator = "MLR")

anova(metricInvarianceModel_fit, cfaModel_scalarInvarianceV6_fit)
16.10.8.2.7 Variable 7

Measurement invariance becomes established when freeing intercepts of variable 7 across groups. This suggests that item 7 shows non-invariant intercepts across groups, and that measurement invariance holds when constraining items 8 and 9 to be invariant across groups. There may also be other items, especially among the preceding items (variables 1–6), that show non-invariant intercepts across groups.

Code
cfaModel_scalarInvarianceV7 <- '
 #Fix factor loadings to be the same across groups
 visual  =~ c(lambdax1,lambdax1)*x1 + c(lambdax2,lambdax2)*x2 + c(lambdax3,lambdax3)*x3
 textual =~ c(lambdax4,lambdax4)*x4 + c(lambdax5,lambdax5)*x5 + c(lambdax6,lambdax6)*x6
 speed   =~ c(lambdax7,lambdax7)*x7 + c(lambdax8,lambdax8)*x8 + c(lambdax9,lambdax9)*x9
 
 #Fix latent means to zero
 visual ~ 0
 textual ~ 0
 speed ~ 0
 
 #Fix latent variances to one in group 1; free latent variances in group 2
 visual ~~ c(1, NA)*visual
 textual ~~ c(1, NA)*textual
 speed ~~ c(1, NA)*speed
 
 #Estimate covariances among latent variables
 visual ~~ textual
 visual ~~ speed
 textual ~~ speed
 
 #Estimate residual variances of manifest variables
 x1 ~~ x1
 x2 ~~ x2
 x3 ~~ x3
 x4 ~~ x4
 x5 ~~ x5
 x6 ~~ x6
 x7 ~~ x7
 x8 ~~ x8
 x9 ~~ x9
 
 #Iteratively free intercepts across groups
 x1 ~ NA*1
 x2 ~ NA*1
 x3 ~ NA*1
 x4 ~ NA*1
 x5 ~ NA*1
 x6 ~ NA*1
 x7 ~ NA*1
 x8 ~ c(intx8, intx8)*1
 x9 ~ c(intx9, intx9)*1
'

cfaModel_scalarInvarianceV7_fit <- lavaan(
  cfaModel_scalarInvarianceV7,
  data = HolzingerSwineford1939,
  group = "school",
  missing = "ML",
  estimator = "MLR")

anova(metricInvarianceModel_fit, cfaModel_scalarInvarianceV7_fit)
16.10.8.2.8 Variable 8

Measurement invariance continues to hold when freeing intercepts of variable 8 across groups. This suggests that variable 8 does not show non-invariant intercepts across groups, at least when the other intercepts have been freed.

Code
cfaModel_scalarInvarianceV8 <- '
 #Fix factor loadings to be the same across groups
 visual  =~ c(lambdax1,lambdax1)*x1 + c(lambdax2,lambdax2)*x2 + c(lambdax3,lambdax3)*x3
 textual =~ c(lambdax4,lambdax4)*x4 + c(lambdax5,lambdax5)*x5 + c(lambdax6,lambdax6)*x6
 speed   =~ c(lambdax7,lambdax7)*x7 + c(lambdax8,lambdax8)*x8 + c(lambdax9,lambdax9)*x9
 
 #Fix latent means to zero
 visual ~ 0
 textual ~ 0
 speed ~ 0
 
 #Fix latent variances to one in group 1; free latent variances in group 2
 visual ~~ c(1, NA)*visual
 textual ~~ c(1, NA)*textual
 speed ~~ c(1, NA)*speed
 
 #Estimate covariances among latent variables
 visual ~~ textual
 visual ~~ speed
 textual ~~ speed
 
 #Estimate residual variances of manifest variables
 x1 ~~ x1
 x2 ~~ x2
 x3 ~~ x3
 x4 ~~ x4
 x5 ~~ x5
 x6 ~~ x6
 x7 ~~ x7
 x8 ~~ x8
 x9 ~~ x9
 
 #Iteratively free intercepts across groups
 x1 ~ NA*1
 x2 ~ NA*1
 x3 ~ NA*1
 x4 ~ NA*1
 x5 ~ NA*1
 x6 ~ NA*1
 x7 ~ NA*1
 x8 ~ NA*1
 x9 ~ c(intx9, intx9)*1
'

cfaModel_scalarInvarianceV8_fit <- lavaan(
  cfaModel_scalarInvarianceV8,
  data = HolzingerSwineford1939,
  group = "school",
  missing = "ML",
  estimator = "MLR")

anova(metricInvarianceModel_fit, cfaModel_scalarInvarianceV8_fit)
16.10.8.2.9 Variable 9

Measurement invariance continues to hold when freeing intercepts of variable 9 across groups (even after constraining intercepts of variable 8 across groups). This suggests that variable 9 does not show non-invariant intercepts across groups, at least when the other intercepts have been freed.

Code
cfaModel_scalarInvarianceV9 <- '
 #Fix factor loadings to be the same across groups
 visual  =~ c(lambdax1,lambdax1)*x1 + c(lambdax2,lambdax2)*x2 + c(lambdax3,lambdax3)*x3
 textual =~ c(lambdax4,lambdax4)*x4 + c(lambdax5,lambdax5)*x5 + c(lambdax6,lambdax6)*x6
 speed   =~ c(lambdax7,lambdax7)*x7 + c(lambdax8,lambdax8)*x8 + c(lambdax9,lambdax9)*x9
 
 #Fix latent means to zero
 visual ~ 0
 textual ~ 0
 speed ~ 0
 
 #Fix latent variances to one in group 1; free latent variances in group 2
 visual ~~ c(1, NA)*visual
 textual ~~ c(1, NA)*textual
 speed ~~ c(1, NA)*speed
 
 #Estimate covariances among latent variables
 visual ~~ textual
 visual ~~ speed
 textual ~~ speed
 
 #Estimate residual variances of manifest variables
 x1 ~~ x1
 x2 ~~ x2
 x3 ~~ x3
 x4 ~~ x4
 x5 ~~ x5
 x6 ~~ x6
 x7 ~~ x7
 x8 ~~ x8
 x9 ~~ x9
 
 #Iteratively free intercepts across groups
 x1 ~ NA*1
 x2 ~ NA*1
 x3 ~ NA*1
 x4 ~ NA*1
 x5 ~ NA*1
 x6 ~ NA*1
 x7 ~ NA*1
 x8 ~ c(intx8, intx8)*1
 x9 ~ NA*1
'

cfaModel_scalarInvarianceV9_fit <- lavaan(
  cfaModel_scalarInvarianceV9,
  data = HolzingerSwineford1939,
  group = "school",
  missing = "ML",
  estimator = "MLR")

anova(metricInvarianceModel_fit, cfaModel_scalarInvarianceV9_fit)
16.10.8.2.10 Summary

When iteratively dropping constraints, measurement invariance became established when freeing the intercepts for items 1–7 across groups. This suggests that items 7, 8, and 9 show measurement invariance of intercepts across groups. This could be due to the selection of starting with item 1 as the first constraint to be freed. We could try freeing constraints in a different order to see the extent to which the order influences which items are identified as showing measurement non-invariance.

16.10.8.3 Next Steps

After identifying which item(s) show measurement non-invariance, we would evaluate the measurement non-invariance according to theory and effect sizes. We might start with the item with the largest measurement non-invariance. If we deem the measurement non-invariance for this item is non-negligible, we have three primary options: (1) drop the item for both groups, (2) drop the item for one group but keep it for the other group, or (3) freely estimate parameters for the item across groups. We would do this iteratively for the remaining items, by magnitude, that show non-negligible measurement non-invariance. Thus, our model might show partial scalar invariance—some items might have intercepts that are constrained to be the same across groups, whereas other items might have intercepts that are allowed to differ across groups. Then, we would test subsequent measurement invariance models (e.g., residual invariance) while keeping the partially freed constaints from the partial scalar invariance model. Once we establish the best fitting model that makes as many constraints that are theoretically and empirically justified for the purposes of the study, we would use that model in subsequent tests. As a reminder, full or partial metric invariance is helpful for comparing associations across groups, whereas full or partial scalar invariance is helpful for comparing mean levels across groups.

16.11 Conclusion

Bias is a systematic error. Test bias refers to a systematic error (in measurement, prediction, etc.) as a function of group membership. The two broad categories of test bias include predictive bias and test structure bias. Predictive bias refers to differences between groups in the relation between the test and criterion—i.e., different intercepts and/or slopes of the regression line between the test and the criterion. Test structure bias refers to differences in the internal test characteristics across groups, and can be evaluated with approaches such as measurement invariance and differential item function. When detecting test bias, it is important to address it, and there are a number of ways of correcting for bias, including score adjustment and other approaches that do not involve score adjustment. However, even if a test does not show bias, it does not mean that the test is fair. Fairness is not a scientific question but rather a moral, societal, and ethical question; there are many different ways of operationalizing fairness. Fairness is a complex question, so do the best you can and try to minimize any negative impact of the assessment procedures.

16.12 Suggested Readings

Putnick & Bornstein (2016); N. S. Cole (1981); Fernández & Abe (2018); Reynolds & Suzuki (2012); Sackett & Wilk (1994); Jonson & Geisinger (2022)

16.13 Exercises

16.13.1 Questions

Note: Several of the following questions use data from the Children of the National Longitudinal Survey of Youth Survey (CNLSY). The CNLSY is a publicly available longitudinal data set provided by the Bureau of Labor Statistics (https://www.bls.gov/nls/nlsy79-children.htm#topical-guide; archived at https://perma.cc/EH38-HDRN). The CNLSY data file for these exercises is located on the book’s page of the Open Science Framework (https://osf.io/3pwza). Children’s behavior problems were rated in 1988 (time 1: T1) and then again in 1990 (time 2: T2) on the Behavior Problems Index (BPI). Below are the items corresponding to the Antisocial subscale of the BPI:

  1. cheats or tells lies
  2. bullies or is cruel/mean to others
  3. does not seem to feel sorry after misbehaving
  4. breaks things deliberately
  5. is disobedient at school
  6. has trouble getting along with teachers
  7. has sudden changes in mood or feeling
    1. Determine whether there is predictive bias, as a function of sex, in predicting antisocial behavior at T2 using antisocial behavior at T2. Describe any predictive bias observed.
    2. How could you address this predictive bias?
  1. Assume that children are selected to receive preventative services if their score on the Antisocial scale at T1 is > 6. Assume that children at T2 have an antisocial disorder if their score at T2 is > 6. Does the Antisocial scale show fairness in terms of equal outcomes between boys and girls?
  2. A lawyer argues in court that a test that is used to select students for college admissions should be able to be used because it has been demonstrated to show no evidence of bias against particular groups. However, you know that just because a measure may be unbiased does not mean that using the measure’s scores for that purpose is fair. How could such a measure be unbiased yet unfair?
  3. Fit a multi-group item response theory model to the Antisocial subscale at time 2, with the participant’s sex as a grouping factor.
    1. Does a model with item parameters constrained across sexes fit significantly worse than a model with item parameters that are allowed to vary across sexes?
    2. Starting with an unconstrained model that allows item parameters to vary across sexes, sequentially add constraints to item parameters across groups. Which items differed in discrimination across groups? Which items differed in severity across groups? Describe any differences.
    3. How could any differential item functioning be addressed?
  4. Examine the longitudinal measurement invariance of the Antisocial scale across T1 and T2. Use full information maximum likelihood (FIML) to account for missing data. Use robust standard errors to account for non-normally distributed data. Which type(s) of longitudinal measurement invariance failed? Which items and parameters appeared to differ across ages?

16.13.2 Answers

    1. There is predictive bias in terms of different slopes across sexes. The slope is steeper for boys than girls, indicating that the measure is not as predictively accurate for girls as it is for boys.
    2. You could address this predictive bias by using a different predictive measure for girls than boys, changing the criterion, or by using within-group norming.
  1. No, the Antisocial scale does not show equal outcomes in terms of selection rate. The selection ratio is much lower in girls \((0.10)\) compared to boys \((0.05)\).
  2. The measure could be unbiased if it shows the same intercepts and slopes across groups in predicting college performance. However, a measure can still be unfair even it is unbiased. If there are different types of errors in different groups, the measure may not be fair. For instance, if there are more false negative errors in Black applicants compared to White applicants and more false positive errors in White applicants than Black applicants, the measure would be unfair against Black applicants. Even if the percent accuracy of prediction is the same for Black and White applicants, the errors have different implications for each group—White applicants who would have failed in college are more likely to be accepted whereas strong Black applicants who would have succeeded are less likely to be accepted. To address the unfairness of the test, it will be important to use one of the various operationalizations of fairness: equal outcomes, equal opportunity, or equal odds.
    1. The model with item parameters constrained across sexes fit significantly worse than a model with item parameters that were allowed to vary across sexes, \(\chi^2[-19] = -119.15, p < .001\). This suggests that the item parameters show differential item functioning across boys and girls.
    2. No items showed significant differences in discrimination across sexes, suggesting that items did not differ in their relation to the latent factor across boys and girls. Items 1, 2, 3, 4, 5, and 6 differed in severity across sexes. Items showed greater severity for girls than boys. It took a higher level of antisocial behavior for a girl to be endorsed as engaging in one of the following behaviors: cheats or tells lies, bullies or is cruel/mean to others, does not seem to feel sorry after misbehaving, breaks things deliberately, is disobedient at school, and has trouble getting along with teachers.
    3. The differences in item severity across boys and girls could be addressed by resolving DIF. That is, the items that show severity DIF (items 1–6) would be allowed to differ in their severity parameter across boys and girls, but these items would be constrained to have the same discrimination across groups; non-DIF items would have severity and discrimination parameters constrained to be the same across groups.
  3. The configural invariance model fit well according to RMSEA \((0.09)\) and SRMR \((0.06)\), but fit less well according to CFI \((0.77)\). The metric invariance model did not fit significantly worse than the configural invariance model \((\Delta\chi^2[6] = 8.77, p = 0.187)\). However, the scalar invariance model fit significantly worse than the metric invariance model \((\Delta\chi^2[6] = 21.28, p = 0.002)\), and the residual invariance model fit significantly worse than the scalar invariance model \((\Delta\chi^2[7] = 42.80, p < .001)\). Therefore, scalar and residual invariance failed. The following items showed larger intercepts at T1 compared to T2: items 1, 2, 3, 4, 5, and 7. Only item 6 showed larger intercepts at T2 than T1. The following items showed a larger residual at T1 compared to T2: items 2–7. Only item 1 showed a larger residual at T2 compared to T1.

References

Aguinis, H., Culpepper, S. A., & Pierce, C. A. (2010). Revival of test bias research in preemployment testing. Journal of Applied Psychology, 95(4), 648–680. https://doi.org/10.1037/a0018714
American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (2014). Standards for educational and psychological testing. American Educational Research Association.
Barrash, J., Stillman, A., Anderson, S. W., Uc, E. Y., Dawson, J. D., & Rizzo, M. (2010). Prediction of driving ability with neuropsychological tests: Demographic adjustments diminish accuracy. Journal of the International Neuropsychological Society, 16(4), 679–686. https://doi.org/10.1017/S1355617710000470
Bauer, D. J., Belzak, W. C. M., & Cole, V. T. (2020). Simplifying the assessment of measurement invariance over multiple background variables: Using regularized moderated nonlinear factor analysis to detect differential item functioning. Structural Equation Modeling: A Multidisciplinary Journal, 27(1), 43–55. https://doi.org/10.1080/10705511.2019.1642754
Belzak, W. C. M., & Bauer, D. J. (2020). Improving the assessment of measurement invariance: Using regularization to select anchor items and identify differential item functioning. Psychological Methods, 25(6), 673–690. https://doi.org/10.1037/met0000253
Brown, R. T., Reynolds, C. R., & Whitaker, J. S. (1999). Bias in mental testing since bias in mental testing. School Psychology Quarterly, 14(3), 208–238. https://doi.org/10.1037/h0089007
Burlew, A. K., Peteet, B. J., McCuistian, C., & Miller-Roenigk, B. D. (2019). Best practices for researching diverse groups. American Journal of Orthopsychiatry, 89(3), 354–368. https://doi.org/10.1037/ort0000350
Camilli, G. (2013). Ongoing issues in test fairness. Educational Research and Evaluation, 19(2–3), 104–120. https://doi.org/10.1080/13803611.2013.767602
Chalmers, P. (2020). mirt: Multidimensional item response theory. https://CRAN.R-project.org/package=mirt
Chen, F. F. (2007). Sensitivity of goodness of fit indexes to lack of measurement invariance. Structural Equation Modeling: A Multidisciplinary Journal, 14(3), 464–504. https://doi.org/10.1080/10705510701301834
Cheng, Y., Shao, C., & Lathrop, Q. N. (2016). The mediated MIMIC model for understanding the underlying mechanism of DIF. Educational and Psychological Measurement, 76(1), 43–63. https://doi.org/10.1177/0013164415576187
Cheung, G. W., & Rensvold, R. B. (2002). Evaluating goodness-of-fit indexes for testing measurement invariance. Structural Equation Modeling: A Multidisciplinary Journal, 9(2), 233–255. https://doi.org/10.1207/s15328007sem0902_5
Clark, M. J., & Grandy, J. (1984). Sex differences in the academic performance of Scholastic Aptitude Test takers: College board report no. 84-8. College Board Publications.
Cole, N. S. (1981). Bias in testing. American Psychologist, 36(10), 1067–1077. https://doi.org/10.1037/0003-066X.36.10.1067
Cole, V., Gottfredson, N., & Giordano, M. (2018). aMNLFA: Automated fitting of moderated nonlinear factor analysis through the Mplus program. https://CRAN.R-project.org/package=aMNLFA
Committee on the General Aptitude Test Battery, Commission on Behavioral and Social Sciences and Education, & National Research Council. (1989). Fairness in employment testing: Validity generalization, minority issues, and the general aptitude test battery. National Academies Press.
Counsell, A., Cribbie, R. A., & Flora, D. B. (2020). Evaluating equivalence testing methods for measurement invariance. Multivariate Behavioral Research, 55(2), 312–328. https://doi.org/10.1080/00273171.2019.1633617
Curran, P. J., Howard, A. L., Bainter, S. A., Lane, S. T., & McGinley, J. S. (2014). The separation of between-person and within-person components of individual change over time: A latent curve model with structured residuals. Journal of Consulting and Clinical Psychology, 82, 8–94. https://doi.org/10.1037/a0035297
Dorans, N. J. (2017). Contributions to the quantitative assessment of item, test, and score fairness. In R. E. Bennett & M. von Davier (Eds.), Advancing human assessment (pp. 201–230). Springer, Cham.
Dueber, D. (2019). dmacs: Measurement nonequivalence effect size calculator. https://github.com/ddueber/dmacs
Epskamp, S. (2022). semPlot: Path diagrams and visual analysis of various SEM packages’ output. https://github.com/SachaEpskamp/semPlot
Fernández, A. L., & Abe, J. (2018). Bias in cross-cultural neuropsychological testing: Problems and possible solutions. Culture and Brain, 6(1), 1–35. https://doi.org/10.1007/s40167-017-0050-2
Fletcher, R. R., Nakeshimana, A., & Olubeko, O. (2021). Addressing fairness, bias, and appropriate use of artificial intelligence and machine learning in global health. Frontiers in Artificial Intelligence, 3(116). https://doi.org/10.3389/frai.2020.561802
Gipps, C., & Stobart, G. (2009). Fairness in assessment. In C. Wyatt-Smith & J. J. Cumming (Eds.), Educational assessment in the 21st century: Connecting theory and practice (pp. 105–118). Springer Netherlands. https://doi.org/10.1007/978-1-4020-9964-9_6
Gonzalez, O., & Pelham, W. E. (2021). When does differential item functioning matter for screening? A method for empirical evaluation. Assessment, 28(2), 446–456. https://doi.org/10.1177/1073191120913618
Gottfredson, L. S. (1994). The science and politics of race-norming. American Psychologist, 49(11), 955–963. https://doi.org/10.1037/0003-066X.49.11.955
Gottfredson, N. C., Cole, V. T., Giordano, M. L., Bauer, D. J., Hussong, A. M., & Ennett, S. T. (2019). Simplifying the implementation of modern scale scoring methods with an automated R package: Automated moderated nonlinear factor analysis (aMNLFA). Addictive Behaviors, 94, 65–73. https://doi.org/10.1016/j.addbeh.2018.10.031
Gunn, H. J., Grimm, K. J., & Edwards, M. C. (2020). Evaluation of six effect size measures of measurement non-invariance for continuous outcomes. Structural Equation Modeling: A Multidisciplinary Journal, 27(4), 503–514. https://doi.org/10.1080/10705511.2019.1689507
Hagquist, C. (2019). Explaining differential item functioning focusing on the crucial role of external information – an example from the measurement of adolescent mental health. BMC Medical Research Methodology, 19(1), 185. https://doi.org/10.1186/s12874-019-0828-3
Hagquist, C., & Andrich, D. (2017). Recent advances in analysis of differential item functioning in health research using the Rasch model. Health and Quality of Life Outcomes, 15(1), 181. https://doi.org/10.1186/s12955-017-0755-0
Hall, G. C. N., Bansal, A., & Lopez, I. R. (1999). Ethnicity and psychopathology: A meta-analytic review of 31 years of comparative MMPI/MMPI-2 research. Psychological Assessment, 11(2), 186–197. https://doi.org/10.1037/1040-3590.11.2.186
Han, K., Colarelli, S. M., & Weed, N. C. (2019). Methodological and statistical advances in the consideration of cultural diversity in assessment: A critical review of group classification and measurement invariance testing. Psychological Assessment, 31(12), 1481–1496. https://doi.org/10.1037/pas0000731
Helms, J. E. (2006). Fairness is not validity or cultural bias in racial-group assessment: A quantitative perspective. American Psychologist, 61(8), 845–859. https://doi.org/10.1037/0003-066X.61.8.845
Jensen, A. R. (1980). Précis of bias in mental testing. Behavioral and Brain Sciences, 3(3), 325–333. https://doi.org/10.1017/S0140525X00005161
Jonson, J. L., & Geisinger, K. F. (2022). Fairness in educational and psychological testing: Examining theoretical, research, practice, and policy implications of the 2014 standards. American Educational Research Association,.
Jorgensen, T. D., Kite, B. A., Chen, P.-Y., & Short, S. D. (2018). Permutation randomization methods for testing measurement equivalence and detecting differential item functioning in multiple-group confirmatory factor analysis. Psychological Methods, 23(4), 708–728. https://doi.org/10.1037/met0000152
Jorgensen, T. D., Pornprasertmanit, S., Schoemann, A. M., & Rosseel, Y. (2021). semTools: Useful tools for structural equation modeling. https://github.com/simsem/semTools/wiki
Kuncel, N. R., & Hezlett, S. A. (2010). Fact and fiction in cognitive ability testing for admissions and hiring decisions. Current Directions in Psychological Science, 19(6), 339–345. https://doi.org/10.1177/0963721410389459
Lai, M. H. C. (2021). Adjusting for measurement noninvariance with alignment in growth modeling. Multivariate Behavioral Research, 1–18. https://doi.org/10.1080/00273171.2021.1941730
Lee, K., Bull, R., & Ho, R. M. H. (2013). Developmental changes in executive functioning. Child Development, 84(6), 1933–1953. https://doi.org/10.1111/cdev.12096
Little, T. D. (2013). Longitudinal structural equation modeling. The Guilford Press.
Little, T. D., Preacher, K. J., Selig, J. P., & Card, N. A. (2007). New developments in latent variable panel analyses of longitudinal data. International Journal of Behavioral Development, 31(4), 357–365. https://doi.org/10.1177/0165025407077757
Little, T. D., Slegers, D. W., & Card, N. A. (2006). A non-arbitrary method of identifying and scaling latent variables in SEM and MACS models. Structural Equation Modeling, 13(1), 59–72. https://doi.org/10.1207/s15328007sem1301_3
Liu, Y., Millsap, R. E., West, S. G., Tein, J.-Y., Tanaka, R., & Grimm, K. J. (2017). Testing measurement invariance in longitudinal data with ordered-categorical measures. Psychological Methods, 22(3), 486–506. https://doi.org/10.1037/met0000075
Manly, J. J. (2005). Advantages and disadvantages of separate norms for African Americans. The Clinical Neuropsychologist, 19(2), 270–275. https://doi.org/10.1080/13854040590945346
Manly, J. J., & Echemendia, R. J. (2007). Race-specific norms: Using the model of hypertension to understand issues of race, culture, and education in neuropsychology. Archives of Clinical Neuropsychology, 22(3), 319–325. https://doi.org/10.1016/j.acn.2007.01.006
McClelland, D. C. (1973). Testing for competence rather than for “intelligence.” American Psychologist, 28, 1–14. https://doi.org/10.1037/h0034092
Meade, A. W. (2010). A taxonomy of effect size measures for the differential functioning of items and scales. Journal of Applied Psychology, 95(4), 728–743. https://doi.org/10.1037/a0018966
Melikyan, Z. A., Agranovich, A. V., & Puente, A. E. (2019). Fairness in psychological testing. In G. Goldstein, D. N. Allen, & J. DeLuca (Eds.), Handbook of psychological assessment (fourth edition) (pp. 551–572). Academic Press. https://doi.org/10.1016/B978-0-12-802203-0.00018-3
Millsap, R. E. (2011). Statistical approaches to measurement invariance. Taylor & Francis.
Muthén, L. K., & Muthén, B. O. (2019). Mplus version 8.4. Muthén & Muthén.
Nye, C. D., Bradburn, J., Olenick, J., Bialko, C., & Drasgow, F. (2019). How big are my effects? Examining the magnitude of effect sizes in studies of measurement equivalence. Organizational Research Methods, 22(3), 678–709. https://doi.org/10.1177/1094428118761122
Oberski, D. L. (2014). Evaluating sensitivity of parameters of interest to measurement invariance in latent variable models. Political Analysis, 22(1), 45–60. https://doi.org/10.1093/pan/mpt014
Oberski, D. L., Vermunt, J. K., & Moors, G. B. D. (2015). Evaluating measurement invariance in categorical data latent variable models with the EPC-interest. Political Analysis, 23(4), 550–563. https://doi.org/10.1093/pan/mpv020
Paulus, J. K., & Kent, D. M. (2020). Predictably unequal: Understanding and addressing concerns that algorithmic clinical prediction may increase health disparities. Npj Digital Medicine, 3(1), 99. https://doi.org/10.1038/s41746-020-0304-9
Petersen, I. T. (2024b). petersenlab: A collection of R functions by the Petersen Lab. https://doi.org/10.5281/zenodo.7602890
Petersen, I. T., Choe, D. E., & LeBeau, B. (2020). Studying a moving target in development: The challenge and opportunity of heterotypic continuity. Developmental Review, 58, 100935. https://doi.org/10.1016/j.dr.2020.100935
Putnick, D. L., & Bornstein, M. H. (2016). Measurement invariance conventions and reporting: The state of the art and future directions for psychological research. Developmental Review, 41, 71–90. https://doi.org/10.1016/j.dr.2016.06.004
Raykov, T., Marcoulides, G. A., Harrison, M., & Zhang, M. (2020). On the dependability of a popular procedure for studying measurement invariance: A cause for concern? Structural Equation Modeling: A Multidisciplinary Journal, 27(4), 649–656. https://doi.org/10.1080/10705511.2019.1610409
Reynolds, C. R., Altmann, R. A., & Allen, D. N. (2021). The problem of bias in psychological assessment. In C. R. Reynolds, R. A. Altmann, & D. N. Allen (Eds.), Mastering modern psychological testing: Theory and methods (pp. 573–613). Springer International Publishing. https://doi.org/10.1007/978-3-030-59455-8_15
Reynolds, C. R., & Suzuki, L. A. (2012). Bias in psychological assessment: An empirical review and recommendations. In I. B. Weiner, J. R. Graham, & J. A. Naglieri (Eds.), Handbook of psychology, Vol. 10: Assessment psychology, Part 1: Assessment issues (2nd ed., pp. 82–113).
Robitzsch, A. (2019). mnlfa: Moderated nonlinear factor analysis. https://CRAN.R-project.org/package=mnlfa
Rosseel, Y., Jorgensen, T. D., & Rockwood, N. (2022). lavaan: Latent variable analysis. https://lavaan.ugent.be
Sackett, P. R., Borneman, M. J., & Connelly, B. S. (2008). High stakes testing in higher education and employment: Appraising the evidence for validity and fairness. American Psychologist, 63, 215–227. https://doi.org/10.1037/0003-066X.63.4.215
Sackett, P. R., Schmitt, N., Ellingson, J. E., & Kabin, M. B. (2001). High-stakes testing in employment, credentialing, and higher education. American Psychologist, 56, 301–318. https://doi.org/10.1037/0003-066X.56.4.302
Sackett, P. R., & Wilk, S. L. (1994). Within-group norming and other forms of score adjustment in preemployment testing. American Psychologist, 49(11), 929–954. https://doi.org/10.1037/0003-066X.49.11.929
Schmidt, F. L., & Hunter, J. E. (1981). Employment testing: Old theories and new research findings. American Psychologist, 36(10), 1128–1137. https://doi.org/10.1037/0003-066X.36.10.1128
Silverberg, N. D., & Millis, S. R. (2009). Impairment versus deficiency in neuropsychological assessment: Implications for ecological validity. Journal of the International Neuropsychological Society, 15(1), 94–102. https://doi.org/10.1017/S1355617708090139
Thorndike, R. L. (1971). Concepts of culture-fairness. Journal of Educational Measurement, 8(2), 63–70. https://doi.org/10.1111/j.1745-3984.1971.tb00907.x
Van De Schoot, R., Kluytmans, A., Tummers, L., Lugtig, P., Hox, J., & Muthen, B. (2013). Facing off with scylla and charybdis: A comparison of scalar, partial, and the novel possibility of approximate measurement invariance. Frontiers in Psychology, 4(770). https://doi.org/10.3389/fpsyg.2013.00770
Van De Schoot, R., Schmidt, P., De Beuckelaer, A., Lek, K., & Zondervan-Zwijnenburg, M. (2015). Editorial: Measurement invariance. Frontiers in Psychology, 6(1064). https://doi.org/10.3389/fpsyg.2015.01064
Wang, T., Merkle, E. C., & Zeileis, A. (2014). Score-based tests of measurement invariance: Use in practice. Frontiers in Psychology, 5. https://doi.org/10.3389/fpsyg.2014.00438
Wang, W.-C., Shih, C.-L., & Yang, C.-C. (2009). The MIMIC method with scale purification for detecting differential item functioning. Educational and Psychological Measurement, 69(5), 713–731. https://doi.org/10.1177/0013164409332228
Youngstrom, E. A., & Van Meter, A. (2016). Empirically supported assessment of children and adolescents. Clinical Psychology: Science and Practice, 23(4), 327–347. https://doi.org/10.1111/cpsp.12172
Zieky, M. J. (2006). Fairness review in assessment. In S. M. Downing & T. M. Haladyna (Eds.), Handbook of test development (pp. 359–376). Routledge. https://doi.org/10.4324/9780203874776.ch16
Zieky, M. J. (2013). Fairness review in assessment. In K. F. Geisinger, B. A. Bracken, J. F. Carlson, J.-I. C. Hansen, N. R. Kuncel, S. P. Reise, & M. C. Rodriguez (Eds.), APA handbook of testing and assessment in psychology, Vol. 1: Test theory and testing and assessment in industrial and organizational psychology (pp. 293–302). American Psychological Association. https://doi.org/10.1037/14047-017

Feedback

Please consider providing feedback about this textbook, so that I can make it as helpful as possible. You can provide feedback at the following link: https://forms.gle/95iW4p47cuaphTek6