I want your feedback to make the book better for you and other readers. If you find typos, errors, or places where the text may be improved, please let me know. The best ways to provide feedback are by GitHub or hypothes.is annotations.
You can leave a comment at the bottom of the page/chapter, or open an issue or submit a pull request on GitHub: https://github.com/isaactpetersen/Fantasy-Football-Analytics-Textbook
Alternatively, you can leave an annotation using hypothes.is.
To add an annotation, select some text and then click the
symbol on the pop-up menu.
To see the annotations of others, click the
symbol in the upper right-hand corner of the page.
22 Factor Analysis
“All models are wrong, but some are useful.”
— George Box (1979, p. 202)
22.1 Getting Started
22.1.1 Load Packages
22.1.2 Load Data
22.1.3 Prepare Data
22.1.3.1 Season-Averages
Code
player_stats_seasonal_avgPerGame <- player_stats_seasonal %>%
mutate(
completionsPerGame = completions / games,
attemptsPerGame = attempts / games,
passing_yardsPerGame = passing_yards / games,
passing_tdsPerGame = passing_tds / games,
passing_interceptionsPerGame = passing_interceptions / games,
sacks_sufferedPerGame = sacks_suffered / games,
sack_yards_lostPerGame = sack_yards_lost / games,
sack_fumblesPerGame = sack_fumbles / games,
sack_fumbles_lostPerGame = sack_fumbles_lost / games,
passing_air_yardsPerGame = passing_air_yards / games,
passing_yards_after_catchPerGame = passing_yards_after_catch / games,
passing_first_downsPerGame = passing_first_downs / games,
#passing_epaPerGame = passing_epa / games,
#passing_cpoePerGame = passing_cpoe / games,
passing_2pt_conversionsPerGame = passing_2pt_conversions / games,
#pacrPerGame = pacr / games
pass_40_ydsPerGame = pass_40_yds / games,
pass_incPerGame = pass_inc / games,
#pass_comp_pctPerGame = pass_comp_pct / games,
fumblesPerGame = fumbles / games
)
nfl_nextGenStats_seasonal <- nfl_nextGenStats_weekly %>%
filter(season_type == "REG") %>%
select(-any_of(c("week","season_type","player_display_name","player_position","team_abbr","player_first_name","player_last_name","player_jersey_number","player_short_name"))) %>%
group_by(player_gsis_id, season) %>%
summarise(
across(everything(),
~ mean(.x, na.rm = TRUE)),
.groups = "drop")
22.1.3.2 Merge Data
22.1.3.3 Specify Variables
Code
faVars <- c(
"completionsPerGame","attemptsPerGame","passing_yardsPerGame","passing_tdsPerGame",
"passing_air_yardsPerGame","passing_yards_after_catchPerGame","passing_first_downsPerGame",
"avg_completed_air_yards","avg_intended_air_yards","aggressiveness","max_completed_air_distance",
"avg_air_distance","max_air_distance","avg_air_yards_to_sticks","passing_cpoe","pass_comp_pct",
"passer_rating","completion_percentage_above_expectation"
)
22.1.3.4 Subset Data
22.1.3.5 Standardize Variables
Standardizing variables is not strictly necessary in factor analysis; however, standardization can be helpful to prevent some variables from having considerably larger variances than others.
22.2 Overview of Factor Analysis
Factor analysis involves the estimation of latent variables. Latent variables are ways of studying and operationalizing theoretical constructs that cannot be directly observed or quantified. Factor analysis is a class of latent variable models that is designated to identify the structure of a measure or set of measures, and ideally, a construct or set of constructs. It aims to identify the optimal latent structure for a group of variables. The goal of factor analysis is to identify simple, parsimonious factors that underlie the “junk” (i.e., scores filled with measurement error) that we observe.
Factor analysis encompasses two general types: confirmatory factor analysis and exploratory factor analysis. Exploratory factor analysis (EFA) is a latent variable modeling approach that is used when the researcher has no a priori hypotheses about how a set of variables is structured. EFA seeks to identify the empirically optimal-fitting model in ways that balance accuracy (i.e., variance accounted for) and parsimony (i.e., simplicity). Confirmatory factor analysis (CFA) is a latent variable modeling approach that is used when a researcher wants to evaluate how well a hypothesized model fits, and the model can be examined in comparison to alternative models. Using a CFA approach, the researcher can pit models representing two theoretical frameworks against each other to see which better accounts for the observed data.
Factor analysis involves observed (manifest) variables and unobserved (latent) factors. Factor analysis assumes that the latent factor influences the manifest variables, and the latent factor therefore reflects the common variance among the variables. A factor model potentially includes factor loadings, residuals (errors or disturbances), intercepts/means, covariances, and regression paths. When depicting a factor analysis model, rectangles represent variables we observe (i.e., manifest variables), and circles represent latent (i.e., unobserved) variables. A regression path indicates a hypothesis that one variable (or factor) influences another, and it is depicted using a single-headed arrow. The standardized regression coefficient (i.e., beta or \(\beta\)) represents the strength of association between the variables or factors. A factor loading is a regression path from a latent factor to an observed (manifest) variable. The standardized factor loading represents the strength of association between the variable and the latent factor, where conceptually, it is intended to reflect the magnitude that the latent factor influences the observed variable. A residual is variance in a variable (or factor) that is unexplained by other variables or factors. A variable’s intercept is the expected value of the variable when the factor(s) (onto which it loads) is equal to zero. A covariance is the unstandardized index of the strength of association between between two variables (or factors), and it is depicted with a double-headed arrow.
Because a covariance is unstandardized, its scale depends on the scale of the variables. The covariance between two variables is the average product of their deviations from their respective means, as in Equation 10.1. The covariance of a variable with itself is equivalent to its variance, as in Equation 10.2. By contrast, a correlation is a standardized index of the strength of association between two variables. Because a correlation is standardized (fixed between [−1,1]), its scale does not depend on the scales of the variables. Because a covariance is unstandardized, its scale depends on the scale of the variables. A covariance path between two variables represents omitted shared cause(s) of the variables. For instance, if you depict a covariance path between two variables, it means that there is a shared cause of the two variables that is omitted from the model (for instance, if the common cause is not known or was not assessed).
In factor analysis, the relation between an indicator (\(\text{X}\)) and its underlying latent factor(s) (\(\text{F}\)) can be represented with a regression formula as in Equation 22.1:
\[ X = \lambda \cdot \text{F} + \text{Item Intercept} + \text{Error Term} \tag{22.1}\]
where:
- \(\text{X}\) is the observed value of the indicator
- \(\lambda\) is the factor loading, indicating the strength of the association between the indicator and the latent factor(s)
- \(\text{F}\) is the person’s value on the latent factor(s)
- \(\text{Item Intercept}\) represents the constant term that accounts for the expected value of the indicator when the latent factor(s) are zero
- \(\text{Error Term}\) is the residual, indicating the extent of variance in the indicator that is not explained by the latent factor(s)
When the latent factors are uncorrelated, the (standardized) error term for an indicator is calculated as 1 minus the sum of squared standardized factor loadings for a given item (including cross-loadings). A cross-loading is when a variable loads onto more than one latent factor.
Factor analysis is a powerful technique to help identify the factor structure that underlies a measure or construct. However, given the extensive method variance that influences scores on measure, factor analysis (and principal component analysis) tends to extract method factors. Method factors are factors that are related to the methods being assessed rather than the construct of interest. To better estimate construct factors, it is sometimes necessary to estimate both construct and method factors.
22.3 Path Diagrams
A key tool when designing a structural equation model is a conceptual depiction of the hypothesized causal processes. A path diagram depicts the hypothesized causal processes that link two or more variables. Karch (2025a) provides a tool to create lavaan
R
syntax from a path analytic diagram: https://lavaangui.org.
In a path analysis diagram, rectangles represent variables we observe, and circles represent latent (i.e., unobserved) variables. Single-headed arrows indicate regression paths, where conceptually, one variable is thought to influence another variable. Double-headed arrows indicate covariance paths, where conceptually, two variables are associated for some unknown reason (i.e., an omitted shared cause).
22.4 Decisions in Factor Analysis
There are five primary decisions to make in factor analysis:
- what variables to include in the model and how to scale them
- method of factor extraction: whether to use exploratory or confirmatory factor analysis
- if using exploratory factor analysis, whether and how to rotate factors
- how many factors to retain (and what variables load onto which factors)
- how to interpret and use the factors
The answer you get can differ highly depending on the decisions you make. Below, we provide guidance for each of these decisions.
22.4.1 1. Variables to Include and their Scaling
The first decision when conducting a factor analysis is which variables to include and the scaling of those variables. What factors you extract can differ widely depending on what variables you include in the analysis. For example, if you include many variables from the same source (e.g., self-report), it is possible that you will extract a factor that represents the common variance among the variables from that source (i.e., the self-reported variables). This would be considered a method factor, which works against the goal of estimating latent factors that represent the constructs of interest (as opposed to the measurement methods used to estimate those constructs).
An additional consideration is the scaling of the variables: whether to use raw variables, standardized variables, or dividing some variables by a constant to make the variables’ variances more similar. Before performing a principal component analysis (PCA), it is generally important to ensure that the variables included in the PCA are on the same scale. PCA seeks to identify components that explain variance in the data, so if the variables are not on the same scale, some variables may contribute considerably more variance than others. A common way of ensuring that variables are on the same scale is to standardize them using, for example, z-scores that have a mean of zero and standard deviation of one. By contrast, factor analysis can better accommodate variables that are on different scales.
22.4.2 2. Method of Factor Extraction
22.4.2.1 Exploratory Factor Analysis
Exploratory factor analysis (EFA) is used if you have no a priori hypotheses about the factor structure of the model, but you would like to understand the latent variables represented by your items.
EFA is partly induced from the data. You feed in the data and let the program build the factor model. You can set some parameters going in, including how to extract or rotate the factors. The factors are extracted from the data without specifying the number and pattern of loadings between the items and the latent factors (Bollen, 2002). All cross-loadings are freely estimated.
22.4.2.2 Confirmatory Factor Analysis
Confirmatory factor analysis (CFA) is used to (dis)confirm a priori hypotheses about the factor structure of the model. CFA is a test of the hypothesis. In CFA, you specify the model and ask how well this model represents the data. The researcher specifies the number, meaning, associations, and pattern of free parameters in the factor loading matrix (Bollen, 2002). A key advantage of CFA is the ability to directly compare alternative models (i.e., factor structures), which is valuable for theory testing (Strauss & Smith, 2009). For instance, you could use CFA to test whether the variance in several measures’ scores is best explained with one factor or two factors. In CFA, cross-loadings are not estimated unless the researcher specifies them.
22.4.3 3. Factor Rotation
When using EFA or principal component analysis, an important step is, possibly, to rotate the factors to make them more interpretable and simple, which is the whole goal. To interpret the results of a factor analysis, we examine the factor matrix. The columns refer to the different factors; the rows refer to the different observed variables. The cells in the table are the factor loadings—they are basically the correlation between the variable and the factor. Our goal is to achieve a model with simple structure because it is easily interpretable. Simple structure means that every variable loads perfectly on one and only one factor, as operationalized by a matrix of factor loadings with values of one and zero and nothing else. An example of a factor matrix that follows simple structure is depicted in Figure 22.1.
An example of a factor analysis model that follows simple structure is depicted in Figure 22.2. Each variable loads onto one and only one factor, which makes it easy to interpret the meaning of each factor, because a given factor represents the common variance among the items that load onto it.
However, pure simple structure only occurs in simulations, not in real-life data. In reality, our unrotated factor analysis model might look like the model in Figure 22.3. In this example, the factor analysis model does not show simple structure because the items have cross-loadings—that is, the items load onto more than one factor. The cross-loadings make it difficult to interpret the factors, because all of the items load onto all of the factors, so the factors are not very distinct from each other, which makes it difficult to interpret what the factors mean.
As a result of the challenges of interpretability caused by cross-loadings, factor rotations are often performed. An example of an unrotated factor matrix is in Figure 22.4.
In the example factor matrix in Figure 22.5, the factor analysis is not very helpful—it tells us very little because it did not distinguish between the two factors. The variables have similar loadings on Factor 1 and Factor 2. An example of a unrotated factor solution is in Figure 22.5. In the figure, all of the variables are in the midst of the quadrants—they are not on the factors’ axes. Thus, the factors are not very informative.
As a result, to improve the interpretability of the factor analysis, we can do what is called rotation. Rotation leverages the idea that there are infinite solutions to the factor analysis model that fit equally well. Rotation involves changing the orientation of the factors by changing the axes so that variables end up with very high (close to one or negative one) or very low (close to zero) loadings, so that it is clear which factors include which variables. That is, rotation rescales the factors and tries to identify the ideal solution (factor) for each variable. It searches for simple structure and keeps searching until it finds a minimum. After rotation, if the rotation was successful for imposing simple structure, each factor will have loadings close to one (or negative one) for some variables and close to zero for other variables. The goal of factor rotation is to achieve simple structure, to help make it easier to interpret the meaning of the factors.
To perform factor rotation, orthogonal rotations are often used. Orthogonal rotations make the rotated factors uncorrelated. An example of a commonly used orthogonal rotation is varimax rotation. Varimax rotation maximizes the sum of the variance of the squared loadings (i.e., so that items have either a very high or very low loading on a factor) and yields axes with a 90-degree angle.
An example of a factor matrix following an orthogonal rotation is depicted in Figure 22.6. An example of a factor solution following an orthogonal rotation is depicted in Figure 22.7.
An example of a factor matrix from SPSS following an orthogonal rotation is depicted in Figure 22.8.
An example of a factor structure from an orthogonal rotation is in Figure 22.9.
Sometimes, however, the two factors and their constituent variables may be correlated. Examples of two correlated factors may be depression and anxiety. When the two factors are correlated in reality, if we make them uncorrelated, this would result in an inaccurate model. Oblique rotation allows for factors to be correlated and yields axes with less an angle of less than 90 degrees. However, if the factors have low correlation (e.g., .2 or less), you can likely continue with orthogonal rotation. Nevertheless, just because an oblique rotation allows for correlated factors does not mean that the factors will be correlated, so oblique rotation provides greater flexibility than orthogonal rotation. An example of a factor structure from an oblique rotation is in Figure 22.10. Results from an oblique rotation are more complicated than orthogonal rotation—they provide lots of output and are more complicated to interpret. In addition, oblique rotation might not yield a smooth answer if you have a relatively small sample size.
As an example of rotation based on interpretability, consider the Five-Factor Model of Personality (the Big Five), which goes by the acronym, OCEAN: Openness, Conscientiousness, Extraversion, Agreeableness, and Neuroticism. Although the five factors of personality are somewhat correlated, we can use rotation to ensure they are maximally independent. Upon rotation, extraversion and neuroticism are essentially uncorrelated, as depicted in Figure 22.11. The other pole of extraversion is intraversion and the other pole of neuroticism might be emotional stability or calmness.
Simple structure is achieved when each variable loads highly onto as few factors as possible (i.e., each item has only one significant or primary loading). Oftentimes this is not the case, so we choose our rotation method in order to decide if the factors can be correlated (an oblique rotation) or if the factors will be uncorrelated (an orthogonal rotation). If the factors are not correlated with each other, use an orthogonal rotation. The correlation between an item and a factor is a factor loading, which is simply a way to ask how much a variable is correlated with the underlying factor. However, its interpretation is more complicated if there are correlated factors!
An orthogonal rotation (e.g., varimax) can help with simplicity of interpretation because it seeks to yield simple structure without cross-loadings. Cross-loadings are instances where a variable loads onto multiple factors. My recommendation would always be to use an orthogonal rotation if you have reason to believe that finding simple structure in your data is possible; otherwise, the factors are extremely difficult to interpret—what exactly does a cross-loading even mean? However, you should always try an oblique rotation, too, to see how strongly the factors are correlated. Examples of oblique rotations include oblimin and promax.
22.4.4 4. Determining the Number of Factors to Retain
A goal of factor analysis and principal component analysis is simplification or parsimony, while still explaining as much variance as possible. The hope is that you can have fewer factors that explain the associations between the variables than the number of observed variables. It does not make sense to replace 18 variables with 18 latent factors because that would not result in any simplification. But how do you decide on the number of factors?
There are a number of criteria that one can use to help determine how many factors/components to keep:
- Kaiser-Guttman criterion: factors with eigenvalues greater than zero
- or, for principal component analysis, components with eigenvalues greater than 1
- Cattell’s scree test: the “elbow” in a scree plot minus one; sometimes operationalized with optimal coordinates (OC) or the acceleration factor (AF)
- Parallel analysis: factors that explain more variance than randomly simulated data
- Very simple structure (VSS) criterion: larger is better
- Velicer’s minimum average partial (MAP) test: smaller is better
- Akaike information criterion (AIC): smaller is better
- Bayesian information criterion (BIC): smaller is better
- Sample size-adjusted BIC (SABIC): smaller is better
- Root mean square error of approximation (RMSEA): smaller is better
- Chi-square difference test: smaller is better; a significant test indicates that the more complex model is significantly better fitting than the less complex model
- Standardized root mean square residual (SRMR): smaller is better
- Comparative Fit Index (CFI): larger is better
- Tucker Lewis Index (TLI): larger is better
There is not necessarily a “correct” criterion to use in determining how many factors to keep, so it is generally recommended that researchers use multiple criteria in combination with theory and interpretability.
A scree plot provides lots of information. A scree plot has the factor number on the x-axis and the eigenvalue on the y-axis. The eigenvalue is the variance accounted for by a factor; when using a varimax (orthogonal) rotation, an eigenvalue (or factor variance) is calculated as the sum of squared standardized factor (or component) loadings on that factor. An example of a scree plot is in Figure 22.12.
The total variance is equal to the number of variables you have, so one eigenvalue is approximately one variable’s worth of variance. The first factor accounts for the most variance, the second factor accounts for the second-most variance, and so on. The more factors you add, the less variance is explained by the additional factor.
One criterion for how many factors to keep is the Kaiser-Guttman criterion. According to the Kaiser-Guttman criterion, you should keep any factors whose eigenvalue is greater than 1. That is, for the sake of simplicity, parsimony, and data reduction, you should take any factors that explain more than a single variable would explain. According to the Kaiser-Guttman criterion, we would keep three factors from Figure 22.12 that have eigenvalues greater than 1. The default in SPSS is to retain factors with eigenvalues greater than 1. However, keeping factors whose eigenvalue is greater than 1 is not the most correct rule. If you let SPSS do this, you may get many factors with eigenvalues around 1 (e.g., factors with an eigenvalue ~ 1.0001) that are not adding so much that it is worth the added complexity. The Kaiser-Guttman criterion usually results in keeping too many factors. Factors with small eigenvalues around 1 could reflect error shared across variables. For instance, factors with small eigenvalues could reflect method variance (i.e., method factor), such as a self-report factor that turns up as a factor in factor analysis, but that may be useless to you as a conceptual factor of a construct of interest.
Another criterion is Cattell’s scree test, which involves selecting the number of factors from looking at the scree plot. “Scree” refers to the rubble of stones at the bottom of a mountain. According to Cattell’s scree test, you should keep the factors before the last steep drop in eigenvalues—i.e., the factors before the rubble, where the slope approaches zero. The beginning of the scree (or rubble), where the slope approaches zero, is called the “elbow” of a scree plot. Using Cattell’s scree test, you retain the number of factors that explain the most variance prior to the explained variance drop-off, because, ultimately, you want to include only as many factors in which you gain substantially more by the inclusion of these factors. That is, you would keep the number of factors at the elbow of the scree plot minus one. If the last steep drop occurs from Factor 4 to Factor 5 and the elbow is at Factor 5, we would keep four factors. In Figure 22.12, the last steep drop in eigenvalues occurs from Factor 3 to Factor 4; the elbow of the scree plot occurs at Factor 4. We would keep the number of factors at the elbow minus one. Thus, using Cattell’s scree test, we would keep three factors based on Figure 22.12.
There are more sophisticated ways of using a scree plot, but they usually end up at a similar decision. Examples of more sophisticated tests include parallel analysis and very simple structure (VSS) plots. In a parallel analysis, you examine where the eigenvalues from observed data and random data converge, so you do not retain a factor that explains less variance than would be expected by random chance. Using the VSS criterion, the optimal number of factors to retain is the number of factors that maximizes the VSS criterion (Revelle & Rocklin, 1979). The VSS criterion is evaluated with models in which factor loadings for a given item that are less than the maximum factor loading for that item are suppressed to zero, thus forcing simple structure (i.e., no cross-loadings). The goal is finding a factor structure with interpretability so that factors are clearly distinguishable. Thus, we want to identify the number of factors with the highest VSS criterion.
In general, my recommendation is to use Cattell’s scree test, and then test the factor solutions with plus or minus one factor, in addition to examining model fit. You should never accept factors with eigenvalues less than zero (or components from PCA with eigenvalues less than one), because they are likely to be largely composed of error. If you are using maximum likelihood factor analysis, you can compare the fit of various models with model fit criteria to see which model fits best for its parsimony. A model will always fit better when you add additional parameters or factors, so you examine if there is significant improvement in model fit when adding the additional factor—that is, we keep adding complexity until additional complexity does not buy us much. Always try a factor solution that is one less and one more than suggested by Cattell’s scree test to buffer your final solution because the purpose of factor analysis is to explain things and to have interpretability. Even if all rules or indicators suggest to keep X number of factors, maybe \(\pm\) one factor helps clarify things. Even though factor analysis is empirical, theory and interpretatability should also inform decisions.
22.4.4.1 Model Fit Indices
In factor analysis, we fit a model to observed data, or to the variance-covariance matrix, and we evaluate the degree of model misfit. That is, fit indices evaluate how likely it is that a given causal model gave rise to the observed data. Various model fit indices can be used for evaluating how well a model fits the data and for comparing the fit of two competing models. Fit indices known as absolute fit indices compare whether the model fits better than the best-possible fitting model (i.e., a saturated model). Examples of absolute fit indices include the chi-square test, root mean square error of approximation (RMSEA), and the standardized root mean square residual (SRMR).
The chi-square test evaluates whether the model has a significant degree of misfit relative to the best-possible fitting model (a saturated model that fits as many parameters as possible; i.e., as many parameters as there are degrees of freedom); the null hypothesis of a chi-square test is that there is no difference between the predicted data (i.e., the data that would be observed if the model were true) and the observed data. Thus, a non-significant chi-square test indicates good model fit. However, because the null hypothesis of the chi-square test is that the model-implied covariance matrix is exactly equal to the observed covariance matrix (i.e., a model of perfect fit), this may be an unrealistic comparison. Models are simplifications of reality, and our models are virtually never expected to be a perfect description of reality. Thus, we would say a model is “useful” and partially validated if “it helps us to understand the relation between variables and does a ‘reasonable’ job of matching the data…A perfect fit may be an inappropriate standard, and a high chi-square estimate may indicate what we already know—that the hypothesized model holds approximately, not perfectly.” (Bollen, 1989, p. 268). The power of the chi-square test depends on sample size, and a large sample will likely detect small differences as significantly worse than the best-possible fitting model (Bollen, 1989).
RMSEA is an index of absolute fit. Lower values indicate better fit.
SRMR is an index of absolute fit with no penalty for model complexity. Lower values indicate better fit.
There are also various fit indices known as incremental, comparative, or relative fit indices that compare whether the model fits better than the worst-possible fitting model (i.e., a “baseline” or “null” model). Incremental fit indices include a chi-square difference test, the comparative fit index (CFI), and the Tucker-Lewis index (TLI). Unlike the chi-square test comparing the model to the best-possible fitting model, a significant chi-square test of the relative fit index indicates better fit—i.e., that the model fits better than the worst-possible fitting model.
CFI is another relative fit index that compares the model to the worst-possible fitting model. Higher values indicate better fit.
TLI is another relative fit index. Higher values indicate better fit.
Parsimony fit include fit indices that use information criteria fit indices, including the Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC). BIC penalizes model complexity more so than AIC. Lower AIC and BIC values indicate better fit.
Chi-square difference tests and CFI can be used to compare two nested models. AIC and BIC can be used to compare two non-nested models.
Criteria for acceptable fit and good fit of SEM models are in Table 22.1. In addition, dynamic fit indexes have been proposed based on simulation to identify fit index cutoffs that are tailored to the characteristics of the specific model and data (McNeish & Wolf, 2023); dynamic fit indexes are available via the dynamic
package (Wolf & McNeish, 2022) or with a webapp.
SEM Fit Index | Acceptable Fit | Good Fit |
---|---|---|
RMSEA | \(\leq\) .08 | \(\leq\) .05 |
CFI | \(\geq\) .90 | \(\geq\) .95 |
TLI | \(\geq\) .90 | \(\geq\) .95 |
SRMR | \(\leq\) .10 | \(\leq\) .08 |
However, good model fit does not necessarily indicate a true model.
In addition to global fit indices, it can also be helpful to examine evidence of local fit, such as the residual covariance matrix. The residual covariance matrix represents the difference between the observed covariance matrix and the model-implied covariance matrix (the observed covariance matrix minus the model-implied covariance matrix). These difference values are called covariance residuals. Standardizing the covariance matrix by converting each to a correlation matrix can be helpful for interpreting the magnitude of any local misfit. This is known as a residual correlation matrix, which is composed of correlation residuals. Correlation residuals greater than |.10| are possible evidence for poor local fit (Kline, 2023). If a correlation residual is positive, it suggests that the model underpredicts the observed association between the two variables (i.e., the observed covariance is greater than the model-implied covariance). If a correlation residual is negative, it suggests that the model overpredicts their observed association between the two variables (i.e., the observed covariance is smaller than the model-implied covariance). If the two variables are connected by only indirect pathways, it may be helpful to respecify the model with direct pathways between the two variables, such as a direct effect (i.e., regression path) or a covariance path. For guidance on evaluating local fit, see Kline (2024).
22.4.5 5. Interpreting and Using Latent Factors
The next step is interpreting the model and latent factors. One data matrix can lead to many different (correct) models—you must choose one based on the factor structure and theory. Use theory to interpret the model and label the factors. In latent variable models, factors have meaning. You can use them as predictors, mediators, moderators, or outcomes. When possible, it is preferable to use the factors by examining their associations with other variables in the same model. You can extract factor scores, if necessary, for use in other analyses; however, it is preferable to examine the associations between factors and other variables in the same model if possible. And, using latent factors helps disattenuate associations for measurement error, to identify what the association is between variables when removing random measurement error.
22.5 Example of Exploratory Factor Analysis
We generated the scree plot in Figure 22.13 using the fa.parallel()
function of the psych
package (Revelle, 2025). The optimal coordinates and the acceleration factor attempt to operationalize the Cattell scree test: i.e., the “elbow” of the scree plot (Ruscio & Roche, 2012). The optimal coordinators factor is quantified using a series of linear equations to determine whether observed eigenvalues exceed the predicted values. The acceleration factor is quantified using the acceleration of the curve, that is, the second derivative. The Kaiser-Guttman rule states to keep principal components whose eigenvalues are greater than 1. However, for exploratory factor analysis (as opposed to PCA), the criterion is to keep the factors whose eigenvalues are greater than zero (i.e., not the factors whose eigenvalues are greater than 1) (Dinno, 2014).
The number of factors to keep would depend on which criteria one uses. Based on the rule to keep factors whose eigenvalues are greater than zero and based on the parallel test, we would keep five factors. However, based on the Cattell scree test (the “elbow” of the screen plot minus one), we would keep three factors. If using the optimal coordinates, we would keep eight factors; if using the acceleration factor, we would keep one factor. Therefore, interpretability of the factors would be important for deciding how many factors to keep.
Code
Parallel analysis suggests that the number of factors = 5 and the number of components = NA
We generated the scree plot in Figure 22.14 using the nScree()
and plotnScree()
functions of the nFactors
package (Raiche & Magis, 2025).
Code
#screeDataEFA <- nFactors::nScree(
# x = cor( # throws error with correlation matrix, so use covariance instead (below)
# dataForFA[faVars],
# use = "pairwise.complete.obs"),
# model = "factors")
screeDataEFA <- nFactors::nScree(
x = cov(
dataForFA[faVars],
use = "pairwise.complete.obs"),
cor = FALSE,
model = "factors")
nFactors::plotnScree(screeDataEFA)
We generated the very simple structure (VSS) plots in Figures 22.15 and 22.16 using the vss()
and nfactors()
functions of the psych
package (Revelle, 2025). In addition to VSS plots, the output also provides additional criteria by which to determine the optimal number of factors, each for which lower values are better, including the Velicer minimum average partial (MAP) test, the Bayesian information criterion (BIC), the sample size-adjusted BIC (SABIC), and the root mean square error of approximation (RMSEA). Depending on the criterion, the optimal number of factors extracted varies between 2 and 8 factors.
Very Simple Structure
Call: psych::vss(x = dataForFA[faVars], rotate = "oblimin", fm = "minres")
VSS complexity 1 achieves a maximimum of 0.79 with 2 factors
VSS complexity 2 achieves a maximimum of 0.87 with 2 factors
The Velicer MAP achieves a minimum of 0.13 with 3 factors
BIC achieves a minimum of 101984.9 with 8 factors
Sample Size adjusted BIC achieves a minimum of 102102.5 with 8 factors
Statistics by number of factors
vss1 vss2 map dof chisq prob sqresid fit RMSEA BIC SABIC complex
1 0.67 0.00 0.19 135 151248 0 26.4 0.67 0.75 150221 150650 1.0
2 0.79 0.87 0.16 118 138098 0 10.5 0.87 0.76 137201 137576 1.2
3 0.77 0.86 0.13 102 NaN NaN 8.7 0.89 NA NA NA 1.2
4 0.77 0.86 0.14 87 134790 0 8.1 0.90 0.88 134129 134406 1.3
5 0.73 0.81 0.32 73 157011 0 10.0 0.87 1.04 156456 156687 1.3
6 0.69 0.78 0.49 60 NaN NaN 11.8 0.85 NA NA NA 1.4
7 0.62 0.76 1.55 48 NaN NaN 13.4 0.83 NA NA NA 1.6
8 0.48 0.68 NaN 37 102266 0 17.6 0.78 1.17 101985 102102 1.6
eChisq SRMR eCRMS eBIC
1 35972 0.242 0.258 34946
2 10583 0.131 0.150 9686
3 1653 0.052 0.064 877
4 975 0.040 0.053 314
5 492 0.028 0.041 -63
6 238 0.020 0.032 -218
7 116 0.014 0.025 -249
8 39 0.008 0.016 -242
Code
Number of factors
Call: vss(x = x, n = n, rotate = rotate, diagonal = diagonal, fm = fm,
n.obs = n.obs, plot = FALSE, title = title, use = use, cor = cor)
VSS complexity 1 achieves a maximimum of 0.79 with 2 factors
VSS complexity 2 achieves a maximimum of 0.87 with 2 factors
The Velicer MAP achieves a minimum of 0.13 with 3 factors
Empirical BIC achieves a minimum of -249.36 with 7 factors
Sample Size adjusted BIC achieves a minimum of 102102.5 with 8 factors
Statistics by number of factors
vss1 vss2 map dof chisq prob sqresid fit RMSEA BIC SABIC complex
1 0.67 0.00 0.19 135 151248 0 26.4 0.67 0.75 150221 150650 1.0
2 0.79 0.87 0.16 118 138098 0 10.5 0.87 0.76 137201 137576 1.2
3 0.77 0.86 0.13 102 NaN NaN 8.7 0.89 NA NA NA 1.2
4 0.77 0.86 0.14 87 134790 0 8.1 0.90 0.88 134129 134406 1.3
5 0.73 0.81 0.32 73 157011 0 10.0 0.87 1.04 156456 156687 1.3
6 0.69 0.78 0.49 60 NaN NaN 11.8 0.85 NA NA NA 1.4
7 0.62 0.76 1.55 48 NaN NaN 13.4 0.83 NA NA NA 1.6
8 0.48 0.68 NaN 37 102266 0 17.6 0.78 1.17 101985 102102 1.6
9 0.48 0.67 NaN 27 103297 0 18.6 0.77 1.38 103092 103177 1.6
10 0.33 0.49 NaN 18 132907 0 28.0 0.65 1.92 132770 132827 1.8
11 0.35 0.48 NaN 10 NaN NaN 30.1 0.62 NA NA NA 1.8
12 0.41 0.50 NaN 3 NaN NaN 28.5 0.64 NA NA NA 1.6
13 0.40 0.47 NaN -3 NaN NA 31.3 0.61 NA NA NA 1.7
14 0.41 0.45 NaN -8 NaN NA 30.8 0.61 NA NA NA 1.9
15 0.35 0.40 NaN -12 NaN NA 35.9 0.55 NA NA NA 2.0
16 0.35 0.40 NaN -15 NaN NA 35.9 0.55 NA NA NA 2.0
17 0.35 0.40 NaN -17 NaN NA 35.9 0.55 NA NA NA 2.0
18 0.35 0.40 NA -18 NaN NA 35.9 0.55 NA NA NA 2.0
eChisq SRMR eCRMS eBIC
1 35971.9 0.2423 0.258 34946
2 10583.1 0.1314 0.150 9686
3 1652.9 0.0519 0.064 877
4 975.3 0.0399 0.053 314
5 492.1 0.0283 0.041 -63
6 238.5 0.0197 0.032 -218
7 115.5 0.0137 0.025 -249
8 39.2 0.0080 0.016 -242
9 25.1 0.0064 0.015 -180
10 18.7 0.0055 0.016 -118
11 14.7 0.0049 0.019 -61
12 12.0 0.0044 0.032 -11
13 10.2 0.0041 NA NA
14 9.6 0.0040 NA NA
15 9.5 0.0039 NA NA
16 9.5 0.0039 NA NA
17 9.5 0.0039 NA NA
18 9.5 0.0039 NA NA
We fit EFA models using the efa()
function of the lavaan
package (Rosseel, 2012; Rosseel et al., 2024).
The model fits well according to CFI with 5 or more factors; the model fits well according to RMSEA with 6 or more factors. However, 5+ factors does not represent much of a simplification (relative to the 18 variables included). Moreover, in the model with four factors, only one variable had a significant loading on Factor 1. Thus, even with four factors, one of the factors does not seem to represent the aggregation of multiple variables. For these reasons, and because the “elbow test” for the scree plot suggested three factors, we decided to retain three factors and to see if we could achieve better fit by making additional model modifications (e.g., correlated residuals). Correlated residuals may be necessary, for example, when variables are correlated for reasons other than the latent factors.
This is lavaan 0.6-19 -- running exploratory factor analysis
Estimator ML
Rotation method GEOMIN OBLIQUE
Geomin epsilon 0.001
Rotation algorithm (rstarts) GPA (30)
Standardized metric TRUE
Row weights None
Number of observations 2002
Number of missing patterns 4
Overview models:
aic bic sabic chisq df pvalue cfi rmsea
nfactors = 1 125185.2 125487.7 125316.1 9230.262 135 0 0.220 0.572
nfactors = 2 117980.3 118378.0 118152.4 5189.697 118 0 0.662 0.406
nfactors = 3 114054.9 114542.3 114265.9 3191.805 102 0 0.726 0.403
nfactors = 4 110987.8 111559.2 111235.1 1451.682 87 0 0.896 0.287
nfactors = 5 109938.6 110588.5 110219.9 745.228 73 0 1.000 0.119
nfactors = 6 109666.4 110389.0 109979.2 624.068 60 0 1.000 0.048
nfactors = 7 109305.8 110095.7 109647.7 522.437 48 0 1.000 0.000
Eigenvalues correlation matrix:
ev1 ev2 ev3 ev4 ev5 ev6 ev7 ev8
6.65346 6.50854 2.96513 0.53174 0.39828 0.33664 0.20872 0.11431
ev9 ev10 ev11 ev12 ev13 ev14 ev15 ev16
0.08090 0.05503 0.04064 0.03101 0.02321 0.02024 0.01553 0.00884
ev17 ev18
0.00578 0.00200
Number of factors: 1
Standardized loadings: (* = significant at 1% level)
f1 unique.var communalities
completionsPerGame 0.987* 0.026 0.974
attemptsPerGame 0.974* 0.051 0.949
passing_yardsPerGame 0.994* 0.011 0.989
passing_tdsPerGame 0.865* 0.252 0.748
passing_air_yardsPerGame 0.694* 0.519 0.481
passing_yards_after_catchPerGame 0.773* 0.403 0.597
passing_first_downsPerGame 0.992* 0.016 0.984
avg_completed_air_yards .* 0.919 0.081
avg_intended_air_yards . 0.977 0.023
aggressiveness 0.997 0.003
max_completed_air_distance 0.473* 0.776 0.224
avg_air_distance 1.000 0.000
max_air_distance 0.368* 0.865 0.135
avg_air_yards_to_sticks .* 0.954 0.046
passing_cpoe .* 0.914 0.086
pass_comp_pct 0.301* 0.910 0.090
passer_rating 0.629* 0.605 0.395
completion_percentage_above_expectation 0.514* 0.736 0.264
f1
Sum of squared loadings 7.068
Proportion of total 1.000
Proportion var 0.393
Cumulative var 0.393
Number of factors: 2
Standardized loadings: (* = significant at 1% level)
f1 f2 unique.var
completionsPerGame 0.986* 0.029
attemptsPerGame 0.974* * 0.049
passing_yardsPerGame 0.996* 0.008
passing_tdsPerGame 0.863* 0.254
passing_air_yardsPerGame 0.725* 0.693* 0.032
passing_yards_after_catchPerGame 0.750* -0.611* 0.029
passing_first_downsPerGame 0.990* 0.019
avg_completed_air_yards 0.965* 0.070
avg_intended_air_yards * 0.998* 0.002
aggressiveness . 0.785* 0.366
max_completed_air_distance .* 0.813* 0.288
avg_air_distance * 0.979* 0.032
max_air_distance .* 0.850* 0.260
avg_air_yards_to_sticks 0.991* 0.017
passing_cpoe 0.318* -0.338* 0.777
pass_comp_pct .* 0.913
passer_rating 0.615* . 0.612
completion_percentage_above_expectation 0.478* . 0.730
communalities
completionsPerGame 0.971
attemptsPerGame 0.951
passing_yardsPerGame 0.992
passing_tdsPerGame 0.746
passing_air_yardsPerGame 0.968
passing_yards_after_catchPerGame 0.971
passing_first_downsPerGame 0.981
avg_completed_air_yards 0.930
avg_intended_air_yards 0.998
aggressiveness 0.634
max_completed_air_distance 0.712
avg_air_distance 0.968
max_air_distance 0.740
avg_air_yards_to_sticks 0.983
passing_cpoe 0.223
pass_comp_pct 0.087
passer_rating 0.388
completion_percentage_above_expectation 0.270
f2 f1 total
Sum of sq (obliq) loadings 6.889 6.625 13.514
Proportion of total 0.510 0.490 1.000
Proportion var 0.383 0.368 0.751
Cumulative var 0.383 0.751 0.751
Factor correlations: (* = significant at 1% level)
f1 f2
f1 1.000
f2 -0.039 1.000
Number of factors: 3
Standardized loadings: (* = significant at 1% level)
f1 f2 f3 unique.var
completionsPerGame 0.978* * 0.025
attemptsPerGame 0.993* * * 0.038
passing_yardsPerGame 0.986* * 0.010
passing_tdsPerGame 0.833* * .* 0.255
passing_air_yardsPerGame 0.795* 0.729* 0.022
passing_yards_after_catchPerGame 0.696* -0.603* 0.030
passing_first_downsPerGame 0.978* * 0.020
avg_completed_air_yards 0.789* 0.331* 0.044
avg_intended_air_yards 0.891* .* 0.002
aggressiveness 0.796* 0.350
max_completed_air_distance .* 0.666* 0.377* 0.184
avg_air_distance * 0.875* .* 0.025
max_air_distance .* 0.808* . 0.221
avg_air_yards_to_sticks 0.866* .* 0.012
passing_cpoe 1.005* 0.020
pass_comp_pct -0.449* 1.036* 0.121
passer_rating 0.929* 0.092
completion_percentage_above_expectation 0.962* 0.044
communalities
completionsPerGame 0.975
attemptsPerGame 0.962
passing_yardsPerGame 0.990
passing_tdsPerGame 0.745
passing_air_yardsPerGame 0.978
passing_yards_after_catchPerGame 0.970
passing_first_downsPerGame 0.980
avg_completed_air_yards 0.956
avg_intended_air_yards 0.998
aggressiveness 0.650
max_completed_air_distance 0.816
avg_air_distance 0.975
max_air_distance 0.779
avg_air_yards_to_sticks 0.988
passing_cpoe 0.980
pass_comp_pct 0.879
passer_rating 0.908
completion_percentage_above_expectation 0.956
f2 f1 f3 total
Sum of sq (obliq) loadings 6.046 5.751 4.687 16.484
Proportion of total 0.367 0.349 0.284 1.000
Proportion var 0.336 0.320 0.260 0.916
Cumulative var 0.336 0.655 0.916 0.916
Factor correlations: (* = significant at 1% level)
f1 f2 f3
f1 1.000
f2 -0.146 1.000
f3 0.171 0.429 1.000
Number of factors: 4
Standardized loadings: (* = significant at 1% level)
f1 f2 f3 f4
completionsPerGame 0.970* * *
attemptsPerGame * 1.024*
passing_yardsPerGame .* 0.902*
passing_tdsPerGame 0.381* 0.669*
passing_air_yardsPerGame 0.826* 0.732*
passing_yards_after_catchPerGame .* 0.598* -0.606*
passing_first_downsPerGame .* 0.905* *
avg_completed_air_yards .* 0.959*
avg_intended_air_yards 1.024*
aggressiveness * 0.814*
max_completed_air_distance .* .* 0.778*
avg_air_distance 0.995*
max_air_distance .* 0.869*
avg_air_yards_to_sticks 1.005*
passing_cpoe .* 0.923*
pass_comp_pct -0.451* 1.056*
passer_rating .* 0.799*
completion_percentage_above_expectation .* 0.887*
unique.var communalities
completionsPerGame 0.015 0.985
attemptsPerGame 0.001 0.999
passing_yardsPerGame 0.001 0.999
passing_tdsPerGame 0.204 0.796
passing_air_yardsPerGame 0.015 0.985
passing_yards_after_catchPerGame 0.023 0.977
passing_first_downsPerGame 0.023 0.977
avg_completed_air_yards 0.067 0.933
avg_intended_air_yards 0.003 0.997
aggressiveness 0.319 0.681
max_completed_air_distance 0.295 0.705
avg_air_distance 0.034 0.966
max_air_distance 0.240 0.760
avg_air_yards_to_sticks 0.018 0.982
passing_cpoe 0.033 0.967
pass_comp_pct 0.092 0.908
passer_rating 0.148 0.852
completion_percentage_above_expectation 0.024 0.976
f3 f2 f4 f1 total
Sum of sq (obliq) loadings 6.953 5.378 3.387 0.727 16.445
Proportion of total 0.423 0.327 0.206 0.044 1.000
Proportion var 0.386 0.299 0.188 0.040 0.914
Cumulative var 0.386 0.685 0.873 0.914 0.914
Factor correlations: (* = significant at 1% level)
f1 f2 f3 f4
f1 1.000
f2 0.387 1.000
f3 -0.006 -0.175 1.000
f4 0.280 0.133 0.438 1.000
Number of factors: 5
Standardized loadings: (* = significant at 1% level)
f1 f2 f3 f4 f5
completionsPerGame 0.952* * * .*
attemptsPerGame 0.978* *
passing_yardsPerGame 0.865* .* * *
passing_tdsPerGame 0.647* 0.439* * *
passing_air_yardsPerGame 0.805* * 0.759* *
passing_yards_after_catchPerGame 0.527* 0.300* -0.610* .* *
passing_first_downsPerGame 0.878* .* *
avg_completed_air_yards * 1.007* .* .*
avg_intended_air_yards 1.004* *
aggressiveness .* 0.791*
max_completed_air_distance .* .* 0.733*
avg_air_distance * 0.942* .*
max_air_distance .* 0.660* .* .*
avg_air_yards_to_sticks 0.973* *
passing_cpoe .* 0.901*
pass_comp_pct .* 1.026*
passer_rating 0.323* * 0.757*
completion_percentage_above_expectation .* 0.847*
unique.var communalities
completionsPerGame 0.008 0.992
attemptsPerGame 0.005 0.995
passing_yardsPerGame 0.003 0.997
passing_tdsPerGame 0.187 0.813
passing_air_yardsPerGame 0.016 0.984
passing_yards_after_catchPerGame 0.000 1.000
passing_first_downsPerGame 0.018 0.982
avg_completed_air_yards 0.037 0.963
avg_intended_air_yards 0.002 0.998
aggressiveness 0.431 0.569
max_completed_air_distance 0.345 0.655
avg_air_distance 0.048 0.952
max_air_distance 0.300 0.700
avg_air_yards_to_sticks 0.027 0.973
passing_cpoe 0.017 0.983
pass_comp_pct 0.113 0.887
passer_rating 0.136 0.864
completion_percentage_above_expectation 0.036 0.964
f3 f1 f5 f2 f4 total
Sum of sq (obliq) loadings 6.502 5.158 3.336 1.003 0.272 16.271
Proportion of total 0.400 0.317 0.205 0.062 0.017 1.000
Proportion var 0.361 0.287 0.185 0.056 0.015 0.904
Cumulative var 0.361 0.648 0.833 0.889 0.904 0.904
Factor correlations: (* = significant at 1% level)
f1 f2 f3 f4 f5
f1 1.000
f2 0.371 1.000
f3 -0.124 0.090 1.000
f4 0.190 0.090 0.021 1.000
f5 0.120 0.380 0.444 0.287 1.000
Number of factors: 6
Standardized loadings: (* = significant at 1% level)
f1 f2 f3 f4 f5
completionsPerGame 0.970* * * *
attemptsPerGame 1.001* * * * *
passing_yardsPerGame 0.896* .* * *
passing_tdsPerGame 0.674* 0.395* *
passing_air_yardsPerGame 0.812* .* 0.649*
passing_yards_after_catchPerGame 0.572* 0.317* -0.548* *
passing_first_downsPerGame 0.903* .* * *
avg_completed_air_yards 0.459* 0.593*
avg_intended_air_yards .* .* 0.678*
aggressiveness .* 0.729*
max_completed_air_distance .* .* 0.777*
avg_air_distance * .* . 0.762*
max_air_distance .* .* * 0.626*
avg_air_yards_to_sticks .* . 0.800*
passing_cpoe
pass_comp_pct * .* -0.636*
passer_rating .*
completion_percentage_above_expectation *
f6 unique.var communalities
completionsPerGame * 0.008 0.992
attemptsPerGame * 0.007 0.993
passing_yardsPerGame 0.004 0.996
passing_tdsPerGame * 0.185 0.815
passing_air_yardsPerGame 0.000 1.000
passing_yards_after_catchPerGame .* 0.000 1.000
passing_first_downsPerGame * 0.018 0.982
avg_completed_air_yards .* 0.031 0.969
avg_intended_air_yards 0.004 0.996
aggressiveness . 0.429 0.571
max_completed_air_distance 0.384 0.616
avg_air_distance 0.047 0.953
max_air_distance .* 0.302 0.698
avg_air_yards_to_sticks 0.025 0.975
passing_cpoe 0.989* 0.015 0.985
pass_comp_pct 1.033* 0.046 0.954
passer_rating 0.833* 0.135 0.865
completion_percentage_above_expectation 0.957* 0.042 0.958
f1 f5 f6 f4 f2 f3 total
Sum of sq (obliq) loadings 5.365 4.369 3.627 1.703 0.827 0.427 16.318
Proportion of total 0.329 0.268 0.222 0.104 0.051 0.026 1.000
Proportion var 0.298 0.243 0.202 0.095 0.046 0.024 0.907
Cumulative var 0.298 0.541 0.742 0.837 0.883 0.907 0.907
Factor correlations: (* = significant at 1% level)
f1 f2 f3 f4 f5 f6
f1 1.000
f2 0.368 1.000
f3 -0.005 -0.389 1.000
f4 -0.145 -0.100 0.007 1.000
f5 -0.097 -0.102 0.212 0.874 1.000
f6 0.171 0.237 0.333 0.277 0.355 1.000
Number of factors: 7
Standardized loadings: (* = significant at 1% level)
f1 f2 f3 f4 f5
completionsPerGame 0.961* * * *
attemptsPerGame 0.977* * * * *
passing_yardsPerGame 0.862* .* .* *
passing_tdsPerGame 0.672* 0.432* * *
passing_air_yardsPerGame 0.764* * 0.516* .*
passing_yards_after_catchPerGame 0.551* 0.368* -0.463* *
passing_first_downsPerGame 0.904* .* * *
avg_completed_air_yards 0.544* 0.455*
avg_intended_air_yards 0.540* 0.509*
aggressiveness .* 0.694*
max_completed_air_distance 0.715*
avg_air_distance * 0.583* 0.450*
max_air_distance 0.682* *
avg_air_yards_to_sticks 0.447* 0.617*
passing_cpoe * * *
pass_comp_pct * -0.505*
passer_rating * 0.310* *
completion_percentage_above_expectation *
f6 f7 unique.var
completionsPerGame * .* 0.010
attemptsPerGame * 0.003
passing_yardsPerGame * * 0.002
passing_tdsPerGame * * 0.192
passing_air_yardsPerGame 0.007
passing_yards_after_catchPerGame * .* 0.000
passing_first_downsPerGame * * 0.017
avg_completed_air_yards * 0.025
avg_intended_air_yards 0.001
aggressiveness 0.421
max_completed_air_distance 0.945* 0.000
avg_air_distance * 0.047
max_air_distance 0.415* 0.165
avg_air_yards_to_sticks 0.027
passing_cpoe 0.952* 0.019
pass_comp_pct 0.938* 0.042
passer_rating 0.740* 0.141
completion_percentage_above_expectation 0.986* 0.042
communalities
completionsPerGame 0.990
attemptsPerGame 0.997
passing_yardsPerGame 0.998
passing_tdsPerGame 0.808
passing_air_yardsPerGame 0.993
passing_yards_after_catchPerGame 1.000
passing_first_downsPerGame 0.983
avg_completed_air_yards 0.975
avg_intended_air_yards 0.999
aggressiveness 0.579
max_completed_air_distance 1.000
avg_air_distance 0.953
max_air_distance 0.835
avg_air_yards_to_sticks 0.973
passing_cpoe 0.981
pass_comp_pct 0.958
passer_rating 0.859
completion_percentage_above_expectation 0.958
f1 f7 f5 f3 f4 f2 f6 total
Sum of sq (obliq) loadings 5.146 3.450 2.539 2.401 1.286 1.017 0.999 16.839
Proportion of total 0.306 0.205 0.151 0.143 0.076 0.060 0.059 1.000
Proportion var 0.286 0.192 0.141 0.133 0.071 0.057 0.056 0.935
Cumulative var 0.286 0.478 0.619 0.752 0.823 0.880 0.935 0.935
Factor correlations: (* = significant at 1% level)
f1 f2 f3 f4 f5 f6 f7
f1 1.000
f2 0.379 1.000
f3 -0.019 -0.234 1.000
f4 -0.167 -0.266 0.689 1.000
f5 -0.184 -0.250 0.702 0.819 1.000
f6 0.350 0.336 0.183 -0.294 -0.011 1.000
f7 0.090 0.333 0.595 0.157 0.176 0.438 1.000
A path diagram of the three-factor EFA model in Figure 22.17 was created using the lavaanPlot()
function of the lavaanPlot
package (Lishinski, 2024).
Here is the syntax for estimating a three-factor EFA using exploratory structural equation modeling (ESEM). Estimating the model in a ESEM framework allows us to make modifications to the model, such as adding correlated residuals, and adding predictors or outcomes of the latent factors. The syntax below represents the same model (with the same fit indices) as the three-factor EFA model above.
Code
efa3factor_syntax <- '
# EFA Factor Loadings
efa("efa1")*f1 +
efa("efa1")*f2 +
efa("efa1")*f3 =~ completionsPerGame + attemptsPerGame + passing_yardsPerGame + passing_tdsPerGame +
passing_air_yardsPerGame + passing_yards_after_catchPerGame + passing_first_downsPerGame +
avg_completed_air_yards + avg_intended_air_yards + aggressiveness + max_completed_air_distance +
avg_air_distance + max_air_distance + avg_air_yards_to_sticks + passing_cpoe + pass_comp_pct +
passer_rating + completion_percentage_above_expectation
'
To fit the ESEM model, we use the sem()
function of the lavaan
package (Rosseel, 2012; Rosseel et al., 2024).
The fit indices suggests that the model does not fit well to the data and that additional model modifications are necessary. The fit indices are below.
lavaan 0.6-19 ended normally after 343 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 93
Row rank of the constraints matrix 30
Rotation method GEOMIN OBLIQUE
Geomin epsilon 0.001
Rotation algorithm (rstarts) GPA (30)
Standardized metric TRUE
Row weights None
Number of observations 2002
Number of missing patterns 4
Model Test User Model:
Standard Scaled
Test Statistic 5445.197 3191.805
Degrees of freedom 102 102
P-value (Chi-square) 0.000 0.000
Scaling correction factor 1.706
Yuan-Bentler correction (Mplus variant)
Model Test Baseline Model:
Test statistic 43031.563 23571.926
Degrees of freedom 153 153
P-value 0.000 0.000
Scaling correction factor 1.826
User Model versus Baseline Model:
Comparative Fit Index (CFI) 0.875 0.868
Tucker-Lewis Index (TLI) 0.813 0.802
Robust Comparative Fit Index (CFI) 0.726
Robust Tucker-Lewis Index (TLI) 0.590
Loglikelihood and Information Criteria:
Loglikelihood user model (H0) -56940.469 -56940.469
Scaling correction factor 1.784
for the MLR correction
Loglikelihood unrestricted model (H1) -54217.870 -54217.870
Scaling correction factor 1.742
for the MLR correction
Akaike (AIC) 114054.938 114054.938
Bayesian (BIC) 114542.303 114542.303
Sample-size adjusted Bayesian (SABIC) 114265.900 114265.900
Root Mean Square Error of Approximation:
RMSEA 0.162 0.123
90 Percent confidence interval - lower 0.158 0.120
90 Percent confidence interval - upper 0.165 0.126
P-value H_0: RMSEA <= 0.050 0.000 0.000
P-value H_0: RMSEA >= 0.080 1.000 1.000
Robust RMSEA 0.403
90 Percent confidence interval - lower 0.379
90 Percent confidence interval - upper 0.427
P-value H_0: Robust RMSEA <= 0.050 0.000
P-value H_0: Robust RMSEA >= 0.080 1.000
Standardized Root Mean Square Residual:
SRMR 0.447 0.447
Parameter Estimates:
Standard errors Sandwich
Information bread Observed
Observed information based on Hessian
Latent Variables:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
f1 =~ efa1
completinsPrGm 7.869 0.092 85.428 0.000 7.869 0.978
attemptsPerGam 12.514 0.137 91.318 0.000 12.514 0.993
pssng_yrdsPrGm 91.876 1.049 87.548 0.000 91.876 0.986
passng_tdsPrGm 0.575 0.011 53.647 0.000 0.575 0.833
pssng_r_yrdsPG 96.231 1.928 49.919 0.000 96.231 0.795
pssng_yrds__PG 45.718 0.994 45.990 0.000 45.718 0.696
pssng_frst_dPG 4.457 0.053 84.518 0.000 4.457 0.978
avg_cmpltd_r_y 0.016 0.048 0.332 0.740 0.016 0.004
avg_ntndd_r_yr -0.041 0.043 -0.958 0.338 -0.041 -0.009
aggressiveness -0.276 0.383 -0.721 0.471 -0.276 -0.035
mx_cmpltd_r_ds 1.680 0.433 3.883 0.000 1.680 0.158
avg_air_distnc -0.210 0.075 -2.782 0.005 -0.210 -0.048
max_air_distnc 1.695 0.436 3.885 0.000 1.695 0.185
avg_r_yrds_t_s 0.008 0.042 0.191 0.848 0.008 0.002
passing_cpoe -0.599 0.241 -2.482 0.013 -0.599 -0.044
pass_comp_pct 0.001 0.001 0.927 0.354 0.001 0.004
passer_rating 2.610 0.977 2.671 0.008 2.610 0.081
cmpltn_prcnt__ -0.597 0.311 -1.917 0.055 -0.597 -0.048
f2 =~ efa1
completinsPrGm 0.028 0.030 0.958 0.338 0.028 0.004
attemptsPerGam 0.404 0.076 5.318 0.000 0.404 0.032
pssng_yrdsPrGm -0.333 0.287 -1.158 0.247 -0.333 -0.004
passng_tdsPrGm -0.032 0.009 -3.437 0.001 -0.032 -0.046
pssng_r_yrdsPG 88.148 1.718 51.310 0.000 88.148 0.729
pssng_yrds__PG -39.560 0.941 -42.027 0.000 -39.560 -0.603
pssng_frst_dPG -0.028 0.014 -2.061 0.039 -0.028 -0.006
avg_cmpltd_r_y 3.206 0.185 17.327 0.000 3.206 0.789
avg_ntndd_r_yr 4.132 0.194 21.350 0.000 4.132 0.891
aggressiveness 6.295 0.901 6.985 0.000 6.295 0.796
mx_cmpltd_r_ds 7.102 0.800 8.874 0.000 7.102 0.666
avg_air_distnc 3.829 0.214 17.865 0.000 3.829 0.875
max_air_distnc 7.382 0.747 9.888 0.000 7.382 0.808
avg_r_yrds_t_s 4.061 0.234 17.365 0.000 4.061 0.866
passing_cpoe -0.257 0.381 -0.675 0.499 -0.257 -0.019
pass_comp_pct -0.060 0.003 -18.734 0.000 -0.060 -0.449
passer_rating 0.531 0.626 0.849 0.396 0.531 0.016
cmpltn_prcnt__ 0.601 0.545 1.102 0.271 0.601 0.049
f3 =~ efa1
completinsPrGm 0.394 0.050 7.844 0.000 0.394 0.049
attemptsPerGam -0.671 0.080 -8.410 0.000 -0.671 -0.053
pssng_yrdsPrGm 3.963 0.461 8.590 0.000 3.963 0.043
passng_tdsPrGm 0.074 0.008 8.760 0.000 0.074 0.108
pssng_r_yrdsPG -2.231 1.674 -1.332 0.183 -2.231 -0.018
pssng_yrds__PG 0.381 0.488 0.782 0.435 0.381 0.006
pssng_frst_dPG 0.249 0.026 9.576 0.000 0.249 0.055
avg_cmpltd_r_y 1.345 0.215 6.270 0.000 1.345 0.331
avg_ntndd_r_yr 0.970 0.179 5.405 0.000 0.970 0.209
aggressiveness 0.081 0.581 0.139 0.890 0.081 0.010
mx_cmpltd_r_ds 4.026 0.947 4.252 0.000 4.026 0.377
avg_air_distnc 0.912 0.199 4.579 0.000 0.912 0.208
max_air_distnc 1.374 0.797 1.725 0.085 1.374 0.150
avg_r_yrds_t_s 1.136 0.208 5.459 0.000 1.136 0.242
passing_cpoe 13.682 0.564 24.271 0.000 13.682 1.005
pass_comp_pct 0.138 0.005 26.711 0.000 0.138 1.036
passer_rating 30.032 2.160 13.907 0.000 30.032 0.929
cmpltn_prcnt__ 11.870 0.680 17.454 0.000 11.870 0.962
Covariances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
f1 ~~
f2 -0.146 -0.146 -0.146
f3 0.171 0.171 0.171
f2 ~~
f3 0.429 0.429 0.429
Intercepts:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.completinsPrGm 13.216 0.180 73.516 0.000 13.216 1.643
.attemptsPerGam 21.828 0.282 77.531 0.000 21.828 1.733
.pssng_yrdsPrGm 147.994 2.082 71.090 0.000 147.994 1.589
.passng_tdsPrGm 0.847 0.015 54.871 0.000 0.847 1.226
.pssng_r_yrdsPG 132.999 2.704 49.186 0.000 132.999 1.099
.pssng_yrds__PG 87.622 1.467 59.725 0.000 87.622 1.335
.pssng_frst_dPG 7.143 0.102 70.156 0.000 7.143 1.568
.avg_cmpltd_r_y 3.703 0.162 22.892 0.000 3.703 0.912
.avg_ntndd_r_yr 5.728 0.159 35.953 0.000 5.728 1.235
.aggressiveness 13.717 0.555 24.699 0.000 13.717 1.735
.mx_cmpltd_r_ds 33.473 0.623 53.716 0.000 33.473 3.139
.avg_air_distnc 19.147 0.169 113.439 0.000 19.147 4.374
.max_air_distnc 42.848 0.654 65.492 0.000 42.848 4.688
.avg_r_yrds_t_s -3.289 0.193 -17.061 0.000 -3.289 -0.701
.passing_cpoe -6.148 0.415 -14.805 0.000 -6.148 -0.451
.pass_comp_pct 0.590 0.003 187.208 0.000 0.590 4.417
.passer_rating 72.013 1.459 49.375 0.000 72.013 2.227
.cmpltn_prcnt__ -5.716 0.507 -11.267 0.000 -5.716 -0.463
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.completinsPrGm 1.615 0.116 13.903 0.000 1.615 0.025
.attemptsPerGam 6.066 0.383 15.823 0.000 6.066 0.038
.pssng_yrdsPrGm 86.808 8.228 10.550 0.000 86.808 0.010
.passng_tdsPrGm 0.122 0.006 20.594 0.000 0.122 0.255
.pssng_r_yrdsPG 329.008 29.046 11.327 0.000 329.008 0.022
.pssng_yrds__PG 130.973 8.837 14.820 0.000 130.973 0.030
.pssng_frst_dPG 0.411 0.026 15.914 0.000 0.411 0.020
.avg_cmpltd_r_y 0.718 0.074 9.651 0.000 0.718 0.044
.avg_ntndd_r_yr 0.033 0.014 2.441 0.015 0.033 0.002
.aggressiveness 21.877 NA 21.877 0.350
.mx_cmpltd_r_ds 20.943 NA 20.943 0.184
.avg_air_distnc 0.470 0.470 0.025
.max_air_distnc 18.462 NA 18.462 0.221
.avg_r_yrds_t_s 0.272 NA 0.272 0.012
.passing_cpoe 3.682 NA 3.682 0.020
.pass_comp_pct 0.002 0.002 0.121
.passer_rating 96.356 NA 96.356 0.092
.cmpltn_prcnt__ 6.698 NA 6.698 0.044
f1 1.000 1.000 1.000
f2 1.000 1.000 1.000
f3 1.000 1.000 1.000
R-Square:
Estimate
completinsPrGm 0.975
attemptsPerGam 0.962
pssng_yrdsPrGm 0.990
passng_tdsPrGm 0.745
pssng_r_yrdsPG 0.978
pssng_yrds__PG 0.970
pssng_frst_dPG 0.980
avg_cmpltd_r_y 0.956
avg_ntndd_r_yr 0.998
aggressiveness 0.650
mx_cmpltd_r_ds 0.816
avg_air_distnc 0.975
max_air_distnc 0.779
avg_r_yrds_t_s 0.988
passing_cpoe 0.980
pass_comp_pct 0.879
passer_rating 0.908
cmpltn_prcnt__ 0.956
Code
chisq df pvalue
5445.197 102.000 0.000
chisq.scaled df.scaled pvalue.scaled
3191.805 102.000 0.000
chisq.scaling.factor baseline.chisq baseline.df
1.706 43031.563 153.000
baseline.pvalue rmsea cfi
0.000 0.162 0.875
tli srmr rmsea.robust
0.813 0.447 0.403
cfi.robust tli.robust
0.726 0.590
$type
[1] "cor.bollen"
$cov
cmplPG attmPG pssng_yPG pssng_tPG
completionsPerGame 0.000
attemptsPerGame 0.020 0.000
passing_yardsPerGame -0.004 -0.006 0.000
passing_tdsPerGame -0.022 -0.043 0.014 0.000
passing_air_yardsPerGame 0.003 0.007 0.001 -0.011
passing_yards_after_catchPerGame -0.006 0.001 0.006 0.001
passing_first_downsPerGame 0.000 -0.006 0.003 0.019
avg_completed_air_yards -0.112 -0.094 -0.066 -0.026
avg_intended_air_yards -0.092 -0.082 -0.061 -0.029
aggressiveness -0.052 -0.022 -0.038 -0.027
max_completed_air_distance 0.077 0.099 0.123 0.148
avg_air_distance -0.095 -0.083 -0.064 -0.033
max_air_distance 0.031 0.035 0.046 0.064
avg_air_yards_to_sticks -0.092 -0.081 -0.059 -0.018
passing_cpoe 0.016 0.015 0.019 0.017
pass_comp_pct 0.013 0.010 0.000 -0.017
passer_rating 0.068 0.063 0.101 0.178
completion_percentage_above_expectation -0.006 -0.009 -0.006 -0.010
pssng_r_PG p___PG pssng_f_PG avg_c__
completionsPerGame
attemptsPerGame
passing_yardsPerGame
passing_tdsPerGame
passing_air_yardsPerGame 0.000
passing_yards_after_catchPerGame -0.008 0.000
passing_first_downsPerGame -0.003 -0.002 0.000
avg_completed_air_yards -0.080 -0.047 -0.076 0.000
avg_intended_air_yards -0.074 -0.014 -0.067 -0.016
aggressiveness -0.176 0.107 -0.028 -0.113
max_completed_air_distance -0.066 0.214 0.089 -0.180
avg_air_distance -0.097 0.005 -0.075 -0.037
max_air_distance -0.022 0.106 0.030 -0.131
avg_air_yards_to_sticks -0.085 -0.002 -0.060 -0.023
passing_cpoe -0.016 0.042 0.018 -0.383
pass_comp_pct 0.003 0.000 0.001 -0.426
passer_rating -0.171 0.275 0.095 -0.602
completion_percentage_above_expectation 0.005 -0.016 -0.006 -0.301
avg_n__ aggrss mx_c__ avg_r_ mx_r_d
completionsPerGame
attemptsPerGame
passing_yardsPerGame
passing_tdsPerGame
passing_air_yardsPerGame
passing_yards_after_catchPerGame
passing_first_downsPerGame
avg_completed_air_yards
avg_intended_air_yards 0.000
aggressiveness -0.123 0.000
max_completed_air_distance -0.233 -0.287 0.000
avg_air_distance -0.012 -0.127 -0.210 0.000
max_air_distance -0.089 -0.227 -0.006 -0.080 0.000
avg_air_yards_to_sticks -0.007 -0.115 -0.223 -0.018 -0.099
passing_cpoe -0.321 -0.408 -0.276 -0.335 -0.175
pass_comp_pct -0.357 -0.379 -0.248 -0.378 -0.158
passer_rating -0.581 -0.598 -0.369 -0.593 -0.367
completion_percentage_above_expectation -0.248 -0.296 -0.247 -0.264 -0.130
av____ pssng_ pss_c_ pssr_r cmp___
completionsPerGame
attemptsPerGame
passing_yardsPerGame
passing_tdsPerGame
passing_air_yardsPerGame
passing_yards_after_catchPerGame
passing_first_downsPerGame
avg_completed_air_yards
avg_intended_air_yards
aggressiveness
max_completed_air_distance
avg_air_distance
max_air_distance
avg_air_yards_to_sticks 0.000
passing_cpoe -0.340 0.000
pass_comp_pct -0.385 0.018 0.000
passer_rating -0.593 -0.085 0.081 0.000
completion_percentage_above_expectation -0.271 -0.002 -0.016 -0.133 0.000
$mean
completionsPerGame attemptsPerGame
0.000 0.000
passing_yardsPerGame passing_tdsPerGame
0.000 0.000
passing_air_yardsPerGame passing_yards_after_catchPerGame
0.000 0.000
passing_first_downsPerGame avg_completed_air_yards
0.000 0.419
avg_intended_air_yards aggressiveness
0.297 0.292
max_completed_air_distance avg_air_distance
0.435 0.309
max_air_distance avg_air_yards_to_sticks
0.155 0.359
passing_cpoe pass_comp_pct
0.024 -0.002
passer_rating completion_percentage_above_expectation
0.243 0.005
We can examine the model modification indices to identify parameters that, if estimated, would substantially improve model fit. For instance, the modification indices below indicate additional correlated residuals that could substantially improve model fit. However, it is generally not recommended to blindly estimate additional parameters solely based on modification indices, which can lead to data dredging and overfitting. Rather, it is generally advised to consider modification indices in light of theory. Based on the modification indices, we will add several correlated residuals to the model, to help account for why variables are associated with each other for reasons other than their underlying latent factors.
Below are factor scores from the model for the first six players:
f1 f2 f3
[1,] -0.09277137 -1.27994975 0.3674886
[2,] 0.22251524 -1.66512897 -0.9524790
[3,] 0.43788965 -1.83507786 -1.1746610
[4,] 0.07350637 0.08712285 0.7603455
[5,] 0.89967499 1.15956280 0.4651716
[6,] -0.02270474 0.12927766 -0.2200487
A path diagram of the three-factor ESEM model is in Figure 22.18.
To make the plot interactive for editing, you can use the plot_lavaan()
function of the lavaangui
package (Karch, 2025b; Karch, 2025a):
Below is a modification of the three-factor model with correlated residuals. For instance, it makes sense that passing completions and attempts are related to each other (even after accounting for their latent factor).
Code
efa3factorModified_syntax <- '
# EFA Factor Loadings
efa("efa1")*F1 +
efa("efa1")*F2 +
efa("efa1")*F3 =~ completionsPerGame + attemptsPerGame + passing_yardsPerGame + passing_tdsPerGame +
passing_air_yardsPerGame + passing_yards_after_catchPerGame + passing_first_downsPerGame +
avg_completed_air_yards + avg_intended_air_yards + aggressiveness + max_completed_air_distance +
avg_air_distance + max_air_distance + avg_air_yards_to_sticks + passing_cpoe + pass_comp_pct +
passer_rating + completion_percentage_above_expectation
# Correlated Residuals
completionsPerGame ~~ attemptsPerGame
passing_yardsPerGame ~~ passing_yards_after_catchPerGame
attemptsPerGame ~~ passing_yardsPerGame
attemptsPerGame ~~ passing_air_yardsPerGame
passing_air_yardsPerGame ~~ passing_yards_after_catchPerGame
passing_yards_after_catchPerGame ~~ avg_completed_air_yards
passing_tdsPerGame ~~ passer_rating
completionsPerGame ~~ passing_air_yardsPerGame
max_completed_air_distance ~~ max_air_distance
aggressiveness ~~ completion_percentage_above_expectation
'
lavaan 0.6-19 ended normally after 407 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 103
Row rank of the constraints matrix 40
Rotation method GEOMIN OBLIQUE
Geomin epsilon 0.001
Rotation algorithm (rstarts) GPA (30)
Standardized metric TRUE
Row weights None
Number of observations 2002
Number of missing patterns 4
Model Test User Model:
Standard Scaled
Test Statistic 1399.681 837.974
Degrees of freedom 92 92
P-value (Chi-square) 0.000 0.000
Scaling correction factor 1.670
Yuan-Bentler correction (Mplus variant)
Model Test Baseline Model:
Test statistic 43031.563 23571.926
Degrees of freedom 153 153
P-value 0.000 0.000
Scaling correction factor 1.826
User Model versus Baseline Model:
Comparative Fit Index (CFI) 0.970 0.968
Tucker-Lewis Index (TLI) 0.949 0.947
Robust Comparative Fit Index (CFI) 0.948
Robust Tucker-Lewis Index (TLI) 0.914
Loglikelihood and Information Criteria:
Loglikelihood user model (H0) -54917.711 -54917.711
Scaling correction factor 1.810
for the MLR correction
Loglikelihood unrestricted model (H1) -54217.870 -54217.870
Scaling correction factor 1.742
for the MLR correction
Akaike (AIC) 110029.421 110029.421
Bayesian (BIC) 110572.806 110572.806
Sample-size adjusted Bayesian (SABIC) 110264.632 110264.632
Root Mean Square Error of Approximation:
RMSEA 0.084 0.064
90 Percent confidence interval - lower 0.080 0.061
90 Percent confidence interval - upper 0.088 0.067
P-value H_0: RMSEA <= 0.050 0.000 0.000
P-value H_0: RMSEA >= 0.080 0.965 0.000
Robust RMSEA 0.189
90 Percent confidence interval - lower 0.160
90 Percent confidence interval - upper 0.218
P-value H_0: Robust RMSEA <= 0.050 0.000
P-value H_0: Robust RMSEA >= 0.080 1.000
Standardized Root Mean Square Residual:
SRMR 0.147 0.147
Parameter Estimates:
Standard errors Sandwich
Information bread Observed
Observed information based on Hessian
Latent Variables:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
F1 =~ efa1
completinsPrGm 7.817 0.092 85.035 0.000 7.817 0.972
attemptsPerGam 12.334 0.140 88.069 0.000 12.334 0.980
pssng_yrdsPrGm 92.161 1.019 90.450 0.000 92.161 0.990
passng_tdsPrGm 0.590 0.010 58.359 0.000 0.590 0.854
pssng_r_yrdsPG 90.323 1.824 49.520 0.000 90.323 0.747
pssng_yrds__PG 47.955 0.982 48.825 0.000 47.955 0.730
pssng_frst_dPG 4.477 0.051 87.543 0.000 4.477 0.983
avg_cmpltd_r_y -0.001 0.076 -0.019 0.985 -0.001 -0.000
avg_ntndd_r_yr -0.078 0.071 -1.092 0.275 -0.078 -0.022
aggressiveness 0.020 0.185 0.106 0.915 0.020 0.003
mx_cmpltd_r_ds 1.650 0.468 3.524 0.000 1.650 0.194
avg_air_distnc -0.255 0.088 -2.893 0.004 -0.255 -0.077
max_air_distnc 1.441 0.433 3.332 0.001 1.441 0.194
avg_r_yrds_t_s -0.011 0.075 -0.145 0.885 -0.011 -0.003
passing_cpoe -1.059 0.296 -3.582 0.000 -1.059 -0.072
pass_comp_pct 0.001 0.002 0.405 0.686 0.001 0.005
passer_rating 2.624 0.924 2.840 0.005 2.624 0.096
cmpltn_prcnt__ -0.987 0.347 -2.841 0.004 -0.987 -0.076
F2 =~ efa1
completinsPrGm -0.094 0.042 -2.238 0.025 -0.094 -0.012
attemptsPerGam 0.106 0.094 1.135 0.256 0.106 0.008
pssng_yrdsPrGm 0.946 0.334 2.829 0.005 0.946 0.010
passng_tdsPrGm -0.002 0.007 -0.298 0.766 -0.002 -0.003
pssng_r_yrdsPG 84.028 2.081 40.383 0.000 84.028 0.695
pssng_yrds__PG -36.824 1.056 -34.888 0.000 -36.824 -0.560
pssng_frst_dPG 0.001 0.016 0.066 0.948 0.001 0.000
avg_cmpltd_r_y 2.943 0.135 21.797 0.000 2.943 0.951
avg_ntndd_r_yr 3.491 0.134 26.141 0.000 3.491 1.009
aggressiveness 5.160 0.711 7.257 0.000 5.160 0.799
mx_cmpltd_r_ds 5.902 0.618 9.543 0.000 5.902 0.694
avg_air_distnc 3.195 0.144 22.205 0.000 3.195 0.971
max_air_distnc 6.180 0.599 10.311 0.000 6.180 0.834
avg_r_yrds_t_s 3.402 0.159 21.336 0.000 3.402 0.981
passing_cpoe 0.445 0.443 1.005 0.315 0.445 0.030
pass_comp_pct -0.074 0.005 -16.319 0.000 -0.074 -0.559
passer_rating -0.824 0.899 -0.916 0.360 -0.824 -0.030
cmpltn_prcnt__ 0.927 0.529 1.755 0.079 0.927 0.072
F3 =~ efa1
completinsPrGm 0.459 0.070 6.546 0.000 0.459 0.057
attemptsPerGam -0.593 0.117 -5.044 0.000 -0.593 -0.047
pssng_yrdsPrGm 3.155 0.844 3.736 0.000 3.155 0.034
passng_tdsPrGm 0.055 0.010 5.543 0.000 0.055 0.080
pssng_r_yrdsPG 0.495 1.188 0.417 0.677 0.495 0.004
pssng_yrds__PG -1.413 1.336 -1.057 0.290 -1.413 -0.022
pssng_frst_dPG 0.218 0.044 4.962 0.000 0.218 0.048
avg_cmpltd_r_y 0.080 0.139 0.577 0.564 0.080 0.026
avg_ntndd_r_yr -0.087 0.083 -1.040 0.298 -0.087 -0.025
aggressiveness -2.489 0.720 -3.459 0.001 -2.489 -0.385
mx_cmpltd_r_ds 1.865 0.844 2.211 0.027 1.865 0.219
avg_air_distnc -0.028 0.096 -0.290 0.772 -0.028 -0.008
max_air_distnc -0.381 0.675 -0.565 0.572 -0.381 -0.051
avg_r_yrds_t_s 0.043 0.119 0.364 0.716 0.043 0.012
passing_cpoe 14.463 0.607 23.821 0.000 14.463 0.983
pass_comp_pct 0.146 0.005 27.318 0.000 0.146 1.096
passer_rating 25.152 1.873 13.428 0.000 25.152 0.922
cmpltn_prcnt__ 12.370 0.716 17.277 0.000 12.370 0.956
Covariances:
Estimate Std.Err z-value P(>|z|)
.completionsPerGame ~~
.attemptsPerGam 3.293 0.164 20.104 0.000
.passing_yardsPerGame ~~
.pssng_yrds__PG 92.749 7.776 11.928 0.000
.attemptsPerGame ~~
.pssng_yrdsPrGm 3.505 0.591 5.930 0.000
.pssng_r_yrdsPG 53.095 2.680 19.809 0.000
.passing_air_yardsPerGame ~~
.pssng_yrds__PG -225.138 40.624 -5.542 0.000
.passing_yards_after_catchPerGame ~~
.avg_cmpltd_r_y -5.993 0.423 -14.166 0.000
.passing_tdsPerGame ~~
.passer_rating 1.762 0.146 12.084 0.000
.completionsPerGame ~~
.pssng_r_yrdsPG 13.078 1.148 11.390 0.000
.max_completed_air_distance ~~
.max_air_distnc 9.132 0.882 10.356 0.000
.aggressiveness ~~
.cmpltn_prcnt__ 5.806 0.731 7.947 0.000
F1 ~~
F2 -0.100
F3 0.175 NA
F2 ~~
F3 0.490
Std.lv Std.all
3.293 0.776
92.749 0.611
3.505 0.129
53.095 0.606
-225.138 -0.460
-5.993 -0.435
1.762 0.517
13.078 0.314
9.132 0.455
5.806 0.529
-0.100 -0.100
0.175 0.175
0.490 0.490
Intercepts:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.completinsPrGm 13.216 0.180 73.516 0.000 13.216 1.643
.attemptsPerGam 21.828 0.282 77.531 0.000 21.828 1.734
.pssng_yrdsPrGm 147.994 2.082 71.090 0.000 147.994 1.589
.passng_tdsPrGm 0.847 0.015 54.871 0.000 0.847 1.226
.pssng_r_yrdsPG 132.999 2.704 49.186 0.000 132.999 1.100
.pssng_yrds__PG 87.622 1.467 59.725 0.000 87.622 1.334
.pssng_frst_dPG 7.143 0.102 70.157 0.000 7.143 1.568
.avg_cmpltd_r_y 4.317 0.129 33.367 0.000 4.317 1.395
.avg_ntndd_r_yr 6.447 0.127 50.838 0.000 6.447 1.864
.aggressiveness 15.228 0.496 30.718 0.000 15.228 2.357
.mx_cmpltd_r_ds 34.773 0.502 69.299 0.000 34.773 4.087
.avg_air_distnc 19.820 0.128 154.562 0.000 19.820 6.022
.max_air_distnc 44.212 0.529 83.576 0.000 44.212 5.968
.avg_r_yrds_t_s -2.561 0.143 -17.856 0.000 -2.561 -0.739
.passing_cpoe -7.217 0.460 -15.687 0.000 -7.217 -0.491
.pass_comp_pct 0.589 0.003 185.454 0.000 0.589 4.421
.passer_rating 73.910 1.333 55.459 0.000 73.910 2.708
.cmpltn_prcnt__ -6.539 0.481 -13.599 0.000 -6.539 -0.505
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.completinsPrGm 2.018 0.103 19.608 0.000 2.018 0.031
.attemptsPerGam 8.913 0.352 25.293 0.000 8.913 0.056
.pssng_yrdsPrGm 82.770 8.515 9.721 0.000 82.770 0.010
.passng_tdsPrGm 0.114 0.006 19.528 0.000 0.114 0.240
.pssng_r_yrdsPG 861.327 93.142 9.247 0.000 861.327 0.059
.pssng_yrds__PG 278.493 19.893 14.000 0.000 278.493 0.065
.pssng_frst_dPG 0.318 0.027 11.906 0.000 0.318 0.015
.avg_cmpltd_r_y 0.681 0.063 10.855 0.000 0.681 0.071
.avg_ntndd_r_yr 0.004 0.014 0.270 0.787 0.004 0.000
.aggressiveness 21.554 NA 21.554 0.516
.mx_cmpltd_r_ds 21.431 NA 21.431 0.296
.avg_air_distnc 0.481 NA 0.481 0.044
.max_air_distnc 18.765 18.765 0.342
.avg_r_yrds_t_s 0.288 0.288 0.024
.passing_cpoe 4.948 4.948 0.023
.pass_comp_pct 0.002 0.002 0.085
.passer_rating 101.498 NA 101.498 0.136
.cmpltn_prcnt__ 5.587 NA 5.587 0.033
F1 1.000 1.000 1.000
F2 1.000 1.000 1.000
F3 1.000 1.000 1.000
R-Square:
Estimate
completinsPrGm 0.969
attemptsPerGam 0.944
pssng_yrdsPrGm 0.990
passng_tdsPrGm 0.760
pssng_r_yrdsPG 0.941
pssng_yrds__PG 0.935
pssng_frst_dPG 0.985
avg_cmpltd_r_y 0.929
avg_ntndd_r_yr 1.000
aggressiveness 0.484
mx_cmpltd_r_ds 0.704
avg_air_distnc 0.956
max_air_distnc 0.658
avg_r_yrds_t_s 0.976
passing_cpoe 0.977
pass_comp_pct 0.915
passer_rating 0.864
cmpltn_prcnt__ 0.967
The model fits substantially better (though not perfectly) with the additional correlated residuals. Below are the model fit indices:
Code
chisq df pvalue
1399.681 92.000 0.000
chisq.scaled df.scaled pvalue.scaled
837.974 92.000 0.000
chisq.scaling.factor baseline.chisq baseline.df
1.670 43031.563 153.000
baseline.pvalue rmsea cfi
0.000 0.084 0.970
tli srmr rmsea.robust
0.949 0.147 0.189
cfi.robust tli.robust
0.948 0.914
$type
[1] "cor.bollen"
$cov
cmplPG attmPG pssng_yPG pssng_tPG
completionsPerGame 0.000
attemptsPerGame 0.000 0.000
passing_yardsPerGame -0.001 -0.001 0.000
passing_tdsPerGame -0.029 -0.046 0.004 0.000
passing_air_yardsPerGame 0.004 0.003 -0.001 -0.036
passing_yards_after_catchPerGame -0.004 -0.002 0.001 0.017
passing_first_downsPerGame 0.001 0.000 0.000 0.007
avg_completed_air_yards -0.058 -0.050 -0.031 -0.001
avg_intended_air_yards -0.052 -0.046 -0.041 -0.022
aggressiveness -0.031 -0.026 -0.038 -0.026
max_completed_air_distance 0.052 0.073 0.083 0.105
avg_air_distance -0.044 -0.036 -0.032 -0.016
max_air_distance 0.038 0.038 0.035 0.043
avg_air_yards_to_sticks -0.061 -0.054 -0.047 -0.018
passing_cpoe 0.048 0.052 0.053 0.052
pass_comp_pct 0.002 0.001 0.001 -0.003
passer_rating 0.046 0.045 0.080 0.071
completion_percentage_above_expectation 0.019 0.021 0.019 0.015
pssng_r_PG p___PG pssng_f_PG avg_c__
completionsPerGame
attemptsPerGame
passing_yardsPerGame
passing_tdsPerGame
passing_air_yardsPerGame 0.000
passing_yards_after_catchPerGame 0.001 0.000
passing_first_downsPerGame -0.002 0.003 0.000
avg_completed_air_yards -0.062 0.004 -0.033 0.000
avg_intended_air_yards -0.053 -0.011 -0.039 -0.009
aggressiveness -0.030 -0.033 -0.019 0.057
max_completed_air_distance -0.061 0.152 0.055 -0.075
avg_air_distance -0.059 0.010 -0.035 -0.019
max_air_distance 0.031 0.041 0.025 -0.049
avg_air_yards_to_sticks -0.071 -0.004 -0.041 -0.007
passing_cpoe -0.064 0.128 0.054 -0.244
pass_comp_pct 0.003 0.000 0.000 -0.150
passer_rating -0.192 0.264 0.076 -0.384
completion_percentage_above_expectation -0.032 0.050 0.022 -0.154
avg_n__ aggrss mx_c__ avg_r_ mx_r_d
completionsPerGame
attemptsPerGame
passing_yardsPerGame
passing_tdsPerGame
passing_air_yardsPerGame
passing_yards_after_catchPerGame
passing_first_downsPerGame
avg_completed_air_yards
avg_intended_air_yards 0.000
aggressiveness 0.050 0.000
max_completed_air_distance -0.144 -0.040 0.000
avg_air_distance -0.001 0.054 -0.110 0.000
max_air_distance -0.012 -0.032 -0.027 0.009 0.000
avg_air_yards_to_sticks -0.001 0.067 -0.134 -0.001 -0.019
passing_cpoe -0.240 -0.105 -0.198 -0.258 -0.066
pass_comp_pct -0.139 -0.050 -0.108 -0.171 0.022
passer_rating -0.413 -0.250 -0.244 -0.426 -0.202
completion_percentage_above_expectation -0.155 -0.047 -0.160 -0.173 -0.008
av____ pssng_ pss_c_ pssr_r cmp___
completionsPerGame
attemptsPerGame
passing_yardsPerGame
passing_tdsPerGame
passing_air_yardsPerGame
passing_yards_after_catchPerGame
passing_first_downsPerGame
avg_completed_air_yards
avg_intended_air_yards
aggressiveness
max_completed_air_distance
avg_air_distance
max_air_distance
avg_air_yards_to_sticks 0.000
passing_cpoe -0.260 0.000
pass_comp_pct -0.173 0.066 0.000
passer_rating -0.429 -0.051 0.102 0.000
completion_percentage_above_expectation -0.179 -0.007 0.019 -0.103 0.000
$mean
completionsPerGame attemptsPerGame
0.000 0.000
passing_yardsPerGame passing_tdsPerGame
0.000 0.000
passing_air_yardsPerGame passing_yards_after_catchPerGame
0.000 0.000
passing_first_downsPerGame avg_completed_air_yards
0.000 0.190
avg_intended_air_yards aggressiveness
0.088 0.067
max_completed_air_distance avg_air_distance
0.247 0.102
max_air_distance avg_air_yards_to_sticks
-0.027 0.137
passing_cpoe pass_comp_pct
0.104 -0.001
passer_rating completion_percentage_above_expectation
0.170 0.073
Below are factor scores from the model for the first six players:
Code
F1 F2 F3
[1,] -0.03484111 -0.91444013 0.3509635
[2,] 0.19471724 -1.64270149 -1.0607739
[3,] 0.43512510 -2.27576482 -1.5450535
[4,] 0.15369020 0.29300546 0.7922651
[5,] 0.83015409 1.09543588 0.4754447
[6,] -0.13076091 0.07123353 -0.1724548
The path diagram of the modified ESEM model with correlated residuals is in Figure 22.19.
Model fit of nested models can be compared with a chi-square difference test. This allows us to evaluate whether the more complex model fits signficantly better than the less complex model. In this case, our more complex model is the three-factor model with correlated residuals; the less complex model is the three-factor model without correlated residuals. The modified model with the correlated residuals and the original model are considered “nested” models. The original model is nested within the modified model because the modified model includes all of the terms of the original model along with additional terms.
In this case, the model with the correlated residuals fit significantly better (i.e., has a significantly smaller chi-square value) than the model without the correlated residuals.
Here are the variables that had a standardized loading greater than 0.3 on each of the factors:
Code
factor1vars <- c(
"completionsPerGame","attemptsPerGame","passing_yardsPerGame","passing_tdsPerGame",
"passing_air_yardsPerGame","passing_yards_after_catchPerGame","passing_first_downsPerGame")
factor2vars <- c(
"avg_completed_air_yards","avg_intended_air_yards","aggressiveness","max_completed_air_distance",
"avg_air_distance","max_air_distance","avg_air_yards_to_sticks")
factor3vars <- c(
"passing_cpoe","pass_comp_pct","passer_rating","completion_percentage_above_expectation")
The variables that loaded most strongly onto factor 1 appear to reflect Quarterback usage: completions per game, passing attempts per game, passing yards per game, passing touchdowns per game, passing air yards (total horizontal distance the ball travels on all pass attempts) per game, passing yards after the catch per game, and first downs gained per game by passing. Quarterbacks who tend to throw more tend to have higher levels on those variables. Thus, we label component 1 as “Usage”, which reflects total Quarterback involvement, regardless of efficiency or outcome.
The variables that loaded most strongly onto factor 2 appear to reflect Quarterback aggressiveness: average air yards on completed passes, average air yards on all attempted passes, aggressiveness (percentage of passing attempts thrown into tight windows, where there is a defender within one yard or less of the receiver at the time of the completion or incompletion), average amount of air yards ahead of or behind the first down marker on passing attempts, average air distance (the true three-dimensional distance the ball travels in the air), maximum air distance, and maximum air distance on completed passes. Quarterbacks who throw the ball farther and into tighter windows tend to have higher values on those variables. Thus, we label component 2 as “Aggressiveness”, which reflects throwing longer, more difficult passes with a tight window.
The variables that loaded most strongly onto factor 3 appear to reflect Quarterback performance: passing completion percentage above expectation, pass completion percentage, and passer rating. Quarterbacks who perform better tend to have higher values on those variables. Thus, we label component 3 as “Performance”.
Here are the players and seasons that showed the highest levels of Quarterback “Usage”:
Code
Here are the players and seasons that showed the lowest levels of Quarterback “Usage”:
Code
Here are the players and seasons that showed the highest levels of Quarterback “Aggressiveness”:
Code
Here are the players and seasons that showed the lowest levels of Quarterback “Aggressiveness”:
Code
Here are the players and seasons that showed the highest levels of Quarterback “Performance”:
Code
If we restrict it to Quarterbacks who played at least 10 games in the season, here are the players and seasons that showed the highest levels of Quarterback “Performance”:
Code
Here are the players and seasons that showed the lowest levels of Quarterback “Performance”:
Code
If we restrict it to Quarterbacks who played at least 10 games in the season, here are the players and seasons that showed the lowest levels of Quarterback “Performance”:
22.6 Example of Confirmatory Factor Analysis
We fit CFA models using the cfa()
function of the lavaan
package (Rosseel, 2012; Rosseel et al., 2024). We compare one-, two-, and three-factor CFA models.
Below is the syntax for the one-factor CFA model:
Code
cfa1factor_syntax <- '
#Factor loadings
F1 =~ completionsPerGame + attemptsPerGame + passing_yardsPerGame + passing_tdsPerGame +
passing_air_yardsPerGame + passing_yards_after_catchPerGame + passing_first_downsPerGame +
avg_completed_air_yards + avg_intended_air_yards + aggressiveness + max_completed_air_distance +
avg_air_distance + max_air_distance + avg_air_yards_to_sticks + passing_cpoe + pass_comp_pct +
passer_rating + completion_percentage_above_expectation
'
lavaan 0.6-19 ended normally after 187 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 54
Row rank of the constraints matrix 36
Number of observations 2002
Number of clusters [player_id] 396
Number of missing patterns 4
Model Test User Model:
Standard Scaled
Test Statistic 16641.448 8547.405
Degrees of freedom 135 135
P-value (Chi-square) 0.000 0.000
Scaling correction factor 1.947
Yuan-Bentler correction (Mplus variant)
Model Test Baseline Model:
Test statistic 43031.563 21852.033
Degrees of freedom 153 153
P-value 0.000 0.000
Scaling correction factor 1.969
User Model versus Baseline Model:
Comparative Fit Index (CFI) 0.615 0.612
Tucker-Lewis Index (TLI) 0.564 0.561
Robust Comparative Fit Index (CFI) 0.220
Robust Tucker-Lewis Index (TLI) 0.116
Loglikelihood and Information Criteria:
Loglikelihood user model (H0) -62538.594 -62538.594
Scaling correction factor 2.590
for the MLR correction
Loglikelihood unrestricted model (H1) -54217.870 -54217.870
Scaling correction factor 2.131
for the MLR correction
Akaike (AIC) 125185.188 125185.188
Bayesian (BIC) 125487.691 125487.691
Sample-size adjusted Bayesian (SABIC) 125316.130 125316.130
Root Mean Square Error of Approximation:
RMSEA 0.247 0.176
90 Percent confidence interval - lower 0.244 0.174
90 Percent confidence interval - upper 0.250 0.179
P-value H_0: RMSEA <= 0.050 0.000 0.000
P-value H_0: RMSEA >= 0.080 1.000 1.000
Robust RMSEA 0.571
90 Percent confidence interval - lower 0.553
90 Percent confidence interval - upper 0.589
P-value H_0: Robust RMSEA <= 0.050 0.000
P-value H_0: Robust RMSEA >= 0.080 1.000
Standardized Root Mean Square Residual:
SRMR 0.380 0.380
Parameter Estimates:
Standard errors Robust.cluster
Information Observed
Observed information based on Hessian
Latent Variables:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
F1 =~
completinsPrGm 7.938 0.126 62.929 0.000 7.938 0.987
attemptsPerGam 12.274 0.178 69.044 0.000 12.274 0.974
pssng_yrdsPrGm 92.610 1.513 61.202 0.000 92.610 0.994
passng_tdsPrGm 0.597 0.019 31.445 0.000 0.597 0.865
pssng_r_yrdsPG 83.918 3.630 23.117 0.000 83.918 0.694
pssng_yrds__PG 50.712 1.537 32.996 0.000 50.712 0.773
pssng_frst_dPG 4.519 0.080 56.260 0.000 4.519 0.992
avg_cmpltd_r_y 0.411 0.105 3.912 0.000 0.411 0.285
avg_ntndd_r_yr 0.213 0.108 1.976 0.048 0.213 0.152
aggressiveness -0.275 0.478 -0.575 0.565 -0.275 -0.053
mx_cmpltd_r_ds 2.806 0.424 6.615 0.000 2.806 0.473
avg_air_distnc 0.029 0.116 0.249 0.803 0.029 0.020
max_air_distnc 1.959 0.436 4.490 0.000 1.959 0.368
avg_r_yrds_t_s 0.317 0.114 2.783 0.005 0.317 0.214
passing_cpoe 3.527 0.515 6.852 0.000 3.527 0.293
pass_comp_pct 0.040 0.005 7.527 0.000 0.040 0.301
passer_rating 11.688 1.173 9.963 0.000 11.688 0.629
cmpltn_prcnt__ 2.982 0.422 7.063 0.000 2.982 0.514
Intercepts:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.completinsPrGm 13.216 0.396 33.415 0.000 13.216 1.643
.attemptsPerGam 21.828 0.588 37.096 0.000 21.828 1.733
.pssng_yrdsPrGm 147.994 4.667 31.714 0.000 147.994 1.589
.passng_tdsPrGm 0.847 0.036 23.649 0.000 0.847 1.226
.pssng_r_yrdsPG 133.000 5.791 22.966 0.000 133.000 1.099
.pssng_yrds__PG 87.622 3.002 29.187 0.000 87.622 1.335
.pssng_frst_dPG 7.143 0.230 31.073 0.000 7.143 1.568
.avg_cmpltd_r_y 5.553 0.100 55.435 0.000 5.553 3.848
.avg_ntndd_r_yr 7.933 0.107 74.074 0.000 7.933 5.668
.aggressiveness 16.709 0.420 39.795 0.000 16.709 3.250
.mx_cmpltd_r_ds 37.848 0.436 86.769 0.000 37.848 6.380
.avg_air_distnc 21.196 0.108 195.850 0.000 21.196 14.670
.max_air_distnc 46.718 0.427 109.350 0.000 46.718 8.769
.avg_r_yrds_t_s -1.078 0.113 -9.534 0.000 -1.078 -0.727
.passing_cpoe -2.805 0.429 -6.533 0.000 -2.805 -0.233
.pass_comp_pct 0.588 0.004 136.966 0.000 0.588 4.369
.passer_rating 79.793 1.215 65.679 0.000 79.793 4.291
.cmpltn_prcnt__ -2.434 0.421 -5.789 0.000 -2.434 -0.420
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.completinsPrGm 1.697 0.123 13.754 0.000 1.697 0.026
.attemptsPerGam 8.032 0.458 17.551 0.000 8.032 0.051
.pssng_yrdsPrGm 99.645 9.143 10.899 0.000 99.645 0.011
.passng_tdsPrGm 0.120 0.007 17.338 0.000 0.120 0.252
.pssng_r_yrdsPG 7595.814 516.591 14.704 0.000 7595.814 0.519
.pssng_yrds__PG 1737.341 130.798 13.283 0.000 1737.341 0.403
.pssng_frst_dPG 0.335 0.025 13.211 0.000 0.335 0.016
.avg_cmpltd_r_y 1.914 0.157 12.166 0.000 1.914 0.919
.avg_ntndd_r_yr 1.914 0.170 11.279 0.000 1.914 0.977
.aggressiveness 26.364 3.307 7.972 0.000 26.364 0.997
.mx_cmpltd_r_ds 27.316 2.789 9.794 0.000 27.316 0.776
.avg_air_distnc 2.087 0.175 11.909 0.000 2.087 1.000
.max_air_distnc 24.549 2.486 9.875 0.000 24.549 0.865
.avg_r_yrds_t_s 2.102 0.210 10.025 0.000 2.102 0.954
.passing_cpoe 132.904 14.303 9.292 0.000 132.904 0.914
.pass_comp_pct 0.016 0.001 10.991 0.000 0.016 0.910
.passer_rating 209.149 20.026 10.444 0.000 209.149 0.605
.cmpltn_prcnt__ 24.783 2.838 8.732 0.000 24.783 0.736
F1 1.000 1.000 1.000
R-Square:
Estimate
completinsPrGm 0.974
attemptsPerGam 0.949
pssng_yrdsPrGm 0.989
passng_tdsPrGm 0.748
pssng_r_yrdsPG 0.481
pssng_yrds__PG 0.597
pssng_frst_dPG 0.984
avg_cmpltd_r_y 0.081
avg_ntndd_r_yr 0.023
aggressiveness 0.003
mx_cmpltd_r_ds 0.224
avg_air_distnc 0.000
max_air_distnc 0.135
avg_r_yrds_t_s 0.046
passing_cpoe 0.086
pass_comp_pct 0.090
passer_rating 0.395
cmpltn_prcnt__ 0.264
The one-factor model did not fit well according to the fit indices:
Code
chisq df pvalue
16641.448 135.000 0.000
chisq.scaled df.scaled pvalue.scaled
8547.405 135.000 0.000
chisq.scaling.factor baseline.chisq baseline.df
1.947 43031.563 153.000
baseline.pvalue rmsea cfi
0.000 0.247 0.615
tli srmr rmsea.robust
0.564 0.380 0.571
cfi.robust tli.robust
0.220 0.116
$type
[1] "cor.bollen"
$cov
cmplPG attmPG pssng_yPG pssng_tPG
completionsPerGame 0.000
attemptsPerGame 0.023 0.000
passing_yardsPerGame -0.003 -0.003 0.000
passing_tdsPerGame -0.025 -0.050 0.011 0.000
passing_air_yardsPerGame 0.012 0.009 0.003 -0.021
passing_yards_after_catchPerGame -0.008 0.014 0.009 0.005
passing_first_downsPerGame -0.001 -0.006 0.001 0.014
avg_completed_air_yards -0.411 -0.432 -0.378 -0.288
avg_intended_air_yards -0.311 -0.333 -0.293 -0.229
aggressiveness -0.126 -0.111 -0.121 -0.105
max_completed_air_distance -0.231 -0.250 -0.198 -0.120
avg_air_distance -0.221 -0.241 -0.201 -0.150
max_air_distance -0.212 -0.232 -0.209 -0.159
avg_air_yards_to_sticks -0.352 -0.374 -0.331 -0.251
passing_cpoe -0.095 -0.180 -0.102 -0.040
pass_comp_pct -0.001 -0.083 -0.019 0.020
passer_rating -0.272 -0.352 -0.251 -0.084
completion_percentage_above_expectation -0.357 -0.438 -0.370 -0.281
pssng_r_PG p___PG pssng_f_PG avg_c__
completionsPerGame
attemptsPerGame
passing_yardsPerGame
passing_tdsPerGame
passing_air_yardsPerGame 0.000
passing_yards_after_catchPerGame -0.428 0.000
passing_first_downsPerGame 0.000 -0.003 0.000
avg_completed_air_yards 0.345 -0.862 -0.381 0.000
avg_intended_air_yards 0.442 -0.792 -0.293 0.910
aggressiveness 0.323 -0.439 -0.108 0.652
max_completed_air_distance 0.279 -0.546 -0.226 0.549
avg_air_distance 0.474 -0.689 -0.206 0.914
max_air_distance 0.403 -0.619 -0.220 0.607
avg_air_yards_to_sticks 0.396 -0.806 -0.327 0.885
passing_cpoe 0.171 -0.339 -0.093 0.192
pass_comp_pct -0.028 -0.052 -0.009 -0.235
passer_rating -0.143 -0.282 -0.247 -0.147
completion_percentage_above_expectation 0.059 -0.614 -0.358 0.246
avg_n__ aggrss mx_c__ avg_r_ mx_r_d
completionsPerGame
attemptsPerGame
passing_yardsPerGame
passing_tdsPerGame
passing_air_yardsPerGame
passing_yards_after_catchPerGame
passing_first_downsPerGame
avg_completed_air_yards
avg_intended_air_yards 0.000
aggressiveness 0.676 0.000
max_completed_air_distance 0.555 0.382 0.000
avg_air_distance 0.971 0.657 0.622 0.000
max_air_distance 0.717 0.467 0.597 0.757 0.000
avg_air_yards_to_sticks 0.953 0.677 0.539 0.957 0.682
passing_cpoe 0.213 -0.054 0.258 0.225 0.228
pass_comp_pct -0.233 -0.367 -0.036 -0.225 -0.100
passer_rating -0.122 -0.242 -0.003 -0.067 -0.086
completion_percentage_above_expectation 0.294 0.111 0.207 0.333 0.226
av____ pssng_ pss_c_ pssr_r cmp___
completionsPerGame
attemptsPerGame
passing_yardsPerGame
passing_tdsPerGame
passing_air_yardsPerGame
passing_yards_after_catchPerGame
passing_first_downsPerGame
avg_completed_air_yards
avg_intended_air_yards
aggressiveness
max_completed_air_distance
avg_air_distance
max_air_distance
avg_air_yards_to_sticks 0.000
passing_cpoe 0.199 0.000
pass_comp_pct -0.249 0.768 0.000
passer_rating -0.148 0.667 0.697 0.000
completion_percentage_above_expectation 0.261 0.813 0.630 0.466 0.000
$mean
completionsPerGame attemptsPerGame
0.000 0.000
passing_yardsPerGame passing_tdsPerGame
0.000 0.000
passing_air_yardsPerGame passing_yards_after_catchPerGame
0.000 0.000
passing_first_downsPerGame avg_completed_air_yards
0.000 -0.272
avg_intended_air_yards aggressiveness
-0.344 -0.155
max_completed_air_distance avg_air_distance
-0.197 -0.322
max_air_distance avg_air_yards_to_sticks
-0.361 -0.314
passing_cpoe pass_comp_pct
-0.226 0.013
passer_rating completion_percentage_above_expectation
-0.057 -0.266
Below are factor scores from the model for the first six players:
F1
[1,] -0.06560882
[2,] 0.16433996
[3,] 0.42149699
[4,] 0.11697062
[5,] 0.88607540
[6,] -0.06929236
The path diagram of the one-factor CFA model is in Figure 22.20.
Below is the syntax for the two-factor CFA model:
Code
cfa2factor_syntax <- '
#Factor loadings
F1 =~ completionsPerGame + attemptsPerGame + passing_yardsPerGame + passing_tdsPerGame +
passing_air_yardsPerGame + passing_yards_after_catchPerGame + passing_first_downsPerGame
F2 =~ avg_completed_air_yards + avg_intended_air_yards + aggressiveness + max_completed_air_distance +
avg_air_distance + max_air_distance + avg_air_yards_to_sticks + passing_cpoe + pass_comp_pct +
passer_rating + completion_percentage_above_expectation
'
lavaan 0.6-19 ended normally after 2164 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 55
Row rank of the constraints matrix 37
Number of observations 2002
Number of clusters [player_id] 396
Number of missing patterns 4
Model Test User Model:
Standard Scaled
Test Statistic 13378.048 6637.854
Degrees of freedom 134 134
P-value (Chi-square) 0.000 0.000
Scaling correction factor 2.015
Yuan-Bentler correction (Mplus variant)
Model Test Baseline Model:
Test statistic 43031.563 21852.033
Degrees of freedom 153 153
P-value 0.000 0.000
Scaling correction factor 1.969
User Model versus Baseline Model:
Comparative Fit Index (CFI) 0.691 0.700
Tucker-Lewis Index (TLI) 0.647 0.658
Robust Comparative Fit Index (CFI) 0.503
Robust Tucker-Lewis Index (TLI) 0.433
Loglikelihood and Information Criteria:
Loglikelihood user model (H0) -60906.894 -60906.894
Scaling correction factor 2.411
for the MLR correction
Loglikelihood unrestricted model (H1) -54217.870 -54217.870
Scaling correction factor 2.131
for the MLR correction
Akaike (AIC) 121923.789 121923.789
Bayesian (BIC) 122231.893 122231.893
Sample-size adjusted Bayesian (SABIC) 122057.155 122057.155
Root Mean Square Error of Approximation:
RMSEA 0.222 0.156
90 Percent confidence interval - lower 0.219 0.153
90 Percent confidence interval - upper 0.225 0.158
P-value H_0: RMSEA <= 0.050 0.000 0.000
P-value H_0: RMSEA >= 0.080 1.000 1.000
Robust RMSEA 0.457
90 Percent confidence interval - lower 0.439
90 Percent confidence interval - upper 0.476
P-value H_0: Robust RMSEA <= 0.050 0.000
P-value H_0: Robust RMSEA >= 0.080 1.000
Standardized Root Mean Square Residual:
SRMR 0.348 0.348
Parameter Estimates:
Standard errors Robust.cluster
Information Observed
Observed information based on Hessian
Latent Variables:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
F1 =~
completinsPrGm 7.927 0.125 63.503 0.000 7.927 0.987
attemptsPerGam 12.263 0.175 69.900 0.000 12.263 0.975
pssng_yrdsPrGm 92.397 1.500 61.596 0.000 92.397 0.994
passng_tdsPrGm 0.595 0.019 31.402 0.000 0.595 0.863
pssng_r_yrdsPG 83.777 3.595 23.302 0.000 83.777 0.693
pssng_yrds__PG 50.610 1.522 33.262 0.000 50.610 0.772
pssng_frst_dPG 4.509 0.080 56.535 0.000 4.509 0.992
F2 =~
avg_cmpltd_r_y 2.677 2.677 0.951
avg_ntndd_r_yr 3.272 0.090 36.239 0.000 3.272 0.996
aggressiveness 4.724 0.650 7.265 0.000 4.724 0.705
mx_cmpltd_r_ds 6.921 0.566 12.230 0.000 6.921 0.819
avg_air_distnc 3.036 0.128 23.705 0.000 3.036 0.974
max_air_distnc 6.366 0.638 9.975 0.000 6.366 0.820
avg_r_yrds_t_s 3.284 NA 3.284 0.989
passing_cpoe 10.466 0.488 21.450 0.000 10.466 0.872
pass_comp_pct 0.111 0.005 20.441 0.000 0.111 0.844
passer_rating 7.212 2.721 2.651 0.008 7.212 0.397
cmpltn_prcnt__ 2.497 0.806 3.100 0.002 2.497 0.422
Covariances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
F1 ~~
F2 0.288 0.039 7.373 0.000 0.288 0.288
Intercepts:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.completinsPrGm 13.216 0.396 33.415 0.000 13.216 1.646
.attemptsPerGam 21.828 0.588 37.096 0.000 21.828 1.736
.pssng_yrdsPrGm 147.994 4.667 31.714 0.000 147.994 1.592
.passng_tdsPrGm 0.847 0.036 23.649 0.000 0.847 1.228
.pssng_r_yrdsPG 132.999 5.791 22.966 0.000 132.999 1.100
.pssng_yrds__PG 87.622 3.002 29.187 0.000 87.622 1.336
.pssng_frst_dPG 7.143 0.230 31.073 0.000 7.143 1.571
.avg_cmpltd_r_y 5.102 0.110 46.242 0.000 5.102 1.813
.avg_ntndd_r_yr 7.237 0.135 53.525 0.000 7.237 2.202
.aggressiveness 15.414 0.353 43.647 0.000 15.414 2.301
.mx_cmpltd_r_ds 37.550 0.391 96.078 0.000 37.550 4.444
.avg_air_distnc 20.467 0.137 149.518 0.000 20.467 6.566
.max_air_distnc 46.133 0.409 112.709 0.000 46.133 5.943
.avg_r_yrds_t_s -1.725 0.132 -13.112 0.000 -1.725 -0.520
.passing_cpoe -3.291 0.374 -8.801 0.000 -3.291 -0.274
.pass_comp_pct 0.588 0.004 138.923 0.000 0.588 4.452
.passer_rating 83.848 1.230 68.169 0.000 83.848 4.621
.cmpltn_prcnt__ -1.561 0.364 -4.286 0.000 -1.561 -0.264
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.completinsPrGm 1.620 0.123 13.205 0.000 1.620 0.025
.attemptsPerGam 7.732 0.463 16.707 0.000 7.732 0.049
.pssng_yrdsPrGm 106.683 9.637 11.070 0.000 106.683 0.012
.passng_tdsPrGm 0.122 0.007 17.275 0.000 0.122 0.256
.pssng_r_yrdsPG 7592.718 0.353 21500.229 0.000 7592.718 0.520
.pssng_yrds__PG 1737.909 132.297 13.136 0.000 1737.909 0.404
.pssng_frst_dPG 0.343 0.027 12.923 0.000 0.343 0.017
.avg_cmpltd_r_y 0.750 0.078 9.587 0.000 0.750 0.095
.avg_ntndd_r_yr 0.096 0.027 3.537 0.000 0.096 0.009
.aggressiveness 22.565 2.996 7.533 0.000 22.565 0.503
.mx_cmpltd_r_ds 23.506 2.120 11.088 0.000 23.506 0.329
.avg_air_distnc 0.500 0.042 11.895 0.000 0.500 0.051
.max_air_distnc 19.728 2.181 9.045 0.000 19.728 0.327
.avg_r_yrds_t_s 0.230 0.039 5.859 0.000 0.230 0.021
.passing_cpoe 34.454 3.234 10.653 0.000 34.454 0.239
.pass_comp_pct 0.005 0.000 10.895 0.000 0.005 0.288
.passer_rating 277.206 23.560 11.766 0.000 277.206 0.842
.cmpltn_prcnt__ 28.724 2.988 9.613 0.000 28.724 0.822
F1 1.000 1.000 1.000
F2 1.000 1.000 1.000
R-Square:
Estimate
completinsPrGm 0.975
attemptsPerGam 0.951
pssng_yrdsPrGm 0.988
passng_tdsPrGm 0.744
pssng_r_yrdsPG 0.480
pssng_yrds__PG 0.596
pssng_frst_dPG 0.983
avg_cmpltd_r_y 0.905
avg_ntndd_r_yr 0.991
aggressiveness 0.497
mx_cmpltd_r_ds 0.671
avg_air_distnc 0.949
max_air_distnc 0.673
avg_r_yrds_t_s 0.979
passing_cpoe 0.761
pass_comp_pct 0.712
passer_rating 0.158
cmpltn_prcnt__ 0.178
The two-factor model did not fit well according to the fit indices:
Code
chisq df pvalue
13378.048 134.000 0.000
chisq.scaled df.scaled pvalue.scaled
6637.854 134.000 0.000
chisq.scaling.factor baseline.chisq baseline.df
2.015 43031.563 153.000
baseline.pvalue rmsea cfi
0.000 0.222 0.691
tli srmr rmsea.robust
0.647 0.348 0.457
cfi.robust tli.robust
0.503 0.433
$type
[1] "cor.bollen"
$cov
cmplPG attmPG pssng_yPG pssng_tPG
completionsPerGame 0.000
attemptsPerGame 0.021 0.000
passing_yardsPerGame -0.003 -0.003 0.000
passing_tdsPerGame -0.024 -0.048 0.013 0.000
passing_air_yardsPerGame 0.012 0.009 0.004 -0.019
passing_yards_after_catchPerGame -0.008 0.014 0.010 0.007
passing_first_downsPerGame -0.002 -0.007 0.002 0.016
avg_completed_air_yards -0.400 -0.422 -0.367 -0.279
avg_intended_air_yards -0.444 -0.465 -0.426 -0.344
aggressiveness -0.379 -0.361 -0.376 -0.326
max_completed_air_distance 0.003 -0.019 0.038 0.085
avg_air_distance -0.478 -0.495 -0.460 -0.375
max_air_distance -0.083 -0.104 -0.078 -0.045
avg_air_yards_to_sticks -0.422 -0.444 -0.402 -0.312
passing_cpoe -0.055 -0.139 -0.061 -0.004
pass_comp_pct 0.056 -0.027 0.038 0.070
passer_rating 0.235 0.149 0.260 0.361
completion_percentage_above_expectation 0.030 -0.056 0.020 0.059
pssng_r_PG p___PG pssng_f_PG avg_c__
completionsPerGame
attemptsPerGame
passing_yardsPerGame
passing_tdsPerGame
passing_air_yardsPerGame 0.000
passing_yards_after_catchPerGame -0.427 0.000
passing_first_downsPerGame 0.001 -0.002 0.000
avg_completed_air_yards 0.352 -0.854 -0.370 0.000
avg_intended_air_yards 0.349 -0.896 -0.426 0.006
aggressiveness 0.145 -0.637 -0.362 -0.034
max_completed_air_distance 0.443 -0.362 0.010 -0.096
avg_air_distance 0.294 -0.890 -0.465 -0.007
max_air_distance 0.494 -0.518 -0.090 -0.069
avg_air_yards_to_sticks 0.346 -0.861 -0.397 0.004
passing_cpoe 0.200 -0.307 -0.052 -0.554
pass_comp_pct 0.012 -0.007 0.048 -0.953
passer_rating 0.214 0.115 0.263 -0.346
completion_percentage_above_expectation 0.331 -0.310 0.031 -0.010
avg_n__ aggrss mx_c__ avg_r_ mx_r_d
completionsPerGame
attemptsPerGame
passing_yardsPerGame
passing_tdsPerGame
passing_air_yardsPerGame
passing_yards_after_catchPerGame
passing_first_downsPerGame
avg_completed_air_yards
avg_intended_air_yards 0.000
aggressiveness -0.034 0.000
max_completed_air_distance -0.189 -0.221 0.000
avg_air_distance 0.005 -0.030 -0.166 0.000
max_air_distance -0.044 -0.131 0.099 -0.034 0.000
avg_air_yards_to_sticks 0.000 -0.032 -0.170 -0.002 -0.051
passing_cpoe -0.611 -0.685 -0.318 -0.619 -0.379
pass_comp_pct -1.027 -0.979 -0.585 -1.041 -0.682
passer_rating -0.422 -0.556 -0.031 -0.442 -0.181
completion_percentage_above_expectation -0.049 -0.214 0.104 -0.068 0.069
av____ pssng_ pss_c_ pssr_r cmp___
completionsPerGame
attemptsPerGame
passing_yardsPerGame
passing_tdsPerGame
passing_air_yardsPerGame
passing_yards_after_catchPerGame
passing_first_downsPerGame
avg_completed_air_yards
avg_intended_air_yards
aggressiveness
max_completed_air_distance
avg_air_distance
max_air_distance
avg_air_yards_to_sticks 0.000
passing_cpoe -0.602 0.000
pass_comp_pct -1.020 0.119 0.000
passer_rating -0.407 0.504 0.550 0.000
completion_percentage_above_expectation -0.047 0.595 0.428 0.622 0.000
$mean
completionsPerGame attemptsPerGame
0.000 0.000
passing_yardsPerGame passing_tdsPerGame
0.000 0.000
passing_air_yardsPerGame passing_yards_after_catchPerGame
0.000 0.000
passing_first_downsPerGame avg_completed_air_yards
0.000 -0.104
avg_intended_air_yards aggressiveness
-0.142 0.039
max_completed_air_distance avg_air_distance
-0.154 -0.097
max_air_distance avg_air_yards_to_sticks
-0.283 -0.117
passing_cpoe pass_comp_pct
-0.189 0.008
passer_rating completion_percentage_above_expectation
-0.213 -0.338
Below are factor scores from the model for the first six players:
F1 F2
[1,] -0.07350235 0.7786304
[2,] 0.16428295 -0.1707025
[3,] 0.42916604 -0.2838464
[4,] 0.11007477 0.6616575
[5,] 0.88867369 0.1204172
[6,] -0.06549830 -0.3928992
The path diagram of the two-factor CFA model is in Figure 22.21.
Because the one-factor model is nested within the two-factor model, we can compare them using a chi-square difference test. The two factor model fit considerably better than the one-factor model in terms of a lower chi-square value:
Below is the syntax for the three-factor model:
Code
cfa3factor_syntax <- '
#Factor loadings
F1 =~ completionsPerGame + attemptsPerGame + passing_yardsPerGame + passing_tdsPerGame +
passing_air_yardsPerGame + passing_yards_after_catchPerGame + passing_first_downsPerGame
F2 =~ avg_completed_air_yards + avg_intended_air_yards + aggressiveness + max_completed_air_distance +
avg_air_distance + max_air_distance + avg_air_yards_to_sticks
F3 =~ passing_cpoe + pass_comp_pct + passer_rating + completion_percentage_above_expectation
'
lavaan 0.6-19 ended normally after 185 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 57
Row rank of the constraints matrix 39
Number of observations 2002
Number of clusters [player_id] 396
Number of missing patterns 4
Model Test User Model:
Standard Scaled
Test Statistic 10863.020 5689.900
Degrees of freedom 132 132
P-value (Chi-square) 0.000 0.000
Scaling correction factor 1.909
Yuan-Bentler correction (Mplus variant)
Model Test Baseline Model:
Test statistic 43031.563 21852.033
Degrees of freedom 153 153
P-value 0.000 0.000
Scaling correction factor 1.969
User Model versus Baseline Model:
Comparative Fit Index (CFI) 0.750 0.744
Tucker-Lewis Index (TLI) 0.710 0.703
Robust Comparative Fit Index (CFI) 0.714
Robust Tucker-Lewis Index (TLI) 0.669
Loglikelihood and Information Criteria:
Loglikelihood user model (H0) -59649.380 -59649.380
Scaling correction factor 2.643
for the MLR correction
Loglikelihood unrestricted model (H1) -54217.870 -54217.870
Scaling correction factor 2.131
for the MLR correction
Akaike (AIC) 119412.761 119412.761
Bayesian (BIC) 119732.069 119732.069
Sample-size adjusted Bayesian (SABIC) 119550.977 119550.977
Root Mean Square Error of Approximation:
RMSEA 0.202 0.145
90 Percent confidence interval - lower 0.198 0.143
90 Percent confidence interval - upper 0.205 0.147
P-value H_0: RMSEA <= 0.050 0.000 0.000
P-value H_0: RMSEA >= 0.080 1.000 1.000
Robust RMSEA 0.349
90 Percent confidence interval - lower 0.330
90 Percent confidence interval - upper 0.369
P-value H_0: Robust RMSEA <= 0.050 0.000
P-value H_0: Robust RMSEA >= 0.080 1.000
Standardized Root Mean Square Residual:
SRMR 0.325 0.325
Parameter Estimates:
Standard errors Robust.cluster
Information Observed
Observed information based on Hessian
Latent Variables:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
F1 =~
completinsPrGm 7.932 0.124 63.997 0.000 7.932 0.987
attemptsPerGam 12.270 0.175 69.967 0.000 12.270 0.975
pssng_yrdsPrGm 92.456 1.492 61.983 0.000 92.456 0.994
passng_tdsPrGm 0.595 0.019 31.607 0.000 0.595 0.863
pssng_r_yrdsPG 83.828 3.587 23.367 0.000 83.828 0.693
pssng_yrds__PG 50.640 1.527 33.163 0.000 50.640 0.772
pssng_frst_dPG 4.512 0.079 56.894 0.000 4.512 0.992
F2 =~
avg_cmpltd_r_y 1.111 0.064 17.405 0.000 1.111 0.782
avg_ntndd_r_yr 1.389 0.064 21.836 0.000 1.389 0.992
aggressiveness 2.009 0.300 6.694 0.000 2.009 0.391
mx_cmpltd_r_ds 2.681 0.277 9.679 0.000 2.681 0.475
avg_air_distnc 1.273 0.066 19.223 0.000 1.273 0.878
max_air_distnc 2.619 0.298 8.781 0.000 2.619 0.506
avg_r_yrds_t_s 1.383 0.079 17.463 0.000 1.383 0.937
F3 =~
passing_cpoe 12.128 0.481 25.227 0.000 12.128 0.997
pass_comp_pct 0.121 0.005 23.672 0.000 0.121 0.914
passer_rating 25.942 NA 25.942 0.921
cmpltn_prcnt__ 10.365 0.541 19.158 0.000 10.365 0.967
Covariances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
F1 ~~
F2 0.154 0.076 2.036 0.042 0.154 0.154
F3 0.303 0.037 8.212 0.000 0.303 0.303
F2 ~~
F3 0.078 0.132 0.586 0.558 0.078 0.078
Intercepts:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.completinsPrGm 13.216 0.396 33.415 0.000 13.216 1.645
.attemptsPerGam 21.828 0.588 37.096 0.000 21.828 1.735
.pssng_yrdsPrGm 147.994 4.667 31.714 0.000 147.994 1.591
.passng_tdsPrGm 0.847 0.036 23.649 0.000 0.847 1.228
.pssng_r_yrdsPG 133.000 5.791 22.966 0.000 133.000 1.100
.pssng_yrds__PG 87.622 3.002 29.187 0.000 87.622 1.336
.pssng_frst_dPG 7.143 0.230 31.073 0.000 7.143 1.570
.avg_cmpltd_r_y 5.671 0.095 59.795 0.000 5.671 3.987
.avg_ntndd_r_yr 7.930 0.108 73.374 0.000 7.930 5.663
.aggressiveness 16.414 0.296 55.526 0.000 16.414 3.191
.mx_cmpltd_r_ds 39.035 0.341 114.482 0.000 39.035 6.918
.avg_air_distnc 21.111 0.104 202.637 0.000 21.111 14.548
.max_air_distnc 47.488 0.333 142.765 0.000 47.488 9.175
.avg_r_yrds_t_s -1.029 0.110 -9.332 0.000 -1.029 -0.697
.passing_cpoe -3.494 0.375 -9.328 0.000 -3.494 -0.287
.pass_comp_pct 0.588 0.004 138.904 0.000 0.588 4.426
.passer_rating 80.612 0.997 80.843 0.000 80.612 2.861
.cmpltn_prcnt__ -2.948 0.356 -8.274 0.000 -2.948 -0.275
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.completinsPrGm 1.625 0.123 13.221 0.000 1.625 0.025
.attemptsPerGam 7.757 0.462 16.775 0.000 7.757 0.049
.pssng_yrdsPrGm 106.237 9.617 11.047 0.000 106.237 0.012
.passng_tdsPrGm 0.121 0.007 17.277 0.000 0.121 0.255
.pssng_r_yrdsPG 7592.808 0.352 21546.125 0.000 7592.808 0.519
.pssng_yrds__PG 1737.972 132.291 13.138 0.000 1737.972 0.404
.pssng_frst_dPG 0.342 0.026 12.943 0.000 0.342 0.017
.avg_cmpltd_r_y 0.787 0.084 9.388 0.000 0.787 0.389
.avg_ntndd_r_yr 0.033 0.021 1.543 0.123 0.033 0.017
.aggressiveness 22.417 2.889 7.760 0.000 22.417 0.847
.mx_cmpltd_r_ds 24.649 2.207 11.167 0.000 24.649 0.774
.avg_air_distnc 0.484 0.037 13.022 0.000 0.484 0.230
.max_air_distnc 19.929 2.186 9.117 0.000 19.929 0.744
.avg_r_yrds_t_s 0.268 0.043 6.167 0.000 0.268 0.123
.passing_cpoe 0.877 1.122 0.782 0.434 0.877 0.006
.pass_comp_pct 0.003 0.000 6.555 0.000 0.003 0.165
.passer_rating 121.020 17.269 7.008 0.000 121.020 0.152
.cmpltn_prcnt__ 7.407 1.173 6.314 0.000 7.407 0.065
F1 1.000 1.000 1.000
F2 1.000 1.000 1.000
F3 1.000 1.000 1.000
R-Square:
Estimate
completinsPrGm 0.975
attemptsPerGam 0.951
pssng_yrdsPrGm 0.988
passng_tdsPrGm 0.745
pssng_r_yrdsPG 0.481
pssng_yrds__PG 0.596
pssng_frst_dPG 0.983
avg_cmpltd_r_y 0.611
avg_ntndd_r_yr 0.983
aggressiveness 0.153
mx_cmpltd_r_ds 0.226
avg_air_distnc 0.770
max_air_distnc 0.256
avg_r_yrds_t_s 0.877
passing_cpoe 0.994
pass_comp_pct 0.835
passer_rating 0.848
cmpltn_prcnt__ 0.935
The three-factor model did not fit well according to fit indices:
Code
chisq df pvalue
10863.020 132.000 0.000
chisq.scaled df.scaled pvalue.scaled
5689.900 132.000 0.000
chisq.scaling.factor baseline.chisq baseline.df
1.909 43031.563 153.000
baseline.pvalue rmsea cfi
0.000 0.202 0.750
tli srmr rmsea.robust
0.710 0.325 0.349
cfi.robust tli.robust
0.714 0.669
However, we know that the three-factor model fit improved considerably when accounting for correlated residuals. So, we plan to examine modification indices to see if we can account for covariances between variables that were not explained by their underlying latent factors.
$type
[1] "cor.bollen"
$cov
cmplPG attmPG pssng_yPG pssng_tPG
completionsPerGame 0.000
attemptsPerGame 0.022 0.000
passing_yardsPerGame -0.003 -0.003 0.000
passing_tdsPerGame -0.024 -0.049 0.013 0.000
passing_air_yardsPerGame 0.012 0.009 0.004 -0.020
passing_yards_after_catchPerGame -0.008 0.014 0.010 0.007
passing_first_downsPerGame -0.002 -0.007 0.002 0.016
avg_completed_air_yards -0.249 -0.272 -0.215 -0.146
avg_intended_air_yards -0.313 -0.334 -0.294 -0.229
aggressiveness -0.238 -0.222 -0.234 -0.203
max_completed_air_distance 0.163 0.140 0.200 0.225
avg_air_distance -0.335 -0.354 -0.316 -0.250
max_air_distance 0.073 0.050 0.079 0.091
avg_air_yards_to_sticks -0.284 -0.307 -0.263 -0.191
passing_cpoe -0.105 -0.189 -0.112 -0.048
pass_comp_pct 0.023 -0.060 0.004 0.041
passer_rating 0.072 -0.011 0.097 0.219
completion_percentage_above_expectation -0.139 -0.223 -0.150 -0.089
pssng_r_PG p___PG pssng_f_PG avg_c__
completionsPerGame
attemptsPerGame
passing_yardsPerGame
passing_tdsPerGame
passing_air_yardsPerGame 0.000
passing_yards_after_catchPerGame -0.427 0.000
passing_first_downsPerGame 0.000 -0.002 0.000
avg_completed_air_yards 0.458 -0.736 -0.218 0.000
avg_intended_air_yards 0.441 -0.793 -0.294 0.178
aggressiveness 0.244 -0.527 -0.221 0.332
max_completed_air_distance 0.556 -0.237 0.171 0.312
avg_air_distance 0.394 -0.779 -0.321 0.234
max_air_distance 0.603 -0.396 0.067 0.316
avg_air_yards_to_sticks 0.444 -0.753 -0.258 0.214
passing_cpoe 0.165 -0.346 -0.102 0.215
pass_comp_pct -0.012 -0.034 0.014 -0.205
passer_rating 0.100 -0.012 0.100 -0.024
completion_percentage_above_expectation 0.212 -0.443 -0.139 0.333
avg_n__ aggrss mx_c__ avg_r_ mx_r_d
completionsPerGame
attemptsPerGame
passing_yardsPerGame
passing_tdsPerGame
passing_air_yardsPerGame
passing_yards_after_catchPerGame
passing_first_downsPerGame
avg_completed_air_yards
avg_intended_air_yards 0.000
aggressiveness 0.281 0.000
max_completed_air_distance 0.156 0.171 0.000
avg_air_distance 0.104 0.314 0.215 0.000
max_air_distance 0.271 0.250 0.530 0.320 0.000
avg_air_yards_to_sticks 0.057 0.300 0.195 0.140 0.287
passing_cpoe 0.181 -0.100 0.360 0.163 0.297
pass_comp_pct -0.257 -0.411 0.072 -0.281 -0.026
passer_rating -0.097 -0.304 0.261 -0.117 0.109
completion_percentage_above_expectation 0.297 0.054 0.415 0.278 0.377
av____ pssng_ pss_c_ pssr_r cmp___
completionsPerGame
attemptsPerGame
passing_yardsPerGame
passing_tdsPerGame
passing_air_yardsPerGame
passing_yards_after_catchPerGame
passing_first_downsPerGame
avg_completed_air_yards
avg_intended_air_yards
aggressiveness
max_completed_air_distance
avg_air_distance
max_air_distance
avg_air_yards_to_sticks 0.000
passing_cpoe 0.189 0.000
pass_comp_pct -0.251 -0.056 0.000
passer_rating -0.080 -0.067 0.044 0.000
completion_percentage_above_expectation 0.300 -0.001 -0.099 -0.101 0.000
$mean
completionsPerGame attemptsPerGame
0.000 0.000
passing_yardsPerGame passing_tdsPerGame
0.000 0.000
passing_air_yardsPerGame passing_yards_after_catchPerGame
0.000 0.000
passing_first_downsPerGame avg_completed_air_yards
0.000 -0.316
avg_intended_air_yards aggressiveness
-0.344 -0.111
max_completed_air_distance avg_air_distance
-0.368 -0.295
max_air_distance avg_air_yards_to_sticks
-0.463 -0.329
passing_cpoe pass_comp_pct
-0.174 0.011
passer_rating completion_percentage_above_expectation
-0.089 -0.223
Below are the modification indices, suggesting potential model modifications that would improve model fit such as correlated residuals and cross-loadings.
Below are factor scores from the model for the first six players:
F1 F2 F3
[1,] -0.07259821 0.01834072 0.8517998
[2,] 0.16408224 0.01722349 -0.1903004
[3,] 0.42835856 0.05079569 -0.3243843
[4,] 0.11043343 0.03759429 0.6407341
[5,] 0.88829206 0.13630769 0.2430781
[6,] -0.06581186 -0.02435272 -0.4394584
The path diagram of the three-factor CFA model is in Figure 22.22.
Because the two-factor model is nested within the three-factor model, we can compare them using a chi-square difference test. The three factor model fit considerably better than the two-factor model in terms of a lower chi-square value:
Based on the model modifications suggested by the modification indices for the three-factor model, we modify the model to account for correlated residuals, using the syntax below:
Code
cfa3factorModified_syntax <- '
# Factor loadings
F1 =~ NA*completionsPerGame + attemptsPerGame + passing_yardsPerGame + passing_tdsPerGame +
passing_air_yardsPerGame + passing_yards_after_catchPerGame + passing_first_downsPerGame
F2 =~ NA*avg_completed_air_yards + avg_intended_air_yards + aggressiveness + max_completed_air_distance +
avg_air_distance + max_air_distance + avg_air_yards_to_sticks
F3 =~ NA*passing_cpoe + pass_comp_pct + passer_rating + completion_percentage_above_expectation
# Cross loadings
#F3 =~ attemptsPerGame
# Correlated residuals
passing_air_yardsPerGame ~~ passing_yards_after_catchPerGame
completionsPerGame ~~ attemptsPerGame
attemptsPerGame ~~ passing_yards_after_catchPerGame
passing_yards_after_catchPerGame ~~ passing_first_downsPerGame
completionsPerGame ~~ passing_first_downsPerGame
attemptsPerGame ~~ passing_tdsPerGame
completionsPerGame ~~ passing_tdsPerGame
passing_tdsPerGame ~~ passing_air_yardsPerGame
max_completed_air_distance ~~ max_air_distance
avg_completed_air_yards ~~ max_completed_air_distance
avg_completed_air_yards ~~ avg_air_distance
avg_completed_air_yards ~~ max_air_distance
avg_intended_air_yards ~~ max_completed_air_distance
completionsPerGame ~~ pass_comp_pct
passing_yardsPerGame ~~ avg_completed_air_yards
passing_yardsPerGame ~~ max_completed_air_distance
passing_tdsPerGame ~~ passer_rating
aggressiveness ~~ completion_percentage_above_expectation
# Variances
F1 ~~ 1*F1
F2 ~~ 1*F2
'
lavaan 0.6-19 ended normally after 1347 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 76
Row rank of the constraints matrix 40
Number of observations 2002
Number of clusters [player_id] 396
Number of missing patterns 4
Model Test User Model:
Standard Scaled
Test Statistic 2192.914 1205.054
Degrees of freedom 113 113
P-value (Chi-square) 0.000 0.000
Scaling correction factor 1.820
Yuan-Bentler correction (Mplus variant)
Model Test Baseline Model:
Test statistic 43031.563 21852.033
Degrees of freedom 153 153
P-value 0.000 0.000
Scaling correction factor 1.969
User Model versus Baseline Model:
Comparative Fit Index (CFI) 0.951 0.950
Tucker-Lewis Index (TLI) 0.934 0.932
Robust Comparative Fit Index (CFI) NA
Robust Tucker-Lewis Index (TLI) NA
Loglikelihood and Information Criteria:
Loglikelihood user model (H0) -55314.327 -55314.327
Scaling correction factor 2.593
for the MLR correction
Loglikelihood unrestricted model (H1) -54217.870 -54217.870
Scaling correction factor 2.131
for the MLR correction
Akaike (AIC) 110780.655 110780.655
Bayesian (BIC) 111206.399 111206.399
Sample-size adjusted Bayesian (SABIC) 110964.943 110964.943
Root Mean Square Error of Approximation:
RMSEA 0.096 0.069
90 Percent confidence interval - lower 0.092 0.067
90 Percent confidence interval - upper 0.099 0.072
P-value H_0: RMSEA <= 0.050 0.000 0.000
P-value H_0: RMSEA >= 0.080 1.000 0.000
Robust RMSEA NA
90 Percent confidence interval - lower NA
90 Percent confidence interval - upper NA
P-value H_0: Robust RMSEA <= 0.050 NA
P-value H_0: Robust RMSEA >= 0.080 NA
Standardized Root Mean Square Residual:
SRMR 0.324 0.324
Parameter Estimates:
Standard errors Robust.cluster
Information Observed
Observed information based on Hessian
Latent Variables:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
F1 =~
completinsPrGm 7.880 0.126 62.745 0.000 7.880 0.982
attemptsPerGam 12.208 0.179 68.234 0.000 12.208 0.968
pssng_yrdsPrGm 92.521 1.492 62.020 0.000 92.521 0.996
passng_tdsPrGm 0.596 0.018 32.317 0.000 0.596 0.873
pssng_r_yrdsPG 84.416 3.660 23.064 0.000 84.416 0.696
pssng_yrds__PG 51.194 1.547 33.092 0.000 51.194 0.779
pssng_frst_dPG 4.514 0.080 56.080 0.000 4.514 0.991
F2 =~
avg_cmpltd_r_y 1.105 0.061 18.086 0.000 1.105 0.809
avg_ntndd_r_yr 1.389 0.065 21.387 0.000 1.389 0.986
aggressiveness 1.726 0.299 5.774 0.000 1.726 0.342
mx_cmpltd_r_ds 2.863 0.280 10.235 0.000 2.863 0.506
avg_air_distnc 1.284 0.067 19.227 0.000 1.284 0.881
max_air_distnc 2.648 0.302 8.775 0.000 2.648 0.511
avg_r_yrds_t_s 1.399 0.080 17.480 0.000 1.399 0.943
F3 =~
passing_cpoe 4.414 0.547 8.071 0.000 12.228 0.994
pass_comp_pct 0.042 0.005 8.364 0.000 0.117 0.906
passer_rating 9.934 1.359 7.312 0.000 27.524 0.935
cmpltn_prcnt__ 4.006 0.520 7.697 0.000 11.099 0.973
Covariances:
Estimate Std.Err z-value P(>|z|)
.passing_air_yardsPerGame ~~
.pssng_yrds__PG -3491.729 256.243 -13.627 0.000
.completionsPerGame ~~
.attemptsPerGam 3.496 0.187 18.726 0.000
.attemptsPerGame ~~
.pssng_yrds__PG 15.610 1.055 14.801 0.000
.passing_yards_after_catchPerGame ~~
.pssng_frst_dPG -3.403 0.280 -12.173 0.000
.completionsPerGame ~~
.pssng_frst_dPG 0.129 0.027 4.784 0.000
.attemptsPerGame ~~
.passng_tdsPrGm -0.438 0.036 -12.334 0.000
.completionsPerGame ~~
.passng_tdsPrGm -0.153 0.015 -10.311 0.000
.passing_tdsPerGame ~~
.pssng_r_yrdsPG -2.829 0.304 -9.296 0.000
.max_completed_air_distance ~~
.max_air_distnc 11.329 1.521 7.450 0.000
.avg_completed_air_yards ~~
.mx_cmpltd_r_ds 1.260 0.233 5.407 0.000
.avg_air_distnc -0.112 0.024 -4.772 0.000
.max_air_distnc -0.401 0.149 -2.686 0.007
.avg_intended_air_yards ~~
.mx_cmpltd_r_ds -0.460 0.135 -3.408 0.001
.completionsPerGame ~~
.pass_comp_pct 0.022 0.002 13.658 0.000
.passing_yardsPerGame ~~
.avg_cmpltd_r_y 4.794 0.319 15.037 0.000
.mx_cmpltd_r_ds 17.131 1.451 11.807 0.000
.passing_tdsPerGame ~~
.passer_rating 1.302 0.135 9.654 0.000
.aggressiveness ~~
.cmpltn_prcnt__ 5.594 0.987 5.667 0.000
F1 ~~
F2 0.224 0.073 3.086 0.002
F3 0.865 0.139 6.240 0.000
F2 ~~
F3 0.266 0.362 0.737 0.461
Std.lv Std.all
-3491.729 -0.973
3.496 0.725
15.610 0.120
-3.403 -0.135
0.129 0.138
-0.438 -0.415
-0.153 -0.300
-2.829 -0.097
11.329 0.521
1.260 0.322
-0.112 -0.203
-0.401 -0.112
-0.460 -0.403
0.022 0.265
4.794 0.690
17.131 0.406
1.302 0.374
5.594 0.448
0.224 0.224
0.312 0.312
0.096 0.096
Intercepts:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.completinsPrGm 13.216 0.396 33.416 0.000 13.216 1.647
.attemptsPerGam 21.828 0.588 37.097 0.000 21.828 1.731
.pssng_yrdsPrGm 147.994 4.667 31.714 0.000 147.994 1.593
.passng_tdsPrGm 0.847 0.036 23.649 0.000 0.847 1.239
.pssng_r_yrdsPG 132.999 5.791 22.966 0.000 132.999 1.097
.pssng_yrds__PG 87.622 3.002 29.188 0.000 87.622 1.333
.pssng_frst_dPG 7.143 0.230 31.074 0.000 7.143 1.568
.avg_cmpltd_r_y 5.648 0.090 63.065 0.000 5.648 4.135
.avg_ntndd_r_yr 7.884 0.105 75.362 0.000 7.884 5.598
.aggressiveness 16.380 0.292 56.166 0.000 16.380 3.244
.mx_cmpltd_r_ds 38.973 0.342 113.903 0.000 38.973 6.892
.avg_air_distnc 21.068 0.101 208.210 0.000 21.068 14.451
.max_air_distnc 47.399 0.331 142.982 0.000 47.399 9.144
.avg_r_yrds_t_s -1.076 0.108 -10.001 0.000 -1.076 -0.725
.passing_cpoe -3.288 0.375 -8.768 0.000 -3.288 -0.267
.pass_comp_pct 0.588 0.004 138.832 0.000 0.588 4.556
.passer_rating 80.907 1.126 71.837 0.000 80.907 2.748
.cmpltn_prcnt__ -2.888 0.370 -7.805 0.000 -2.888 -0.253
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
F1 1.000 1.000 1.000
F2 1.000 1.000 1.000
.completinsPrGm 2.326 0.111 20.869 0.000 2.326 0.036
.attemptsPerGam 9.989 0.452 22.091 0.000 9.989 0.063
.pssng_yrdsPrGm 74.862 6.377 11.739 0.000 74.862 0.009
.passng_tdsPrGm 0.111 0.007 17.008 0.000 0.111 0.238
.pssng_r_yrdsPG 7576.573 522.372 14.504 0.000 7576.573 0.515
.pssng_yrds__PG 1698.105 126.909 13.381 0.000 1698.105 0.393
.pssng_frst_dPG 0.373 0.029 12.742 0.000 0.373 0.018
.avg_cmpltd_r_y 0.645 0.063 10.202 0.000 0.645 0.346
.avg_ntndd_r_yr 0.055 0.020 2.702 0.007 0.055 0.028
.aggressiveness 22.510 2.923 7.701 0.000 22.510 0.883
.mx_cmpltd_r_ds 23.778 1.952 12.179 0.000 23.778 0.744
.avg_air_distnc 0.476 0.037 12.791 0.000 0.476 0.224
.max_air_distnc 19.857 2.179 9.112 0.000 19.857 0.739
.avg_r_yrds_t_s 0.246 0.038 6.552 0.000 0.246 0.112
.passing_cpoe 1.745 0.841 2.075 0.038 1.745 0.012
.pass_comp_pct 0.003 0.000 6.550 0.000 0.003 0.179
.passer_rating 108.999 16.145 6.751 0.000 108.999 0.126
.cmpltn_prcnt__ 6.916 1.038 6.663 0.000 6.916 0.053
F3 7.676 1.904 4.032 0.000 1.000 1.000
R-Square:
Estimate
completinsPrGm 0.964
attemptsPerGam 0.937
pssng_yrdsPrGm 0.991
passng_tdsPrGm 0.762
pssng_r_yrdsPG 0.485
pssng_yrds__PG 0.607
pssng_frst_dPG 0.982
avg_cmpltd_r_y 0.654
avg_ntndd_r_yr 0.972
aggressiveness 0.117
mx_cmpltd_r_ds 0.256
avg_air_distnc 0.776
max_air_distnc 0.261
avg_r_yrds_t_s 0.888
passing_cpoe 0.988
pass_comp_pct 0.821
passer_rating 0.874
cmpltn_prcnt__ 0.947
The three-factor model with correlated residuals fit was acceptable according to CFI and TLI. The model fit was subpar for RMSEA and SRMR.
Code
chisq df pvalue
2192.914 113.000 0.000
chisq.scaled df.scaled pvalue.scaled
1205.054 113.000 0.000
chisq.scaling.factor baseline.chisq baseline.df
1.820 43031.563 153.000
baseline.pvalue rmsea cfi
0.000 0.096 0.951
tli srmr rmsea.robust
0.934 0.324 NA
cfi.robust tli.robust
NA NA
$type
[1] "cor.bollen"
$cov
cmplPG attmPG pssng_yPG pssng_tPG
completionsPerGame 0.000
attemptsPerGame -0.001 0.000
passing_yardsPerGame 0.001 0.002 0.000
passing_tdsPerGame -0.001 -0.001 0.001 0.000
passing_air_yardsPerGame 0.013 0.011 0.000 0.005
passing_yards_after_catchPerGame -0.011 -0.007 0.002 -0.007
passing_first_downsPerGame 0.001 0.001 0.001 0.007
avg_completed_air_yards -0.308 -0.330 -0.313 -0.200
avg_intended_air_yards -0.378 -0.399 -0.362 -0.290
aggressiveness -0.254 -0.237 -0.251 -0.218
max_completed_air_distance 0.124 0.101 0.127 0.190
avg_air_distance -0.395 -0.413 -0.378 -0.305
max_air_distance 0.038 0.015 0.043 0.059
avg_air_yards_to_sticks -0.348 -0.371 -0.329 -0.251
passing_cpoe -0.112 -0.195 -0.121 -0.058
pass_comp_pct -0.003 -0.064 -0.002 0.033
passer_rating 0.061 -0.022 0.083 0.140
completion_percentage_above_expectation -0.148 -0.232 -0.161 -0.101
pssng_r_PG p___PG pssng_f_PG avg_c__
completionsPerGame
attemptsPerGame
passing_yardsPerGame
passing_tdsPerGame
passing_air_yardsPerGame 0.000
passing_yards_after_catchPerGame 0.004 0.000
passing_first_downsPerGame -0.002 0.003 0.000
avg_completed_air_yards 0.416 -0.784 -0.278 0.000
avg_intended_air_yards 0.394 -0.847 -0.361 0.156
aggressiveness 0.233 -0.540 -0.237 0.360
max_completed_air_distance 0.528 -0.269 0.131 0.111
avg_air_distance 0.350 -0.828 -0.382 0.263
max_air_distance 0.578 -0.425 0.031 0.355
avg_air_yards_to_sticks 0.397 -0.806 -0.324 0.183
passing_cpoe 0.158 -0.355 -0.111 0.198
pass_comp_pct -0.017 -0.040 0.008 -0.220
passer_rating 0.090 -0.024 0.087 -0.041
completion_percentage_above_expectation 0.204 -0.453 -0.150 0.316
avg_n__ aggrss mx_c__ avg_r_ mx_r_d
completionsPerGame
attemptsPerGame
passing_yardsPerGame
passing_tdsPerGame
passing_air_yardsPerGame
passing_yards_after_catchPerGame
passing_first_downsPerGame
avg_completed_air_yards
avg_intended_air_yards 0.000
aggressiveness 0.331 0.000
max_completed_air_distance 0.185 0.183 0.000
avg_air_distance 0.106 0.355 0.186 0.000
max_air_distance 0.269 0.273 0.125 0.314 0.000
avg_air_yards_to_sticks 0.056 0.343 0.163 0.131 0.279
passing_cpoe 0.163 -0.102 0.348 0.147 0.287
pass_comp_pct -0.273 -0.413 0.062 -0.296 -0.034
passer_rating -0.115 -0.306 0.249 -0.134 0.099
completion_percentage_above_expectation 0.280 -0.046 0.403 0.261 0.367
av____ pssng_ pss_c_ pssr_r cmp___
completionsPerGame
attemptsPerGame
passing_yardsPerGame
passing_tdsPerGame
passing_air_yardsPerGame
passing_yards_after_catchPerGame
passing_first_downsPerGame
avg_completed_air_yards
avg_intended_air_yards
aggressiveness
max_completed_air_distance
avg_air_distance
max_air_distance
avg_air_yards_to_sticks 0.000
passing_cpoe 0.171 0.000
pass_comp_pct -0.267 -0.046 0.000
passer_rating -0.098 -0.079 0.038 0.000
completion_percentage_above_expectation 0.283 -0.004 -0.097 -0.120 0.000
$mean
completionsPerGame attemptsPerGame
0.000 0.000
passing_yardsPerGame passing_tdsPerGame
0.000 0.000
passing_air_yardsPerGame passing_yards_after_catchPerGame
0.000 0.000
passing_first_downsPerGame avg_completed_air_yards
0.000 -0.308
avg_intended_air_yards aggressiveness
-0.330 -0.106
max_completed_air_distance avg_air_distance
-0.359 -0.282
max_air_distance avg_air_yards_to_sticks
-0.451 -0.315
passing_cpoe pass_comp_pct
-0.190 0.009
passer_rating completion_percentage_above_expectation
-0.100 -0.228
Error: lavaan->modificationindices():
could not compute modification indices; information matrix is singular
Below are factor scores from the model for the first six players:
Code
F1 F2 F3
[1,] 0.155936734 0.06008852 2.5399976
[2,] 0.161137983 0.03162709 -0.2902111
[3,] 0.243284575 0.04885075 -0.3328656
[4,] 0.109019463 0.04151196 1.7281787
[5,] 0.974356926 0.21606794 0.6206635
[6,] -0.009807153 -0.01506287 -1.2393687
The path diagram of the modified three-factor CFA model with correlated residuals is in Figure 22.23.
Here are the variables that loaded onto each of the factors:
Code
factor1vars <- c(
"completionsPerGame","attemptsPerGame","passing_yardsPerGame","passing_tdsPerGame",
"passing_air_yardsPerGame","passing_yards_after_catchPerGame","passing_first_downsPerGame")
factor2vars <- c(
"avg_completed_air_yards","avg_intended_air_yards","aggressiveness","max_completed_air_distance",
"avg_air_distance","max_air_distance","avg_air_yards_to_sticks")
factor3vars <- c(
"passing_cpoe","pass_comp_pct","passer_rating","completion_percentage_above_expectation")
The variables that loaded most strongly onto factor 1 appear to reflect Quarterback usage: completions per game, passing attempts per game, passing yards per game, passing touchdowns per game, passing air yards (total horizontal distance the ball travels on all pass attempts) per game, passing yards after the catch per game, and first downs gained per game by passing. Quarterbacks who tend to throw more tend to have higher levels on those variables. Thus, we label component 1 as “Usage”, which reflects total Quarterback involvement, regardless of efficiency or outcome.
The variables that loaded most strongly onto factor 2 appear to reflect Quarterback aggressiveness: average air yards on completed passes, average air yards on all attempted passes, aggressiveness (percentage of passing attempts thrown into tight windows, where there is a defender within one yard or less of the receiver at the time of the completion or incompletion), average amount of air yards ahead of or behind the first down marker on passing attempts, average air distance (the true three-dimensional distance the ball travels in the air), maximum air distance, and maximum air distance on completed passes. Quarterbacks who throw the ball farther and into tighter windows tend to have higher values on those variables. Thus, we label component 2 as “Aggressiveness”, which reflects throwing longer, more difficult passes with a tight window.
The variables that loaded most strongly onto factor 3 appear to reflect Quarterback performance: passing completion percentage above expectation, pass completion percentage, and passer rating. Quarterbacks who perform better tend to have higher values on those variables. Thus, we label component 3 as “Performance”.
Here are the players and seasons that showed the highest levels of Quarterback “Usage”:
Code
Here are the players and seasons that showed the lowest levels of Quarterback “Usage”:
Code
Here are the players and seasons that showed the highest levels of Quarterback “Aggressiveness”:
Code
Here are the players and seasons that showed the lowest levels of Quarterback “Aggressiveness”:
Code
Here are the players and seasons that showed the highest levels of Quarterback “Performance”:
Code
If we restrict it to Quarterbacks who played at least 10 games in the season, here are the players and seasons that showed the highest levels of Quarterback “Performance”:
Code
Here are the players and seasons that showed the lowest levels of Quarterback “Performance”:
Code
If we restrict it to Quarterbacks who played at least 10 games in the season, here are the players and seasons that showed the lowest levels of Quarterback “Performance”:
22.7 Conclusion
22.8 Session Info
R version 4.5.1 (2025-06-13)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 24.04.2 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
locale:
[1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
[4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
[7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
time zone: UTC
tzcode source: system (glibc)
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] lubridate_1.9.4 forcats_1.0.0 stringr_1.5.1 dplyr_1.1.4
[5] purrr_1.1.0 readr_2.1.5 tidyr_1.3.1 tibble_3.3.0
[9] ggplot2_3.5.2 tidyverse_2.0.0 lavaanPlot_0.8.1 lavaan_0.6-19
[13] nFactors_2.4.1.2 psych_2.5.6
loaded via a namespace (and not attached):
[1] GPArotation_2025.3-1 DiagrammeR_1.0.11 generics_0.1.4
[4] stringi_1.8.7 lattice_0.22-7 hms_1.1.3
[7] digest_0.6.37 magrittr_2.0.3 evaluate_1.0.4
[10] grid_4.5.1 timechange_0.3.0 RColorBrewer_1.1-3
[13] fastmap_1.2.0 jsonlite_2.0.0 scales_1.4.0
[16] pbivnorm_0.6.0 numDeriv_2016.8-1.1 mnormt_2.1.1
[19] cli_3.6.5 rlang_1.1.6 visNetwork_2.1.2
[22] withr_3.0.2 yaml_2.3.10 tools_4.5.1
[25] parallel_4.5.1 tzdb_0.5.0 vctrs_0.6.5
[28] R6_2.6.1 stats4_4.5.1 lifecycle_1.0.4
[31] htmlwidgets_1.6.4 MASS_7.3-65 pkgconfig_2.0.3
[34] pillar_1.11.0 gtable_0.3.6 glue_1.8.0
[37] xfun_0.52 tidyselect_1.2.1 rstudioapi_0.17.1
[40] knitr_1.50 farver_2.1.2 htmltools_0.5.8.1
[43] nlme_3.1-168 rmarkdown_2.29 compiler_4.5.1
[46] quadprog_1.5-8