I want your feedback to make the book better for you and other readers. If you find typos, errors, or places where the text may be improved, please let me know. The best ways to provide feedback are by GitHub or hypothes.is annotations.
You can leave a comment at the bottom of the page/chapter, or open an issue or submit a pull request on GitHub: https://github.com/isaactpetersen/Fantasy-Football-Analytics-Textbook
Alternatively, you can leave an annotation using hypothes.is.
To add an annotation, select some text and then click the
symbol on the pop-up menu.
To see the annotations of others, click the
symbol in the upper right-hand corner of the page.
23 Data Reduction: Principal Component Analysis
This chapter provides an overview of principal component analysis as a useful technique for data reduction.
23.1 Getting Started
23.1.1 Load Packages
23.1.2 Load Data
23.1.3 Prepare Data
23.1.3.1 Merge Data
23.1.3.2 Specify Variables
Code
pcaVars <- c(
"completions","attempts","passing_yards","passing_tds","passing_interceptions",
"sacks_suffered","sack_yards_lost","sack_fumbles","sack_fumbles_lost",
"passing_air_yards","passing_yards_after_catch","passing_first_downs",
"passing_epa","passing_cpoe","passing_2pt_conversions","pacr","pass_40_yds",
"pass_inc","pass_comp_pct","fumbles","two_pts",
"avg_time_to_throw","avg_completed_air_yards","avg_intended_air_yards",
"avg_air_yards_differential","aggressiveness","max_completed_air_distance",
"avg_air_yards_to_sticks","passer_rating", #,"completion_percentage"
"expected_completion_percentage","completion_percentage_above_expectation",
"avg_air_distance","max_air_distance")
23.1.3.3 Standardize Variables
23.2 Overview of Principal Component Analysis
Principal component analysis (PCA) is used if you want to reduce your data matrix. PCA composites represent the variances of an observed measure in as economical a fashion as possible, with no latent underlying variables. The goal of PCA is to identify a smaller number of components that explain as much variance in a set of variables as possible. It is an atheoretical way to decompose a matrix. PCA involves decomposition of a data matrix into a set of eigenvectors, which are transformations of the old variables.
The eigenvectors attempt to simplify the data in the matrix. PCA takes the data matrix and identifies the weighted sum of all variables that does the best job at explaining variance: these are the principal components, also called eigenvectors. Principal components reflect optimally weighted sums.
PCA decomposes the data matrix into any number of components—as many as the number of variables, which will always account for all variance. After the PCA is performed, you can look at the results and discard the components which likely reflect error variance. Judgments about which components to retain are based on empirical criteria in conjunction with theory to select a parsimonious number of components that account for the majority of variance.
The eigenvalue reflects the amount of variance explained by the component (eigenvector). When using a varimax (orthogonal) rotation, an eigenvalue for a component is calculated as the sum of squared standardized component loadings on that component. When using oblique rotation, however, the items explain more variance than is attributable to their factor loadings because the factors are correlated.
PCA pulls the first principal component out (i.e., the eigenvector that explains the most variance) and makes a new data matrix: i.e., new correlation matrix. Then the PCA pulls out the component that explains the next most variance—i.e., the eigenvector with the next largest eigenvalue, and it does this for all components, equal to the same number of variables. For instance, if there are six variables, it will iteratively extract an additional component up to six components. You can extract as many eigenvectors as there are variables. If you extract all six components, the data matrix left over will be the same as the correlation matrix in Figure 23.1. That is, the remaining variables (as part of the leftover data matrix) will be entirely uncorrelated with each other, because six components explain 100% of the variance from six variables. In other words, you can explain (6) variables with (6) new things!
However, it does no good if you have to use all (6) components because there is no data reduction from the original number of variables. When the goal is data reduction (as in PCA), the hope is that the first few components will explain most of the variance, so we can explain the variability in the data with fewer components than there are variables.
The sum of all eigenvalues is equal to the number of variables in the analysis. PCA does not have the same assumptions as factor analysis, which assumes that measures are partly from common variance and error. But if you estimate (6) eigenvectors and only keep (2), the model is a two-component model and whatever left becomes error. Therefore, PCA does not have the same assumptions as factor analysis, but it often ends up in the same place.
23.3 Decisions in Principal Component Analysis
There are four primary decisions to make in PCA:
- what variables to include in the model and how to scale them
- whether and how to rotate components
- how many components to retain
- how to interpret and use the components
As in factor analysis, the answer you get can differ highly depending on the decisions you make. We provide guidance on each of these decisions below and in Section 22.5.
23.3.1 1. Variables to Include and their Scaling
As in factor analysis. The first decision when conducting a factor analysis is which variables to include and the scaling of those variables. Guidance on which variables to include is in Section 22.5.1.
In addition, before performing a PCA, it is important to ensure that the variables included in the PCA are on the same scale. PCA seeks to identify components that explain variance in the data, so if the variables are not on the same scale, some variables may contribute considerably more variance than others. A common way of ensuring that variables are on the same scale is to standardize them using, for example, z-scores.
23.3.2 2. Component Rotation
Similar considerations as in factor analysis can be used to determine whether and how to rotate components in PCA. The considerations for determining whether and how to rotate factors in factor analysis are described in Section 22.5.3.
23.3.3 3. Determining the Number of Components to Retain
Similar criteria as in factor analysis can be used to determine the number of components to retain in PCA. The criteria for determining the number of factors to retain in factor analysis are described in Section 22.5.4.
23.3.4 4. Interpreting and Using PCA Components
The next step is interpreting the PCA components. Use theory to interpret and label the components.
23.4 PCA Versus Factor Analysis
Both factor analysis and PCA can be used for data reduction. The key distinction between factor analysis and PCA is depicted in Figure 23.2.
There are several differences between factor analysis and PCA. Factor analysis has greater sophistication than PCA, but greater sophistication often results in greater assumptions. Factor analysis does not always work; the data may not always fit to a factor analysis model. However, PCA can decompose any data matrix; it always works. PCA is okay if you are not interested in the factor structure. PCA uses all variance of variables and assumes variables have no error, so it does not account for measurement error. PCA is good if you just want to form a linear composite and perform data reduction. However, if you are interested in the factor structure, use factor analysis, which estimates a latent variable that accounts for the common variance and discards error variance. Factor analysis better handles error than PCA—factor analysis assumes that what is in the variable is the combination of common construct variance and error. By contrast, PCA assumes that the measures have no measurement error. Factor analysis is useful for the identification of latent constructs—i.e., underlying dimensions or factors that explain (cause) observed scores.
23.5 Example of Principal Component Analysis
We generated the scree plot in Figure 23.3 using the fa.parallel()
function of the psych
package (Revelle, 2025).
The number of components to keep would depend on which criteria one uses. Based on the rule to keep factors whose eigenvalues are greater than one and based on the parallel test, we would keep nine components. However, based on the Cattell scree test (the “elbow” of the screen plot minus one), we would keep three components. If using the optimal coordinates, we would keep four components; if using the acceleration factor, we would keep one component. Therefore, interpretability of the components would be important for deciding how many components to keep.
Parallel analysis suggests that the number of factors = NA and the number of components = 9
We generated the scree plot in Figure 23.4 using the nScree()
and plotnScree()
functions of the nFactors
package (Raiche & Magis, 2025).
Code
We generated the very simple structure (VSS) plots in Figures 23.5 and 23.6 using the vss()
and nfactors()
functions of the psych
package (Revelle, 2025). The optimal number of components based on the VSS criterion is three components.
Very Simple Structure
Call: psych::vss(x = dataForPCA[pcaVars], rotate = "oblimin", fm = "pc")
VSS complexity 1 achieves a maximimum of 0.76 with 3 factors
VSS complexity 2 achieves a maximimum of 0.9 with 3 factors
The Velicer MAP achieves a minimum of
Inf with factors
BIC achieves a minimum of Inf with factors
Sample Size adjusted BIC achieves a minimum of Inf with factors
Statistics by number of factors
vss1 vss2 map dof chisq prob sqresid fit RMSEA BIC SABIC complex eChisq SRMR
1 0.56 0.00 NaN 0 NA NA 77.3 0.56 NA NA NA NA NA NA
2 0.70 0.76 NaN 0 NA NA 42.5 0.76 NA NA NA NA NA NA
3 0.76 0.89 NaN 0 NA NA 16.5 0.90 NA NA NA NA NA NA
4 0.73 0.89 NaN 0 NA NA 13.3 0.92 NA NA NA NA NA NA
5 0.75 0.89 NaN 0 NA NA 10.9 0.94 NA NA NA NA NA NA
6 0.75 0.90 NaN 0 NA NA 8.6 0.95 NA NA NA NA NA NA
7 0.72 0.88 NaN 0 NA NA 7.0 0.96 NA NA NA NA NA NA
8 0.74 0.88 NaN 0 NA NA 5.7 0.97 NA NA NA NA NA NA
eCRMS eBIC
1 NA NA
2 NA NA
3 NA NA
4 NA NA
5 NA NA
6 NA NA
7 NA NA
8 NA NA
Error in plot.window(...): need finite 'ylim' values
Code
pca1ComponentOblique <- psych::principal(
dataForPCA[pcaVars],
nfactors = 1,
rotate = "oblimin")
pca2ComponentOblique <- psych::principal(
dataForPCA[pcaVars],
nfactors = 2,
rotate = "oblimin")
pca3ComponentOblique <- psych::principal(
dataForPCA[pcaVars],
nfactors = 3,
rotate = "oblimin")
pca4ComponentOblique <- psych::principal(
dataForPCA[pcaVars],
nfactors = 4,
rotate = "oblimin")
pca5ComponentOblique <- psych::principal(
dataForPCA[pcaVars],
nfactors = 5,
rotate = "oblimin")
pca6ComponentOblique <- psych::principal(
dataForPCA[pcaVars],
nfactors = 6,
rotate = "oblimin")
pca7ComponentOblique <- psych::principal(
dataForPCA[pcaVars],
nfactors = 7,
rotate = "oblimin")
pca8ComponentOblique <- psych::principal(
dataForPCA[pcaVars],
nfactors = 8,
rotate = "oblimin")
pca9ComponentOblique <- psych::principal(
dataForPCA[pcaVars],
nfactors = 9,
rotate = "oblimin")
Principal Components Analysis
Call: psych::principal(r = dataForPCA[pcaVars], nfactors = 1, rotate = "oblimin")
Standardized loadings (pattern matrix) based upon correlation matrix
PC1 h2 u2 com
completions 0.98 0.95693 0.0431 1
attempts 0.96 0.92856 0.0714 1
passing_yards 1.00 1.00562 -0.0056 1
passing_tds 0.83 0.69384 0.3062 1
passing_interceptions 0.65 0.42148 0.5785 1
sacks_suffered 0.81 0.64819 0.3518 1
sack_yards_lost -0.78 0.60956 0.3904 1
sack_fumbles 0.57 0.32564 0.6744 1
sack_fumbles_lost 0.46 0.21414 0.7859 1
passing_air_yards 0.84 0.70339 0.2966 1
passing_yards_after_catch 0.90 0.80549 0.1945 1
passing_first_downs 0.98 0.96525 0.0348 1
passing_epa 0.15 0.02383 0.9762 1
passing_cpoe 0.17 0.02992 0.9701 1
passing_2pt_conversions 0.26 0.06876 0.9312 1
pacr 0.09 0.00786 0.9921 1
pass_40_yds 0.30 0.09241 0.9076 1
pass_inc 0.89 0.79671 0.2033 1
pass_comp_pct 0.23 0.05423 0.9458 1
fumbles 0.50 0.25041 0.7496 1
two_pts 0.20 0.03958 0.9604 1
avg_time_to_throw -0.06 0.00332 0.9967 1
avg_completed_air_yards 0.11 0.01190 0.9881 1
avg_intended_air_yards -0.20 0.04121 0.9588 1
avg_air_yards_differential 0.08 0.00645 0.9936 1
aggressiveness -0.02 0.00060 0.9994 1
max_completed_air_distance 0.20 0.03903 0.9610 1
avg_air_yards_to_sticks 0.05 0.00287 0.9971 1
passer_rating 0.16 0.02504 0.9750 1
expected_completion_percentage 0.03 0.00067 0.9993 1
completion_percentage_above_expectation 0.16 0.02598 0.9740 1
avg_air_distance 0.03 0.00067 0.9993 1
max_air_distance 0.15 0.02139 0.9786 1
PC1
SS loadings 9.82
Proportion Var 0.30
Mean item complexity = 1
Test of the hypothesis that 1 component is sufficient.
The root mean square of the residuals (RMSR) is 0.23
with the empirical chi square 24992582 with prob < 0
Fit based upon off diagonal values = 0.6
Principal Components Analysis
Call: psych::principal(r = dataForPCA[pcaVars], nfactors = 2, rotate = "oblimin")
Standardized loadings (pattern matrix) based upon correlation matrix
TC1 TC2 h2 u2 com
completions 0.96 0.23 0.992 0.0082 1.1
attempts 0.97 -0.02 0.932 0.0676 1.0
passing_yards 0.99 0.14 1.014 -0.0136 1.0
passing_tds 0.82 0.18 0.714 0.2861 1.1
passing_interceptions 0.67 -0.23 0.490 0.5096 1.2
sacks_suffered 0.81 -0.10 0.667 0.3333 1.0
sack_yards_lost -0.79 0.10 0.627 0.3729 1.0
sack_fumbles 0.58 -0.08 0.336 0.6641 1.0
sack_fumbles_lost 0.47 -0.07 0.222 0.7780 1.0
passing_air_yards 0.87 -0.40 0.900 0.1001 1.4
passing_yards_after_catch 0.87 0.30 0.872 0.1277 1.2
passing_first_downs 0.97 0.17 0.980 0.0197 1.1
passing_epa 0.12 0.40 0.178 0.8221 1.2
passing_cpoe 0.13 0.55 0.328 0.6724 1.1
passing_2pt_conversions 0.26 -0.02 0.070 0.9300 1.0
pacr 0.04 0.66 0.445 0.5554 1.0
pass_40_yds 0.30 0.01 0.092 0.9076 1.0
pass_inc 0.92 -0.37 0.969 0.0311 1.3
pass_comp_pct 0.18 0.70 0.531 0.4685 1.1
fumbles 0.51 -0.07 0.258 0.7418 1.0
two_pts 0.20 -0.03 0.041 0.9590 1.0
avg_time_to_throw -0.03 -0.36 0.134 0.8661 1.0
avg_completed_air_yards 0.14 -0.36 0.144 0.8564 1.3
avg_intended_air_yards -0.15 -0.69 0.503 0.4968 1.1
avg_air_yards_differential 0.03 0.61 0.380 0.6199 1.0
aggressiveness 0.01 -0.47 0.216 0.7839 1.0
max_completed_air_distance 0.21 -0.12 0.054 0.9455 1.6
avg_air_yards_to_sticks 0.11 -0.71 0.517 0.4830 1.0
passer_rating 0.11 0.65 0.443 0.5566 1.1
expected_completion_percentage -0.03 0.75 0.563 0.4371 1.0
completion_percentage_above_expectation 0.12 0.51 0.277 0.7229 1.1
avg_air_distance 0.09 -0.76 0.576 0.4238 1.0
max_air_distance 0.18 -0.47 0.253 0.7471 1.3
TC1 TC2
SS loadings 9.81 5.91
Proportion Var 0.30 0.18
Cumulative Var 0.30 0.48
Proportion Explained 0.62 0.38
Cumulative Proportion 0.62 1.00
With component correlations of
TC1 TC2
TC1 1.00 0.03
TC2 0.03 1.00
Mean item complexity = 1.1
Test of the hypothesis that 2 components are sufficient.
The root mean square of the residuals (RMSR) is 0.17
with the empirical chi square 13393660 with prob < 0
Fit based upon off diagonal values = 0.78
Principal Components Analysis
Call: psych::principal(r = dataForPCA[pcaVars], nfactors = 3, rotate = "oblimin")
Standardized loadings (pattern matrix) based upon correlation matrix
TC1 TC2 TC3 h2 u2 com
completions 0.95 -0.16 0.19 0.993 0.0068 1.1
attempts 0.98 -0.04 -0.04 0.953 0.0467 1.0
passing_yards 0.94 0.10 0.36 1.076 -0.0763 1.3
passing_tds 0.75 0.12 0.45 0.834 0.1660 1.7
passing_interceptions 0.72 -0.02 -0.36 0.612 0.3885 1.5
sacks_suffered 0.86 -0.10 -0.25 0.775 0.2252 1.2
sack_yards_lost -0.84 0.11 0.25 0.735 0.2651 1.2
sack_fumbles 0.62 -0.10 -0.22 0.418 0.5824 1.3
sack_fumbles_lost 0.50 -0.08 -0.18 0.275 0.7248 1.3
passing_air_yards 0.84 0.46 -0.04 0.931 0.0687 1.5
passing_yards_after_catch 0.87 -0.25 0.19 0.879 0.1210 1.3
passing_first_downs 0.93 0.01 0.30 1.003 -0.0027 1.2
passing_epa -0.01 0.17 0.83 0.700 0.3002 1.1
passing_cpoe 0.03 -0.07 0.77 0.608 0.3921 1.0
passing_2pt_conversions 0.26 0.02 0.00 0.070 0.9299 1.0
pacr 0.04 -0.58 0.33 0.457 0.5428 1.6
pass_40_yds 0.22 0.30 0.40 0.307 0.6933 2.5
pass_inc 0.97 0.13 -0.39 1.058 -0.0581 1.4
pass_comp_pct 0.10 -0.32 0.71 0.636 0.3637 1.4
fumbles 0.54 -0.08 -0.18 0.311 0.6892 1.3
two_pts 0.20 0.04 0.01 0.041 0.9589 1.1
avg_time_to_throw -0.04 0.37 -0.12 0.153 0.8467 1.2
avg_completed_air_yards 0.02 0.76 0.37 0.689 0.3108 1.5
avg_intended_air_yards -0.23 0.89 0.01 0.830 0.1698 1.1
avg_air_yards_differential -0.02 -0.32 0.56 0.432 0.5685 1.6
aggressiveness 0.01 0.42 -0.21 0.227 0.7732 1.5
max_completed_air_distance 0.11 0.49 0.42 0.425 0.5754 2.1
avg_air_yards_to_sticks 0.03 0.93 0.03 0.872 0.1278 1.0
passer_rating -0.03 -0.02 0.99 0.973 0.0272 1.0
expected_completion_percentage -0.01 -0.74 0.27 0.628 0.3725 1.3
completion_percentage_above_expectation 0.00 0.03 0.83 0.688 0.3115 1.0
avg_air_distance 0.02 0.91 -0.06 0.844 0.1565 1.0
max_air_distance 0.13 0.61 0.01 0.387 0.6129 1.1
TC1 TC2 TC3
SS loadings 9.73 5.58 5.51
Proportion Var 0.29 0.17 0.17
Cumulative Var 0.29 0.46 0.63
Proportion Explained 0.47 0.27 0.26
Cumulative Proportion 0.47 0.74 1.00
With component correlations of
TC1 TC2 TC3
TC1 1.00 0.02 0.08
TC2 0.02 1.00 -0.04
TC3 0.08 -0.04 1.00
Mean item complexity = 1.3
Test of the hypothesis that 3 components are sufficient.
The root mean square of the residuals (RMSR) is 0.09
with the empirical chi square 4035534 with prob < 0
Fit based upon off diagonal values = 0.93
Principal Components Analysis
Call: psych::principal(r = dataForPCA[pcaVars], nfactors = 4, rotate = "oblimin")
Standardized loadings (pattern matrix) based upon correlation matrix
TC1 TC2 TC3 TC4 h2 u2
completions 0.95 -0.18 0.12 0.06 1.00 -0.0044
attempts 0.97 -0.07 -0.13 0.07 0.98 0.0205
passing_yards 0.96 0.08 0.27 0.02 1.08 -0.0818
passing_tds 0.83 0.09 0.32 -0.10 0.85 0.1510
passing_interceptions 0.65 -0.04 -0.37 0.17 0.62 0.3791
sacks_suffered 0.63 -0.06 -0.11 0.46 0.81 0.1910
sack_yards_lost -0.61 0.06 0.10 -0.47 0.77 0.2276
sack_fumbles 0.21 0.02 0.13 0.78 0.73 0.2699
sack_fumbles_lost 0.11 0.05 0.17 0.75 0.60 0.4026
passing_air_yards 0.83 0.44 -0.11 0.08 0.94 0.0628
passing_yards_after_catch 0.91 -0.28 0.08 -0.02 0.91 0.0885
passing_first_downs 0.94 -0.01 0.22 0.04 1.01 -0.0089
passing_epa 0.31 0.10 0.54 -0.56 0.79 0.2058
passing_cpoe -0.03 0.00 0.88 0.14 0.72 0.2796
passing_2pt_conversions 0.41 -0.04 -0.17 -0.25 0.16 0.8405
pacr 0.03 -0.56 0.35 0.02 0.46 0.5411
pass_40_yds 0.39 0.25 0.22 -0.29 0.34 0.6565
pass_inc 0.94 0.08 -0.47 0.09 1.10 -0.1040
pass_comp_pct 0.08 -0.27 0.76 0.07 0.68 0.3206
fumbles 0.20 0.03 0.11 0.65 0.52 0.4793
two_pts 0.34 -0.03 -0.16 -0.25 0.12 0.8782
avg_time_to_throw -0.14 0.40 -0.03 0.17 0.18 0.8153
avg_completed_air_yards 0.04 0.78 0.37 -0.01 0.71 0.2870
avg_intended_air_yards -0.24 0.91 0.04 0.03 0.86 0.1439
avg_air_yards_differential 0.00 -0.29 0.57 -0.03 0.44 0.5581
aggressiveness -0.06 0.44 -0.15 0.12 0.24 0.7642
max_completed_air_distance 0.23 0.47 0.31 -0.20 0.43 0.5716
avg_air_yards_to_sticks 0.07 0.92 -0.02 -0.07 0.87 0.1274
passer_rating 0.10 -0.01 0.91 -0.20 0.98 0.0216
expected_completion_percentage 0.08 -0.76 0.19 -0.15 0.65 0.3508
completion_percentage_above_expectation -0.04 0.10 0.92 0.11 0.80 0.2019
avg_air_distance 0.02 0.92 -0.06 0.01 0.85 0.1506
max_air_distance 0.22 0.57 -0.10 -0.16 0.40 0.5981
com
completions 1.1
attempts 1.1
passing_yards 1.2
passing_tds 1.3
passing_interceptions 1.8
sacks_suffered 1.9
sack_yards_lost 2.0
sack_fumbles 1.2
sack_fumbles_lost 1.2
passing_air_yards 1.6
passing_yards_after_catch 1.2
passing_first_downs 1.1
passing_epa 2.6
passing_cpoe 1.1
passing_2pt_conversions 2.1
pacr 1.7
pass_40_yds 3.3
pass_inc 1.5
pass_comp_pct 1.3
fumbles 1.3
two_pts 2.3
avg_time_to_throw 1.6
avg_completed_air_yards 1.4
avg_intended_air_yards 1.1
avg_air_yards_differential 1.5
aggressiveness 1.4
max_completed_air_distance 2.7
avg_air_yards_to_sticks 1.0
passer_rating 1.1
expected_completion_percentage 1.2
completion_percentage_above_expectation 1.1
avg_air_distance 1.0
max_air_distance 1.5
TC1 TC2 TC3 TC4
SS loadings 9.04 5.50 4.99 3.09
Proportion Var 0.27 0.17 0.15 0.09
Cumulative Var 0.27 0.44 0.59 0.69
Proportion Explained 0.40 0.24 0.22 0.14
Cumulative Proportion 0.40 0.64 0.86 1.00
With component correlations of
TC1 TC2 TC3 TC4
TC1 1.00 0.03 0.12 0.29
TC2 0.03 1.00 -0.06 -0.07
TC3 0.12 -0.06 1.00 -0.23
TC4 0.29 -0.07 -0.23 1.00
Mean item complexity = 1.5
Test of the hypothesis that 4 components are sufficient.
The root mean square of the residuals (RMSR) is 0.08
with the empirical chi square 3333582 with prob < 0
Fit based upon off diagonal values = 0.95
Principal Components Analysis
Call: psych::principal(r = dataForPCA[pcaVars], nfactors = 5, rotate = "oblimin")
Standardized loadings (pattern matrix) based upon correlation matrix
TC1 TC2 TC3 TC4 TC5 h2
completions 0.94 -0.18 0.12 0.07 0.05 1.01
attempts 0.96 -0.08 -0.13 0.07 0.04 0.99
passing_yards 0.97 0.08 0.26 0.02 0.01 1.09
passing_tds 0.82 0.09 0.32 -0.09 0.06 0.85
passing_interceptions 0.66 -0.04 -0.38 0.16 -0.01 0.63
sacks_suffered 0.62 -0.06 -0.11 0.47 0.02 0.81
sack_yards_lost -0.60 0.06 0.10 -0.47 -0.02 0.77
sack_fumbles 0.20 0.03 0.14 0.80 0.01 0.75
sack_fumbles_lost 0.09 0.06 0.18 0.77 0.01 0.62
passing_air_yards 0.82 0.43 -0.11 0.08 0.05 0.94
passing_yards_after_catch 0.91 -0.28 0.08 -0.02 0.03 0.92
passing_first_downs 0.93 -0.01 0.22 0.05 0.05 1.01
passing_epa 0.29 0.09 0.55 -0.55 0.07 0.79
passing_cpoe -0.04 0.00 0.89 0.16 0.01 0.73
passing_2pt_conversions 0.03 0.00 0.00 -0.02 0.89 0.80
pacr 0.06 -0.56 0.34 0.00 -0.06 0.46
pass_40_yds 0.45 0.24 0.19 -0.33 -0.11 0.38
pass_inc 0.94 0.08 -0.48 0.09 0.04 1.11
pass_comp_pct 0.07 -0.26 0.77 0.09 0.03 0.69
fumbles 0.19 0.03 0.12 0.67 0.02 0.53
two_pts -0.05 0.02 0.02 -0.02 0.91 0.80
avg_time_to_throw -0.17 0.41 -0.01 0.19 0.06 0.20
avg_completed_air_yards 0.06 0.77 0.36 -0.02 -0.03 0.71
avg_intended_air_yards -0.24 0.91 0.04 0.03 -0.01 0.86
avg_air_yards_differential 0.03 -0.30 0.56 -0.04 -0.07 0.44
aggressiveness -0.06 0.44 -0.15 0.12 -0.01 0.24
max_completed_air_distance 0.30 0.46 0.28 -0.24 -0.13 0.45
avg_air_yards_to_sticks 0.06 0.92 -0.01 -0.07 0.04 0.87
passer_rating 0.10 -0.01 0.92 -0.20 0.01 0.98
expected_completion_percentage 0.08 -0.76 0.19 -0.16 -0.01 0.65
completion_percentage_above_expectation -0.04 0.10 0.92 0.11 -0.01 0.81
avg_air_distance 0.01 0.92 -0.06 0.01 0.01 0.85
max_air_distance 0.25 0.56 -0.12 -0.18 -0.05 0.41
u2 com
completions -0.007 1.1
attempts 0.014 1.1
passing_yards -0.088 1.2
passing_tds 0.150 1.4
passing_interceptions 0.370 1.8
sacks_suffered 0.191 2.0
sack_yards_lost 0.227 2.0
sack_fumbles 0.254 1.2
sack_fumbles_lost 0.383 1.2
passing_air_yards 0.061 1.6
passing_yards_after_catch 0.081 1.2
passing_first_downs -0.010 1.1
passing_epa 0.206 2.6
passing_cpoe 0.265 1.1
passing_2pt_conversions 0.198 1.0
pacr 0.539 1.7
pass_40_yds 0.622 3.0
pass_inc -0.114 1.5
pass_comp_pct 0.312 1.3
fumbles 0.466 1.2
two_pts 0.195 1.0
avg_time_to_throw 0.803 1.8
avg_completed_air_yards 0.287 1.4
avg_intended_air_yards 0.142 1.1
avg_air_yards_differential 0.557 1.6
aggressiveness 0.764 1.4
max_completed_air_distance 0.547 3.3
avg_air_yards_to_sticks 0.126 1.0
passer_rating 0.020 1.1
expected_completion_percentage 0.349 1.2
completion_percentage_above_expectation 0.193 1.1
avg_air_distance 0.150 1.0
max_air_distance 0.588 1.7
TC1 TC2 TC3 TC4 TC5
SS loadings 8.81 5.49 4.98 3.15 1.74
Proportion Var 0.27 0.17 0.15 0.10 0.05
Cumulative Var 0.27 0.43 0.58 0.68 0.73
Proportion Explained 0.36 0.23 0.21 0.13 0.07
Cumulative Proportion 0.36 0.59 0.80 0.93 1.00
With component correlations of
TC1 TC2 TC3 TC4 TC5
TC1 1.00 0.03 0.12 0.29 0.21
TC2 0.03 1.00 -0.06 -0.08 0.01
TC3 0.12 -0.06 1.00 -0.22 0.00
TC4 0.29 -0.08 -0.22 1.00 0.07
TC5 0.21 0.01 0.00 0.07 1.00
Mean item complexity = 1.5
Test of the hypothesis that 5 components are sufficient.
The root mean square of the residuals (RMSR) is 0.08
with the empirical chi square 2975002 with prob < 0
Fit based upon off diagonal values = 0.95
Principal Components Analysis
Call: psych::principal(r = dataForPCA[pcaVars], nfactors = 6, rotate = "oblimin")
Standardized loadings (pattern matrix) based upon correlation matrix
TC1 TC2 TC3 TC4 TC6 TC5
completions 0.95 -0.17 0.19 0.04 -0.07 0.03
attempts 0.98 -0.07 -0.06 0.05 -0.09 0.03
passing_yards 0.93 0.07 0.14 0.05 0.26 0.03
passing_tds 0.79 0.09 0.22 -0.07 0.22 0.08
passing_interceptions 0.67 -0.05 -0.31 0.15 -0.13 -0.02
sacks_suffered 0.63 -0.05 -0.05 0.45 -0.10 0.01
sack_yards_lost -0.61 0.05 0.04 -0.46 0.11 -0.01
sack_fumbles 0.17 0.03 0.08 0.82 0.07 0.02
sack_fumbles_lost 0.06 0.06 0.10 0.79 0.10 0.02
passing_air_yards 0.84 0.44 -0.01 0.05 -0.13 0.03
passing_yards_after_catch 0.90 -0.28 0.07 -0.02 0.07 0.03
passing_first_downs 0.92 -0.01 0.20 0.05 0.09 0.06
passing_epa 0.26 0.09 0.41 -0.53 0.30 0.09
passing_cpoe -0.02 0.05 0.94 0.09 -0.03 -0.01
passing_2pt_conversions 0.02 0.00 -0.04 -0.01 -0.03 0.90
pacr -0.03 -0.59 0.01 0.11 0.54 0.00
pass_40_yds 0.37 0.21 -0.12 -0.22 0.56 -0.05
pass_inc 0.95 0.07 -0.42 0.09 -0.10 0.03
pass_comp_pct 0.11 -0.22 0.93 -0.01 -0.19 -0.01
fumbles 0.16 0.03 0.05 0.69 0.09 0.03
two_pts -0.05 0.01 -0.03 0.00 -0.02 0.91
avg_time_to_throw -0.19 0.40 -0.08 0.22 0.08 0.07
avg_completed_air_yards -0.03 0.75 0.05 0.08 0.53 0.03
avg_intended_air_yards -0.24 0.91 0.06 0.02 -0.02 -0.01
avg_air_yards_differential -0.09 -0.34 0.09 0.10 0.79 0.01
aggressiveness -0.04 0.44 -0.09 0.10 -0.12 -0.02
max_completed_air_distance 0.22 0.43 -0.01 -0.15 0.52 -0.08
avg_air_yards_to_sticks 0.07 0.93 0.02 -0.08 -0.04 0.03
passer_rating 0.05 0.00 0.72 -0.17 0.40 0.04
expected_completion_percentage 0.08 -0.76 0.16 -0.16 0.08 -0.01
completion_percentage_above_expectation -0.03 0.14 0.96 0.05 0.02 -0.03
avg_air_distance 0.02 0.92 -0.02 0.00 -0.07 0.00
max_air_distance 0.28 0.57 -0.01 -0.21 -0.13 -0.07
h2 u2 com
completions 1.03 -0.0330 1.2
attempts 1.00 0.0047 1.0
passing_yards 1.10 -0.0969 1.2
passing_tds 0.85 0.1492 1.4
passing_interceptions 0.63 0.3703 1.6
sacks_suffered 0.81 0.1901 1.9
sack_yards_lost 0.77 0.2267 2.0
sack_fumbles 0.77 0.2313 1.1
sack_fumbles_lost 0.65 0.3549 1.1
passing_air_yards 0.96 0.0399 1.6
passing_yards_after_catch 0.92 0.0802 1.2
passing_first_downs 1.01 -0.0123 1.1
passing_epa 0.79 0.2054 3.2
passing_cpoe 0.84 0.1610 1.0
passing_2pt_conversions 0.81 0.1895 1.0
pacr 0.64 0.3579 2.1
pass_40_yds 0.52 0.4794 2.6
pass_inc 1.11 -0.1148 1.4
pass_comp_pct 0.89 0.1090 1.2
fumbles 0.56 0.4420 1.2
two_pts 0.81 0.1852 1.0
avg_time_to_throw 0.21 0.7852 2.3
avg_completed_air_yards 0.85 0.1482 1.9
avg_intended_air_yards 0.86 0.1404 1.1
avg_air_yards_differential 0.78 0.2207 1.5
aggressiveness 0.24 0.7617 1.4
max_completed_air_distance 0.56 0.4447 2.6
avg_air_yards_to_sticks 0.88 0.1210 1.0
passer_rating 0.98 0.0190 1.7
expected_completion_percentage 0.65 0.3485 1.2
completion_percentage_above_expectation 0.90 0.1028 1.1
avg_air_distance 0.86 0.1448 1.0
max_air_distance 0.44 0.5601 1.9
TC1 TC2 TC3 TC4 TC6 TC5
SS loadings 8.70 5.48 4.11 3.09 2.57 1.74
Proportion Var 0.26 0.17 0.12 0.09 0.08 0.05
Cumulative Var 0.26 0.43 0.55 0.65 0.73 0.78
Proportion Explained 0.34 0.21 0.16 0.12 0.10 0.07
Cumulative Proportion 0.34 0.55 0.71 0.83 0.93 1.00
With component correlations of
TC1 TC2 TC3 TC4 TC6 TC5
TC1 1.00 0.03 0.11 0.31 0.08 0.21
TC2 0.03 1.00 -0.10 -0.08 0.02 0.01
TC3 0.11 -0.10 1.00 -0.16 0.35 0.04
TC4 0.31 -0.08 -0.16 1.00 -0.14 0.06
TC6 0.08 0.02 0.35 -0.14 1.00 0.04
TC5 0.21 0.01 0.04 0.06 0.04 1.00
Mean item complexity = 1.5
Test of the hypothesis that 6 components are sufficient.
The root mean square of the residuals (RMSR) is 0.07
with the empirical chi square 2403218 with prob < 0
Fit based upon off diagonal values = 0.96
Principal Components Analysis
Call: psych::principal(r = dataForPCA[pcaVars], nfactors = 7, rotate = "oblimin")
Standardized loadings (pattern matrix) based upon correlation matrix
TC1 TC2 TC3 TC4 TC6 TC7
completions 0.94 -0.19 0.20 0.05 -0.05 0.04
attempts 0.98 -0.10 -0.04 0.05 -0.07 -0.01
passing_yards 0.91 0.12 0.14 0.06 0.20 0.15
passing_tds 0.78 0.13 0.22 -0.06 0.16 0.13
passing_interceptions 0.71 -0.06 -0.28 0.10 -0.05 -0.18
sacks_suffered 0.63 -0.06 -0.04 0.43 -0.04 -0.13
sack_yards_lost -0.61 0.07 0.03 -0.44 0.05 0.12
sack_fumbles 0.11 0.05 0.05 0.86 0.05 0.05
sack_fumbles_lost 0.00 0.08 0.07 0.85 0.05 0.08
passing_air_yards 0.85 0.39 0.01 0.03 -0.18 -0.05
passing_yards_after_catch 0.86 -0.29 0.06 0.04 0.02 0.21
passing_first_downs 0.93 0.02 0.21 0.03 0.10 0.01
passing_epa 0.25 0.14 0.40 -0.50 0.18 0.27
passing_cpoe -0.03 0.03 0.93 0.09 -0.02 0.00
passing_2pt_conversions 0.03 0.00 -0.04 0.00 -0.01 -0.03
pacr 0.01 -0.41 0.01 0.06 0.66 0.01
pass_40_yds 0.28 0.30 -0.16 -0.10 0.29 0.53
pass_inc 0.97 0.05 -0.39 0.05 -0.07 -0.12
pass_comp_pct 0.08 -0.27 0.91 0.02 -0.16 0.05
fumbles 0.11 0.06 0.03 0.72 0.07 0.05
two_pts -0.05 0.02 -0.03 0.01 -0.01 -0.02
avg_time_to_throw -0.21 0.41 -0.09 0.24 -0.01 0.05
avg_completed_air_yards 0.00 0.88 0.05 0.03 0.41 0.05
avg_intended_air_yards -0.27 0.86 0.04 0.06 -0.23 0.12
avg_air_yards_differential -0.04 -0.09 0.09 0.03 0.87 0.03
aggressiveness 0.12 0.47 -0.03 -0.12 0.09 -0.61
max_completed_air_distance 0.12 0.50 -0.05 -0.03 0.21 0.53
avg_air_yards_to_sticks 0.07 0.88 0.02 -0.08 -0.21 0.02
passer_rating 0.01 0.08 0.69 -0.12 0.30 0.30
expected_completion_percentage -0.05 -0.77 0.11 0.02 -0.01 0.48
completion_percentage_above_expectation 0.03 0.17 0.97 -0.03 0.11 -0.19
avg_air_distance 0.01 0.86 -0.02 0.01 -0.25 0.03
max_air_distance 0.15 0.45 -0.05 -0.04 -0.45 0.44
TC5 h2 u2 com
completions 0.03 1.03 -0.033 1.2
attempts 0.03 1.00 0.003 1.0
passing_yards 0.03 1.10 -0.098 1.3
passing_tds 0.08 0.85 0.147 1.4
passing_interceptions -0.03 0.65 0.352 1.5
sacks_suffered 0.01 0.81 0.189 1.9
sack_yards_lost 0.00 0.77 0.226 2.0
sack_fumbles 0.02 0.81 0.195 1.1
sack_fumbles_lost 0.02 0.69 0.305 1.1
passing_air_yards 0.03 0.97 0.035 1.5
passing_yards_after_catch 0.04 0.93 0.067 1.4
passing_first_downs 0.05 1.02 -0.023 1.1
passing_epa 0.10 0.79 0.205 3.8
passing_cpoe -0.01 0.84 0.161 1.0
passing_2pt_conversions 0.90 0.81 0.189 1.0
pacr -0.01 0.67 0.327 1.7
pass_40_yds -0.04 0.58 0.420 3.2
pass_inc 0.03 1.13 -0.129 1.4
pass_comp_pct 0.00 0.90 0.104 1.3
fumbles 0.03 0.58 0.417 1.1
two_pts 0.91 0.82 0.184 1.0
avg_time_to_throw 0.07 0.22 0.778 2.4
avg_completed_air_yards 0.02 0.88 0.119 1.4
avg_intended_air_yards 0.00 0.88 0.123 1.4
avg_air_yards_differential 0.00 0.85 0.149 1.1
aggressiveness -0.05 0.57 0.425 2.1
max_completed_air_distance -0.06 0.63 0.370 2.5
avg_air_yards_to_sticks 0.03 0.88 0.121 1.1
passer_rating 0.04 0.98 0.015 1.9
expected_completion_percentage 0.02 0.84 0.159 1.7
completion_percentage_above_expectation -0.04 0.96 0.035 1.2
avg_air_distance 0.01 0.86 0.143 1.2
max_air_distance -0.04 0.63 0.366 3.3
TC1 TC2 TC3 TC4 TC6 TC7 TC5
SS loadings 8.56 5.27 4.00 3.07 2.39 1.93 1.74
Proportion Var 0.26 0.16 0.12 0.09 0.07 0.06 0.05
Cumulative Var 0.26 0.42 0.54 0.63 0.71 0.76 0.82
Proportion Explained 0.32 0.20 0.15 0.11 0.09 0.07 0.06
Cumulative Proportion 0.32 0.51 0.66 0.78 0.86 0.94 1.00
With component correlations of
TC1 TC2 TC3 TC4 TC6 TC7 TC5
TC1 1.00 0.05 0.10 0.35 0.05 0.10 0.21
TC2 0.05 1.00 -0.04 -0.10 -0.10 0.04 0.01
TC3 0.10 -0.04 1.00 -0.13 0.31 0.27 0.05
TC4 0.35 -0.10 -0.13 1.00 -0.08 -0.12 0.07
TC6 0.05 -0.10 0.31 -0.08 1.00 0.17 0.03
TC7 0.10 0.04 0.27 -0.12 0.17 1.00 0.05
TC5 0.21 0.01 0.05 0.07 0.03 0.05 1.00
Mean item complexity = 1.6
Test of the hypothesis that 7 components are sufficient.
The root mean square of the residuals (RMSR) is 0.07
with the empirical chi square 2111929 with prob < 0
Fit based upon off diagonal values = 0.97
Principal Components Analysis
Call: psych::principal(r = dataForPCA[pcaVars], nfactors = 8, rotate = "oblimin")
Standardized loadings (pattern matrix) based upon correlation matrix
TC1 TC3 TC2 TC7 TC6 TC4
completions 0.92 0.17 -0.10 -0.13 -0.02 0.08
attempts 0.97 -0.02 -0.05 -0.04 -0.05 0.05
passing_yards 0.87 0.12 0.23 0.01 0.15 0.11
passing_tds 0.72 0.09 0.09 0.07 0.05 0.09
passing_interceptions 0.77 -0.06 0.04 0.03 0.08 -0.08
sacks_suffered 0.65 0.05 -0.06 0.02 0.02 0.34
sack_yards_lost -0.62 -0.06 0.06 -0.01 -0.02 -0.35
sack_fumbles 0.07 -0.03 -0.02 -0.01 -0.01 0.89
sack_fumbles_lost -0.05 -0.02 0.02 -0.02 -0.02 0.88
passing_air_yards 0.84 0.09 0.21 0.23 -0.23 0.00
passing_yards_after_catch 0.82 -0.01 -0.05 -0.28 0.03 0.12
passing_first_downs 0.89 0.16 0.00 0.06 0.07 0.09
passing_epa 0.15 0.12 0.07 0.03 -0.01 -0.21
passing_cpoe -0.02 0.98 0.10 -0.06 0.04 -0.01
passing_2pt_conversions 0.02 -0.02 -0.05 -0.02 0.01 -0.01
pacr 0.04 0.12 0.07 -0.17 0.82 -0.07
pass_40_yds 0.23 -0.20 0.62 -0.15 0.16 -0.02
pass_inc 0.98 -0.31 -0.01 0.12 -0.05 0.03
pass_comp_pct 0.09 0.88 -0.14 -0.26 -0.08 -0.01
fumbles 0.07 -0.05 -0.01 0.00 0.01 0.75
two_pts -0.05 -0.01 -0.03 -0.03 0.01 -0.01
avg_time_to_throw -0.19 0.05 0.38 0.12 -0.03 0.12
avg_completed_air_yards -0.04 0.07 0.63 0.54 0.22 0.05
avg_intended_air_yards -0.31 0.03 0.49 0.36 -0.44 0.09
avg_air_yards_differential -0.06 0.07 0.17 0.09 0.89 0.04
aggressiveness 0.10 -0.09 -0.29 0.82 -0.02 -0.04
max_completed_air_distance 0.10 0.03 0.84 -0.11 0.11 -0.06
avg_air_yards_to_sticks 0.03 0.02 0.44 0.46 -0.41 -0.03
passer_rating -0.07 0.46 0.14 -0.07 0.16 0.07
expected_completion_percentage -0.04 0.07 -0.06 -0.82 0.12 0.01
completion_percentage_above_expectation 0.02 0.92 -0.05 0.25 0.10 -0.04
avg_air_distance -0.01 0.03 0.48 0.41 -0.43 0.01
max_air_distance 0.15 0.03 0.58 -0.20 -0.54 -0.08
TC8 TC5 h2 u2 com
completions 0.11 0.03 1.03 -0.034 1.2
attempts 0.00 0.03 1.00 0.003 1.0
passing_yards 0.18 0.04 1.10 -0.098 1.4
passing_tds 0.44 0.06 0.90 0.100 1.8
passing_interceptions -0.51 0.02 0.78 0.218 1.8
sacks_suffered -0.30 0.02 0.82 0.176 2.0
sack_yards_lost 0.30 -0.02 0.79 0.213 2.1
sack_fumbles -0.03 0.00 0.85 0.147 1.0
sack_fumbles_lost -0.01 0.00 0.74 0.258 1.0
passing_air_yards -0.12 0.05 0.98 0.020 1.5
passing_yards_after_catch 0.24 0.02 0.95 0.052 1.5
passing_first_downs 0.20 0.05 1.03 -0.028 1.2
passing_epa 0.87 0.05 0.97 0.030 1.3
passing_cpoe -0.11 0.01 0.95 0.045 1.1
passing_2pt_conversions 0.01 0.90 0.81 0.187 1.0
pacr -0.20 0.01 0.79 0.214 1.3
pass_40_yds 0.32 -0.04 0.58 0.420 2.4
pass_inc -0.16 0.04 1.13 -0.130 1.3
pass_comp_pct 0.05 0.00 0.92 0.078 1.3
fumbles 0.00 0.01 0.62 0.378 1.0
two_pts 0.00 0.92 0.82 0.180 1.0
avg_time_to_throw -0.32 0.11 0.29 0.709 3.2
avg_completed_air_yards 0.10 0.05 0.89 0.110 2.3
avg_intended_air_yards 0.06 0.01 0.88 0.122 3.7
avg_air_yards_differential 0.14 0.00 0.86 0.140 1.2
aggressiveness 0.08 -0.05 0.63 0.370 1.3
max_completed_air_distance 0.04 -0.03 0.70 0.297 1.1
avg_air_yards_to_sticks 0.09 0.05 0.88 0.120 3.1
passer_rating 0.66 0.01 1.04 -0.041 2.1
expected_completion_percentage 0.12 0.00 0.84 0.158 1.1
completion_percentage_above_expectation 0.11 -0.04 0.98 0.020 1.2
avg_air_distance -0.05 0.03 0.86 0.140 3.0
max_air_distance -0.06 -0.02 0.66 0.338 2.4
TC1 TC3 TC2 TC7 TC6 TC4 TC8 TC5
SS loadings 8.33 3.40 3.28 2.94 2.84 2.89 2.66 1.75
Proportion Var 0.25 0.10 0.10 0.09 0.09 0.09 0.08 0.05
Cumulative Var 0.25 0.36 0.45 0.54 0.63 0.72 0.80 0.85
Proportion Explained 0.30 0.12 0.12 0.10 0.10 0.10 0.09 0.06
Cumulative Proportion 0.30 0.42 0.53 0.64 0.74 0.84 0.94 1.00
With component correlations of
TC1 TC3 TC2 TC7 TC6 TC4 TC8 TC5
TC1 1.00 0.08 0.09 -0.03 0.03 0.41 0.03 0.21
TC3 0.08 1.00 0.12 -0.11 0.24 0.01 0.38 0.02
TC2 0.09 0.12 1.00 0.31 -0.15 -0.02 0.19 0.06
TC7 -0.03 -0.11 0.31 1.00 -0.28 -0.01 -0.04 0.04
TC6 0.03 0.24 -0.15 -0.28 1.00 0.03 0.18 -0.02
TC4 0.41 0.01 -0.02 -0.01 0.03 1.00 -0.10 0.11
TC8 0.03 0.38 0.19 -0.04 0.18 -0.10 1.00 0.03
TC5 0.21 0.02 0.06 0.04 -0.02 0.11 0.03 1.00
Mean item complexity = 1.7
Test of the hypothesis that 8 components are sufficient.
The root mean square of the residuals (RMSR) is 0.06
with the empirical chi square 1779393 with prob < 0
Fit based upon off diagonal values = 0.97
Principal Components Analysis
Call: psych::principal(r = dataForPCA[pcaVars], nfactors = 9, rotate = "oblimin")
Standardized loadings (pattern matrix) based upon correlation matrix
TC1 TC3 TC4 TC2 TC6 TC8
completions 0.90 0.18 0.09 -0.03 -0.01 0.10
attempts 0.95 -0.01 0.06 -0.05 0.02 -0.01
passing_yards 0.86 0.11 0.11 0.15 0.21 0.19
passing_tds 0.73 0.06 0.06 0.05 0.01 0.50
passing_interceptions 0.79 -0.05 -0.09 0.09 0.02 -0.50
sacks_suffered 0.70 0.04 0.31 0.01 -0.14 -0.24
sack_yards_lost -0.67 -0.04 -0.31 -0.01 0.14 0.25
sack_fumbles 0.06 -0.02 0.90 -0.01 0.00 -0.04
sack_fumbles_lost -0.06 -0.01 0.89 -0.02 0.04 -0.02
passing_air_yards 0.85 0.09 -0.01 -0.21 0.17 -0.09
passing_yards_after_catch 0.81 -0.02 0.12 0.02 0.02 0.23
passing_first_downs 0.90 0.15 0.08 0.06 -0.02 0.23
passing_epa 0.14 0.10 -0.23 -0.01 0.04 0.90
passing_cpoe -0.01 0.98 -0.02 0.04 0.06 -0.10
passing_2pt_conversions 0.01 -0.01 0.00 0.01 -0.02 0.00
pacr 0.03 0.13 -0.05 0.81 0.08 -0.23
pass_40_yds 0.16 -0.19 0.03 0.19 0.68 0.26
pass_inc 0.99 -0.31 0.02 -0.05 -0.01 -0.14
pass_comp_pct 0.07 0.89 0.00 -0.09 -0.06 0.03
fumbles 0.06 -0.04 0.76 0.01 0.00 0.00
two_pts -0.06 0.00 0.00 0.01 0.00 -0.01
avg_time_to_throw -0.01 -0.05 -0.02 -0.02 -0.14 -0.07
avg_completed_air_yards 0.00 0.04 0.01 0.25 0.35 0.19
avg_intended_air_yards -0.28 0.01 0.08 -0.40 0.32 0.11
avg_air_yards_differential -0.06 0.07 0.04 0.88 0.09 0.16
aggressiveness 0.02 -0.02 0.03 -0.01 -0.06 -0.04
max_completed_air_distance 0.01 0.06 0.00 0.16 0.91 -0.06
avg_air_yards_to_sticks 0.06 0.00 -0.06 -0.38 0.26 0.16
passer_rating -0.08 0.44 0.06 0.16 0.09 0.69
expected_completion_percentage -0.06 0.06 0.02 0.10 0.03 0.08
completion_percentage_above_expectation 0.00 0.93 -0.02 0.10 -0.03 0.10
avg_air_distance 0.02 0.01 -0.02 -0.39 0.30 0.02
max_air_distance 0.07 0.05 -0.01 -0.50 0.72 -0.15
TC7 TC9 TC5 h2 u2 com
completions -0.08 -0.17 0.04 1.03 -0.0347 1.2
attempts 0.00 -0.13 0.04 1.00 0.0015 1.1
passing_yards 0.00 0.04 0.03 1.10 -0.0978 1.4
passing_tds 0.00 0.10 0.04 0.93 0.0704 1.9
passing_interceptions 0.02 0.06 0.02 0.78 0.2176 1.8
sacks_suffered -0.04 0.14 0.00 0.84 0.1636 1.9
sack_yards_lost 0.05 -0.13 0.00 0.80 0.2022 2.0
sack_fumbles 0.02 -0.02 0.00 0.86 0.1376 1.0
sack_fumbles_lost 0.01 -0.02 0.01 0.76 0.2445 1.0
passing_air_yards 0.18 0.16 0.05 0.98 0.0201 1.4
passing_yards_after_catch -0.24 -0.18 0.02 0.95 0.0503 1.5
passing_first_downs 0.02 0.01 0.04 1.03 -0.0338 1.2
passing_epa -0.03 -0.01 0.04 1.00 0.0014 1.2
passing_cpoe -0.07 0.09 0.01 0.96 0.0438 1.1
passing_2pt_conversions 0.00 -0.01 0.90 0.82 0.1802 1.0
pacr -0.14 -0.09 0.02 0.79 0.2074 1.3
pass_40_yds -0.09 -0.05 -0.02 0.61 0.3861 1.8
pass_inc 0.11 0.02 0.04 1.13 -0.1305 1.3
pass_comp_pct -0.21 -0.16 0.01 0.93 0.0744 1.2
fumbles 0.02 0.01 0.02 0.63 0.3737 1.0
two_pts 0.00 0.00 0.92 0.83 0.1720 1.0
avg_time_to_throw -0.25 0.89 0.01 0.71 0.2868 1.2
avg_completed_air_yards 0.35 0.57 0.01 0.91 0.0942 3.2
avg_intended_air_yards 0.24 0.43 -0.01 0.88 0.1185 4.6
avg_air_yards_differential 0.05 0.06 -0.01 0.86 0.1402 1.1
aggressiveness 0.95 -0.26 0.00 0.82 0.1831 1.2
max_completed_air_distance -0.01 0.00 0.00 0.83 0.1733 1.1
avg_air_yards_to_sticks 0.32 0.44 0.02 0.89 0.1107 4.0
passer_rating -0.11 0.03 0.00 1.05 -0.0520 2.0
expected_completion_percentage -0.74 -0.29 0.00 0.84 0.1554 1.4
completion_percentage_above_expectation 0.26 0.00 -0.02 0.99 0.0069 1.2
avg_air_distance 0.29 0.44 0.01 0.86 0.1356 3.6
max_air_distance -0.07 -0.08 0.02 0.76 0.2353 2.0
TC1 TC3 TC4 TC2 TC6 TC8 TC7 TC9 TC5
SS loadings 8.31 3.37 2.85 2.70 2.68 2.68 2.46 2.38 1.74
Proportion Var 0.25 0.10 0.09 0.08 0.08 0.08 0.07 0.07 0.05
Cumulative Var 0.25 0.35 0.44 0.52 0.60 0.68 0.76 0.83 0.88
Proportion Explained 0.28 0.12 0.10 0.09 0.09 0.09 0.08 0.08 0.06
Cumulative Proportion 0.28 0.40 0.50 0.59 0.68 0.77 0.86 0.94 1.00
With component correlations of
TC1 TC3 TC4 TC2 TC6 TC8 TC7 TC9 TC5
TC1 1.00 0.07 0.43 0.02 0.13 0.04 -0.02 -0.02 0.22
TC3 0.07 1.00 0.01 0.24 0.13 0.37 -0.13 -0.05 0.01
TC4 0.43 0.01 1.00 0.04 -0.04 -0.09 -0.03 0.00 0.11
TC2 0.02 0.24 0.04 1.00 -0.10 0.17 -0.25 -0.19 -0.01
TC6 0.13 0.13 -0.04 -0.10 1.00 0.24 0.23 0.28 0.05
TC8 0.04 0.37 -0.09 0.17 0.24 1.00 -0.02 0.02 0.04
TC7 -0.02 -0.13 -0.03 -0.25 0.23 -0.02 1.00 0.31 0.00
TC9 -0.02 -0.05 0.00 -0.19 0.28 0.02 0.31 1.00 0.02
TC5 0.22 0.01 0.11 -0.01 0.05 0.04 0.00 0.02 1.00
Mean item complexity = 1.7
Test of the hypothesis that 9 components are sufficient.
The root mean square of the residuals (RMSR) is 0.06
with the empirical chi square 1584360 with prob < 0
Fit based upon off diagonal values = 0.97
Based on the component solutions, the three-component solution maps onto our three-factor solution from factor analysis. The three-component solution explains more than half of the variance in the variables. The fourth component in a four-component solution is not particularly interpretable, explains relatively little variance, and seems to be related to sacks: sacks suffered, sack yards lost, sack fumbles, and sack fumbles lost. However, two of the sack-related variables (sacks suffered and sack yards lost) load more strongly onto the first component, suggesting that these sack-related variables are better captured by another component. Moreover, sacks might be considered a scoring methodological component rather than a particular concept of interest. For these resources we prefer the three-component solution and choose to retain three components.
Here are the variables that had a standardized component loading of 0.4 or greater on each component:
Code
component1vars <- c(
"completions","attempts","passing_yards","passing_tds","passing_interceptions",
"sacks_suffered","sack_yards_lost","sack_fumbles","sack_fumbles_lost",
"passing_air_yards","passing_yards_after_catch","passing_first_downs",
#"passing_epa","passing_cpoe","passing_2pt_conversions","pacr","pass_40_yds",
"pass_inc","fumbles")#,"two_pts","pass_comp_pct",
#"avg_time_to_throw","avg_completed_air_yards","avg_intended_air_yards",
#"avg_air_yards_differential","aggressiveness","max_completed_air_distance",
#"avg_air_yards_to_sticks","passer_rating", #,"completion_percentage"
#"expected_completion_percentage","completion_percentage_above_expectation",
#"avg_air_distance","max_air_distance")
component2vars <- c(
#"completions","attempts","passing_yards","passing_tds","passing_interceptions",
#"sacks_suffered","sack_yards_lost","sack_fumbles","sack_fumbles_lost",
"passing_air_yards",#"passing_yards_after_catch","passing_first_downs",
"pacr",#"passing_epa","passing_cpoe","passing_2pt_conversions","pass_40_yds",
#"pass_inc","pass_comp_pct","fumbles","two_pts",
"avg_completed_air_yards","avg_intended_air_yards",#"avg_time_to_throw",
"aggressiveness","max_completed_air_distance",#"avg_air_yards_differential",
"avg_air_yards_to_sticks",#"passer_rating", #,"completion_percentage"
"expected_completion_percentage",#"completion_percentage_above_expectation",
"avg_air_distance","max_air_distance")
component3vars <- c(
"passing_tds",#"completions","attempts","passing_yards","passing_interceptions",
#"sacks_suffered","sack_yards_lost","sack_fumbles","sack_fumbles_lost",
#"passing_air_yards","passing_yards_after_catch","passing_first_downs",
"passing_epa","passing_cpoe","pass_40_yds",#"passing_2pt_conversions","pacr",
"pass_comp_pct",#"fumbles","two_pts","pass_inc",
"avg_air_yards_differential","max_completed_air_distance",
"avg_time_to_throw","avg_completed_air_yards","avg_intended_air_yards",
#"aggressiveness",
"passer_rating",#"avg_air_yards_to_sticks", #,"completion_percentage"
"completion_percentage_above_expectation")#,"expected_completion_percentage",
#"avg_air_distance","max_air_distance")
The variables that loaded most strongly onto component 1 appear to reflect Quarterback usage: completions, incompletions, passing attempts, passing yards, passing touchdowns, interceptions thrown, fumbles, times sacked, sack yards lost (“reversed”—i.e., negatively associated with the component), sack fumbles, sack fumbles lost, passing air yards (total horizontal distance the ball travels on all pass attempts), passing yards after the catch, and first downs gained by passing. Quarterbacks who tend to throw more tend to have higher levels on those variables. Thus, we label component 1 as “Usage”, which reflects total Quarterback involvement, regardless of efficiency or outcome.
The variables that loaded most strongly onto component 2 appear to reflect Quarterback aggressiveness: passing air yards, passing air conversion ratio (reversed; ratio of passing yards to passing air yards), average air yards on completed passes, average air yards on all attempted passes, aggressiveness (percentage of passing attempts thrown into tight windows, where there is a defender within one yard or less of the receiver at the time of the completion or incompletion), expected completion percentage (reversed; based on air distance, receiver separation, Quarterback/Wide Receiver movement, pass location, whether there was pressure on the Quarterback when throwing, the throw angle and trajectory, receiver and defender positioning at the catch point, and defensive coverage scheme), average amount of air yards ahead of or behind the first down marker on passing attempts, average air distance (the true three-dimensional distance the ball travels in the air), maximum air distance, and maximum air distance on completed passes. Quarterbacks who throw the ball farther and into tighter windows tend to have higher values on those variables. Thus, we label component 2 as “Aggressiveness”, which reflects throwing longer, more difficult passes with a tight window.
The variables that loaded most strongly onto component 3 appear to reflect Quarterback performance: passing touchdowns, passing expected points added, passing completion percentage above expectation, passes completed of 40 yards or more, pass completion percentage, air yards differential (intended air yards \(-\) completed air yards; attempting deeper passes than he on average completes), maximum completed air distance, average time to throw, average completed air yards, average intended air yards, and passer rating. Quarterbacks who perform better tend to have higher values on those variables. Thus, we label component 3 as “Performance”.
Below are component scores from the PCA for the first six players:
TC1 TC2 TC3
[1,] 5.623830 0.1212148 -1.3399587
[2,] 4.009919 1.9548287 0.8406664
[3,] 8.418577 -0.3938434 -2.8524482
[4,] 4.189431 1.0714847 1.7692927
[5,] 5.804268 1.3782986 -0.5757511
[6,] 6.157528 -0.3560422 0.3905272
Here are the players and weeks that showed the highest levels of Quarterback “Usage”:
Code
Here are the players and weeks that showed the lowest levels of Quarterback “Usage”:
Code
Here are the players and weeks that showed the highest levels of Quarterback “Aggressiveness”:
Code
Here are the players and weeks that showed the lowest levels of Quarterback “Aggressiveness”:
Code
Here are the players and weeks that showed the highest levels of Quarterback “Performance”:
Code
Here are the players and weeks that showed the lowest levels of Quarterback “Performance”:
23.6 Conclusion
Principal component analysis (PCA) is a technique used for data reduction—reducing a large set of a variables down to a smaller set of components that capture most of the variance in the larger set. There are many decisions to make in factor analysis. These decisions can have important impacts on the resulting solution. Thus, it can be helpful for theory and interpretability to help guide decision-making when conducting factor analysis. There are several differences between factor analysis and PCA. Unlike factor analysis, which estimates the latent factors as the common variance among the variables that load onto that factor and discards the remaining variance as “error”, PCA uses all variance of variables and assumes variables have no error. Thus, PCA does not account for measurement error. Using PCA, we were able to identify three PCA components that accounted for considerable variance in the variables we examined, pertaining to Quarterbacks: 1) usage; 2) aggressiveness; 3) performance. We were then able to determine which players were highest and lowest on each of these components.
23.7 Session Info
R version 4.5.1 (2025-06-13)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 24.04.2 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
locale:
[1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
[4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
[7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
time zone: UTC
tzcode source: system (glibc)
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] lubridate_1.9.4 forcats_1.0.0 stringr_1.5.1 dplyr_1.1.4
[5] purrr_1.1.0 readr_2.1.5 tidyr_1.3.1 tibble_3.3.0
[9] ggplot2_3.5.2 tidyverse_2.0.0 nFactors_2.4.1.2 psych_2.5.6
loaded via a namespace (and not attached):
[1] gtable_0.3.6 jsonlite_2.0.0 compiler_4.5.1
[4] tidyselect_1.2.1 parallel_4.5.1 scales_1.4.0
[7] yaml_2.3.10 fastmap_1.2.0 lattice_0.22-7
[10] R6_2.6.1 generics_0.1.4 knitr_1.50
[13] htmlwidgets_1.6.4 pillar_1.11.0 RColorBrewer_1.1-3
[16] tzdb_0.5.0 rlang_1.1.6 stringi_1.8.7
[19] xfun_0.52 timechange_0.3.0 cli_3.6.5
[22] withr_3.0.2 magrittr_2.0.3 digest_0.6.37
[25] grid_4.5.1 hms_1.1.3 lifecycle_1.0.4
[28] nlme_3.1-168 vctrs_0.6.5 mnormt_2.1.1
[31] evaluate_1.0.4 glue_1.8.0 GPArotation_2025.3-1
[34] farver_2.1.2 rmarkdown_2.29 tools_4.5.1
[37] pkgconfig_2.0.3 htmltools_0.5.8.1