I want your feedback to make the book better for you and other readers. If you find typos, errors, or places where the text may be improved, please let me know. The best ways to provide feedback are by GitHub or hypothes.is annotations.
Opening an issue or submitting a pull request on GitHub: https://github.com/isaactpetersen/Fantasy-Football-Analytics-Textbook
Adding an annotation using hypothes.is.
To add an annotation, select some text and then click the
symbol on the pop-up menu.
To see the annotations of others, click the
symbol in the upper right-hand corner of the page.
8 Correlation Analysis
8.1 Getting Started
8.1.1 Load Packages
8.2 Overview of Correlation
Correlation is a metric of the association between variables. Covariance is the association between variables and in an unstandardized metric that differs for variables with different scales. By contrast, correlation is in a standarized metric that does not differ for variables with different scales. When examining the association between variables that are interval or ratio levels of measurement, Pearson correlation is used. When examining the association between variables that are ordinal in level of measurement, Spearman correlation is used. Pearson correlation is an index of the linear association between variables. If a nonlinear association is present, other indices like xi [\(\xi\); Chatterjee (2021)] and distance correlation coefficients are better suited to detect the association.
8.3 The Correlation Coefficient (\(r\))
The formula for the correlation coefficient is in Equation 7.21.
The correlation coefficient ranges from −1.0 to +1.0. The correlation coefficient (\(r\)) tells you two things: (1) the direction (sign) of the association (positive or negative) and (2) the magnitude of the association. If the correlation coefficient is positive, the association is positive. If the correlation coefficient is negative, the association is negative. If the association is positive, as X
increases, Y
increases (or conversely, as X
decreases, Y
decreases). If the association is negative, as X
increases, Y
decreases (or conversely, as X
decreases, Y
increases). The smaller the absolute value of the correlation coefficient (i.e., the closer the \(r\) value is to zero), the weaker the association and the flatter the slope of the best-fit line in a scatterplot. The larger the absolute value of the correlation coefficient (i.e., the closer the absolute value of the \(r\) value is to one), the stronger the association and the steeper the slope of the best-fit line in a scatterplot. See Figure 8.1 for a range of different correlation coefficients and what some example data may look like for each direction and strength of association.
Code
set.seed(52242)
correlations <- data.frame(criterion = rnorm(1000))
correlations$v1 <- complement(correlations$criterion, -1)
correlations$v2 <- complement(correlations$criterion, -.9)
correlations$v3 <- complement(correlations$criterion, -.8)
correlations$v4 <- complement(correlations$criterion, -.7)
correlations$v5 <- complement(correlations$criterion, -.6)
correlations$v6 <- complement(correlations$criterion, -.5)
correlations$v7 <- complement(correlations$criterion, -.4)
correlations$v8 <- complement(correlations$criterion, -.3)
correlations$v9 <- complement(correlations$criterion, -.2)
correlations$v10 <-complement(correlations$criterion, -.1)
correlations$v11 <-complement(correlations$criterion, 0)
correlations$v12 <-complement(correlations$criterion, .1)
correlations$v13 <-complement(correlations$criterion, .2)
correlations$v14 <-complement(correlations$criterion, .3)
correlations$v15 <-complement(correlations$criterion, .4)
correlations$v16 <-complement(correlations$criterion, .5)
correlations$v17 <-complement(correlations$criterion, .6)
correlations$v18 <-complement(correlations$criterion, .7)
correlations$v19 <-complement(correlations$criterion, .8)
correlations$v20 <-complement(correlations$criterion, .9)
correlations$v21 <-complement(correlations$criterion, 1)
par(mfrow = c(7,3), mar = c(1, 0, 1, 0))
# -1.0
plot(correlations$criterion, correlations$v1, xaxt = "n", yaxt = "n", xlab = "" , ylab = "",
main = substitute(paste(italic(r), " = ", x, sep = ""), list(x = round(cor.test(x = correlations$criterion, y = correlations$v1)$estimate, 2))))
abline(lm(v1 ~ criterion, data = correlations), col = "black")
# -.9
plot(correlations$criterion, correlations$v2, xaxt = "n", yaxt = "n", xlab = "" , ylab = "",
main = substitute(paste(italic(r), " = ", x, sep = ""), list(x = round(cor.test(x = correlations$criterion, y = correlations$v2)$estimate, 2))))
abline(lm(v2 ~ criterion, data = correlations), col = "black")
# -.8
plot(correlations$criterion, correlations$v3, xaxt = "n", yaxt = "n", xlab = "" , ylab = "",
main = substitute(paste(italic(r), " = ", x, sep = ""), list(x = round(cor.test(x = correlations$criterion, y = correlations$v3)$estimate, 2))))
abline(lm(v3 ~ criterion, data = correlations), col = "black")
# -.7
plot(correlations$criterion, correlations$v4, xaxt = "n", yaxt = "n", xlab = "" , ylab = "",
main = substitute(paste(italic(r), " = ", x, sep = ""), list(x = round(cor.test(x = correlations$criterion, y = correlations$v4)$estimate, 2))))
abline(lm(v4 ~ criterion, data = correlations), col = "black")
# -.6
plot(correlations$criterion, correlations$v5, xaxt = "n", yaxt = "n", xlab = "" , ylab = "",
main = substitute(paste(italic(r), " = ", x, sep = ""), list(x = round(cor.test(x = correlations$criterion, y = correlations$v5)$estimate, 2))))
abline(lm(v5 ~ criterion, data = correlations), col = "black")
# -.5
plot(correlations$criterion, correlations$v6, xaxt = "n", yaxt = "n", xlab = "" , ylab = "",
main = substitute(paste(italic(r), " = ", x, sep = ""), list(x = round(cor.test(x = correlations$criterion, y = correlations$v6)$estimate, 2))))
abline(lm(v6 ~ criterion, data = correlations), col = "black")
# -.4
plot(correlations$criterion, correlations$v7, xaxt = "n", yaxt = "n", xlab = "" , ylab = "",
main = substitute(paste(italic(r), " = ", x, sep = ""), list(x = round(cor.test(x = correlations$criterion, y = correlations$v7)$estimate, 2))))
abline(lm(v7 ~ criterion, data = correlations), col = "black")
# -.3
plot(correlations$criterion, correlations$v8, xaxt = "n", yaxt = "n", xlab = "" , ylab = "",
main = substitute(paste(italic(r), " = ", x, sep = ""), list(x = round(cor.test(x = correlations$criterion, y = correlations$v8)$estimate, 2))))
abline(lm(v8 ~ criterion, data = correlations), col = "black")
# -.2
plot(correlations$criterion, correlations$v9, xaxt = "n", yaxt = "n", xlab = "" , ylab = "",
main = substitute(paste(italic(r), " = ", x, sep = ""), list(x = round(cor.test(x = correlations$criterion, y = correlations$v9)$estimate, 2))))
abline(lm(v9 ~ criterion, data = correlations), col = "black")
# -.1
plot(correlations$criterion, correlations$v10, xaxt = "n", yaxt = "n", xlab = "" , ylab = "",
main = substitute(paste(italic(r), " = ", x, sep = ""), list(x = round(cor.test(x = correlations$criterion, y = correlations$v10)$estimate, 2))))
abline(lm(v10 ~ criterion, data = correlations), col = "black")
# 0.0
plot(correlations$criterion, correlations$v11, xaxt = "n", yaxt = "n", xlab = "" , ylab = "",
main = substitute(paste(italic(r), " = ", x, sep = ""), list(x = round(cor.test(x = correlations$criterion, y = correlations$v11)$estimate, 2))))
abline(lm(v11 ~ criterion, data = correlations), col = "black")
# 0.1
plot(correlations$criterion, correlations$v12, xaxt = "n", yaxt = "n", xlab = "" , ylab = "",
main = substitute(paste(italic(r), " = ", x, sep = ""), list(x = round(cor.test(x = correlations$criterion, y = correlations$v12)$estimate, 2))))
abline(lm(v12 ~ criterion, data = correlations), col = "black")
# 0.2
plot(correlations$criterion, correlations$v13, xaxt = "n", yaxt = "n", xlab = "" , ylab = "",
main = substitute(paste(italic(r), " = ", x, sep = ""), list(x = round(cor.test(x = correlations$criterion, y = correlations$v13)$estimate, 2))))
abline(lm(v13 ~ criterion, data = correlations), col = "black")
# 0.3
plot(correlations$criterion, correlations$v14, xaxt = "n", yaxt = "n", xlab = "" , ylab = "",
main = substitute(paste(italic(r), " = ", x, sep = ""), list(x = round(cor.test(x = correlations$criterion, y = correlations$v14)$estimate, 2))))
abline(lm(v14 ~ criterion, data = correlations), col = "black")
# 0.4
plot(correlations$criterion, correlations$v15, xaxt = "n", yaxt = "n", xlab = "" , ylab = "",
main = substitute(paste(italic(r), " = ", x, sep = ""), list(x = round(cor.test(x = correlations$criterion, y = correlations$v15)$estimate, 2))))
abline(lm(v15 ~ criterion, data = correlations), col = "black")
# 0.5
plot(correlations$criterion, correlations$v16, xaxt = "n", yaxt = "n", xlab = "" , ylab = "",
main = substitute(paste(italic(r), " = ", x, sep = ""), list(x = round(cor.test(x = correlations$criterion, y = correlations$v16)$estimate, 2))))
abline(lm(v16 ~ criterion, data = correlations), col = "black")
# 0.6
plot(correlations$criterion, correlations$v17, xaxt = "n", yaxt = "n", xlab = "" , ylab = "",
main = substitute(paste(italic(r), " = ", x, sep = ""), list(x = round(cor.test(x = correlations$criterion, y = correlations$v17)$estimate, 2))))
abline(lm(v17 ~ criterion, data = correlations), col = "black")
# 0.7
plot(correlations$criterion, correlations$v18, xaxt = "n", yaxt = "n", xlab = "" , ylab = "",
main = substitute(paste(italic(r), " = ", x, sep = ""), list(x = round(cor.test(x = correlations$criterion, y = correlations$v18)$estimate, 2))))
abline(lm(v18 ~ criterion, data = correlations), col = "black")
# 0.8
plot(correlations$criterion, correlations$v19, xaxt = "n", yaxt = "n", xlab = "" , ylab = "",
main = substitute(paste(italic(r), " = ", x, sep = ""), list(x = round(cor.test(x = correlations$criterion, y = correlations$v19)$estimate, 2))))
abline(lm(v19 ~ criterion, data = correlations), col = "black")
# 0.9
plot(correlations$criterion, correlations$v20, xaxt = "n", yaxt = "n", xlab = "" , ylab = "",
main = substitute(paste(italic(r), " = ", x, sep = ""), list(x = round(cor.test(x = correlations$criterion, y = correlations$v20)$estimate, 2))))
abline(lm(v20 ~ criterion, data = correlations), col = "black")
# 1.0
plot(correlations$criterion, correlations$v21, xaxt = "n", yaxt = "n", xlab = "" , ylab = "",
main = substitute(paste(italic(r), " = ", x, sep = ""), list(x = round(cor.test(x = correlations$criterion, y = correlations$v21)$estimate, 2))))
abline(lm(v21 ~ criterion, data = correlations), col = "black")
invisible(dev.off()) #par(mfrow = c(1,1))
![](correlation_files/figure-html/fig-rangeOfCorrelations-1.png)
See Figure 8.2 for the interpretation of the magnitude and direction (sign) of various correlation coefficients.
Code
library("patchwork")
set.seed(52242)
correlations2 <- data.frame(criterion = rnorm(15))
correlations2$v1 <- complement(correlations2$criterion, -1)
correlations2$v2 <- complement(correlations2$criterion, -.9)
correlations2$v3 <- complement(correlations2$criterion, -.8)
correlations2$v4 <- complement(correlations2$criterion, -.7)
correlations2$v5 <- complement(correlations2$criterion, -.6)
correlations2$v6 <- complement(correlations2$criterion, -.5)
correlations2$v7 <- complement(correlations2$criterion, -.4)
correlations2$v8 <- complement(correlations2$criterion, -.3)
correlations2$v9 <- complement(correlations2$criterion, -.2)
correlations2$v10 <-complement(correlations2$criterion, -.1)
correlations2$v11 <-complement(correlations2$criterion, 0)
correlations2$v12 <-complement(correlations2$criterion, .1)
correlations2$v13 <-complement(correlations2$criterion, .2)
correlations2$v14 <-complement(correlations2$criterion, .3)
correlations2$v15 <-complement(correlations2$criterion, .4)
correlations2$v16 <-complement(correlations2$criterion, .5)
correlations2$v17 <-complement(correlations2$criterion, .6)
correlations2$v18 <-complement(correlations2$criterion, .7)
correlations2$v19 <-complement(correlations2$criterion, .8)
correlations2$v20 <-complement(correlations2$criterion, .9)
correlations2$v21 <-complement(correlations2$criterion, 1)
# -1.0
p1 <- ggplot(
data = correlations2,
mapping = aes(
x = criterion,
y = v1
)
) +
geom_point() +
geom_smooth(
method = "lm",
se = FALSE) +
labs(
title = "Perfect Negative Association",
subtitle = expression(paste(italic("r"), " = ", "−1.0"))
) +
theme_classic(
base_size = 12) +
theme(
axis.title.x = element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
axis.title.y = element_blank(),
axis.text.y = element_blank(),
axis.ticks.y = element_blank())
# -0.9
p2 <- ggplot(
data = correlations2,
mapping = aes(
x = criterion,
y = v2
)
) +
geom_point() +
geom_smooth(
method = "lm",
se = FALSE) +
labs(
title = "Strong Negative Association",
subtitle = expression(paste(italic("r"), " = ", "−.9"))
) +
theme_classic(
base_size = 12) +
theme(
axis.title.x = element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
axis.title.y = element_blank(),
axis.text.y = element_blank(),
axis.ticks.y = element_blank())
# -0.5
p3 <- ggplot(
data = correlations2,
mapping = aes(
x = criterion,
y = v6
)
) +
geom_point() +
geom_smooth(
method = "lm",
se = FALSE) +
labs(
title = "Moderate Negative Association",
subtitle = expression(paste(italic("r"), " = ", "−.5"))
) +
theme_classic(
base_size = 12) +
theme(
axis.title.x = element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
axis.title.y = element_blank(),
axis.text.y = element_blank(),
axis.ticks.y = element_blank())
# -0.2
p4 <- ggplot(
data = correlations2,
mapping = aes(
x = criterion,
y = v9
)
) +
geom_point() +
geom_smooth(
method = "lm",
se = FALSE) +
labs(
title = "Weak Negative Association",
subtitle = expression(paste(italic("r"), " = ", "−.2"))
) +
theme_classic(
base_size = 12) +
theme(
axis.title.x = element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
axis.title.y = element_blank(),
axis.text.y = element_blank(),
axis.ticks.y = element_blank())
# 0.0
p5 <- ggplot(
data = correlations2,
mapping = aes(
x = criterion,
y = v11
)
) +
geom_point() +
geom_smooth(
method = "lm",
se = FALSE) +
labs(
title = "No Association",
subtitle = expression(paste(italic("r"), " = ", ".0"))
) +
theme_classic(
base_size = 12) +
theme(
axis.title.x = element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
axis.title.y = element_blank(),
axis.text.y = element_blank(),
axis.ticks.y = element_blank())
# 0.2
p6 <- ggplot(
data = correlations2,
mapping = aes(
x = criterion,
y = v13
)
) +
geom_point() +
geom_smooth(
method = "lm",
se = FALSE) +
labs(
title = "Weak Positive Association",
subtitle = expression(paste(italic("r"), " = ", ".2"))
) +
theme_classic(
base_size = 12) +
theme(
axis.title.x = element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
axis.title.y = element_blank(),
axis.text.y = element_blank(),
axis.ticks.y = element_blank())
# 0.5
p7 <- ggplot(
data = correlations2,
mapping = aes(
x = criterion,
y = v16
)
) +
geom_point() +
geom_smooth(
method = "lm",
se = FALSE) +
labs(
title = "Moderate Positive Association",
subtitle = expression(paste(italic("r"), " = ", ".5"))
) +
theme_classic(
base_size = 12) +
theme(
axis.title.x = element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
axis.title.y = element_blank(),
axis.text.y = element_blank(),
axis.ticks.y = element_blank())
# 0.9
p8 <- ggplot(
data = correlations2,
mapping = aes(
x = criterion,
y = v20
)
) +
geom_point() +
geom_smooth(
method = "lm",
se = FALSE) +
labs(
title = "Strong Positive Association",
subtitle = expression(paste(italic("r"), " = ", ".9"))
) +
theme_classic(
base_size = 12) +
theme(
axis.title.x = element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
axis.title.y = element_blank(),
axis.text.y = element_blank(),
axis.ticks.y = element_blank())
# 1.0
p9 <- ggplot(
data = correlations2,
mapping = aes(
x = criterion,
y = v21
)
) +
geom_point() +
geom_smooth(
method = "lm",
se = FALSE) +
labs(
title = "Perfect Positive Association",
subtitle = expression(paste(italic("r"), " = ", "1.0"))
) +
theme_classic(
base_size = 12) +
theme(
axis.title.x = element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
axis.title.y = element_blank(),
axis.text.y = element_blank(),
axis.ticks.y = element_blank())
p1 + p2 + p3 + p4 + p5 + p6 + p7 + p8 + p9 +
plot_layout(
ncol = 3,
heights = 1,
widths = 1)
![](correlation_files/figure-html/fig-interepretingCorrelationCoefficients-1.png)
Interactive visualizations by Kristoffer Magnusson on p-values and null-hypothesis significance testing are below:
8.4 Examples
8.4.1 Covariance
8.4.2 Pearson Correlation
8.4.3 Spearman Correlation
8.4.4 Nonlinear Correlation
8.4.5 Correlation Matrix
8.4.6 Correlogram
8.5 Session Info
R version 4.4.1 (2024-06-14)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 22.04.4 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
locale:
[1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
[4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
[7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
time zone: UTC
tzcode source: system (glibc)
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] patchwork_1.2.0 lubridate_1.9.3 forcats_1.0.0 stringr_1.5.1
[5] dplyr_1.1.4 purrr_1.0.2 readr_2.1.5 tidyr_1.3.1
[9] tibble_3.2.1 ggplot2_3.5.1 tidyverse_2.0.0 XICOR_0.4.1
[13] petersenlab_1.0.2
loaded via a namespace (and not attached):
[1] tidyselect_1.2.1 psych_2.4.6.26 viridisLite_0.4.2 farver_2.1.2
[5] fastmap_1.2.0 digest_0.6.36 rpart_4.1.23 timechange_0.3.0
[9] lifecycle_1.0.4 cluster_2.1.6 magrittr_2.0.3 compiler_4.4.1
[13] rlang_1.1.4 Hmisc_5.1-3 tools_4.4.1 utf8_1.2.4
[17] yaml_2.3.8 data.table_1.15.4 knitr_1.47 labeling_0.4.3
[21] htmlwidgets_1.6.4 mnormt_2.1.1 plyr_1.8.9 RColorBrewer_1.1-3
[25] foreign_0.8-86 withr_3.0.0 R.oo_1.26.0 nnet_7.3-19
[29] grid_4.4.1 stats4_4.4.1 fansi_1.0.6 lavaan_0.6-18
[33] xtable_1.8-4 colorspace_2.1-0 scales_1.3.0 cli_3.6.3
[37] mvtnorm_1.2-5 rmarkdown_2.27 generics_0.1.3 rstudioapi_0.16.0
[41] reshape2_1.4.4 tzdb_0.4.0 DBI_1.2.3 rtf_0.4-14.1
[45] splines_4.4.1 parallel_4.4.1 base64enc_0.1-3 mitools_2.4
[49] vctrs_0.6.5 Matrix_1.7-0 jsonlite_1.8.8 hms_1.1.3
[53] Formula_1.2-5 htmlTable_2.4.2 glue_1.7.0 stringi_1.8.4
[57] gtable_0.3.5 quadprog_1.5-8 munsell_0.5.1 pillar_1.9.0
[61] psychTools_2.4.3 htmltools_0.5.8.1 R6_2.5.1 mix_1.0-12
[65] evaluate_0.24.0 pbivnorm_0.6.0 lattice_0.22-6 R.methodsS3_1.8.2
[69] backports_1.5.0 Rcpp_1.0.12 gridExtra_2.3 nlme_3.1-164
[73] checkmate_2.3.1 mgcv_1.9-1 xfun_0.45 pkgconfig_2.0.3