I want your feedback to make the book better for you and other readers. If you find typos, errors, or places where the text may be improved, please let me know. The best ways to provide feedback are by GitHub or hypothes.is annotations.
Opening an issue or submitting a pull request on GitHub: https://github.com/isaactpetersen/Fantasy-Football-Analytics-Textbook
Adding an annotation using hypothes.is. To add an annotation, select some text and then click the symbol on the pop-up menu. To see the annotations of others, click the symbol in the upper right-hand corner of the page.
21 Time Series Analysis
21.1 Getting Started
21.1.1 Load Packages
21.1.2 Load Data
21.2 Overview of Time Series Analysis
Time series analysis is useful when trying to generate forecasts from longitudinal data. That is, time series analysis seeks to evaluate change over time to predict future values.
There many different types of time series analyses. For simplicity, in this chapter, we use autoregressive integrated moving average (ARIMA) models to demonstrate one approach to time series analysis. We also leverage Bayesian mixed models to generate forecasts of future performance and plots of individuals model-implied performance by age and position.
21.3 Autoregressive Integrated Moving Average (ARIMA) Models
Hyndman & Athanasopoulos (2021) provide a nice overview of ARIMA models. As noted by Hyndman & Athanasopoulos (2021), ARIMA models aim to describe how a variable is correlated with itself over time (autocorrelation)—i.e., how earlier levels of a variable are correlated with later levels of the same variable. ARIMA models perform best when there is a clear pattern where later values are influenced by earlier values. ARIMA models incorporate autoregression effects, moving average effects, and differencing.
ARIMA models can have various numbers of terms and model complexity. They are specified in the following form: \(\text{ARIMA}(p,d,q)\), where:
- \(p =\) the number of autoregressive terms
- \(d =\) the number of differences between consecutive scores (to make the time series stationary by reducing trends and seasonality)
- \(q =\) the number of moving average terms
ARIMA models assume that the data are stationary (i.e., there are no long-term trends), are non-seasonal (i.e., there is no consistency of the timing of the peaks or troughs in the line), and that earlier values influence later values. This may not strongly be the case in fantasy football, so ARIMA models may not be particularly useful in forecasting fantasy football performance. Other approaches, such as exponential smoothing, may be useful for data that show longer-term trends and seasonality (Hyndman & Athanasopoulos, 2021). Nevertheless, ARIMA models are widely used in forecasting financial markets and economic indicators. Thus, it is a useful technique to learn.
Adapted from: https://rc2e.com/timeseriesanalysis (archived at https://perma.cc/U5P6-2VWC).
21.3.1 Create the Time Series Objects
Code
weeklyFantasyPoints_tomBrady <- player_stats_weekly %>%
filter(
player_id == "00-0019596" | player_display_name == "Tom Brady")
weeklyFantasyPoints_peytonManning <- player_stats_weekly %>%
filter(
player_id == "00-0010346" | player_display_name == "Peyton Manning")
ts_tomBrady <- xts::xts(
x = weeklyFantasyPoints_tomBrady["fantasyPoints"],
order.by = weeklyFantasyPoints_tomBrady$gameday)
ts_peytonManning <- xts::xts(
x = weeklyFantasyPoints_peytonManning["fantasyPoints"],
order.by = weeklyFantasyPoints_peytonManning$gameday)
ts_tomBrady
fantasyPoints
2000-11-23 0.24
2001-09-23 2.74
2001-09-30 6.92
2001-10-07 4.34
2001-10-14 22.56
2001-10-21 19.88
2001-10-28 6.02
2001-11-04 22.00
2001-11-11 7.18
2001-11-18 7.00
...
2022-10-27 17.10
2022-11-06 15.20
2022-11-13 16.02
2022-11-27 18.04
2022-12-05 16.14
2022-12-11 8.12
2022-12-18 18.58
2022-12-25 9.34
2023-01-01 37.68
2023-01-08 7.36
fantasyPoints
1999-09-12 13.06
1999-09-19 15.22
1999-09-26 28.56
1999-10-10 19.66
1999-10-17 9.10
1999-10-24 16.86
1999-10-31 18.52
1999-11-07 19.60
1999-11-14 14.18
1999-11-21 22.80
...
2015-09-13 3.90
2015-09-17 19.24
2015-09-27 17.86
2015-10-04 6.32
2015-10-11 4.64
2015-10-18 6.60
2015-11-01 10.60
2015-11-08 13.24
2015-11-15 -10.60
2016-01-03 2.56
21.3.2 Plot the Time Series
21.3.3 Rolling Mean/Median
fantasyPoints
2001-09-30 7.360
2001-10-07 11.288
2001-10-14 11.944
2001-10-21 14.960
2001-10-28 15.528
2001-11-04 12.416
2001-11-11 13.984
2001-11-18 14.484
2001-11-25 10.568
2001-12-02 10.888
...
2022-10-16 18.332
2022-10-23 15.892
2022-10-27 15.148
2022-11-06 16.012
2022-11-13 16.500
2022-11-27 14.704
2022-12-05 15.380
2022-12-11 14.044
2022-12-18 17.972
2022-12-25 16.216
fantasyPoints
2001-09-30 4.34
2001-10-07 6.92
2001-10-14 6.92
2001-10-21 19.88
2001-10-28 19.88
2001-11-04 7.18
2001-11-11 7.18
2001-11-18 8.52
2001-11-25 7.18
2001-12-02 8.52
...
2022-10-16 17.10
2022-10-23 15.20
2022-10-27 15.20
2022-11-06 16.02
2022-11-13 16.14
2022-11-27 16.02
2022-12-05 16.14
2022-12-11 16.14
2022-12-18 16.14
2022-12-25 9.34
21.3.4 Autocorrelation
The autocorrelation function (ACF) plot depicts the autocorrelation of scores as a function of the length of the lag. Significant autocorrelation is detected when the autocorrelation exceeds the dashed blue lines, as is depicted in Figure 21.3.
21.3.5 Fit an Autoregressive Integrated Moving Average Model
Series: ts_tomBrady
ARIMA(1,1,2)
Coefficients:
ar1 ma1 ma2
0.8007 -1.7041 0.7111
s.e. 0.1381 0.1527 0.1479
sigma^2 = 67.3: log likelihood = -1176.57
AIC=2361.13 AICc=2361.25 BIC=2376.38
Series: ts_peytonManning
ARIMA(1,0,1) with non-zero mean
Coefficients:
ar1 ma1 mean
0.8795 -0.7465 17.2154
s.e. 0.0861 0.1152 1.0481
sigma^2 = 62.99: log likelihood = -871.18
AIC=1750.36 AICc=1750.53 BIC=1764.45
Call:
arima(x = ts_tomBrady, order = c(5, 1, 4))
Coefficients:
ar1 ar2 ar3 ar4 ar5 ma1 ma2 ma3
0.6376 -0.0319 -0.4470 -0.1250 0.2078 -1.5598 0.6323 0.5117
s.e. 0.2066 0.2223 0.1676 0.0706 0.0667 0.2078 0.3149 0.3352
ma4
-0.5574
s.e. 0.1461
sigma^2 estimated as 63.92: log likelihood = -1169.59, aic = 2359.17
Training set error measures:
ME RMSE MAE MPE MAPE MASE
Training set 0.6798849 7.983157 6.455929 -9.382029 62.84247 0.7287981
ACF1
Training set 0.001251287
2.5 % 97.5 %
ar1 0.2326528 1.04261876
ar2 -0.4676647 0.40383415
ar3 -0.7755181 -0.11841000
ar4 -0.2633125 0.01336914
ar5 0.0771301 0.33839489
ma1 -1.9670637 -1.15248737
ma2 0.0150989 1.24940961
ma3 -0.1454023 1.16874068
ma4 -0.8438263 -0.27099009
Ljung-Box test
data: Residuals from ARIMA(5,1,4)
Q* = 3.1415, df = 3, p-value = 0.3703
Model df: 9. Total lags used: 12
Code
Call:
arima(x = ts_tomBrady, order = c(5, 1, 4), fixed = c(NA, NA, 0, NA, NA, NA,
NA, NA, NA))
Coefficients:
ar1 ar2 ar3 ar4 ar5 ma1 ma2 ma3 ma4
-0.8409 -0.7345 0 0.0804 0.1507 -0.0637 -0.0167 -0.6063 -0.2275
s.e. 0.3120 0.2164 0 0.1024 0.0840 0.3197 0.2207 0.1873 0.1068
sigma^2 estimated as 63.88: log likelihood = -1169.47, aic = 2356.93
Training set error measures:
ME RMSE MAE MPE MAPE MASE
Training set 0.6574145 7.980413 6.385955 -9.143247 62.73514 0.7208988
ACF1
Training set -0.007136381
2.5 % 97.5 %
ar1 -1.45240974 -0.22948037
ar2 -1.15869850 -0.31030272
ar3 NA NA
ar4 -0.12027200 0.28102794
ar5 -0.01390496 0.31531240
ma1 -0.69025865 0.56278078
ma2 -0.44934933 0.41594092
ma3 -0.97335654 -0.23917503
ma4 -0.43672063 -0.01826288
Ljung-Box test
data: Residuals from ARIMA(5,1,4)
Q* = 3.1606, df = 3, p-value = 0.3675
Model df: 9. Total lags used: 12
Code
Call:
arima(x = ts_peytonManning, order = c(1, 0, 1))
Coefficients:
ar1 ma1 intercept
0.8795 -0.7465 17.2154
s.e. 0.0861 0.1152 1.0481
sigma^2 estimated as 62.24: log likelihood = -871.18, aic = 1750.36
Training set error measures:
ME RMSE MAE MPE MAPE MASE
Training set 0.004669414 7.888962 6.28244 -89.05348 136.3438 0.7615486
ACF1
Training set 0.02492534
2.5 % 97.5 %
ar1 0.7106760 1.0482992
ma1 -0.9722609 -0.5206973
intercept 15.1612132 19.2695648
Ljung-Box test
data: Residuals from ARIMA(1,0,1) with non-zero mean
Q* = 8.4306, df = 8, p-value = 0.3926
Model df: 2. Total lags used: 10
21.3.6 Generate the Model Forecasts
Code
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
336 16.79157 6.545441 27.03770 1.121467 32.46168
337 21.31309 11.035944 31.59023 5.595554 37.03062
338 13.69041 3.371785 24.00903 -2.090564 29.47138
339 21.48471 10.979805 31.98961 5.418846 37.55057
340 17.19892 6.693126 27.70471 1.131696 33.26614
341 19.01892 8.370031 29.66780 2.732851 35.30498
342 18.72444 8.060966 29.38792 2.416061 35.03283
343 17.83641 7.157114 28.51571 1.503835 34.16899
344 18.62106 7.938214 29.30391 2.283057 34.95907
345 18.16347 7.480427 28.84651 1.825168 34.50177
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
251 10.55320 0.4430843 20.66331 -4.9088854 26.01528
252 11.35607 1.1569232 21.55522 -4.2421805 26.95433
253 12.06219 1.7947024 22.32968 -3.6405790 27.76497
254 12.68322 2.3631750 23.00326 -3.0999254 28.46636
255 13.22940 2.8688924 23.58991 -2.6156293 29.07443
256 13.70976 3.3180615 24.10146 -2.1829722 29.60250
257 14.13223 3.7164703 24.54800 -1.7973016 30.06177
258 14.50379 4.0694543 24.93813 -1.4541504 30.46173
259 14.83057 4.3818907 25.27926 -1.1493076 30.81045
260 15.11797 4.6582086 25.57774 -0.8788563 31.11480
21.3.7 Plot the Model Forecasts
Code
21.4 Bayesian Mixed Models
The Bayesian longitudinal mixed models were estimated in Section 12.3.5.
21.4.1 Prepare New Data Object
Code
player_stats_seasonal_offense_subset <- player_stats_seasonal %>%
dplyr::filter(position_group %in% c("QB","RB","WR","TE") | position %in% c("K"))
player_stats_seasonal_offense_subset$position[which(player_stats_seasonal_offense_subset$position == "HB")] <- "RB"
player_stats_seasonal_offense_subset$player_idFactor <- factor(player_stats_seasonal_offense_subset$player_id)
player_stats_seasonal_offense_subset$positionFactor <- factor(player_stats_seasonal_offense_subset$position)
Code
player_stats_seasonal_offense_subsetCC <- player_stats_seasonal_offense_subset %>%
filter(
!is.na(player_idFactor),
!is.na(fantasyPoints),
!is.na(positionFactor),
!is.na(ageCentered20),
!is.na(ageCentered20Quadratic),
!is.na(years_of_experience))
player_stats_seasonal_offense_subsetCC <- player_stats_seasonal_offense_subsetCC %>%
filter(player_id %in% bayesianMixedModelFit$data$player_idFactor) %>%
mutate(positionFactor = droplevels(positionFactor))
player_stats_seasonal_offense_subsetCC <- player_stats_seasonal_offense_subsetCC %>%
group_by(player_id) %>%
group_modify(~ add_row(.x, season = max(player_stats_seasonal_offense_subsetCC$season) + 1)) %>%
fill(player_display_name, player_idFactor, position, position_group, positionFactor, team, .direction = "downup") %>%
ungroup
player_stats_seasonal_offense_subsetCC <- player_stats_seasonal_offense_subsetCC %>%
left_join(
player_stats_seasonal_offense_subsetCC %>%
filter(season == max(player_stats_seasonal_offense_subsetCC$season) - 1) %>%
select(player_id, age_lastYear = age, years_of_experience_lastYear = years_of_experience),
by = "player_id") %>%
mutate(
age = if_else(season == max(player_stats_seasonal_offense_subsetCC$season), age_lastYear + 1, age), # increment age by 1
ageCentered20 = age - 20,
years_of_experience = if_else(season == max(player_stats_seasonal_offense_subsetCC$season), years_of_experience_lastYear + 1, years_of_experience)) # increment experience by 1
activePlayers <- unique(player_stats_seasonal_offense_subsetCC[c("player_id","season")]) %>%
filter(season == max(player_stats_seasonal_offense_subsetCC$season) - 1) %>%
select(player_id) %>%
pull()
inactivePlayers <- player_stats_seasonal_offense_subsetCC$player_id[which(player_stats_seasonal_offense_subsetCC$player_id %ni% activePlayers)]
player_stats_seasonal_offense_subsetCC <- player_stats_seasonal_offense_subsetCC %>%
filter(player_id %in% activePlayers | (player_id %in% inactivePlayers & season < max(player_stats_seasonal_offense_subsetCC$season) - 1)) %>%
mutate(
player_idFactor = droplevels(player_idFactor)
)
21.4.2 Generate Predictions
21.4.3 Table of Next Season Predictions
Code
Code
Code
Code
21.4.4 Plot of Individuals’ Model-Implied Predictions
21.4.4.1 Quarterbacks
Code
plot_individualFantasyPointsByAgeQB <- ggplot(
data = player_stats_seasonal_offense_subsetCC %>% filter(position == "QB"),
mapping = aes(
x = round(age, 2),
y = round(fantasyPoints_bayesian, 2),
group = player_id)) +
geom_smooth(
aes(
x = age,
y = fantasyPoints_bayesian,
text = player_display_name, # add player name for mouse over tooltip
label = season # add season for mouse over tooltip
),
se = FALSE,
linewidth = 0.5,
color = "black") +
geom_point(
aes(
x = age,
y = fantasyPoints_bayesian,
text = player_display_name, # add player name for mouse over tooltip
label = season # add season for mouse over tooltip
),
size = 1,
color = "transparent" # make points invisible but keep tooltips
) +
labs(
x = "Player Age (years)",
y = "Fantasy Points (Season)",
title = "Fantasy Points (Season) by Player Age: Quarterbacks"
) +
theme_classic()
ggplotly(
plot_individualFantasyPointsByAgeQB,
tooltip = c("age","fantasyPoints_bayesian","text","label")
)
21.4.4.2 Running Backs
Code
plot_individualFantasyPointsByAgeRB <- ggplot(
data = player_stats_seasonal_offense_subsetCC %>% filter(position == "RB"),
mapping = aes(
x = age,
y = fantasyPoints_bayesian,
group = player_id)) +
geom_smooth(
aes(
x = age,
y = fantasyPoints_bayesian,
text = player_display_name, # add player name for mouse over tooltip
label = season # add season for mouse over tooltip
),
se = FALSE,
linewidth = 0.5,
color = "black") +
geom_point(
aes(
x = age,
y = fantasyPoints_bayesian,
text = player_display_name, # add player name for mouse over tooltip
label = season # add season for mouse over tooltip
),
size = 1,
color = "transparent" # make points invisible but keep tooltips
) +
labs(
x = "Player Age (years)",
y = "Fantasy Points (Season)",
title = "Fantasy Points (Season) by Player Age: Running Backs"
) +
theme_classic()
ggplotly(
plot_individualFantasyPointsByAgeRB,
tooltip = c("age","fantasyPoints_bayesian","text","label")
)
21.4.4.3 Wide Receivers
Code
plot_individualFantasyPointsByAgeWR <- ggplot(
data = player_stats_seasonal_offense_subsetCC %>% filter(position == "WR"),
mapping = aes(
x = age,
y = fantasyPoints_bayesian,
group = player_id)) +
geom_smooth(
aes(
x = age,
y = fantasyPoints_bayesian,
text = player_display_name, # add player name for mouse over tooltip
label = season # add season for mouse over tooltip
),
se = FALSE,
linewidth = 0.5,
color = "black") +
geom_point(
aes(
x = age,
y = fantasyPoints_bayesian,
text = player_display_name, # add player name for mouse over tooltip
label = season # add season for mouse over tooltip
),
size = 1,
color = "transparent" # make points invisible but keep tooltips
) +
labs(
x = "Player Age (years)",
y = "Fantasy Points (Season)",
title = "Fantasy Points (Season) by Player Age: Wide Receivers"
) +
theme_classic()
ggplotly(
plot_individualFantasyPointsByAgeWR,
tooltip = c("age","fantasyPoints_bayesian","text","label")
)
21.4.4.4 Tight Ends
Code
plot_individualFantasyPointsByAgeTE <- ggplot(
data = player_stats_seasonal_offense_subsetCC %>% filter(position == "TE"),
mapping = aes(
x = age,
y = fantasyPoints_bayesian,
group = player_id)) +
geom_smooth(
aes(
x = age,
y = fantasyPoints_bayesian,
text = player_display_name, # add player name for mouse over tooltip
label = season # add season for mouse over tooltip
),
se = FALSE,
linewidth = 0.5,
color = "black") +
geom_point(
aes(
x = age,
y = fantasyPoints_bayesian,
text = player_display_name, # add player name for mouse over tooltip
label = season # add season for mouse over tooltip
),
size = 1,
color = "transparent" # make points invisible but keep tooltips
) +
labs(
x = "Player Age (years)",
y = "Fantasy Points (Season)",
title = "Fantasy Points (Season) by Player Age: Wide Receivers"
) +
theme_classic()
ggplotly(
plot_individualFantasyPointsByAgeTE,
tooltip = c("age","fantasyPoints_bayesian","text","label")
)
21.5 Conclusion
21.6 Session Info
R version 4.4.2 (2024-10-31)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 24.04.1 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
locale:
[1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
[4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
[7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
time zone: UTC
tzcode source: system (glibc)
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] lubridate_1.9.4 forcats_1.0.0 stringr_1.5.1
[4] dplyr_1.1.4 purrr_1.0.2 readr_2.1.5
[7] tidyr_1.3.1 tibble_3.2.1 tidyverse_2.0.0
[10] plotly_4.10.4 ggplot2_3.5.1 rstan_2.32.6
[13] StanHeaders_2.32.10 brms_2.22.0 Rcpp_1.0.13-1
[16] forecast_8.23.0 xts_0.14.1 zoo_1.8-12
[19] petersenlab_1.1.0
loaded via a namespace (and not attached):
[1] DBI_1.2.3 mnormt_2.1.1 gridExtra_2.3
[4] inline_0.3.20 rlang_1.1.4 magrittr_2.0.3
[7] tseries_0.10-58 matrixStats_1.4.1 compiler_4.4.2
[10] mgcv_1.9-1 loo_2.8.0 vctrs_0.6.5
[13] reshape2_1.4.4 quadprog_1.5-8 pkgconfig_2.0.3
[16] fastmap_1.2.0 backports_1.5.0 labeling_0.4.3
[19] pbivnorm_0.6.0 cmdstanr_0.8.1.9000 rmarkdown_2.29
[22] tzdb_0.4.0 ps_1.8.1 xfun_0.49
[25] jsonlite_1.8.9 psych_2.4.12 parallel_4.4.2
[28] lavaan_0.6-19 cluster_2.1.6 R6_2.5.1
[31] stringi_1.8.4 RColorBrewer_1.1-3 rpart_4.1.23
[34] lmtest_0.9-40 estimability_1.5.1 knitr_1.49
[37] base64enc_0.1-3 bayesplot_1.11.1 splines_4.4.2
[40] timechange_0.3.0 Matrix_1.7-1 nnet_7.3-19
[43] tidyselect_1.2.1 rstudioapi_0.17.1 abind_1.4-8
[46] yaml_2.3.10 timeDate_4041.110 codetools_0.2-20
[49] processx_3.8.4 curl_6.0.1 pkgbuild_1.4.5
[52] lattice_0.22-6 plyr_1.8.9 quantmod_0.4.26
[55] withr_3.0.2 bridgesampling_1.1-2 urca_1.3-4
[58] posterior_1.6.0 coda_0.19-4.1 evaluate_1.0.1
[61] foreign_0.8-87 RcppParallel_5.1.9 pillar_1.10.0
[64] tensorA_0.36.2.1 checkmate_2.3.2 stats4_4.4.2
[67] distributional_0.5.0 generics_0.1.3 TTR_0.24.4
[70] hms_1.1.3 mix_1.0-13 rstantools_2.4.0
[73] munsell_0.5.1 scales_1.3.0 xtable_1.8-4
[76] glue_1.8.0 emmeans_1.10.6 Hmisc_5.2-1
[79] lazyeval_0.2.2 tools_4.4.2 data.table_1.16.4
[82] mvtnorm_1.3-2 grid_4.4.2 mitools_2.4
[85] crosstalk_1.2.1 QuickJSR_1.4.0 colorspace_2.1-1
[88] nlme_3.1-166 fracdiff_1.5-3 htmlTable_2.4.3
[91] Formula_1.2-5 cli_3.6.3 viridisLite_0.4.2
[94] Brobdingnag_1.2-9 V8_6.0.0 gtable_0.3.6
[97] digest_0.6.37 farver_2.1.2 htmlwidgets_1.6.4
[100] htmltools_0.5.8.1 lifecycle_1.0.4 httr_1.4.7