I want your feedback to make the book better for you and other readers. If you find typos, errors, or places where the text may be improved, please let me know. The best ways to provide feedback are by GitHub or hypothes.is annotations.
Opening an issue or submitting a pull request on GitHub: https://github.com/isaactpetersen/Fantasy-Football-Analytics-Textbook
Adding an annotation using hypothes.is. To add an annotation, select some text and then click the symbol on the pop-up menu. To see the annotations of others, click the symbol in the upper right-hand corner of the page.
21 Time Series Analysis
21.1 Getting Started
21.1.1 Load Packages
21.1.2 Load Data
21.2 Overview of Time Series Analysis
Time series analysis is useful when trying to generate forecasts from longitudinal data. That is, time series analysis seeks to evaluate change over time to predict future values.
There many different types of time series analyses. For simplicity, in this chapter, we use autoregressive integrated moving average (ARIMA) models to demonstrate one approach to time series analysis. We also leverage Bayesian mixed models to generate forecasts of future performance and plots of individuals model-implied performance by age and position.
21.3 Autoregressive Integrated Moving Average (ARIMA) Models
Hyndman & Athanasopoulos (2021) provide a nice overview of ARIMA models. As noted by Hyndman & Athanasopoulos (2021), ARIMA models aim to describe how a variable is correlated with itself over time (autocorrelation)—i.e., how earlier levels of a variable are correlated with later levels of the same variable. ARIMA models perform best when there is a clear pattern where later values are influenced by earlier values. ARIMA models incorporate autoregression effects, moving average effects, and differencing.
ARIMA models can have various numbers of terms and model complexity. They are specified in the following form: \(\text{ARIMA}(p,d,q)\), where:
- \(p =\) the number of autoregressive terms
- \(d =\) the number of differences between consecutive scores (to make the time series stationary by reducing trends and seasonality)
- \(q =\) the number of moving average terms
ARIMA models assume that the data are stationary (i.e., there are no long-term trends), are non-seasonal (i.e., there is no consistency of the timing of the peaks or troughs in the line), and that earlier values influence later values. This may not strongly be the case in fantasy football, so ARIMA models may not be particularly useful in forecasting fantasy football performance. Other approaches, such as exponential smoothing, may be useful for data that show longer-term trends and seasonality (Hyndman & Athanasopoulos, 2021). Nevertheless, ARIMA models are widely used in forecasting financial markets and economic indicators. Thus, it is a useful technique to learn.
Adapted from: https://rc2e.com/timeseriesanalysis (archived at https://perma.cc/U5P6-2VWC).
21.3.1 Create the Time Series Objects
Code
weeklyFantasyPoints_tomBrady <- player_stats_weekly_offense %>%
filter(
player_id == "00-0019596" | player_display_name == "Tom Brady")
weeklyFantasyPoints_peytonManning <- player_stats_weekly_offense %>%
filter(
player_id == "00-0010346" | player_display_name == "Peyton Manning")
ts_tomBrady <- xts::xts(
x = weeklyFantasyPoints_tomBrady["fantasy_points"],
order.by = weeklyFantasyPoints_tomBrady$gameday)
ts_peytonManning <- xts::xts(
x = weeklyFantasyPoints_peytonManning["fantasy_points"],
order.by = weeklyFantasyPoints_peytonManning$gameday)
ts_tomBrady
fantasy_points
2000-11-23 0.24
2001-09-23 2.74
2001-09-30 6.92
2001-10-07 0.34
2001-10-14 22.56
2001-10-21 19.88
2001-10-28 8.02
2001-11-04 22.00
2001-11-11 4.18
2001-11-18 8.00
...
2022-11-06 15.20
2022-11-13 16.02
2022-11-27 18.04
2022-12-05 17.14
2022-12-11 10.12
2022-12-18 16.58
2022-12-25 11.34
2023-01-01 37.68
2023-01-08 7.36
2023-01-16 22.04
fantasy_points
1999-09-12 15.06
1999-09-19 16.40
1999-09-26 29.56
1999-10-10 19.66
1999-10-17 10.10
1999-10-24 17.86
1999-10-31 17.76
1999-11-07 17.76
1999-11-14 15.18
1999-11-21 22.00
...
2015-10-04 8.32
2015-10-11 6.64
2015-10-18 9.60
2015-11-01 11.60
2015-11-08 15.24
2015-11-15 -6.60
2016-01-03 2.56
2016-01-17 10.78
2016-01-24 14.14
2016-02-07 3.64
21.3.2 Plot the Time Series
21.3.3 Rolling Mean/Median
fantasy_points
2001-09-30 6.560
2001-10-07 10.488
2001-10-14 11.544
2001-10-21 14.560
2001-10-28 15.328
2001-11-04 12.416
2001-11-11 13.984
2001-11-18 14.084
2001-11-25 10.568
2001-12-02 11.488
...
2022-10-23 15.492
2022-10-27 14.748
2022-11-06 15.612
2022-11-13 16.700
2022-11-27 15.304
2022-12-05 15.580
2022-12-11 14.644
2022-12-18 18.572
2022-12-25 16.616
2023-01-01 19.000
fantasy_points
2001-09-30 2.74
2001-10-07 6.92
2001-10-14 8.02
2001-10-21 19.88
2001-10-28 19.88
2001-11-04 8.02
2001-11-11 8.02
2001-11-18 8.52
2001-11-25 8.00
2001-12-02 8.52
...
2022-10-23 15.20
2022-10-27 15.20
2022-11-06 16.02
2022-11-13 17.10
2022-11-27 16.02
2022-12-05 16.58
2022-12-11 16.58
2022-12-18 16.58
2022-12-25 11.34
2023-01-01 16.58
21.3.4 Autocorrelation
The autocorrelation function (ACF) plot depicts the autocorrelation of scores as a function of the length of the lag. Significant autocorrelation is detected when the autocorrelation exceeds the dashed blue lines, as is depicted in Figure 21.3.
21.3.5 Fit an Autoregressive Integrated Moving Average Model
Series: ts_tomBrady
ARIMA(5,1,4)
Coefficients:
ar1 ar2 ar3 ar4 ar5 ma1 ma2 ma3
0.4516 -0.4604 -0.1763 -0.0835 0.2286 -1.3632 0.8840 -0.1503
s.e. 0.2437 0.2276 0.3552 0.0585 0.0535 0.2497 0.4107 0.5232
ma4
-0.3315
s.e. 0.3413
sigma^2 = 59.26: log likelihood = -1318.42
AIC=2656.85 AICc=2657.44 BIC=2696.3
Series: ts_peytonManning
ARIMA(1,0,1) with non-zero mean
Coefficients:
ar1 ma1 mean
0.9122 -0.8173 17.4824
s.e. 0.0617 0.0801 0.9618
sigma^2 = 60.49: log likelihood = -959.81
AIC=1927.62 AICc=1927.76 BIC=1942.11
Call:
arima(x = ts_tomBrady, order = c(5, 1, 4))
Coefficients:
ar1 ar2 ar3 ar4 ar5 ma1 ma2 ma3
0.4516 -0.4604 -0.1763 -0.0835 0.2286 -1.3632 0.8840 -0.1503
s.e. 0.2437 0.2276 0.3552 0.0585 0.0535 0.2497 0.4107 0.5232
ma4
-0.3315
s.e. 0.3413
sigma^2 estimated as 57.87: log likelihood = -1318.42, aic = 2656.85
Training set error measures:
ME RMSE MAE MPE MAPE MASE
Training set 0.6139671 7.597044 6.060271 -26.43586 54.72858 0.7239426
ACF1
Training set -0.005813919
2.5 % 97.5 %
ar1 -0.02608874 0.92929800
ar2 -0.90644893 -0.01428465
ar3 -0.87244587 0.51981406
ar4 -0.19810559 0.03107464
ar5 0.12370355 0.33351988
ma1 -1.85261459 -0.87375464
ma2 0.07904646 1.68893613
ma3 -1.17576451 0.87522599
ma4 -1.00046565 0.33737979
Ljung-Box test
data: Residuals from ARIMA(5,1,4)
Q* = 2.2789, df = 3, p-value = 0.5166
Model df: 9. Total lags used: 12
Code
Call:
arima(x = ts_tomBrady, order = c(5, 1, 4), fixed = c(NA, NA, 0, NA, NA, NA,
NA, NA, NA))
Coefficients:
ar1 ar2 ar3 ar4 ar5 ma1 ma2 ma3 ma4
0.5193 -0.5543 0 -0.0923 0.2250 -1.4318 1.0416 -0.4116 -0.1650
s.e. 0.2241 0.2086 0 0.0579 0.0559 0.2309 0.3702 0.2138 0.0657
sigma^2 estimated as 57.92: log likelihood = -1318.6, aic = 2655.2
Training set error measures:
ME RMSE MAE MPE MAPE MASE ACF1
Training set 0.6227877 7.60082 6.07869 -26.33863 54.9103 0.7261428 -0.005557294
2.5 % 97.5 %
ar1 0.07998006 0.958521477
ar2 -0.96317880 -0.145398938
ar3 NA NA
ar4 -0.20567506 0.021095338
ar5 0.11546585 0.334451330
ma1 -1.88434856 -0.979298817
ma2 0.31604537 1.767195595
ma3 -0.83059876 0.007494026
ma4 -0.29370838 -0.036243035
Ljung-Box test
data: Residuals from ARIMA(5,1,4)
Q* = 2.3778, df = 3, p-value = 0.4978
Model df: 9. Total lags used: 12
Code
Call:
arima(x = ts_peytonManning, order = c(1, 0, 1))
Coefficients:
ar1 ma1 intercept
0.9122 -0.8173 17.4824
s.e. 0.0617 0.0801 0.9618
sigma^2 estimated as 59.84: log likelihood = -959.81, aic = 1927.62
Training set error measures:
ME RMSE MAE MPE MAPE MASE
Training set -0.004427934 7.735376 6.029051 -99.44356 124.3119 0.7310316
ACF1
Training set -0.008441419
2.5 % 97.5 %
ar1 0.7913291 1.0330331
ma1 -0.9744194 -0.6602769
intercept 15.5973175 19.3675413
Ljung-Box test
data: Residuals from ARIMA(1,0,1) with non-zero mean
Q* = 3.8482, df = 8, p-value = 0.8705
Model df: 2. Total lags used: 10
21.3.6 Generate the Model Forecasts
Code
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
384 21.04468 11.295940 30.79342 6.1352694 35.95409
385 15.83560 6.048829 25.62238 0.8680246 30.80318
386 22.88606 13.050490 32.72162 7.8438557 37.92826
387 18.55059 8.525819 28.57536 3.2190255 33.88216
388 17.70447 7.678403 27.73053 2.3709241 33.03801
389 18.28265 8.159424 28.40587 2.8005125 33.76479
390 17.91802 7.758353 28.07768 2.3801515 33.45588
391 19.61025 9.448326 29.77218 4.0689260 35.15158
392 19.51992 9.353826 29.68602 3.9722189 35.06763
393 18.52265 8.355983 28.68932 2.9740740 34.07123
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
278 12.73528 2.821995 22.64856 -2.42577939 27.89634
279 13.15217 3.194408 23.10993 -2.07691104 28.38125
280 13.53245 3.537830 23.52706 -1.75300026 28.81789
281 13.87933 3.854149 23.90451 -1.45286154 29.21152
282 14.19575 4.145208 24.24629 -1.17522778 29.56673
283 14.48438 4.412787 24.55598 -0.91879401 29.88756
284 14.74767 4.658587 24.83675 -0.68224942 30.17759
285 14.98783 4.884225 25.09144 -0.46430054 30.43996
286 15.20690 5.091228 25.32258 -0.26368781 30.67750
287 15.40674 5.281029 25.53245 -0.07919712 30.89267
21.3.7 Plot the Model Forecasts
Code
21.4 Bayesian Mixed Models
The Bayesian longitudinal mixed models were estimated in Section 12.3.5.
21.4.1 Prepare New Data Object
Code
player_stats_seasonal_offense_subset <- player_stats_seasonal_offense %>%
filter(position_group %in% c("QB","RB","WR","TE"))
player_stats_seasonal_offense_subset$position[which(player_stats_seasonal_offense_subset$position == "HB")] <- "RB"
player_stats_seasonal_kicking_subset <- player_stats_seasonal_kicking %>%
filter(position == "K")
player_stats_seasonal_offense_subset <- bind_rows(
player_stats_seasonal_offense_subset,
player_stats_seasonal_kicking_subset
)
player_stats_seasonal_offense_subset$player_idFactor <- factor(player_stats_seasonal_offense_subset$player_id)
player_stats_seasonal_offense_subset$positionFactor <- factor(player_stats_seasonal_offense_subset$position)
Code
player_stats_seasonal_offense_subsetCC <- player_stats_seasonal_offense_subset %>%
filter(
!is.na(player_idFactor),
!is.na(fantasy_points),
!is.na(positionFactor),
!is.na(ageCentered20),
!is.na(ageCentered20Quadratic),
!is.na(years_of_experience))
player_stats_seasonal_offense_subsetCC <- player_stats_seasonal_offense_subsetCC %>%
filter(player_id %in% bayesianMixedModelFit$data$player_idFactor) %>%
mutate(positionFactor = droplevels(positionFactor))
player_stats_seasonal_offense_subsetCC <- player_stats_seasonal_offense_subsetCC %>%
group_by(player_id) %>%
group_modify(~ add_row(.x, season = max(player_stats_seasonal_offense_subsetCC$season) + 1)) %>%
fill(player_display_name, player_idFactor, position, position_group, positionFactor, team, .direction = "downup") %>%
ungroup
player_stats_seasonal_offense_subsetCC <- player_stats_seasonal_offense_subsetCC %>%
left_join(
player_stats_seasonal_offense_subsetCC %>%
filter(season == max(player_stats_seasonal_offense_subsetCC$season) - 1) %>%
select(player_id, age_lastYear = age, years_of_experience_lastYear = years_of_experience),
by = "player_id") %>%
mutate(
age = if_else(season == max(player_stats_seasonal_offense_subsetCC$season), age_lastYear + 1, age), # increment age by 1
ageCentered20 = age - 20,
years_of_experience = if_else(season == max(player_stats_seasonal_offense_subsetCC$season), years_of_experience_lastYear + 1, years_of_experience)) # increment experience by 1
activePlayers <- unique(player_stats_seasonal_offense_subsetCC[c("player_id","season")]) %>%
filter(season == max(player_stats_seasonal_offense_subsetCC$season) - 1) %>%
select(player_id) %>%
pull()
inactivePlayers <- player_stats_seasonal_offense_subsetCC$player_id[which(player_stats_seasonal_offense_subsetCC$player_id %ni% activePlayers)]
player_stats_seasonal_offense_subsetCC <- player_stats_seasonal_offense_subsetCC %>%
filter(player_id %in% activePlayers | (player_id %in% inactivePlayers & season < max(player_stats_seasonal_offense_subsetCC$season) - 1)) %>%
mutate(
player_idFactor = droplevels(player_idFactor)
)
21.4.2 Generate Predictions
21.4.3 Table of Next Season Predictions
Code
Code
Code
Code
21.4.4 Plot of Individuals’ Model-Implied Predictions
21.4.4.1 Quarterbacks
Code
plot_individualFantasyPointsByAgeQB <- ggplot(
data = player_stats_seasonal_offense_subsetCC %>% filter(position == "QB"),
mapping = aes(
x = round(age, 2),
y = round(fantasyPoints_bayesian, 2),
group = player_id)) +
geom_smooth(
aes(
x = age,
y = fantasyPoints_bayesian,
text = player_display_name, # add player name for mouse over tooltip
label = season # add season for mouse over tooltip
),
se = FALSE,
linewidth = 0.5,
color = "black") +
geom_point(
aes(
x = age,
y = fantasyPoints_bayesian,
text = player_display_name, # add player name for mouse over tooltip
label = season # add season for mouse over tooltip
),
size = 1,
color = "transparent" # make points invisible but keep tooltips
) +
labs(
x = "Player Age (years)",
y = "Fantasy Points (Season)",
title = "Fantasy Points (Season) by Player Age: Quarterbacks"
) +
theme_classic()
ggplotly(
plot_individualFantasyPointsByAgeQB,
tooltip = c("age","fantasyPoints_bayesian","text","label")
)
21.4.4.2 Running Backs
Code
plot_individualFantasyPointsByAgeRB <- ggplot(
data = player_stats_seasonal_offense_subsetCC %>% filter(position == "RB"),
mapping = aes(
x = age,
y = fantasyPoints_bayesian,
group = player_id)) +
geom_smooth(
aes(
x = age,
y = fantasyPoints_bayesian,
text = player_display_name, # add player name for mouse over tooltip
label = season # add season for mouse over tooltip
),
se = FALSE,
linewidth = 0.5,
color = "black") +
geom_point(
aes(
x = age,
y = fantasyPoints_bayesian,
text = player_display_name, # add player name for mouse over tooltip
label = season # add season for mouse over tooltip
),
size = 1,
color = "transparent" # make points invisible but keep tooltips
) +
labs(
x = "Player Age (years)",
y = "Fantasy Points (Season)",
title = "Fantasy Points (Season) by Player Age: Running Backs"
) +
theme_classic()
ggplotly(
plot_individualFantasyPointsByAgeRB,
tooltip = c("age","fantasyPoints_bayesian","text","label")
)
21.4.4.3 Wide Receivers
Code
plot_individualFantasyPointsByAgeWR <- ggplot(
data = player_stats_seasonal_offense_subsetCC %>% filter(position == "WR"),
mapping = aes(
x = age,
y = fantasyPoints_bayesian,
group = player_id)) +
geom_smooth(
aes(
x = age,
y = fantasyPoints_bayesian,
text = player_display_name, # add player name for mouse over tooltip
label = season # add season for mouse over tooltip
),
se = FALSE,
linewidth = 0.5,
color = "black") +
geom_point(
aes(
x = age,
y = fantasyPoints_bayesian,
text = player_display_name, # add player name for mouse over tooltip
label = season # add season for mouse over tooltip
),
size = 1,
color = "transparent" # make points invisible but keep tooltips
) +
labs(
x = "Player Age (years)",
y = "Fantasy Points (Season)",
title = "Fantasy Points (Season) by Player Age: Wide Receivers"
) +
theme_classic()
ggplotly(
plot_individualFantasyPointsByAgeWR,
tooltip = c("age","fantasyPoints_bayesian","text","label")
)
21.4.4.4 Tight Ends
Code
plot_individualFantasyPointsByAgeTE <- ggplot(
data = player_stats_seasonal_offense_subsetCC %>% filter(position == "TE"),
mapping = aes(
x = age,
y = fantasyPoints_bayesian,
group = player_id)) +
geom_smooth(
aes(
x = age,
y = fantasyPoints_bayesian,
text = player_display_name, # add player name for mouse over tooltip
label = season # add season for mouse over tooltip
),
se = FALSE,
linewidth = 0.5,
color = "black") +
geom_point(
aes(
x = age,
y = fantasyPoints_bayesian,
text = player_display_name, # add player name for mouse over tooltip
label = season # add season for mouse over tooltip
),
size = 1,
color = "transparent" # make points invisible but keep tooltips
) +
labs(
x = "Player Age (years)",
y = "Fantasy Points (Season)",
title = "Fantasy Points (Season) by Player Age: Wide Receivers"
) +
theme_classic()
ggplotly(
plot_individualFantasyPointsByAgeTE,
tooltip = c("age","fantasyPoints_bayesian","text","label")
)
21.5 Conclusion
21.6 Session Info
R version 4.4.2 (2024-10-31)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 22.04.5 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
locale:
[1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
[4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
[7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
time zone: UTC
tzcode source: system (glibc)
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] rstan_2.32.6 StanHeaders_2.32.10 lubridate_1.9.3
[4] forcats_1.0.0 stringr_1.5.1 dplyr_1.1.4
[7] purrr_1.0.2 readr_2.1.5 tidyr_1.3.1
[10] tibble_3.2.1 tidyverse_2.0.0 plotly_4.10.4
[13] ggplot2_3.5.1 brms_2.22.0 Rcpp_1.0.13-1
[16] forecast_8.23.0 xts_0.14.1 zoo_1.8-12
[19] petersenlab_1.1.0
loaded via a namespace (and not attached):
[1] DBI_1.2.3 mnormt_2.1.1 gridExtra_2.3
[4] inline_0.3.20 rlang_1.1.4 magrittr_2.0.3
[7] tseries_0.10-58 matrixStats_1.4.1 compiler_4.4.2
[10] mgcv_1.9-1 loo_2.8.0 vctrs_0.6.5
[13] reshape2_1.4.4 quadprog_1.5-8 pkgconfig_2.0.3
[16] fastmap_1.2.0 backports_1.5.0 labeling_0.4.3
[19] pbivnorm_0.6.0 utf8_1.2.4 cmdstanr_0.8.1.9000
[22] rmarkdown_2.29 tzdb_0.4.0 ps_1.8.1
[25] xfun_0.49 jsonlite_1.8.9 psych_2.4.6.26
[28] parallel_4.4.2 lavaan_0.6-19 cluster_2.1.6
[31] R6_2.5.1 stringi_1.8.4 RColorBrewer_1.1-3
[34] rpart_4.1.23 lmtest_0.9-40 estimability_1.5.1
[37] knitr_1.49 base64enc_0.1-3 bayesplot_1.11.1
[40] splines_4.4.2 timechange_0.3.0 Matrix_1.7-1
[43] nnet_7.3-19 tidyselect_1.2.1 rstudioapi_0.17.1
[46] abind_1.4-8 yaml_2.3.10 timeDate_4041.110
[49] codetools_0.2-20 processx_3.8.4 curl_6.0.1
[52] pkgbuild_1.4.5 lattice_0.22-6 plyr_1.8.9
[55] quantmod_0.4.26 withr_3.0.2 bridgesampling_1.1-2
[58] urca_1.3-4 posterior_1.6.0 coda_0.19-4.1
[61] evaluate_1.0.1 foreign_0.8-87 RcppParallel_5.1.9
[64] pillar_1.9.0 tensorA_0.36.2.1 checkmate_2.3.2
[67] stats4_4.4.2 distributional_0.5.0 generics_0.1.3
[70] TTR_0.24.4 mix_1.0-12 hms_1.1.3
[73] rstantools_2.4.0 munsell_0.5.1 scales_1.3.0
[76] xtable_1.8-4 glue_1.8.0 emmeans_1.10.5
[79] Hmisc_5.2-0 lazyeval_0.2.2 tools_4.4.2
[82] data.table_1.16.2 mvtnorm_1.3-2 grid_4.4.2
[85] mitools_2.4 crosstalk_1.2.1 QuickJSR_1.4.0
[88] colorspace_2.1-1 nlme_3.1-166 fracdiff_1.5-3
[91] htmlTable_2.4.3 Formula_1.2-5 cli_3.6.3
[94] fansi_1.0.6 viridisLite_0.4.2 Brobdingnag_1.2-9
[97] V8_6.0.0 gtable_0.3.6 digest_0.6.37
[100] farver_2.1.2 htmlwidgets_1.6.4 htmltools_0.5.8.1
[103] lifecycle_1.0.4 httr_1.4.7