I need your help!

I want your feedback to make the book better for you and other readers. If you find typos, errors, or places where the text may be improved, please let me know. The best ways to provide feedback are by GitHub or hypothes.is annotations.

Opening an issue or submitting a pull request on GitHub: https://github.com/isaactpetersen/Fantasy-Football-Analytics-Textbook

Hypothesis Adding an annotation using hypothes.is. To add an annotation, select some text and then click the symbol on the pop-up menu. To see the annotations of others, click the symbol in the upper right-hand corner of the page.

12  Mixed Models

12.1 Getting Started

12.1.1 Load Packages

Code
library("lme4")
library("lmerTest")
library("MuMIn")
library("emmeans")
library("sjstats")
library("mgcv")
library("AICcmodavg")
library("bbmle")
library("rstan")
library("brms")
library("cmdstanr") # todo: install.packages("cmdstanr", repos = c("https://stan-dev.r-universe.dev", getOption("repos"))); check_cmdstan_toolchain(); cmdstanr::install_cmdstan()
library("fitdistrplus")
library("parallel")
library("parallelly")
library("plotly")
library("viridis")
library("tidyverse")

12.1.2 Specify Package Options

Code
emm_options(lmerTest.limit = 100000)
emm_options(pbkrtest.limit = 100000)

12.1.3 Load Data

Code
load(file = "./data/nfl_depthCharts.RData")
load(file = "./data/player_stats_weekly.RData")
load(file = "./data/player_stats_seasonal.RData")

We created the player_stats_weekly.RData and player_stats_seasonal.RData objects in Section 4.4.3.

12.2 Overview of Mixed Models

We will discuss a modeling framework that goes by many terms, including mixed models, mixed-effects models, multilevel models, hierarchical linear models. They are sometimes called multilevel models and hierarchical linear models, whose name emphasizes the hierarchical structure of the data because the data are nonindependent. When observations (i.e., data points) are collected from multiple lower-level units (e.g., people) in an upper-level unit (e.g., married couple, family, classroom, school, neighborhood, team), the data from the lower-level units are considered “nested” within the upper-level unit. In this context, nested data refers to multiple observations from the same upper-level unit. For instance, longitudinal data are nested within the same participant. Students are nested within classrooms, which are nested within schools. Players are nested within teams.

When data are nested, the data from the lower-level unit are likely to be correlated, to some degree, because they come from the same upper-level unit. For example, multiple players may come from the same team, and the players’ performance on that team is likely interrelated because they share common experiences and influence one another. Thus, data from multiple players on a given team are considered nested within that team. Longitudinal data can also be considered nested data, in which time points are nested within the person (i.e., the same player provides an observation across multiple time points). As we will discuss, it is important to account for levels of nesting when the observations are nonindependent.

These models are also sometimes called mixed models or mixed-effects models because the models can include a mix of fixed and random effects. Fixed effects are effects that are constant across individuals (i.e., upper-level units). Random effects are effects that vary across individuals (i.e., upper-level units). For instance, consider a longitudinal study of fantasy performance as a function of age. If we have longitudinal data for multiple players, the time points are nested within players. Examining the association between age as a fixed effect in relation to fantasy performace would examine the assocication between a player’s age and their fantasy performance while holding the association between age and fantasy performance constant across all players. That is, it would assume that all players show the same trajectory such as increase for 4 years then decrease. Examining the association between age as a random effect in relation to fantasy performace would examine the assocication between a player’s age and their fantasy performance while allowing the association between age and fantasy performance to vary across players. That is, it would allow the possibility that some players improve with age, whereas other players decline in performance with age.

When including random effects of a variable (e.g., age) in a mixed model, it is also important to include fixed effects of that variable in the model. This is because random effects have a mean of zero. Fixed effects allow the mean to differ from zero. Thus, inclusion of random effects without the corresponding fixed effect can lead to bias in estimation of the association between the predictor variables and the outcome variable.

12.2.1 Ecological Fallacy

As described in Section 14.5.9, the ecological fallacy is the error of drawing inferences about an individual from group-level data. An type of ecological fallacy is Simpson’s paradox.

12.2.2 Simpson’s Paradox

Examples of Simpson’s Paradox are depicted in Figures 13.5 and 12.1. In the example below, there is a positive association between the predictor variable and outcome variable for each group. However, when the groups are combined, there is a negative association between predictor variable and outcome variable. That is, the sign of an association can differ at different levels of analysis (e.g., group level versus person level).

An Example of Simpson's Paradox. In this example, there is a positive association between `x` and `y` within each group (i.e., red or blue), but there is a negative association between `x` and `y` when the groups are combined. (Figure retrieved from <https://en.wikipedia.org/wiki/File:Simpson%27s_paradox_continuous.svg>)
Figure 12.1: An Example of Simpson’s Paradox. In this example, there is a positive association between x and y within each group (i.e., red or blue), but there is a negative association between x and y when the groups are combined. (Figure retrieved from https://en.wikipedia.org/wiki/File:Simpson%27s_paradox_continuous.svg)

Consider that we observe a between-person association between a predictor, for example, how much sports drink a player drinks and their performance, such that players who drink more sports drink before games tend to perform better during games. However, based on this association, if we draw the inference that sports drink consumption leads to better performance, this could be a faulty inference. It is possible that, at the within-person level, there is no association or even a negative association between sports drink consumption and performance. For helping to approximate causality, a much stronger test than relying on the between-person association would be to examine the association within the individual. That is, examining the association within the individual would examine: when a player consumes more sports drink, whether they perform better than when the same player consumes less sports drink. We describe within-person analyses to approximate causal inferences in Section 13.4.2.2.

In short, we often need to account for the groups or “levels” within which people are nested. When multiple observations are nested within the same upper-level unit, they are often correlated. That is, some variance is attributable to the lower-level unit (e.g., the player) and some variance is attributable to the upper-level unit (e.g., the team). It is important to evaluate how much variance is attributable to the upper-level unit. If substantial variance is attributable to the upper-level unit, it is important to account for the nested structure of the data. Mixed models are a useful approach to account for data with a nested structure so you can avoid committing the ecological fallacy.

12.3 Fantasy Points Per Season by Position, Age, and Experience

In this chapter, we use mixed models to demonstrate how to model trajectories (growth curves) of players’ fantasy points as a function of age. One of the challenges when modeling players’ fantasy points as a function of age is that fantasy points are a function of both ability and opportunity. With age, the player’s ability and opportunity may both decline, but it is difficult to disentangle the extent to which a player’s decline is related to declines in ability versus opportunity. As players get older, they may be supplanted by younger, more talented players. With age, despite still having latent (unobserved) ability, players will get fewer and, eventually, no opportunity. And we do not know the counterfactual—how many fantasy points they would have scored had they been given the full opportunity each season. Thus, players may go from scoring 100+ points in a season to all of a sudden scoring way fewer points, with little in the way of intermediate steps. Thus, this should inform how you interpret the resulting trajectories. For instance, the steep declines can make it look like the model-implied fantasy points go well below zero at later ages, even though it is obviously not likely for players to obtain such a magnitude of negative fantasy points. The negative model-implied values of fantasy points at later ages are likely an artifact of the decreasing opportunity that players tend to get as they age. The crossover point, where the positive values becomes negative might indicate, for instance, that players at that age tend to have retired and thus have no opportunity. Regardless, the form/shape/steepness of the decline is still potentially meaningful even if the precise negative number at a given age is not.

Code
player_stats_seasonal_offense_subset <- player_stats_seasonal %>% 
  dplyr::filter(position_group %in% c("QB","RB","WR","TE") | position %in% c("K"))

player_stats_seasonal_offense_subset$position[which(player_stats_seasonal_offense_subset$position == "HB")] <- "RB"

player_stats_seasonal_offense_subset$player_idFactor <- factor(player_stats_seasonal_offense_subset$player_id)
player_stats_seasonal_offense_subset$positionFactor <- factor(player_stats_seasonal_offense_subset$position)
Code
seasons17week <- 2001:2020
seasons18week <- 2021:max(nfl_depthCharts$season, na.rm = TRUE)

endOfSeasonDepthCharts <- nfl_depthCharts %>% 
  filter((season %in% seasons17week & week == 18) | (season %in% seasons18week & week == 19)) # get end-of-season depth charts

qb1s <- endOfSeasonDepthCharts %>% 
  filter(position == "QB", depth_team == 1)

fb1s <- endOfSeasonDepthCharts %>% 
  filter(position == "FB", depth_team == 1)

k1s <- endOfSeasonDepthCharts %>% 
  filter(position == "K", depth_team == 1)

rb1s <- endOfSeasonDepthCharts %>% 
  filter(position == "RB", depth_team == 1)

wr1s <- endOfSeasonDepthCharts %>% 
  filter(position == "WR", depth_team == 1)

te1s <- endOfSeasonDepthCharts %>% 
  filter(position == "TE", depth_team == 1)

player_stats_seasonal_offense_subsetDepth <- player_stats_seasonal_offense_subset %>% 
  filter(player_id %in% c(
    qb1s$gsis_id,
    fb1s$gsis_id,
    k1s$gsis_id,
    rb1s$gsis_id,
    wr1s$gsis_id,
    te1s$gsis_id
    ))

Create a newdata object for generating the plots of model-implied fantasy points by age and position:

Code
pointsPerSeason_positionAge_newData <- expand.grid(
  positionFactor = factor(c("FB","QB","RB","TE","WR")), #,"K"
  age = seq(from = 20, to = 40, length.out = 10000)
)

pointsPerSeason_positionAge_newData$ageCentered20 <- pointsPerSeason_positionAge_newData$age - 20
pointsPerSeason_positionAge_newData$ageCentered20Quadratic <- pointsPerSeason_positionAge_newData$ageCentered20 ^ 2
pointsPerSeason_positionAge_newData$years_of_experience <- floor(pointsPerSeason_positionAge_newData$age - 22) # assuming that most players start at age 22 (i.e., rookie year) and thus have 1 year of experience at age 23
pointsPerSeason_positionAge_newData$years_of_experience[which(pointsPerSeason_positionAge_newData$years_of_experience < 0)] <- 0

Create an object with complete cases for generating the plots of individuals’ model-implied fantasy points by age and position:

Code
player_stats_seasonal_offense_subsetCC <- player_stats_seasonal_offense_subset %>%
  filter(
    !is.na(player_idFactor),
    !is.na(fantasyPoints),
    !is.na(positionFactor),
    !is.na(ageCentered20),
    !is.na(ageCentered20Quadratic),
    !is.na(years_of_experience))

12.3.1 Scatterplots of Fantasy Points by Age and Position

Scatterplots are a helpful tool for quickly examining the association between two variables. However, scatterplots—as well as correlation and multiple regression—can hide meaningful associations that differ across units of analysis.

12.3.1.1 Quarterbacks

A scatterplot of Quarterbacks’ fantasy points by age is in Figure 12.2.

Code
plot_scatterplotFantasyPointsByAgeQB <- ggplot(
  data = player_stats_seasonal_offense_subset %>% 
    filter(position == "QB") %>% 
    mutate(
      age = round(age, 2),
      fantasyPoints = round(fantasyPoints, 2)
    ),
  mapping = aes(
    x = age,
    y = fantasyPoints,
    color = player_id)) +
  geom_point(
    aes(
      text = player_display_name, # add player name for mouse over tooltip
      label = season # add season for mouse over tooltip
  )) +
  geom_smooth(
    mapping = aes(
    x = age,
    y = fantasyPoints),
    inherit.aes = FALSE
  ) +
  scale_color_viridis(discrete = TRUE) +
  labs(
    x = "Player Age (years)",
    y = "Fantasy Points (Season)",
    title = "Fantasy Points (Season) by Player Age: Quarterbacks"
  ) +
  theme_classic() +
  theme(legend.position = "none")

ggplotly(
  plot_scatterplotFantasyPointsByAgeQB,
  tooltip = c("age","fantasyPoints","text","label"))
Figure 12.2: Scatterplot of Fantasy Points by Age for Quarterbacks.

Based on the scatterplot (and the bivariate association below), Quarterbacks’ fantasy points appear to increase with age.

Code
cor.test(
  formula = ~ age + fantasyPoints,
  data = player_stats_seasonal_offense_subset %>% filter(position == "QB")
)

    Pearson's product-moment correlation

data:  age and fantasyPoints
t = 2.4333, df = 1922, p-value = 0.01505
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.01075572 0.09985864
sample estimates:
       cor 
0.05541751 

12.3.1.2 Fullbacks

A scatterplot of Fullbacks’ fantasy points by age is in Figure 12.3.

Code
plot_scatterplotFantasyPointsByAgeFB <- ggplot(
  data = player_stats_seasonal_offense_subset %>% 
    filter(position == "FB") %>% 
    mutate(
      age = round(age, 2),
      fantasyPoints = round(fantasyPoints, 2)
    ),
  mapping = aes(
    x = age,
    y = fantasyPoints,
    color = player_id)) +
  geom_point(
    aes(
      text = player_display_name, # add player name for mouse over tooltip
      label = season # add season for mouse over tooltip
  )) +
  geom_smooth(
    mapping = aes(
    x = age,
    y = fantasyPoints),
    inherit.aes = FALSE
  ) +
  scale_color_viridis(discrete = TRUE) +
  labs(
    x = "Player Age (years)",
    y = "Fantasy Points (Season)",
    title = "Fantasy Points (Season) by Player Age: Fullbacks"
  ) +
  theme_classic() +
  theme(legend.position = "none")

ggplotly(
  plot_scatterplotFantasyPointsByAgeFB,
  tooltip = c("age","fantasyPoints","text","label"))
Figure 12.3: Scatterplot of Fantasy Points by Age for Fullbacks.

Based on the scatterplot, Fullbacks’ fantasy points appear to be relatively stable across ages.

12.3.1.3 Running Backs

A scatterplot of Running Backs’ fantasy points by age is in Figure 12.4.

Code
plot_scatterplotFantasyPointsByAgeRB <- ggplot(
  data = player_stats_seasonal_offense_subset %>% 
    filter(position == "RB") %>% 
    mutate(
      age = round(age, 2),
      fantasyPoints = round(fantasyPoints, 2)
    ),
  mapping = aes(
    x = age,
    y = fantasyPoints,
    color = player_id)) +
  geom_point(
    aes(
      text = player_display_name, # add player name for mouse over tooltip
      label = season # add season for mouse over tooltip
  )) +
  geom_smooth(
    mapping = aes(
    x = age,
    y = fantasyPoints),
    inherit.aes = FALSE
  ) +
  scale_color_viridis(discrete = TRUE) +
  labs(
    x = "Player Age (years)",
    y = "Fantasy Points (Season)",
    title = "Fantasy Points (Season) by Player Age: Running Backs"
  ) +
  theme_classic() +
  theme(legend.position = "none")

ggplotly(
  plot_scatterplotFantasyPointsByAgeRB,
  tooltip = c("age","fantasyPoints","text","label"))
Figure 12.4: Scatterplot of Fantasy Points by Age for Running Backs.

Based on the scatterplot, Running Backs’ fantasy points appear to be relatively stable across ages.

12.3.1.4 Wide Receivers

A scatterplot of Wide Receivers’ fantasy points by age is in Figure 12.5.

Code
plot_scatterplotFantasyPointsByAgeWR <- ggplot(
  data = player_stats_seasonal_offense_subset %>% 
    filter(position == "WR") %>% 
    mutate(
      age = round(age, 2),
      fantasyPoints = round(fantasyPoints, 2)
    ),
  mapping = aes(
    x = age,
    y = fantasyPoints,
    color = player_id)) +
  geom_point(
    aes(
      text = player_display_name, # add player name for mouse over tooltip
      label = season # add season for mouse over tooltip
  )) +
  geom_smooth(
    mapping = aes(
    x = age,
    y = fantasyPoints),
    inherit.aes = FALSE
  ) +
  scale_color_viridis(discrete = TRUE) +
  labs(
    x = "Player Age (years)",
    y = "Fantasy Points (Season)",
    title = "Fantasy Points (Season) by Player Age: Wide Receivers"
  ) +
  theme_classic() +
  theme(legend.position = "none")

ggplotly(
  plot_scatterplotFantasyPointsByAgeWR,
  tooltip = c("age","fantasyPoints","text","label"))
Figure 12.5: Scatterplot of Fantasy Points by Age for Wide Receivers.

Based on the scatterplot, Wide Receivers’ fantasy points appear to be relatively stable across ages.

12.3.1.5 Tight Ends

A scatterplot of Tight Ends’ fantasy points by age is in Figure 12.6.

Code
plot_scatterplotFantasyPointsByAgeTE <- ggplot(
  data = player_stats_seasonal_offense_subset %>% 
    filter(position == "TE") %>% 
    mutate(
      age = round(age, 2),
      fantasyPoints = round(fantasyPoints, 2)
    ),
  mapping = aes(
    x = age,
    y = fantasyPoints,
    color = player_id)) +
  geom_point(
    aes(
      text = player_display_name, # add player name for mouse over tooltip
      label = season # add season for mouse over tooltip
  )) +
  geom_smooth(
    mapping = aes(
    x = age,
    y = fantasyPoints),
    inherit.aes = FALSE
  ) +
  scale_color_viridis(discrete = TRUE) +
  labs(
    x = "Player Age (years)",
    y = "Fantasy Points (Season)",
    title = "Fantasy Points (Season) by Player Age: Tight Ends"
  ) +
  theme_classic() +
  theme(legend.position = "none")

ggplotly(
  plot_scatterplotFantasyPointsByAgeTE,
  tooltip = c("age","fantasyPoints","text","label"))
Figure 12.6: Scatterplot of Fantasy Points by Age for Tight Ends.

Based on the scatterplot (and the bivariate association below), Tight Ends’ fantasy points appear to increase with age.

Code
cor.test(
  formula = ~ age + fantasyPoints,
  data = player_stats_seasonal_offense_subset %>% filter(position == "TE")
)

    Pearson's product-moment correlation

data:  age and fantasyPoints
t = 5.6935, df = 2874, p-value = 1.37e-08
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.06932659 0.14161227
sample estimates:
      cor 
0.1056089 

12.3.2 Plots of Raw Trajectories of Fantasy Points By Age and Player

Scatterplots can be helpful for quickly visualizing the association between two variables. However, as mentioned earlier, scatterplots can hide the association between variables at different units of analysis. For instance, consider that we are trying to predict how a player will perform based on their age. We are interested not only in what the association is between age and fantasy points between players (i.e., a between-person association). We are also interested in what the association is between age and fantasy points within a given player (and within each player; i.e., a within-individual association). Arguably, the within-individual association between age and fantasy points is more relevant to the prediction of performance than the association between age and fantasy points between players. Assuming that the between-player association between age and fantasy points is the same as the within-player association when it is not is an example of the ecological fallacy.

Below, we depict players’ raw trajectories of fantasy points as a function of age. These are known as spaghetti plots. By examining the trajectory for each player, we can get a better understanding of hor performance changes (within an individual) as a function of age.

12.3.2.1 Quarterbacks

A plot of Quarterbacks’ raw fantasy points data by age is in Figure 12.7.

Code
plot_rawFantasyPointsByAgeQB <- ggplot(
  data = player_stats_seasonal_offense_subset %>% 
    filter(position == "QB") %>% 
    mutate(
      age = round(age, 2),
      fantasyPoints = round(fantasyPoints, 2)
    ),
  mapping = aes(
    x = age,
    y = fantasyPoints,
    color = player_id)) +
  geom_line(
    aes(
      text = player_display_name, # add player name for mouse over tooltip
      label = season # add season for mouse over tooltip
  )) +
  scale_color_viridis(discrete = TRUE) +
  labs(
    x = "Player Age (years)",
    y = "Fantasy Points (Season)",
    title = "Fantasy Points (Season) by Player Age: Quarterbacks"
  ) +
  theme_classic() +
  theme(legend.position = "none")

ggplotly(
  plot_rawFantasyPointsByAgeQB,
  tooltip = c("age","fantasyPoints","text","label"))
Figure 12.7: Plot of Raw Trajectories of Fantasy Points by Age for Quarterbacks.

12.3.2.2 Fullbacks

A plot of Fullbacks’ raw fantasy points data by age is in Figure 12.8.

Code
plot_rawFantasyPointsByAgeFB <- ggplot(
  data = player_stats_seasonal_offense_subset %>% 
    filter(position == "FB") %>% 
    mutate(
      age = round(age, 2),
      fantasyPoints = round(fantasyPoints, 2)
    ),
  mapping = aes(
    x = age,
    y = fantasyPoints,
    color = player_id)) +
  geom_line(
    aes(
      text = player_display_name, # add player name for mouse over tooltip
      label = season # add season for mouse over tooltip
  )) +
  scale_color_viridis(discrete = TRUE) +
  labs(
    x = "Player Age (years)",
    y = "Fantasy Points (Season)",
    title = "Fantasy Points (Season) by Player Age: Fullbacks"
  ) +
  theme_classic() +
  theme(legend.position = "none")

ggplotly(
  plot_rawFantasyPointsByAgeFB,
  tooltip = c("age","fantasyPoints","text","label"))
Figure 12.8: Plot of Raw Trajectories of Fantasy Points by Age for Fullbacks.

12.3.2.3 Running Backs

A plot of Running Backs’ raw fantasy points data by age is in Figure 12.9.

Code
plot_rawFantasyPointsByAgeRB <- ggplot(
  data = player_stats_seasonal_offense_subset %>% 
    filter(position == "RB") %>% 
    mutate(
      age = round(age, 2),
      fantasyPoints = round(fantasyPoints, 2)
    ),
  mapping = aes(
    x = age,
    y = fantasyPoints,
    color = player_id)) +
  geom_line(
    aes(
      text = player_display_name, # add player name for mouse over tooltip
      label = season # add season for mouse over tooltip
  )) +
  scale_color_viridis(discrete = TRUE) +
  labs(
    x = "Player Age (years)",
    y = "Fantasy Points (Season)",
    title = "Fantasy Points (Season) by Player Age: Running Backs"
  ) +
  theme_classic() +
  theme(legend.position = "none")

ggplotly(
  plot_rawFantasyPointsByAgeRB,
  tooltip = c("age","fantasyPoints","text","label"))
Figure 12.9: Plot of Raw Trajectories of Fantasy Points by Age for Running Backs.

12.3.2.4 Wide Receivers

A plot of Wide Receivers’ raw fantasy points data by age is in Figure 12.10.

Code
plot_rawFantasyPointsByAgeWR <- ggplot(
  data = player_stats_seasonal_offense_subset %>% 
    filter(position == "WR") %>% 
    mutate(
      age = round(age, 2),
      fantasyPoints = round(fantasyPoints, 2)
    ),
  mapping = aes(
    x = age,
    y = fantasyPoints,
    color = player_id)) +
  geom_line(
    aes(
      text = player_display_name, # add player name for mouse over tooltip
      label = season # add season for mouse over tooltip
  )) +
  scale_color_viridis(discrete = TRUE) +
  labs(
    x = "Player Age (years)",
    y = "Fantasy Points (Season)",
    title = "Fantasy Points (Season) by Player Age: Wide Receivers"
  ) +
  theme_classic() +
  theme(legend.position = "none")

ggplotly(
  plot_rawFantasyPointsByAgeWR,
  tooltip = c("age","fantasyPoints","text","label"))
Figure 12.10: Plot of Raw Trajectories of Fantasy Points by Age for Wide Receivers.

12.3.2.5 Tight Ends

A plot of Tight Ends’ raw fantasy points data by age is in Figure 12.11.

Code
plot_rawFantasyPointsByAgeTE <- ggplot(
  data = player_stats_seasonal_offense_subset %>% 
    filter(position == "TE") %>% 
    mutate(
      age = round(age, 2),
      fantasyPoints = round(fantasyPoints, 2)
    ),
  mapping = aes(
    x = age,
    y = fantasyPoints,
    color = player_id)) +
  geom_line(
    aes(
      text = player_display_name, # add player name for mouse over tooltip
      label = season # add season for mouse over tooltip
  )) +
  scale_color_viridis(discrete = TRUE) +
  labs(
    x = "Player Age (years)",
    y = "Fantasy Points (Season)",
    title = "Fantasy Points (Season) by Player Age: Tight Ends"
  ) +
  theme_classic() +
  theme(legend.position = "none")

ggplotly(
  plot_rawFantasyPointsByAgeTE,
  tooltip = c("age","fantasyPoints","text","label"))