I need your help!

I want your feedback to make the book better for you and other readers. If you find typos, errors, or places where the text may be improved, please let me know. The best ways to provide feedback are by GitHub or hypothes.is annotations.

You can leave a comment at the bottom of the page/chapter, or open an issue or submit a pull request on GitHub: https://github.com/isaactpetersen/Fantasy-Football-Analytics-Textbook

Hypothesis Alternatively, you can leave an annotation using hypothes.is. To add an annotation, select some text and then click the symbol on the pop-up menu. To see the annotations of others, click the symbol in the upper right-hand corner of the page.

19  Machine Learning

19.1 Getting Started

19.1.1 Load Packages

Code
library("petersenlab")
library("parallel")
library("future")
library("missRanger")
library("powerjoin")
library("tidymodels")
library("LongituRF")
library("gpboost")
library("tidyverse")

19.1.2 Load Data

Code
# Downloaded Data - Processed
load(file = "./data/nfl_players.RData")
load(file = "./data/nfl_teams.RData")
load(file = "./data/nfl_rosters.RData")
load(file = "./data/nfl_rosters_weekly.RData")
load(file = "./data/nfl_schedules.RData")
load(file = "./data/nfl_combine.RData")
load(file = "./data/nfl_draftPicks.RData")
load(file = "./data/nfl_depthCharts.RData")
load(file = "./data/nfl_pbp.RData")
load(file = "./data/nfl_4thdown.RData")
load(file = "./data/nfl_participation.RData")
#load(file = "./data/nfl_actualFantasyPoints_weekly.RData")
load(file = "./data/nfl_injuries.RData")
load(file = "./data/nfl_snapCounts.RData")
load(file = "./data/nfl_espnQBR_seasonal.RData")
load(file = "./data/nfl_espnQBR_weekly.RData")
load(file = "./data/nfl_nextGenStats_weekly.RData")
load(file = "./data/nfl_advancedStatsPFR_seasonal.RData")
load(file = "./data/nfl_advancedStatsPFR_weekly.RData")
load(file = "./data/nfl_playerContracts.RData")
load(file = "./data/nfl_ftnCharting.RData")
load(file = "./data/nfl_playerIDs.RData")
load(file = "./data/nfl_rankings_draft.RData")
load(file = "./data/nfl_rankings_weekly.RData")
load(file = "./data/nfl_expectedFantasyPoints_weekly.RData")
load(file = "./data/nfl_expectedFantasyPoints_pbp.RData")

# Calculated Data - Processed
load(file = "./data/nfl_actualStats_career.RData")
load(file = "./data/nfl_actualStats_seasonal.RData")
load(file = "./data/player_stats_weekly.RData")
load(file = "./data/player_stats_seasonal.RData")

19.1.3 Specify Options

Code
options(scipen = 999) # prevent scientific notation

19.2 Overview of Machine Learning

Machine learning takes us away from focusing on causal inference. Machine learning does not care about which processes are causal—i.e., which processes influence the outcome. Instead, machine learning cares about prediction—it cares about a predictor variable to the extent that it increases predictive accuracy regardless of whether it is causally related to the outcome.

Machine learning can be useful for leveraging big data and lots of predictor variable to develop predictive models with greater accuracy. However, many machine learning techniques are black boxes—it is often unclear how or why certain predictions are made, which can make it difficult to interpret the model’s decisions and understand the underlying relationships between variables. Machine learning tends to be a data-driven, atheoretical technique. This can result in overfitting. Thus, when estimating machine learning models, it is common to keep a hold-out sample for use in cross-validation to evaluate the extent of shrinkage of model coefficients. The data that the model is trained on is known as the “training data”. The data that the model was not trained on but is then is independently tested on—i.e., the hold-out sample—is the “test data”. Shrinkage occurs when predictor variables explain some random error variance in the original model. When the model is applied to an independent sample (i.e., the test data), the predictive model will likely not perform quite as well, and the regressions coefficients will tend to get smaller (i.e., shrink).

If the test data were collected as part of the same processes as the original data and were merely held out for purposes of analysis, this is called internal cross-validation. If the test data were collected separately from the original data used to train the model, this is called external cross-validation.

Most machine learning methods were developed with cross-sectional data in mind. That is, they assume that each person has only one observation on the outcome variable. However, with longitudinal data, each person has multiple observations on the outcome variable.

When performing machine learning, various approaches may help address this:

  • transform data from long to wide form, so that each person has only one row
  • when designing the training and test sets, keep all measurements from the same person in the same data object (either the training or test set); do not have some measurements from a given person in the training set and other measurements from the same person in the test set
  • use a machine learning approach that accounts for the clustered/nested nature of the data

19.3 Types of Machine Learning

There are many approaches to machine learning. This chapter discusses several key ones:

  • supervised learning
    • continuous outcome (i.e., regression)
      • linear regression
      • lasso regression
      • ridge regression
      • elastic net regression
    • categorical outcome (i.e., classification)
      • logistic regression
      • support vector machine
      • random forest
      • extreme gradient boosting
  • unsupervised learning
    • clustering
    • principal component analysis
  • semi-supervised learning
  • reinforcement learning
    • deep learning
  • ensemble

Ensemble machine learning methods combine multiple machine learning approaches with the goal that combining multiple approaches might lead to more accurate predictions that any one method might be able to achieve on its own.

19.3.1 Supervised Learning

[DEFINE SUPERVISED LEARNING]

Unlike linear and logistic regression, various machine learning techniques can handle multicollinearity, including LASSO regression, ridge regression, and elastic net regression. Least absolute shrinkage and selection operator (LASSO) regression helps perform selection of which predictor variables to keep in the model by shrinking some coefficients to zero. Ridge regression shrinks the coefficients of predictor variables toward zero, but not to zero, so it does not perform selection of which predictor variables to retain; this allows it to allow nonzero coefficients for multiple correlated predictor variables in the context of multicollinearity. Elastic net involves a combination of LASSO and ridge regression; it performs selection of which predictor variables to keep by shrinking the coefficients of some predictor variables to zero, and it shrinks the coefficients of some predictor variables toward zero, to address multicollinearity.

Unless interactions or nonlinear terms are specified, linear, logistic, LASSO, ridge, and elastic net regresstion do not account for interactions among the predictor variables or for nonlinear associations between the predictor variables and the outcome variable. By contrast, random forests and extreme gradient boosting do account for interactions among the predictor variables and for nonlinear associations between the predictor variables and the outcome variable.

19.3.2 Unsupervised Learning

[DEFINE UNSUPERVISED LEARNING]

We describe cluster analysis in Chapter 21. We describe principal component analysis in Chapter 23.

19.3.3 Semi-supervised Learning

[DEFINE SEMI-SUPERVISED LEARNING]

19.3.4 Reinforcement Learning

[DEFINE REINFORCEMENT LEARNING]

19.4 Data Processing

19.4.1 Prepare Data for Merging

Code
# Prepare data for merging
#-todo: calculate years_of_experience
## Use common name for the same (gsis_id) ID variable

#nfl_actualFantasyPoints_player_weekly <- nfl_actualFantasyPoints_player_weekly %>% 
#  rename(gsis_id = player_id)
#
#nfl_actualFantasyPoints_player_seasonal <- nfl_actualFantasyPoints_player_seasonal %>% 
#  rename(gsis_id = player_id)

player_stats_seasonal_offense <- player_stats_seasonal %>% 
  filter(position_group %in% c("QB","RB","WR","TE")) %>% 
  rename(gsis_id = player_id)

player_stats_weekly_offense <- player_stats_weekly %>% 
  filter(position_group %in% c("QB","RB","WR","TE")) %>% 
  rename(gsis_id = player_id)

nfl_expectedFantasyPoints_weekly <- nfl_expectedFantasyPoints_weekly %>% 
  rename(gsis_id = player_id)

## Rename other variables to ensure common names

## Ensure variables with the same name have the same type
nfl_players <- nfl_players %>% 
  mutate(
    birth_date = as.Date(birth_date),
    jersey_number = as.character(jersey_number),
    gsis_it_id = as.character(gsis_it_id),
    years_of_experience = as.integer(years_of_experience))

player_stats_seasonal_offense <- player_stats_seasonal_offense %>% 
  mutate(
    birth_date = as.Date(birth_date),
    jersey_number = as.character(jersey_number),
    gsis_it_id = as.character(gsis_it_id))

nfl_rosters <- nfl_rosters %>% 
  mutate(
    draft_number = as.integer(draft_number))

nfl_rosters_weekly <- nfl_rosters_weekly %>% 
  mutate(
    draft_number = as.integer(draft_number))

nfl_depthCharts <- nfl_depthCharts %>% 
  mutate(
    season = as.integer(season))

nfl_expectedFantasyPoints_weekly <- nfl_expectedFantasyPoints_weekly %>% 
  mutate(
    season = as.integer(season),
    receptions = as.integer(receptions)) %>% 
  distinct(gsis_id, season, week, .keep_all = TRUE) # drop duplicated rows

## Rename variables
nfl_draftPicks <- nfl_draftPicks %>%
  rename(
    games_career = games,
    pass_completions_career = pass_completions,
    pass_attempts_career = pass_attempts,
    pass_yards_career = pass_yards,
    pass_tds_career = pass_tds,
    pass_ints_career = pass_ints,
    rush_atts_career = rush_atts,
    rush_yards_career = rush_yards,
    rush_tds_career = rush_tds,
    receptions_career = receptions,
    rec_yards_career = rec_yards,
    rec_tds_career = rec_tds,
    def_solo_tackles_career = def_solo_tackles,
    def_ints_career = def_ints,
    def_sacks_career = def_sacks
  )

## Subset variables
nfl_expectedFantasyPoints_weekly <- nfl_expectedFantasyPoints_weekly %>% 
  select(gsis_id:position, contains("_exp"), contains("_diff"), contains("_team")) #drop "raw stats" variables (e.g., rec_yards_gained) so they don't get coalesced with actual stats

# Check duplicate ids
player_stats_seasonal_offense %>% 
  group_by(gsis_id, season) %>% 
  filter(n() > 1) %>% 
  head()
Code
nfl_advancedStatsPFR_seasonal %>% 
  group_by(gsis_id, season) %>% 
  filter(n() > 1, !is.na(gsis_id)) %>% 
  select(gsis_id, pfr_id, season, team, everything()) %>% 
  head()

Identify objects with shared variable names:

Code
dplyr::intersect(
  names(nfl_players),
  names(nfl_draftPicks))
[1] "gsis_id"  "position"
Code
length(na.omit(nfl_players$position)) # use by default (more cases)
[1] 21360
Code
length(na.omit(nfl_draftPicks$position))
[1] 2855
Code
dplyr::intersect(
  names(player_stats_seasonal_offense),
  names(nfl_advancedStatsPFR_seasonal))
[1] "gsis_id" "season"  "team"    "age"    
Code
length(na.omit(player_stats_seasonal_offense$season)) # use by default (more cases)
[1] 14859
Code
length(na.omit(nfl_advancedStatsPFR_seasonal$season))
[1] 10395
Code
length(na.omit(player_stats_seasonal_offense$team)) # use by default (more cases)
[1] 14858
Code
length(na.omit(nfl_advancedStatsPFR_seasonal$team))
[1] 10395
Code
length(na.omit(player_stats_seasonal_offense$age)) # use by default (more cases)
[1] 14859
Code
length(na.omit(nfl_advancedStatsPFR_seasonal$age))
[1] 10325
Code
dplyr::intersect(
  names(nfl_rosters_weekly),
  names(nfl_expectedFantasyPoints_weekly))
[1] "gsis_id"   "season"    "week"      "position"  "full_name"
Code
length(na.omit(nfl_rosters_weekly$season)) # use by default (more cases)
[1] 845134
Code
length(na.omit(nfl_expectedFantasyPoints_weekly$season))
[1] 100272
Code
length(na.omit(nfl_rosters_weekly$week)) # use by default (more cases)
[1] 841942
Code
length(na.omit(nfl_expectedFantasyPoints_weekly$week))
[1] 100272
Code
length(na.omit(nfl_rosters_weekly$position)) # use by default (more cases)
[1] 845101
Code
length(na.omit(nfl_expectedFantasyPoints_weekly$position))
[1] 97815
Code
length(na.omit(nfl_rosters_weekly$full_name)) # use by default (more cases)
[1] 845118
Code
length(na.omit(nfl_expectedFantasyPoints_weekly$full_name))
[1] 97815

19.4.2 Merge Data

To merge data, we use the powerjoin package (Fabri, 2022):

Code
# Create lists of objects to merge, depending on data structure: id; or id-season; or id-season-week
#-todo: remove redundant variables
playerListToMerge <- list(
  nfl_players %>% filter(!is.na(gsis_id)),
  nfl_draftPicks %>% filter(!is.na(gsis_id)) %>% select(-season)
)

playerSeasonListToMerge <- list(
  player_stats_seasonal_offense %>% filter(!is.na(gsis_id), !is.na(season)),
  nfl_advancedStatsPFR_seasonal %>% filter(!is.na(gsis_id), !is.na(season))
)

playerSeasonWeekListToMerge <- list(
  nfl_rosters_weekly %>% filter(!is.na(gsis_id), !is.na(season), !is.na(week)),
  #nfl_actualStats_offense_weekly,
  nfl_expectedFantasyPoints_weekly %>% filter(!is.na(gsis_id), !is.na(season), !is.na(week))
  #nfl_advancedStatsPFR_weekly,
)

playerSeasonWeekPositionListToMerge <- list(
  nfl_depthCharts %>% filter(!is.na(gsis_id), !is.na(season), !is.na(week))
)

# Merge data
playerMerged <- playerListToMerge %>% 
  reduce(
    powerjoin::power_full_join,
    by = c("gsis_id"),
    conflict = coalesce_xy) # where the objects have the same variable name (e.g., position), keep the values from object 1, unless it's NA, in which case use the relevant value from object 2

playerSeasonMerged <- playerSeasonListToMerge %>% 
  reduce(
    powerjoin::power_full_join,
    by = c("gsis_id","season"),
    conflict = coalesce_xy) # where the objects have the same variable name (e.g., team), keep the values from object 1, unless it's NA, in which case use the relevant value from object 2

playerSeasonWeekMerged <- playerSeasonWeekListToMerge %>% 
  reduce(
    powerjoin::power_full_join,
    by = c("gsis_id","season","week"),
    conflict = coalesce_xy) # where the objects have the same variable name (e.g., position), keep the values from object 1, unless it's NA, in which case use the relevant value from object 2

Identify objects with shared variable names:

Code
dplyr::intersect(
  names(playerSeasonMerged),
  names(playerMerged))
 [1] "gsis_id"                  "position"                
 [3] "position_group"           "first_name"              
 [5] "last_name"                "esb_id"                  
 [7] "display_name"             "rookie_year"             
 [9] "college_conference"       "current_team_id"         
[11] "draft_club"               "draft_number"            
[13] "draftround"               "entry_year"              
[15] "football_name"            "gsis_it_id"              
[17] "headshot"                 "jersey_number"           
[19] "short_name"               "smart_id"                
[21] "status"                   "status_description_abbr" 
[23] "status_short_description" "uniform_number"          
[25] "height"                   "weight"                  
[27] "college_name"             "birth_date"              
[29] "suffix"                   "years_of_experience"     
[31] "pfr_player_name"          "team"                    
[33] "age"                     
Code
seasonalData <- powerjoin::power_full_join(
  playerSeasonMerged,
  playerMerged %>% select(-age, -years_of_experience, -team, -team_abbr, -team_seq, -current_team_id), # drop variables from id objects that change from year to year (and thus are not necessarily accurate for a given season)
  by = "gsis_id",
  conflict = coalesce_xy # where the objects have the same variable name (e.g., position), keep the values from object 1, unless it's NA, in which case use the relevant value from object 2
) %>% 
  filter(!is.na(season)) %>% 
  select(gsis_id, season, player_display_name, position, team, games, everything())
Code
dplyr::intersect(
  names(playerSeasonWeekMerged),
  names(seasonalData))
 [1] "gsis_id"                 "season"                 
 [3] "week"                    "team"                   
 [5] "jersey_number"           "status"                 
 [7] "first_name"              "last_name"              
 [9] "birth_date"              "height"                 
[11] "weight"                  "college"                
[13] "pfr_id"                  "headshot_url"           
[15] "status_description_abbr" "football_name"          
[17] "esb_id"                  "gsis_it_id"             
[19] "smart_id"                "entry_year"             
[21] "rookie_year"             "draft_club"             
[23] "draft_number"            "position"               
Code
seasonalAndWeeklyData <- powerjoin::power_full_join(
  playerSeasonWeekMerged,
  seasonalData,
  by = c("gsis_id","season"),
  conflict = coalesce_xy # where the objects have the same variable name (e.g., position), keep the values from object 1, unless it's NA, in which case use the relevant value from object 2
) %>% 
  filter(!is.na(week)) %>% 
  select(gsis_id, season, week, full_name, position, team, everything())
Code
# Duplicate cases
seasonalData %>% 
  group_by(gsis_id, season) %>% 
  filter(n() > 1) %>% 
  head()
Code
seasonalAndWeeklyData %>% 
  group_by(gsis_id, season, week) %>% 
  filter(n() > 1) %>% 
  head()

19.4.3 Additional Processing

Code
# Convert character and logical variables to factors
seasonalData <- seasonalData %>% 
  mutate(
    across(
      where(is.character),
      as.factor
    ),
    across(
      where(is.logical),
      as.factor
    )
  )

19.4.4 Fill in Missing Data for Static Variables

Code
seasonalData <- seasonalData %>% 
  arrange(gsis_id, season) %>% 
  group_by(gsis_id) %>% 
  fill(
    player_name, player_display_name, pos, position, position_group,
    .direction = "downup") %>% 
  ungroup()

19.4.5 Create New Data Object for Merging with Later Predictions

Code
newData_seasonal <- seasonalData %>% 
  filter(season == max(season, na.rm = TRUE))

19.4.6 Lag Fantasy Points

Code
seasonalData_lag <- seasonalData %>% 
  arrange(gsis_id, season) %>% 
  group_by(gsis_id) %>% 
  mutate(
    fantasyPoints_lag = lead(fantasyPoints)
  ) %>% 
  ungroup()

seasonalData_lag %>% 
  select(gsis_id, player_display_name, season, fantasyPoints, fantasyPoints_lag) # verify that lagging worked as expected

19.4.7 Subset to Predictor Variables and Outcome Variable

Code
seasonalData_lag %>% select_if(~class(.) == "Date")
Code
seasonalData_lag %>% select_if(is.character)
Code
seasonalData_lag %>% select_if(is.factor)
Code
seasonalData_lag %>% select_if(is.logical)
Code
dropVars <- c(
  "birth_date", "loaded", "full_name", "player_name", "player_display_name", "display_name", "suffix", "headshot_url", "player", "pos",
  "espn_id", "sportradar_id", "yahoo_id", "rotowire_id", "pff_id", "fantasy_data_id", "sleeper_id", "pfr_id",
  "pfr_player_id", "cfb_player_id", "pfr_player_name", "esb_id", "gsis_it_id", "smart_id",
  "college", "college_name", "team_abbr", "current_team_id", "college_conference", "draft_club", "status_description_abbr",
  "status_short_description", "short_name", "headshot", "uniform_number", "jersey_number", "first_name", "last_name",
  "football_name", "team")

seasonalData_lag_subset <- seasonalData_lag %>% 
  dplyr::select(-any_of(dropVars))

19.4.8 Separate by Position

Code
seasonalData_lag_subsetQB <- seasonalData_lag_subset %>% 
  filter(position == "QB") %>% 
  select(
    gsis_id, season, games, gs, years_of_experience, age, ageCentered20, ageCentered20Quadratic,
    height, weight, rookie_year, draft_number,
    fantasy_points, fantasy_points_ppr, fantasyPoints, fantasyPoints_lag,
    completions:rushing_2pt_conversions, special_teams_tds, contains(".pass"), contains(".rush"))

seasonalData_lag_subsetRB <- seasonalData_lag_subset %>% 
  filter(position == "RB") %>% 
  select(
    gsis_id, season, games, gs, years_of_experience, age, ageCentered20, ageCentered20Quadratic,
    height, weight, rookie_year, draft_number,
    fantasy_points, fantasy_points_ppr, fantasyPoints, fantasyPoints_lag,
    carries:special_teams_tds, contains(".rush"), contains(".rec"))

seasonalData_lag_subsetWR <- seasonalData_lag_subset %>% 
  filter(position == "WR") %>% 
  select(
    gsis_id, season, games, gs, years_of_experience, age, ageCentered20, ageCentered20Quadratic,
    height, weight, rookie_year, draft_number,
    fantasy_points, fantasy_points_ppr, fantasyPoints, fantasyPoints_lag,
    carries:special_teams_tds, contains(".rush"), contains(".rec"))

seasonalData_lag_subsetTE <- seasonalData_lag_subset %>% 
  filter(position == "TE") %>% 
  select(
    gsis_id, season, games, gs, years_of_experience, age, ageCentered20, ageCentered20Quadratic,
    height, weight, rookie_year, draft_number,
    fantasy_points, fantasy_points_ppr, fantasyPoints, fantasyPoints_lag,
    carries:special_teams_tds, contains(".rush"), contains(".rec"))

19.4.9 Split into Test and Training Data

Code
seasonalData_lag_qb_all <- seasonalData_lag_subsetQB
seasonalData_lag_rb_all <- seasonalData_lag_subsetRB
seasonalData_lag_wr_all <- seasonalData_lag_subsetWR
seasonalData_lag_te_all <- seasonalData_lag_subsetTE

set.seed(52242) # for reproducibility (to keep the same train/holdout players)

activeQBs <- unique(seasonalData_lag_qb_all$gsis_id[which(seasonalData_lag_qb_all$season == max(seasonalData_lag_qb_all$season, na.rm = TRUE))])
retiredQBs <- unique(seasonalData_lag_qb_all$gsis_id[which(seasonalData_lag_qb_all$gsis_id %ni% activeQBs)])
numQBs <- length(unique(seasonalData_lag_qb_all$gsis_id))
qbHoldoutIDs <- sample(retiredQBs, size = ceiling(.2 * numQBs)) # holdout 20% of players

activeRBs <- unique(seasonalData_lag_rb_all$gsis_id[which(seasonalData_lag_rb_all$season == max(seasonalData_lag_rb_all$season, na.rm = TRUE))])
retiredRBs <- unique(seasonalData_lag_rb_all$gsis_id[which(seasonalData_lag_rb_all$gsis_id %ni% activeRBs)])
numRBs <- length(unique(seasonalData_lag_rb_all$gsis_id))
rbHoldoutIDs <- sample(retiredRBs, size = ceiling(.2 * numRBs)) # holdout 20% of players

set.seed(52242) # for reproducibility (to keep the same train/holdout players); added here to prevent a downstream error with predict.missRanger() due to missingness; this suggests that an error can arise from including a player in the holdout sample who has missingness in particular variables; would be good to identify which player(s) in the holdout sample evoke that error to identify the kinds of missingness that yield the error

activeWRs <- unique(seasonalData_lag_wr_all$gsis_id[which(seasonalData_lag_wr_all$season == max(seasonalData_lag_wr_all$season, na.rm = TRUE))])
retiredWRs <- unique(seasonalData_lag_wr_all$gsis_id[which(seasonalData_lag_wr_all$gsis_id %ni% activeWRs)])
numWRs <- length(unique(seasonalData_lag_wr_all$gsis_id))
wrHoldoutIDs <- sample(retiredWRs, size = ceiling(.2 * numWRs)) # holdout 20% of players

activeTEs <- unique(seasonalData_lag_te_all$gsis_id[which(seasonalData_lag_te_all$season == max(seasonalData_lag_te_all$season, na.rm = TRUE))])
retiredTEs <- unique(seasonalData_lag_te_all$gsis_id[which(seasonalData_lag_te_all$gsis_id %ni% activeTEs)])
numTEs <- length(unique(seasonalData_lag_te_all$gsis_id))
teHoldoutIDs <- sample(retiredTEs, size = ceiling(.2 * numTEs)) # holdout 20% of players
  
seasonalData_lag_qb_train <- seasonalData_lag_qb_all %>% 
  filter(gsis_id %ni% qbHoldoutIDs)
seasonalData_lag_qb_test <- seasonalData_lag_qb_all %>% 
  filter(gsis_id %in% qbHoldoutIDs)

seasonalData_lag_rb_train <- seasonalData_lag_rb_all %>% 
  filter(gsis_id %ni% rbHoldoutIDs)
seasonalData_lag_rb_test <- seasonalData_lag_rb_all %>% 
  filter(gsis_id %in% rbHoldoutIDs)

seasonalData_lag_wr_train <- seasonalData_lag_wr_all %>% 
  filter(gsis_id %ni% wrHoldoutIDs)
seasonalData_lag_wr_test <- seasonalData_lag_wr_all %>% 
  filter(gsis_id %in% wrHoldoutIDs)

seasonalData_lag_te_train <- seasonalData_lag_te_all %>% 
  filter(gsis_id %ni% teHoldoutIDs)
seasonalData_lag_te_test <- seasonalData_lag_te_all %>% 
  filter(gsis_id %in% teHoldoutIDs)

19.4.10 Impute the Missing Data

Here is a vignette demonstrating how to impute missing data using missForest(): https://rpubs.com/lmorgan95/MissForest (archived at: https://perma.cc/6GB4-2E22). Below, we impute the training data (and all data) separately by position. We then use the imputed training data to make out-of-sample predictions to fill in the missing data for the testing data. We do not want to impute the training and testing data together so that we can keep them separate for the purposes of cross-validation. However, we impute all data (training and test data together) for purposes of making out-of-sample predictions from the machine learning models to predict players’ performance next season (when actuals are not yet available for evaluating their accuracy). To impute data, we use the missRanger package (Mayer, 2024).

Note 19.1: Impute missing data for machine learning

Note: the following code takes a while to run.

Code
# QBs
seasonalData_lag_qb_all_imp <- missRanger::missRanger(
  seasonalData_lag_qb_all,
  pmm.k = 5,
  verbose = 2,
  seed = 52242,
  keep_forests = TRUE)

Variables to impute:        fantasy_points, fantasy_points_ppr, special_teams_tds, passing_epa, pacr, rushing_epa, fantasyPoints_lag, passing_cpoe, rookie_year, draft_number, gs, pass_attempts.pass, throwaways.pass, spikes.pass, drops.pass, bad_throws.pass, times_blitzed.pass, times_hurried.pass, times_hit.pass, times_pressured.pass, batted_balls.pass, on_tgt_throws.pass, rpo_plays.pass, rpo_yards.pass, rpo_pass_att.pass, rpo_pass_yards.pass, rpo_rush_att.pass, rpo_rush_yards.pass, pa_pass_att.pass, pa_pass_yards.pass, att.rush, yds.rush, td.rush, x1d.rush, ybc.rush, yac.rush, brk_tkl.rush, att_br.rush, drop_pct.pass, bad_throw_pct.pass, on_tgt_pct.pass, pressure_pct.pass, ybc_att.rush, yac_att.rush, pocket_time.pass
Variables used to impute:   gsis_id, season, games, gs, years_of_experience, age, ageCentered20, ageCentered20Quadratic, height, weight, rookie_year, draft_number, fantasy_points, fantasy_points_ppr, fantasyPoints, fantasyPoints_lag, completions, attempts, passing_yards, passing_tds, passing_interceptions, sacks_suffered, sack_yards_lost, sack_fumbles, sack_fumbles_lost, passing_air_yards, passing_yards_after_catch, passing_first_downs, passing_epa, passing_cpoe, passing_2pt_conversions, pacr, carries, rushing_yards, rushing_tds, rushing_fumbles, rushing_fumbles_lost, rushing_first_downs, rushing_epa, rushing_2pt_conversions, special_teams_tds, pocket_time.pass, pass_attempts.pass, throwaways.pass, spikes.pass, drops.pass, bad_throws.pass, times_blitzed.pass, times_hurried.pass, times_hit.pass, times_pressured.pass, batted_balls.pass, on_tgt_throws.pass, rpo_plays.pass, rpo_yards.pass, rpo_pass_att.pass, rpo_pass_yards.pass, rpo_rush_att.pass, rpo_rush_yards.pass, pa_pass_att.pass, pa_pass_yards.pass, drop_pct.pass, bad_throw_pct.pass, on_tgt_pct.pass, pressure_pct.pass, ybc_att.rush, yac_att.rush, att.rush, yds.rush, td.rush, x1d.rush, ybc.rush, yac.rush, brk_tkl.rush, att_br.rush

    fntsy_  fnts__  spcl__  pssng_p pacr    rshng_  fntsP_  pssng_c rok_yr  drft_n  gs  pss_t.  thrww.  spks.p  drps.p  bd_th.  tms_b.  tms_hr. tms_ht. tms_p.  bttd_.  on_tgt_t.   rp_pl.  rp_yr.  rp_pss_t.   rp_pss_y.   rp_rsh_t.   rp_rsh_y.   p_pss_t.    p_pss_y.    att.rs  yds.rs  td.rsh  x1d.rs  ybc.rs  yc.rsh  brk_t.  att_b.  drp_p.  bd_t_.  on_tgt_p.   prss_.  ybc_t.  yc_tt.  pckt_.
iter 1: 0.0054  0.0024  0.7924  0.1919  0.7612  0.3628  0.4789  0.4133  0.0224  0.5216  0.0271  0.0134  0.3024  0.7659  0.1304  0.0541  0.0758  0.1759  0.1820  0.0370  0.3238  0.0291  0.2952  0.1812  0.0885  0.0867  0.2627  0.2563  0.1093  0.0902  0.0580  0.0645  0.1732  0.0524  0.0578  0.1795  0.3524  0.3428  0.7447  0.5158  0.0824  0.6803  0.3529  0.5758  0.8111  
iter 2: 0.0044  0.0048  0.8304  0.2002  0.7926  0.3736  0.4801  0.4289  0.0488  0.6139  0.0188  0.0090  0.2883  0.7481  0.0764  0.0385  0.0718  0.1231  0.1329  0.0337  0.2760  0.0113  0.0548  0.0814  0.0765  0.0990  0.1989  0.2841  0.0707  0.0952  0.0396  0.0386  0.1606  0.0492  0.0525  0.1220  0.2541  0.3556  0.7468  0.4937  0.0827  0.6610  0.3465  0.5796  0.8134  
iter 3: 0.0049  0.0046  0.8690  0.1986  0.7810  0.3641  0.4774  0.4360  0.0528  0.6123  0.0188  0.0088  0.2867  0.7538  0.0767  0.0393  0.0734  0.1261  0.1374  0.0343  0.2741  0.0119  0.0524  0.0816  0.0748  0.1008  0.2184  0.2811  0.0691  0.0926  0.0389  0.0413  0.1640  0.0511  0.0585  0.1255  0.2510  0.3609  0.7477  0.5108  0.0858  0.6426  0.3588  0.5734  0.8300  
Code
seasonalData_lag_qb_all_imp
missRanger object. Extract imputed data via $data
- best iteration: 2 
- best average OOB imputation error: 0.2524825 
Code
data_all_qb <- seasonalData_lag_qb_all_imp$data
data_all_qb$fantasyPointsMC_lag <- scale(data_all_qb$fantasyPoints_lag, scale = FALSE) # mean-centered
data_all_qb_matrix <- data_all_qb %>%
  mutate(across(where(is.factor), ~ as.numeric(as.integer(.)))) %>% 
  as.matrix()
newData_qb <- data_all_qb %>% 
  filter(season == max(season, na.rm = TRUE)) %>% 
  select(-fantasyPoints_lag, -fantasyPointsMC_lag)
newData_qb_matrix <- data_all_qb_matrix[
  data_all_qb_matrix[, "season"] == max(data_all_qb_matrix[, "season"], na.rm = TRUE), # keep only rows with the most recent season
  , # all columns
  drop = FALSE]

dropCol_qb <- which(colnames(newData_qb_matrix) %in% c("fantasyPoints_lag","fantasyPointsMC_lag"))
newData_qb_matrix <- newData_qb_matrix[, -dropCol_qb, drop = FALSE]

seasonalData_lag_qb_train_imp <- missRanger::missRanger(
  seasonalData_lag_qb_train,
  pmm.k = 5,
  verbose = 2,
  seed = 52242,
  keep_forests = TRUE)

Variables to impute:        fantasy_points, fantasy_points_ppr, special_teams_tds, passing_epa, pacr, rushing_epa, fantasyPoints_lag, passing_cpoe, rookie_year, draft_number, gs, pass_attempts.pass, throwaways.pass, spikes.pass, drops.pass, bad_throws.pass, times_blitzed.pass, times_hurried.pass, times_hit.pass, times_pressured.pass, batted_balls.pass, on_tgt_throws.pass, rpo_plays.pass, rpo_yards.pass, rpo_pass_att.pass, rpo_pass_yards.pass, rpo_rush_att.pass, rpo_rush_yards.pass, pa_pass_att.pass, pa_pass_yards.pass, att.rush, yds.rush, td.rush, x1d.rush, ybc.rush, yac.rush, brk_tkl.rush, att_br.rush, drop_pct.pass, bad_throw_pct.pass, on_tgt_pct.pass, pressure_pct.pass, ybc_att.rush, yac_att.rush, pocket_time.pass
Variables used to impute:   gsis_id, season, games, gs, years_of_experience, age, ageCentered20, ageCentered20Quadratic, height, weight, rookie_year, draft_number, fantasy_points, fantasy_points_ppr, fantasyPoints, fantasyPoints_lag, completions, attempts, passing_yards, passing_tds, passing_interceptions, sacks_suffered, sack_yards_lost, sack_fumbles, sack_fumbles_lost, passing_air_yards, passing_yards_after_catch, passing_first_downs, passing_epa, passing_cpoe, passing_2pt_conversions, pacr, carries, rushing_yards, rushing_tds, rushing_fumbles, rushing_fumbles_lost, rushing_first_downs, rushing_epa, rushing_2pt_conversions, special_teams_tds, pocket_time.pass, pass_attempts.pass, throwaways.pass, spikes.pass, drops.pass, bad_throws.pass, times_blitzed.pass, times_hurried.pass, times_hit.pass, times_pressured.pass, batted_balls.pass, on_tgt_throws.pass, rpo_plays.pass, rpo_yards.pass, rpo_pass_att.pass, rpo_pass_yards.pass, rpo_rush_att.pass, rpo_rush_yards.pass, pa_pass_att.pass, pa_pass_yards.pass, drop_pct.pass, bad_throw_pct.pass, on_tgt_pct.pass, pressure_pct.pass, ybc_att.rush, yac_att.rush, att.rush, yds.rush, td.rush, x1d.rush, ybc.rush, yac.rush, brk_tkl.rush, att_br.rush

    fntsy_  fnts__  spcl__  pssng_p pacr    rshng_  fntsP_  pssng_c rok_yr  drft_n  gs  pss_t.  thrww.  spks.p  drps.p  bd_th.  tms_b.  tms_hr. tms_ht. tms_p.  bttd_.  on_tgt_t.   rp_pl.  rp_yr.  rp_pss_t.   rp_pss_y.   rp_rsh_t.   rp_rsh_y.   p_pss_t.    p_pss_y.    att.rs  yds.rs  td.rsh  x1d.rs  ybc.rs  yc.rsh  brk_t.  att_b.  drp_p.  bd_t_.  on_tgt_p.   prss_.  ybc_t.  yc_tt.  pckt_.
iter 1: 0.0061  0.0028  0.8162  0.1897  0.5083  0.3633  0.4726  0.4456  0.0242  0.4723  0.0283  0.0141  0.2939  0.7728  0.1343  0.0558  0.0744  0.1757  0.1818  0.0381  0.3288  0.0351  0.2921  0.1846  0.0860  0.0894  0.2737  0.2661  0.1127  0.0900  0.0586  0.0644  0.1800  0.0574  0.0639  0.1792  0.3570  0.3486  0.7646  0.5313  0.0868  0.7084  0.3533  0.5933  0.8466  
iter 2: 0.0052  0.0052  0.8304  0.1937  0.5621  0.3715  0.4614  0.4586  0.0505  0.5647  0.0192  0.0092  0.2953  0.7530  0.0800  0.0393  0.0725  0.1170  0.1355  0.0343  0.2771  0.0121  0.0555  0.0731  0.0713  0.0979  0.2073  0.2943  0.0698  0.0911  0.0416  0.0399  0.1683  0.0527  0.0577  0.1262  0.2474  0.3582  0.7719  0.5165  0.0900  0.6862  0.3642  0.5926  0.8400  
iter 3: 0.0053  0.0051  0.8261  0.2008  0.5551  0.3571  0.4727  0.4410  0.0551  0.5658  0.0188  0.0092  0.2859  0.7460  0.0807  0.0402  0.0739  0.1202  0.1393  0.0351  0.2808  0.0114  0.0595  0.0705  0.0775  0.1051  0.2163  0.2935  0.0718  0.0921  0.0426  0.0400  0.1719  0.0535  0.0534  0.1225  0.2498  0.3484  0.7502  0.5100  0.0884  0.6609  0.3672  0.5852  0.8440  
iter 4: 0.0054  0.0051  0.6928  0.1979  0.5598  0.3732  0.4771  0.4349  0.0506  0.5691  0.0189  0.0085  0.2891  0.7456  0.0785  0.0395  0.0737  0.1210  0.1353  0.0335  0.2836  0.0117  0.0566  0.0778  0.0743  0.1055  0.2131  0.2964  0.0697  0.0912  0.0396  0.0395  0.1611  0.0531  0.0597  0.1258  0.2600  0.3560  0.8062  0.5032  0.0973  0.6739  0.3698  0.5875  0.8485  
iter 5: 0.0052  0.0055  0.8355  0.1965  0.5664  0.3710  0.4743  0.4604  0.0520  0.5598  0.0193  0.0091  0.2852  0.7474  0.0800  0.0405  0.0722  0.1213  0.1366  0.0344  0.2788  0.0118  0.0555  0.0756  0.0746  0.0986  0.2190  0.2765  0.0695  0.0932  0.0390  0.0425  0.1650  0.0509  0.0576  0.1305  0.2556  0.3509  0.7738  0.5051  0.0969  0.6902  0.3640  0.6007  0.8326  
Code
seasonalData_lag_qb_train_imp
missRanger object. Extract imputed data via $data
- best iteration: 4 
- best average OOB imputation error: 0.2482278 
Code
data_train_qb <- seasonalData_lag_qb_train_imp$data
data_train_qb$fantasyPointsMC_lag <- scale(data_train_qb$fantasyPoints_lag, scale = FALSE) # mean-centered
data_train_qb_matrix <- data_train_qb %>%
  mutate(across(where(is.factor), ~ as.numeric(as.integer(.)))) %>% 
  as.matrix()

seasonalData_lag_qb_test_imp <- predict(
  object = seasonalData_lag_qb_train_imp,
  newdata = seasonalData_lag_qb_test,
  seed = 52242)

data_test_qb <- seasonalData_lag_qb_test_imp
data_test_qb_matrix <- data_test_qb %>%
  mutate(across(where(is.factor), ~ as.numeric(as.integer(.)))) %>% 
  as.matrix()
Code
# RBs
seasonalData_lag_rb_all_imp <- missRanger::missRanger(
  seasonalData_lag_rb_all,
  pmm.k = 5,
  verbose = 2,
  seed = 52242,
  keep_forests = TRUE)

Variables to impute:        games, ageCentered20, ageCentered20Quadratic, fantasy_points, fantasy_points_ppr, fantasyPoints, carries, rushing_yards, rushing_tds, rushing_fumbles, rushing_fumbles_lost, rushing_first_downs, rushing_2pt_conversions, receptions, targets, receiving_yards, receiving_tds, receiving_fumbles, receiving_fumbles_lost, receiving_air_yards, receiving_yards_after_catch, receiving_first_downs, receiving_2pt_conversions, special_teams_tds, years_of_experience, rushing_epa, air_yards_share, receiving_epa, racr, target_share, wopr, fantasyPoints_lag, rookie_year, draft_number, gs, att.rush, yds.rush, td.rush, x1d.rush, ybc.rush, yac.rush, brk_tkl.rush, att_br.rush, tgt.rec, rec.rec, yds.rec, td.rec, x1d.rec, ybc.rec, yac.rec, brk_tkl.rec, drop.rec, int.rec, ybc_att.rush, yac_att.rush, adot.rec, rat.rec, drop_percent.rec, rec_br.rec, ybc_r.rec, yac_r.rec
Variables used to impute:   gsis_id, season, games, gs, years_of_experience, age, ageCentered20, ageCentered20Quadratic, height, weight, rookie_year, draft_number, fantasy_points, fantasy_points_ppr, fantasyPoints, fantasyPoints_lag, carries, rushing_yards, rushing_tds, rushing_fumbles, rushing_fumbles_lost, rushing_first_downs, rushing_epa, rushing_2pt_conversions, receptions, targets, receiving_yards, receiving_tds, receiving_fumbles, receiving_fumbles_lost, receiving_air_yards, receiving_yards_after_catch, receiving_first_downs, receiving_epa, receiving_2pt_conversions, racr, target_share, air_yards_share, wopr, special_teams_tds, ybc_att.rush, yac_att.rush, att.rush, yds.rush, td.rush, x1d.rush, ybc.rush, yac.rush, brk_tkl.rush, att_br.rush, ybc_r.rec, yac_r.rec, adot.rec, rat.rec, tgt.rec, rec.rec, yds.rec, td.rec, x1d.rec, ybc.rec, yac.rec, brk_tkl.rec, drop.rec, int.rec, drop_percent.rec, rec_br.rec

    games   agCn20  agC20Q  fntsy_  fnts__  fntsyP  carris  rshng_y rshng_t rshng_f rshng_fm_   rshng_fr_   rsh_2_  rcptns  targts  rcvng_y rcvng_t rcvng_f rcvng_fm_   rcvng_r_    rcv___  rcvng_fr_   rcv_2_  spcl__  yrs_f_  rshng_p ar_yr_  rcvng_p racr    trgt_s  wopr    fntsP_  rok_yr  drft_n  gs  att.rs  yds.rs  td.rsh  x1d.rs  ybc.rs  yc.rsh  brk_tkl.rs  att_b.  tgt.rc  rec.rc  yds.rc  td.rec  x1d.rc  ybc.rc  yac.rc  brk_tkl.rc  drp.rc  int.rc  ybc_t.  yc_tt.  adt.rc  rat.rc  drp_p.  rc_br.  ybc_r.  yc_r.r
iter 1: 0.8865  0.0057  0.0031  0.4544  0.0178  0.0032  0.0745  0.0233  0.1462  0.4895  0.2594  0.0295  0.9849  0.0690  0.0666  0.0534  0.4327  0.8626  0.4824  0.6841  0.0322  0.0614  1.0171  0.8263  0.1817  0.4512  0.3321  0.3894  0.5211  0.4549  0.1817  0.5440  0.0197  0.5999  0.1700  0.0244  0.0222  0.0792  0.0297  0.0527  0.0520  0.2134  0.3431  0.0252  0.0180  0.0257  0.1634  0.0437  0.3108  0.0217  0.3925  0.4610  0.6941  0.4880  0.5402  0.2670  0.2026  0.3482  0.1596  0.2698  0.3637  
iter 2: 0.2755  0.0162  0.0207  0.0063  0.0037  0.0044  0.0161  0.0148  0.0912  0.2524  0.2898  0.0248  0.9832  0.0273  0.0444  0.0233  0.2065  0.4600  0.4891  0.1285  0.0332  0.0457  1.0175  0.8566  0.1824  0.4212  0.2329  0.3099  0.5605  0.2569  0.1742  0.5373  0.0424  0.6377  0.1653  0.0167  0.0123  0.0859  0.0302  0.0367  0.0349  0.1030  0.3689  0.0144  0.0159  0.0195  0.1403  0.0434  0.1549  0.0190  0.3840  0.1050  0.5453  0.4882  0.5616  0.2472  0.1953  0.1525  0.1687  0.2595  0.3619  
iter 3: 0.2744  0.0163  0.0231  0.0062  0.0038  0.0047  0.0152  0.0137  0.0980  0.2601  0.2906  0.0244  0.9800  0.0265  0.0347  0.0231  0.2101  0.4638  0.4954  0.1284  0.0283  0.0458  1.0114  0.8731  0.1818  0.4117  0.2278  0.3037  0.5699  0.2052  0.1800  0.5389  0.0400  0.6423  0.1624  0.0166  0.0124  0.0893  0.0306  0.0374  0.0356  0.1074  0.3628  0.0144  0.0163  0.0187  0.1390  0.0463  0.1583  0.0190  0.3882  0.1062  0.5642  0.4796  0.5570  0.2380  0.1935  0.1586  0.1588  0.2625  0.3648  
iter 4: 0.2776  0.0169  0.0220  0.0063  0.0038  0.0045  0.0151  0.0138  0.0979  0.2584  0.2846  0.0243  0.9782  0.0263  0.0281  0.0221  0.1968  0.4594  0.4817  0.1267  0.0290  0.0462  1.0104  0.8614  0.1854  0.4216  0.2333  0.3004  0.5467  0.1917  0.1815  0.5353  0.0443  0.6503  0.1657  0.0166  0.0121  0.0905  0.0313  0.0378  0.0357  0.1041  0.3437  0.0155  0.0159  0.0185  0.1405  0.0441  0.1613  0.0196  0.3816  0.1117  0.5682  0.5011  0.5585  0.2421  0.1975  0.1520  0.1770  0.2650  0.3647  
iter 5: 0.2752  0.0163  0.0226  0.0063  0.0038  0.0045  0.0158  0.0138  0.1015  0.2614  0.2857  0.0242  0.9740  0.0250  0.0303  0.0218  0.2004  0.4607  0.4810  0.1167  0.0285  0.0449  1.0077  0.8658  0.1835  0.4182  0.2170  0.2995  0.5690  0.2010  0.1794  0.5375  0.0385  0.6487  0.1652  0.0166  0.0124  0.0878  0.0306  0.0368  0.0353  0.1069  0.3539  0.0154  0.0159  0.0193  0.1409  0.0447  0.1598  0.0205  0.3873  0.1062  0.5583  0.4895  0.5501  0.2418  0.1979  0.1713  0.1726  0.2625  0.3596  
iter 6: 0.2760  0.0158  0.0223  0.0063  0.0037  0.0046  0.0150  0.0144  0.0982  0.2568  0.2816  0.0238  0.9810  0.0253  0.0273  0.0223  0.2141  0.4606  0.4881  0.1386  0.0300  0.0457  1.0174  0.8605  0.1821  0.4188  0.2263  0.2985  0.5497  0.1779  0.1536  0.5388  0.0389  0.6422  0.1668  0.0162  0.0119  0.0897  0.0305  0.0376  0.0356  0.1066  0.3529  0.0149  0.0159  0.0196  0.1446  0.0450  0.1585  0.0197  0.3857  0.1001  0.5607  0.4948  0.5478  0.2487  0.1945  0.1438  0.1543  0.2568  0.3612  
iter 7: 0.2748  0.0158  0.0212  0.0064  0.0039  0.0047  0.0149  0.0141  0.0986  0.2611  0.2877  0.0241  0.9755  0.0253  0.0310  0.0223  0.2163  0.4553  0.4885  0.1335  0.0293  0.0456  1.0096  0.8575  0.1821  0.4236  0.2203  0.2998  0.5510  0.2107  0.1797  0.5354  0.0416  0.6395  0.1646  0.0166  0.0117  0.0895  0.0310  0.0371  0.0361  0.1073  0.3547  0.0154  0.0156  0.0193  0.1410  0.0449  0.1628  0.0201  0.3892  0.1076  0.5609  0.4946  0.5643  0.2392  0.1899  0.1540  0.1423  0.2667  0.3603  
Code
seasonalData_lag_rb_all_imp
missRanger object. Extract imputed data via $data
- best iteration: 6 
- best average OOB imputation error: 0.2175486 
Code
data_all_rb <- seasonalData_lag_rb_all_imp$data
data_all_rb$fantasyPointsMC_lag <- scale(data_all_rb$fantasyPoints_lag, scale = FALSE) # mean-centered
data_all_rb_matrix <- data_all_rb %>%
  mutate(across(where(is.factor), ~ as.numeric(as.integer(.)))) %>% 
  as.matrix()
newData_rb <- data_all_rb %>% 
  filter(season == max(season, na.rm = TRUE)) %>% 
  select(-fantasyPoints_lag, -fantasyPointsMC_lag)
newData_rb_matrix <- data_all_rb_matrix[
  data_all_rb_matrix[, "season"] == max(data_all_rb_matrix[, "season"], na.rm = TRUE), # keep only rows with the most recent season
  , # all columns
  drop = FALSE]

dropCol_rb <- which(colnames(newData_rb_matrix) %in% c("fantasyPoints_lag","fantasyPointsMC_lag"))
newData_rb_matrix <- newData_rb_matrix[, -dropCol_rb, drop = FALSE]

seasonalData_lag_rb_train_imp <- missRanger::missRanger(
  seasonalData_lag_rb_train,
  pmm.k = 5,
  verbose = 2,
  seed = 52242,
  keep_forests = TRUE)

Variables to impute:        games, ageCentered20, ageCentered20Quadratic, fantasy_points, fantasy_points_ppr, fantasyPoints, carries, rushing_yards, rushing_tds, rushing_fumbles, rushing_fumbles_lost, rushing_first_downs, rushing_2pt_conversions, receptions, targets, receiving_yards, receiving_tds, receiving_fumbles, receiving_fumbles_lost, receiving_air_yards, receiving_yards_after_catch, receiving_first_downs, receiving_2pt_conversions, special_teams_tds, years_of_experience, rushing_epa, air_yards_share, receiving_epa, racr, target_share, wopr, fantasyPoints_lag, rookie_year, draft_number, gs, att.rush, yds.rush, td.rush, x1d.rush, ybc.rush, yac.rush, brk_tkl.rush, att_br.rush, tgt.rec, rec.rec, yds.rec, td.rec, x1d.rec, ybc.rec, yac.rec, brk_tkl.rec, drop.rec, int.rec, ybc_att.rush, yac_att.rush, adot.rec, rat.rec, drop_percent.rec, rec_br.rec, ybc_r.rec, yac_r.rec
Variables used to impute:   gsis_id, season, games, gs, years_of_experience, age, ageCentered20, ageCentered20Quadratic, height, weight, rookie_year, draft_number, fantasy_points, fantasy_points_ppr, fantasyPoints, fantasyPoints_lag, carries, rushing_yards, rushing_tds, rushing_fumbles, rushing_fumbles_lost, rushing_first_downs, rushing_epa, rushing_2pt_conversions, receptions, targets, receiving_yards, receiving_tds, receiving_fumbles, receiving_fumbles_lost, receiving_air_yards, receiving_yards_after_catch, receiving_first_downs, receiving_epa, receiving_2pt_conversions, racr, target_share, air_yards_share, wopr, special_teams_tds, ybc_att.rush, yac_att.rush, att.rush, yds.rush, td.rush, x1d.rush, ybc.rush, yac.rush, brk_tkl.rush, att_br.rush, ybc_r.rec, yac_r.rec, adot.rec, rat.rec, tgt.rec, rec.rec, yds.rec, td.rec, x1d.rec, ybc.rec, yac.rec, brk_tkl.rec, drop.rec, int.rec, drop_percent.rec, rec_br.rec

    games   agCn20  agC20Q  fntsy_  fnts__  fntsyP  carris  rshng_y rshng_t rshng_f rshng_fm_   rshng_fr_   rsh_2_  rcptns  targts  rcvng_y rcvng_t rcvng_f rcvng_fm_   rcvng_r_    rcv___  rcvng_fr_   rcv_2_  spcl__  yrs_f_  rshng_p ar_yr_  rcvng_p racr    trgt_s  wopr    fntsP_  rok_yr  drft_n  gs  att.rs  yds.rs  td.rsh  x1d.rs  ybc.rs  yc.rsh  brk_tkl.rs  att_b.  tgt.rc  rec.rc  yds.rc  td.rec  x1d.rc  ybc.rc  yac.rc  brk_tkl.rc  drp.rc  int.rc  ybc_t.  yc_tt.  adt.rc  rat.rc  drp_p.  rc_br.  ybc_r.  yc_r.r
iter 1: 0.8759  0.0072  0.0036  0.4578  0.0178  0.0035  0.0736  0.0229  0.1524  0.4776  0.2679  0.0288  0.9965  0.0749  0.0744  0.0553  0.4578  0.8604  0.4998  0.6821  0.0360  0.0639  1.0042  0.8380  0.1806  0.4662  0.3419  0.3961  0.5595  0.4715  0.1968  0.5338  0.0246  0.5882  0.1726  0.0265  0.0235  0.0849  0.0311  0.0550  0.0521  0.2131  0.3689  0.0281  0.0197  0.0286  0.1742  0.0463  0.3114  0.0229  0.3942  0.4806  0.7239  0.5199  0.5631  0.2865  0.2195  0.3630  0.2052  0.2596  0.4091  
iter 2: 0.2745  0.0177  0.0266  0.0067  0.0041  0.0049  0.0169  0.0154  0.1017  0.2590  0.2956  0.0237  0.9814  0.0286  0.0522  0.0240  0.2187  0.4582  0.4919  0.1541  0.0362  0.0475  1.0075  0.8811  0.1822  0.4481  0.2377  0.3184  0.6116  0.2628  0.2007  0.5254  0.0473  0.6411  0.1653  0.0179  0.0132  0.0940  0.0325  0.0392  0.0375  0.1057  0.3678  0.0161  0.0169  0.0196  0.1510  0.0484  0.1521  0.0202  0.3941  0.1035  0.5567  0.5120  0.5666  0.2466  0.2087  0.1733  0.1699  0.2524  0.4066  
iter 3: 0.2766  0.0190  0.0273  0.0067  0.0041  0.0048  0.0159  0.0149  0.0971  0.2615  0.2952  0.0245  0.9668  0.0278  0.0409  0.0240  0.2180  0.4648  0.4931  0.1319  0.0350  0.0495  1.0128  0.8907  0.1820  0.4366  0.2459  0.3124  0.6236  0.2555  0.2114  0.5276  0.0438  0.6314  0.1658  0.0175  0.0122  0.0899  0.0319  0.0386  0.0386  0.1100  0.3783  0.0155  0.0165  0.0194  0.1477  0.0474  0.1499  0.0194  0.3929  0.1121  0.5761  0.5245  0.5651  0.2490  0.2103  0.1767  0.1817  0.2658  0.4106  
Code
seasonalData_lag_rb_train_imp
missRanger object. Extract imputed data via $data
- best iteration: 2 
- best average OOB imputation error: 0.226086 
Code
data_train_rb <- seasonalData_lag_rb_train_imp$data
data_train_rb$fantasyPointsMC_lag <- scale(data_train_rb$fantasyPoints_lag, scale = FALSE) # mean-centered
data_train_rb_matrix <- data_train_rb %>%
  mutate(across(where(is.factor), ~ as.numeric(as.integer(.)))) %>% 
  as.matrix()

seasonalData_lag_rb_test_imp <- predict(
  object = seasonalData_lag_rb_train_imp,
  newdata = seasonalData_lag_rb_test,
  seed = 52242)

data_test_rb <- seasonalData_lag_rb_test_imp
data_test_rb_matrix <- data_test_rb %>%
  mutate(across(where(is.factor), ~ as.numeric(as.integer(.)))) %>% 
  as.matrix()
Code
# WRs
seasonalData_lag_wr_all_imp <- missRanger::missRanger(
  seasonalData_lag_wr_all,
  pmm.k = 5,
  verbose = 2,
  seed = 52242,
  keep_forests = TRUE)

Variables to impute:        fantasy_points, fantasy_points_ppr, special_teams_tds, years_of_experience, receiving_epa, racr, air_yards_share, target_share, wopr, fantasyPoints_lag, rookie_year, rushing_epa, draft_number, gs, att.rush, yds.rush, td.rush, x1d.rush, ybc.rush, yac.rush, brk_tkl.rush, att_br.rush, tgt.rec, rec.rec, yds.rec, td.rec, x1d.rec, ybc.rec, yac.rec, brk_tkl.rec, drop.rec, int.rec, adot.rec, rat.rec, drop_percent.rec, rec_br.rec, ybc_r.rec, yac_r.rec, ybc_att.rush, yac_att.rush
Variables used to impute:   gsis_id, season, games, gs, years_of_experience, age, ageCentered20, ageCentered20Quadratic, height, weight, rookie_year, draft_number, fantasy_points, fantasy_points_ppr, fantasyPoints, fantasyPoints_lag, carries, rushing_yards, rushing_tds, rushing_fumbles, rushing_fumbles_lost, rushing_first_downs, rushing_epa, rushing_2pt_conversions, receptions, targets, receiving_yards, receiving_tds, receiving_fumbles, receiving_fumbles_lost, receiving_air_yards, receiving_yards_after_catch, receiving_first_downs, receiving_epa, receiving_2pt_conversions, racr, target_share, air_yards_share, wopr, special_teams_tds, ybc_att.rush, yac_att.rush, att.rush, yds.rush, td.rush, x1d.rush, ybc.rush, yac.rush, brk_tkl.rush, att_br.rush, ybc_r.rec, yac_r.rec, adot.rec, rat.rec, tgt.rec, rec.rec, yds.rec, td.rec, x1d.rec, ybc.rec, yac.rec, brk_tkl.rec, drop.rec, int.rec, drop_percent.rec, rec_br.rec

    fntsy_  fnts__  spcl__  yrs_f_  rcvng_  racr    ar_yr_  trgt_s  wopr    fntsP_  rok_yr  rshng_  drft_n  gs  att.rs  yds.rs  td.rsh  x1d.rs  ybc.rs  yc.rsh  brk_tkl.rs  att_b.  tgt.rc  rec.rc  yds.rc  td.rec  x1d.rc  ybc.rc  yac.rc  brk_tkl.rc  drp.rc  int.rc  adt.rc  rat.rc  drp_p.  rc_br.  ybc_r.  yc_r.r  ybc_t.  yc_tt.
iter 1: 0.0061  0.0010  0.7104  0.1566  0.1040  0.8131  0.1013  0.1722  0.0402  0.4890  0.0150  0.3811  0.6654  0.1459  0.1184  0.0898  0.2353  0.1234  0.0966  0.2670  0.6383  0.3084  0.0198  0.0136  0.0151  0.0671  0.0135  0.0268  0.0442  0.4465  0.4320  0.4674  0.2961  0.1410  0.3819  0.1840  0.2251  0.3929  0.2568  0.4760  
iter 2: 0.0058  0.0019  0.7826  0.1601  0.0835  0.7518  0.0607  0.0930  0.0452  0.4939  0.0296  0.3301  0.6843  0.1440  0.0851  0.0600  0.2638  0.1161  0.0708  0.1804  0.3103  0.3223  0.0109  0.0108  0.0096  0.0719  0.0139  0.0200  0.0318  0.4476  0.0778  0.3692  0.2401  0.1448  0.1629  0.1601  0.2261  0.3793  0.2536  0.4775  
iter 3: 0.0061  0.0019  0.7857  0.1593  0.0829  0.7421  0.0580  0.0986  0.0481  0.4946  0.0318  0.3334  0.6890  0.1430  0.0823  0.0604  0.2595  0.1177  0.0728  0.1802  0.3077  0.3194  0.0109  0.0114  0.0095  0.0724  0.0133  0.0199  0.0312  0.4411  0.0767  0.3687  0.2369  0.1455  0.1530  0.1660  0.2169  0.3878  0.2466  0.4716  
iter 4: 0.0060  0.0018  0.7874  0.1604  0.0832  0.7394  0.0591  0.0940  0.0479  0.4926  0.0301  0.3317  0.6896  0.1434  0.0863  0.0601  0.2562  0.1227  0.0711  0.1900  0.3089  0.3194  0.0105  0.0112  0.0095  0.0707  0.0140  0.0202  0.0318  0.4447  0.0784  0.3674  0.2339  0.1423  0.1592  0.1700  0.2254  0.3886  0.2552  0.4662  
Code
seasonalData_lag_wr_all_imp
missRanger object. Extract imputed data via $data
- best iteration: 3 
- best average OOB imputation error: 0.203846 
Code
data_all_wr <- seasonalData_lag_wr_all_imp$data
data_all_wr$fantasyPointsMC_lag <- scale(data_all_wr$fantasyPoints_lag, scale = FALSE) # mean-centered
data_all_wr_matrix <- data_all_wr %>%
  mutate(across(where(is.factor), ~ as.numeric(as.integer(.)))) %>% 
  as.matrix()
newData_wr <- data_all_wr %>% 
  filter(season == max(season, na.rm = TRUE)) %>% 
  select(-fantasyPoints_lag, -fantasyPointsMC_lag)
newData_wr_matrix <- data_all_wr_matrix[
  data_all_wr_matrix[, "season"] == max(data_all_wr_matrix[, "season"], na.rm = TRUE), # keep only rows with the most recent season
  , # all columns
  drop = FALSE]

dropCol_wr <- which(colnames(newData_wr_matrix) %in% c("fantasyPoints_lag","fantasyPointsMC_lag"))
newData_wr_matrix <- newData_wr_matrix[, -dropCol_wr, drop = FALSE]

seasonalData_lag_wr_train_imp <- missRanger::missRanger(
  seasonalData_lag_wr_train,
  pmm.k = 5,
  verbose = 2,
  seed = 52242,
  keep_forests = TRUE)

Variables to impute:        fantasy_points, fantasy_points_ppr, special_teams_tds, years_of_experience, receiving_epa, racr, air_yards_share, target_share, wopr, fantasyPoints_lag, rookie_year, rushing_epa, draft_number, gs, att.rush, yds.rush, td.rush, x1d.rush, ybc.rush, yac.rush, brk_tkl.rush, att_br.rush, tgt.rec, rec.rec, yds.rec, td.rec, x1d.rec, ybc.rec, yac.rec, brk_tkl.rec, drop.rec, int.rec, adot.rec, rat.rec, drop_percent.rec, rec_br.rec, ybc_r.rec, yac_r.rec, ybc_att.rush, yac_att.rush
Variables used to impute:   gsis_id, season, games, gs, years_of_experience, age, ageCentered20, ageCentered20Quadratic, height, weight, rookie_year, draft_number, fantasy_points, fantasy_points_ppr, fantasyPoints, fantasyPoints_lag, carries, rushing_yards, rushing_tds, rushing_fumbles, rushing_fumbles_lost, rushing_first_downs, rushing_epa, rushing_2pt_conversions, receptions, targets, receiving_yards, receiving_tds, receiving_fumbles, receiving_fumbles_lost, receiving_air_yards, receiving_yards_after_catch, receiving_first_downs, receiving_epa, receiving_2pt_conversions, racr, target_share, air_yards_share, wopr, special_teams_tds, ybc_att.rush, yac_att.rush, att.rush, yds.rush, td.rush, x1d.rush, ybc.rush, yac.rush, brk_tkl.rush, att_br.rush, ybc_r.rec, yac_r.rec, adot.rec, rat.rec, tgt.rec, rec.rec, yds.rec, td.rec, x1d.rec, ybc.rec, yac.rec, brk_tkl.rec, drop.rec, int.rec, drop_percent.rec, rec_br.rec

    fntsy_  fnts__  spcl__  yrs_f_  rcvng_  racr    ar_yr_  trgt_s  wopr    fntsP_  rok_yr  rshng_  drft_n  gs  att.rs  yds.rs  td.rsh  x1d.rs  ybc.rs  yc.rsh  brk_tkl.rs  att_b.  tgt.rc  rec.rc  yds.rc  td.rec  x1d.rc  ybc.rc  yac.rc  brk_tkl.rc  drp.rc  int.rc  adt.rc  rat.rc  drp_p.  rc_br.  ybc_r.  yc_r.r  ybc_t.  yc_tt.
iter 1: 0.0064  0.0010  0.7029  0.1611  0.1089  0.8443  0.1021  0.1643  0.0427  0.4935  0.0173  0.3461  0.6788  0.1427  0.1364  0.0993  0.2403  0.1243  0.0979  0.2745  0.6190  0.3171  0.0201  0.0147  0.0159  0.0734  0.0140  0.0280  0.0454  0.4502  0.4439  0.4733  0.3088  0.1641  0.4547  0.2192  0.2439  0.4227  0.2921  0.5068  
iter 2: 0.0063  0.0020  0.7835  0.1630  0.0901  0.8044  0.0674  0.0936  0.0479  0.4930  0.0331  0.3235  0.7090  0.1417  0.0896  0.0659  0.2659  0.1273  0.0752  0.1920  0.3068  0.3225  0.0112  0.0116  0.0101  0.0753  0.0141  0.0210  0.0333  0.4431  0.0809  0.3676  0.2571  0.1617  0.1735  0.1797  0.2441  0.3996  0.2911  0.4923  
iter 3: 0.0063  0.0020  0.7710  0.1639  0.0881  0.7954  0.0646  0.0982  0.0515  0.4956  0.0338  0.3200  0.7088  0.1413  0.0900  0.0640  0.2565  0.1250  0.0735  0.1989  0.3082  0.3280  0.0114  0.0119  0.0096  0.0763  0.0141  0.0216  0.0326  0.4388  0.0807  0.3703  0.2582  0.1623  0.1657  0.2018  0.2375  0.4016  0.2838  0.4794  
iter 4: 0.0062  0.0020  0.7792  0.1625  0.0877  0.8043  0.0632  0.0919  0.0477  0.4963  0.0341  0.3239  0.7038  0.1420  0.0950  0.0653  0.2664  0.1309  0.0767  0.2025  0.2933  0.3076  0.0109  0.0119  0.0097  0.0745  0.0143  0.0217  0.0326  0.4432  0.0804  0.3688  0.2584  0.1605  0.1931  0.2013  0.2378  0.4119  0.2824  0.4860  
Code
seasonalData_lag_wr_train_imp
missRanger object. Extract imputed data via $data
- best iteration: 3 
- best average OOB imputation error: 0.2110456 
Code
data_train_wr <- seasonalData_lag_wr_train_imp$data
data_train_wr$fantasyPointsMC_lag <- scale(data_train_wr$fantasyPoints_lag, scale = FALSE) # mean-centered
data_train_wr_matrix <- data_train_wr %>%
  mutate(across(where(is.factor), ~ as.numeric(as.integer(.)))) %>% 
  as.matrix()

seasonalData_lag_wr_test_imp <- predict(
  object = seasonalData_lag_wr_train_imp,
  newdata = seasonalData_lag_wr_test,
  seed = 52242)

data_test_wr <- seasonalData_lag_wr_test_imp
data_test_wr_matrix <- data_test_wr %>%
  mutate(across(where(is.factor), ~ as.numeric(as.integer(.)))) %>% 
  as.matrix()
Code
# TEs
seasonalData_lag_te_all_imp <- missRanger::missRanger(
  seasonalData_lag_te_all,
  pmm.k = 5,
  verbose = 2,
  seed = 52242,
  keep_forests = TRUE)

Variables to impute:        games, ageCentered20, ageCentered20Quadratic, fantasy_points, fantasy_points_ppr, fantasyPoints, carries, rushing_yards, rushing_tds, rushing_fumbles, rushing_fumbles_lost, rushing_first_downs, rushing_2pt_conversions, receptions, targets, receiving_yards, receiving_tds, receiving_fumbles, receiving_fumbles_lost, receiving_air_yards, receiving_yards_after_catch, receiving_first_downs, receiving_2pt_conversions, special_teams_tds, years_of_experience, receiving_epa, racr, air_yards_share, target_share, wopr, fantasyPoints_lag, rookie_year, draft_number, gs, att.rush, yds.rush, td.rush, x1d.rush, ybc.rush, yac.rush, brk_tkl.rush, att_br.rush, tgt.rec, rec.rec, yds.rec, td.rec, x1d.rec, ybc.rec, yac.rec, brk_tkl.rec, drop.rec, int.rec, adot.rec, rat.rec, drop_percent.rec, rec_br.rec, ybc_r.rec, yac_r.rec, rushing_epa, ybc_att.rush, yac_att.rush
Variables used to impute:   gsis_id, season, games, gs, years_of_experience, age, ageCentered20, ageCentered20Quadratic, height, weight, rookie_year, draft_number, fantasy_points, fantasy_points_ppr, fantasyPoints, fantasyPoints_lag, carries, rushing_yards, rushing_tds, rushing_fumbles, rushing_fumbles_lost, rushing_first_downs, rushing_epa, rushing_2pt_conversions, receptions, targets, receiving_yards, receiving_tds, receiving_fumbles, receiving_fumbles_lost, receiving_air_yards, receiving_yards_after_catch, receiving_first_downs, receiving_epa, receiving_2pt_conversions, racr, target_share, air_yards_share, wopr, special_teams_tds, ybc_att.rush, yac_att.rush, att.rush, yds.rush, td.rush, x1d.rush, ybc.rush, yac.rush, brk_tkl.rush, att_br.rush, ybc_r.rec, yac_r.rec, adot.rec, rat.rec, tgt.rec, rec.rec, yds.rec, td.rec, x1d.rec, ybc.rec, yac.rec, brk_tkl.rec, drop.rec, int.rec, drop_percent.rec, rec_br.rec

    games   agCn20  agC20Q  fntsy_  fnts__  fntsyP  carris  rshng_y rshng_t rshng_f rshng_fm_   rshng_fr_   rsh_2_  rcptns  targts  rcvng_y rcvng_t rcvng_f rcvng_fm_   rcvng_r_    rcv___  rcvng_fr_   rcv_2_  spcl__  yrs_f_  rcvng_p racr    ar_yr_  trgt_s  wopr    fntsP_  rok_yr  drft_n  gs  att.rs  yds.rs  td.rsh  x1d.rs  ybc.rs  yc.rsh  brk_tkl.rs  att_b.  tgt.rc  rec.rc  yds.rc  td.rec  x1d.rc  ybc.rc  yac.rc  brk_tkl.rc  drp.rc  int.rc  adt.rc  rat.rc  drp_p.  rc_br.  ybc_r.  yc_r.r  rshng_p ybc_t.  yc_tt.
iter 1: 0.8157  0.0061  0.0030  0.3406  0.0194  0.0039  0.5253  0.2259  0.2452  0.7083  0.6874  0.0802  1.1303  0.0281  0.0558  0.0255  0.0845  0.8134  0.4317  0.0655  0.0784  0.0253  0.9716  1.0271  0.1530  0.1689  0.6899  0.1092  0.4432  0.1004  0.4764  0.0180  0.6054  0.3846  0.0762  0.0832  0.1564  0.0618  0.0704  0.2123  0.3921  0.6733  0.0290  0.0207  0.0226  0.1012  0.0212  0.0420  0.0603  0.4332  0.4640  0.4996  0.2804  0.1667  0.3542  0.1652  0.2843  0.3948  0.3270  0.6620  0.7439  
iter 2: 0.1712  0.0175  0.0256  0.0106  0.0037  0.0055  0.1140  0.1113  0.0990  0.5369  0.7422  0.0852  1.1286  0.0193  0.0200  0.0128  0.0862  0.4248  0.4659  0.0206  0.0529  0.0217  0.9711  1.0114  0.1561  0.1397  0.6715  0.0766  0.1819  0.1085  0.4649  0.0366  0.6346  0.3880  0.0722  0.0728  0.1592  0.0680  0.0759  0.2034  0.3651  0.6811  0.0164  0.0158  0.0161  0.1080  0.0211  0.0327  0.0475  0.4342  0.1149  0.4173  0.2589  0.1742  0.1467  0.1531  0.2941  0.3851  0.3357  0.6846  0.7397  
iter 3: 0.1689  0.0170  0.0261  0.0114  0.0040  0.0056  0.1190  0.1155  0.0978  0.6088  0.7899  0.0945  1.1731  0.0195  0.0203  0.0132  0.0964  0.4270  0.4608  0.0202  0.0525  0.0214  0.9694  1.0265  0.1560  0.1380  0.6453  0.0751  0.1794  0.1204  0.4642  0.0364  0.6369  0.3853  0.0779  0.0786  0.1497  0.0569  0.0932  0.2027  0.4003  0.6633  0.0171  0.0164  0.0167  0.1049  0.0220  0.0335  0.0466  0.4371  0.1141  0.4304  0.2665  0.1775  0.1464  0.1537  0.2916  0.3720  0.3137  0.6640  0.7770  
Code
seasonalData_lag_te_all_imp
missRanger object. Extract imputed data via $data
- best iteration: 2 
- best average OOB imputation error: 0.2477098 
Code
data_all_te <- seasonalData_lag_te_all_imp$data
data_all_te$fantasyPointsMC_lag <- scale(data_all_te$fantasyPoints_lag, scale = FALSE) # mean-centered
data_all_te_matrix <- data_all_te %>%
  mutate(across(where(is.factor), ~ as.numeric(as.integer(.)))) %>% 
  as.matrix()
newData_te <- data_all_te %>% 
  filter(season == max(season, na.rm = TRUE)) %>% 
  select(-fantasyPoints_lag, -fantasyPointsMC_lag)
newData_te_matrix <- data_all_te_matrix[
  data_all_te_matrix[, "season"] == max(data_all_te_matrix[, "season"], na.rm = TRUE), # keep only rows with the most recent season
  , # all columns
  drop = FALSE]

dropCol_te <- which(colnames(newData_te_matrix) %in% c("fantasyPoints_lag","fantasyPointsMC_lag"))
newData_te_matrix <- newData_te_matrix[, -dropCol_te, drop = FALSE]

seasonalData_lag_te_train_imp <- missRanger::missRanger(
  seasonalData_lag_te_train,
  pmm.k = 5,
  verbose = 2,
  seed = 52242,
  keep_forests = TRUE)

Variables to impute:        games, years_of_experience, ageCentered20, ageCentered20Quadratic, fantasy_points, fantasy_points_ppr, fantasyPoints, carries, rushing_yards, rushing_tds, rushing_fumbles, rushing_fumbles_lost, rushing_first_downs, rushing_2pt_conversions, receptions, targets, receiving_yards, receiving_tds, receiving_fumbles, receiving_fumbles_lost, receiving_air_yards, receiving_yards_after_catch, receiving_first_downs, receiving_2pt_conversions, special_teams_tds, receiving_epa, racr, air_yards_share, target_share, wopr, fantasyPoints_lag, rookie_year, draft_number, gs, att.rush, yds.rush, td.rush, x1d.rush, ybc.rush, yac.rush, brk_tkl.rush, att_br.rush, tgt.rec, rec.rec, yds.rec, td.rec, x1d.rec, ybc.rec, yac.rec, brk_tkl.rec, drop.rec, int.rec, adot.rec, rat.rec, drop_percent.rec, rec_br.rec, ybc_r.rec, yac_r.rec, rushing_epa, ybc_att.rush, yac_att.rush
Variables used to impute:   gsis_id, season, games, gs, years_of_experience, age, ageCentered20, ageCentered20Quadratic, height, weight, rookie_year, draft_number, fantasy_points, fantasy_points_ppr, fantasyPoints, fantasyPoints_lag, carries, rushing_yards, rushing_tds, rushing_fumbles, rushing_fumbles_lost, rushing_first_downs, rushing_epa, rushing_2pt_conversions, receptions, targets, receiving_yards, receiving_tds, receiving_fumbles, receiving_fumbles_lost, receiving_air_yards, receiving_yards_after_catch, receiving_first_downs, receiving_epa, receiving_2pt_conversions, racr, target_share, air_yards_share, wopr, special_teams_tds, ybc_att.rush, yac_att.rush, att.rush, yds.rush, td.rush, x1d.rush, ybc.rush, yac.rush, brk_tkl.rush, att_br.rush, ybc_r.rec, yac_r.rec, adot.rec, rat.rec, tgt.rec, rec.rec, yds.rec, td.rec, x1d.rec, ybc.rec, yac.rec, brk_tkl.rec, drop.rec, int.rec, drop_percent.rec, rec_br.rec

    games   yrs_f_  agCn20  agC20Q  fntsy_  fnts__  fntsyP  carris  rshng_y rshng_t rshng_f rshng_fm_   rshng_fr_   rsh_2_  rcptns  targts  rcvng_y rcvng_t rcvng_f rcvng_fm_   rcvng_r_    rcv___  rcvng_fr_   rcv_2_  spcl__  rcvng_p racr    ar_yr_  trgt_s  wopr    fntsP_  rok_yr  drft_n  gs  att.rs  yds.rs  td.rsh  x1d.rs  ybc.rs  yc.rsh  brk_tkl.rs  att_b.  tgt.rc  rec.rc  yds.rc  td.rec  x1d.rc  ybc.rc  yac.rc  brk_tkl.rc  drp.rc  int.rc  adt.rc  rat.rc  drp_p.  rc_br.  ybc_r.  yc_r.r  rshng_p ybc_t.  yc_tt.
iter 1: 0.8094  0.1093  0.0070  0.0035  0.3272  0.0235  0.0052  0.2840  0.1426  0.2634  0.8628  0.7885  0.0924  1.1067  0.0298  0.0611  0.0249  0.0969  0.8177  0.4537  0.0650  0.0804  0.0249  0.9680  1.0235  0.1738  0.5438  0.0868  0.4123  0.1172  0.4597  0.0189  0.6057  0.3973  0.0877  0.0886  0.1516  0.0467  0.0593  0.2086  0.4018  0.6464  0.0296  0.0223  0.0237  0.1062  0.0207  0.0428  0.0579  0.4367  0.4700  0.4818  0.3045  0.1724  0.4722  0.2410  0.2693  0.4025  0.3943  0.4791  0.7521  
iter 2: 0.1728  0.1469  0.0179  0.0289  0.0104  0.0039  0.0051  0.0863  0.0763  0.1528  0.7880  0.9474  0.0849  1.0234  0.0193  0.0198  0.0137  0.0915  0.4327  0.4835  0.0238  0.0558  0.0221  0.9617  1.0376  0.1464  0.5141  0.0630  0.1827  0.1062  0.4562  0.0379  0.6361  0.3908  0.0641  0.0767  0.1425  0.0603  0.0747  0.1970  0.3903  0.6647  0.0182  0.0171  0.0165  0.1074  0.0226  0.0332  0.0503  0.4361  0.1170  0.4096  0.2673  0.1862  0.2255  0.2386  0.2793  0.4004  0.3648  0.5070  0.7621  
iter 3: 0.1713  0.1447  0.0195  0.0276  0.0104  0.0036  0.0051  0.0796  0.0889  0.1611  0.8505  0.9348  0.0904  1.0447  0.0196  0.0205  0.0134  0.0901  0.4465  0.4867  0.0233  0.0569  0.0222  0.9519  1.0103  0.1457  0.5062  0.0617  0.1698  0.1115  0.4530  0.0382  0.6521  0.3899  0.0665  0.0681  0.1457  0.0647  0.0866  0.2055  0.3919  0.6791  0.0169  0.0169  0.0168  0.1107  0.0213  0.0339  0.0500  0.4315  0.1200  0.4148  0.2745  0.1822  0.1947  0.2205  0.2778  0.4032  0.3639  0.4933  0.7741  
Code
seasonalData_lag_te_train_imp
missRanger object. Extract imputed data via $data
- best iteration: 2 
- best average OOB imputation error: 0.2519559 
Code
data_train_te <- seasonalData_lag_te_train_imp$data
data_train_te$fantasyPointsMC_lag <- scale(data_train_te$fantasyPoints_lag, scale = FALSE) # mean-centered
data_train_te_matrix <- data_train_te %>%
  mutate(across(where(is.factor), ~ as.numeric(as.integer(.)))) %>% 
  as.matrix()

seasonalData_lag_te_test_imp <- predict(
  object = seasonalData_lag_te_train_imp,
  newdata = seasonalData_lag_te_test,
  seed = 52242)

data_test_te <- seasonalData_lag_te_test_imp
data_test_te_matrix <- data_test_te %>%
  mutate(across(where(is.factor), ~ as.numeric(as.integer(.)))) %>% 
  as.matrix()

19.5 Identify Cores for Parallel Processing

Code
num_cores <- parallel::detectCores() - 1
num_true_cores <- parallel::detectCores(logical = FALSE) - 1
Code
num_cores
[1] 4

We use the future (Bengtsson, 2025) package for parallel (faster) processing.

Code
future::plan(future::multisession, workers = num_cores)

19.6 Fitting the Traditional Regression Models

19.6.1 Regression with One Predictor

Code
# Set seed for reproducibility
set.seed(52242)

# Set up cross-validation
folds <- rsample::group_vfold_cv(
  data_train_qb,
  group = gsis_id,
  v = 10) # 10-fold cross-validation

# Define Recipe (Formula)
rec <- recipes::recipe(
  fantasyPoints_lag ~ ageCentered20,
  data = data_train_qb)

# Define Model
lm_spec <- parsnip::linear_reg() %>%
  parsnip::set_engine("lm") %>%
  parsnip::set_mode("regression")

# Workflow
lm_wf <- workflows::workflow() %>%
  workflows::add_recipe(rec) %>%
  workflows::add_model(lm_spec)

# Fit Model with Cross-Validation
cv_results <- tune::fit_resamples(
  lm_wf,
  resamples = folds,
  metrics = metric_set(rmse, mae, rsq),
  control = control_resamples(save_pred = TRUE)
)

# View Cross-Validation metrics
tune::collect_metrics(cv_results)
Code
# Fit Final Model on Training Data
final_model <- workflows::fit(
  lm_wf,
  data = data_train_qb)

# View Coefficients
final_model %>% 
  workflows::extract_fit_parsnip() %>% 
  broom::tidy()
Code
# Predict on Test Data
predict(final_model, data_test_qb)
Code
df <- data_test_qb %>%
  mutate(pred = predict(final_model, new_data = data_test_qb)$.pred)

# Evaluate Accuracy of Predictions
petersenlab::accuracyOverall(
  predicted = df$pred,
  actual = df$fantasyPoints_lag,
  dropUndefined = TRUE
)
Code
# Calculate combined range for axes
axis_limits <- range(c(df$pred, df$fantasyPoints_lag), na.rm = TRUE)

ggplot(
  df,
  aes(
    x = pred,
    y = fantasyPoints_lag)) +
  geom_point(
    size = 2,
    alpha = 0.6) +
  geom_abline(
    slope = 1,
    intercept = 0,
    color = "blue",
    linetype = "dashed") +
  coord_equal(
    xlim = axis_limits,
    ylim = axis_limits) +
  labs(
    title = "Predicted vs Actual Fantasy Points (Test Data)",
    x = "Predicted Fantasy Points",
    y = "Actual Fantasy Points"
  ) +
  theme_classic()
Predicted Versus Actual Fantasy Points for Regression Model with One Predictor (Player Age).
Figure 19.1: Predicted Versus Actual Fantasy Points for Regression Model with One Predictor (Player Age).
Code
newData_qb %>%
  mutate(fantasyPoints_lag = predict(final_model, new_data = newData_qb)$.pred) %>% 
  left_join(
    .,
    nfl_playerIDs %>% select(gsis_id, name),
    by = "gsis_id"
  ) %>% 
  select(name, fantasyPoints_lag) %>% 
  arrange(-fantasyPoints_lag)

19.6.2 Regression with Multiple Predictors

Code
# Set seed for reproducibility
set.seed(52242)

# Set up cross-validation
folds <- rsample::group_vfold_cv(
  data_train_qb,
  group = gsis_id,
  v = 10) # 10-fold cross-validation

# Define Recipe (Formula)
rec <- recipes::recipe(
  fantasyPoints_lag ~ .,
  data = data_train_qb %>% select(-gsis_id, -fantasyPointsMC_lag))

# Define Model
lm_spec <- parsnip::linear_reg() %>%
  parsnip::set_engine("lm") %>%
  parsnip::set_mode("regression")

# Workflow
lm_wf <- workflows::workflow() %>%
  workflows::add_recipe(rec) %>%
  workflows::add_model(lm_spec)

# Fit Model with Cross-Validation
cv_results <- tune::fit_resamples(
  lm_wf,
  resamples = folds,
  metrics = metric_set(rmse, mae, rsq),
  control = control_resamples(save_pred = TRUE)
)

# View Cross-Validation metrics
tune::collect_metrics(cv_results)
Code
# Fit Final Model on Training Data
final_model <- workflows::fit(
  lm_wf,
  data = data_train_qb)

# View Coefficients
final_model %>% 
  workflows::extract_fit_parsnip() %>% 
  broom::tidy()
Code
# Predict on Test Data
predict(final_model, data_test_qb)
Code
df <- data_test_qb %>%
  mutate(pred = predict(final_model, new_data = data_test_qb)$.pred)

# Evaluate Accuracy of Predictions
petersenlab::accuracyOverall(
  predicted = df$pred,
  actual = df$fantasyPoints_lag,
  dropUndefined = TRUE
)
Code
# Calculate combined range for axes
axis_limits <- range(c(df$pred, df$fantasyPoints_lag), na.rm = TRUE)

ggplot(
  df,
  aes(
    x = pred,
    y = fantasyPoints_lag)) +
  geom_point(
    size = 2,
    alpha = 0.6) +
  geom_abline(
    slope = 1,
    intercept = 0,
    color = "blue",
    linetype = "dashed") +
  coord_equal(
    xlim = axis_limits,
    ylim = axis_limits) +
  labs(
    title = "Predicted vs Actual Fantasy Points (Test Data)",
    x = "Predicted Fantasy Points",
    y = "Actual Fantasy Points"
  ) +
  theme_classic()
Predicted Versus Actual Fantasy Points for Regression Model with Multiple Predictors.
Figure 19.2: Predicted Versus Actual Fantasy Points for Regression Model with Multiple Predictors.
Code
newData_qb %>%
  mutate(fantasyPoints_lag = predict(final_model, new_data = newData_qb)$.pred) %>% 
  left_join(
    .,
    nfl_playerIDs %>% select(gsis_id, name),
    by = "gsis_id"
  ) %>% 
  select(name, fantasyPoints_lag) %>% 
  arrange(-fantasyPoints_lag)

19.7 Fitting the Machine Learning Models

19.7.1 Least Absolute Shrinkage and Selection Option (LASSO)

Code
# Set seed for reproducibility
set.seed(52242)

# Set up cross-validation
folds <- rsample::group_vfold_cv(
  data_train_qb,
  group = gsis_id,
  v = 10) # 10-fold cross-validation

# Define Recipe (Formula)
rec <- recipes::recipe(
  fantasyPoints_lag ~ .,
  data = data_train_qb %>% select(-gsis_id, -fantasyPointsMC_lag))

# Define Model
lasso_spec <- 
  parsnip::linear_reg(
    penalty = tune(),
    mixture = 1) %>%
  set_engine("glmnet")

# Workflow
lasso_wf <- workflows::workflow() %>%
  workflows::add_recipe(rec) %>%
  workflows::add_model(lasso_spec)

# Define grid of penalties to try (log scale is typical)
penalty_grid <- dials::grid_regular(
  dials::penalty(range = c(-4, -1)),
  levels = 20)

# Tune the Penalty Parameter
cv_results <- tune::tune_grid(
  lasso_wf,
  resamples = folds,
  grid = penalty_grid,
  metrics = metric_set(rmse, mae, rsq),
  control = control_grid(save_pred = TRUE)
)

# View Cross-Validation metrics
tune::collect_metrics(cv_results)
Code
# Identify best penalty
select_best(cv_results, metric = "rmse")
Code
select_best(cv_results, metric = "mae")
Code
select_best(cv_results, metric = "rsq")
Code
best_penalty <- select_best(cv_results, metric = "mae")

# Finalize Workflow with Best Penalty
final_wf <- finalize_workflow(
  lasso_wf,
  best_penalty)

# Fit Final Model on Training Data
final_model <- workflows::fit(
  final_wf,
  data = data_train_qb)

# View Coefficients
final_model %>% 
  workflows::extract_fit_parsnip() %>% 
  broom::tidy()
Code
# Predict on Test Data
predict(final_model, data_test_qb)
Code
df <- data_test_qb %>%
  mutate(pred = predict(final_model, new_data = data_test_qb)$.pred)

# Evaluate Accuracy of Predictions
petersenlab::accuracyOverall(
  predicted = df$pred,
  actual = df$fantasyPoints_lag,
  dropUndefined = TRUE
)
Code
# Calculate combined range for axes
axis_limits <- range(c(df$pred, df$fantasyPoints_lag), na.rm = TRUE)

ggplot(
  df,
  aes(
    x = pred,
    y = fantasyPoints_lag)) +
  geom_point(
    size = 2,
    alpha = 0.6) +
  geom_abline(
    slope = 1,
    intercept = 0,
    color = "blue",
    linetype = "dashed") +
  coord_equal(
    xlim = axis_limits,
    ylim = axis_limits) +
  labs(
    title = "Predicted vs Actual Fantasy Points (Test Data)",
    x = "Predicted Fantasy Points",
    y = "Actual Fantasy Points"
  ) +
  theme_classic()
Predicted Versus Actual Fantasy Points for Least Absolute Shrinkage and Selection Option (LASSO) Model.
Figure 19.3: Predicted Versus Actual Fantasy Points for Least Absolute Shrinkage and Selection Option (LASSO) Model.
Code
newData_qb %>%
  mutate(fantasyPoints_lag = predict(final_model, new_data = newData_qb)$.pred) %>% 
  left_join(
    .,
    nfl_playerIDs %>% select(gsis_id, name),
    by = "gsis_id"
  ) %>% 
  select(name, fantasyPoints_lag) %>% 
  arrange(-fantasyPoints_lag)

19.7.2 Ridge Regression

Code
# Set seed for reproducibility
set.seed(52242)

# Set up cross-validation
folds <- rsample::group_vfold_cv(
  data_train_qb,
  group = gsis_id,
  v = 10) # 10-fold cross-validation

# Define Recipe (Formula)
rec <- recipes::recipe(
  fantasyPoints_lag ~ .,
  data = data_train_qb %>% select(-gsis_id, -fantasyPointsMC_lag))

# Define Model
ridge_spec <- 
  linear_reg(
    penalty = tune(),
    mixture = 0) %>%
  set_engine("glmnet")

# Workflow
ridge_wf <- workflows::workflow() %>%
  workflows::add_recipe(rec) %>%
  workflows::add_model(ridge_spec)

# Define grid of penalties to try (log scale is typical)
penalty_grid <- dials::grid_regular(
  dials::penalty(range = c(-4, -1)),
  levels = 20)

# Tune the Penalty Parameter
cv_results <- tune::tune_grid(
  ridge_wf,
  resamples = folds,
  grid = penalty_grid,
  metrics = metric_set(rmse, mae, rsq),
  control = control_grid(save_pred = TRUE)
)

# View Cross-Validation metrics
tune::collect_metrics(cv_results)
Code
# Identify best penalty
select_best(cv_results, metric = "rmse")
Code
select_best(cv_results, metric = "mae")
Code
select_best(cv_results, metric = "rsq")
Code
best_penalty <- select_best(cv_results, metric = "mae")

# Finalize Workflow with Best Penalty
final_wf <- finalize_workflow(
  ridge_wf,
  best_penalty)

# Fit Final Model on Training Data
final_model <- workflows::fit(
  final_wf,
  data = data_train_qb)

# View Coefficients
final_model %>% 
  workflows::extract_fit_parsnip() %>% 
  broom::tidy()
Code
# Predict on Test Data
predict(final_model, data_test_qb)
Code
df <- data_test_qb %>%
  mutate(pred = predict(final_model, new_data = data_test_qb)$.pred)

# Evaluate Accuracy of Predictions
petersenlab::accuracyOverall(
  predicted = df$pred,
  actual = df$fantasyPoints_lag,
  dropUndefined = TRUE
)
Code
# Calculate combined range for axes
axis_limits <- range(c(df$pred, df$fantasyPoints_lag), na.rm = TRUE)

ggplot(
  df,
  aes(
    x = pred,
    y = fantasyPoints_lag)) +
  geom_point(
    size = 2,
    alpha = 0.6) +
  geom_abline(
    slope = 1,
    intercept = 0,
    color = "blue",
    linetype = "dashed") +
  coord_equal(
    xlim = axis_limits,
    ylim = axis_limits) +
  labs(
    title = "Predicted vs Actual Fantasy Points (Test Data)",
    x = "Predicted Fantasy Points",
    y = "Actual Fantasy Points"
  ) +
  theme_classic()
Predicted Versus Actual Fantasy Points for Ridge Regression Model.
Figure 19.4: Predicted Versus Actual Fantasy Points for Ridge Regression Model.
Code
newData_qb %>%
  mutate(fantasyPoints_lag = predict(final_model, new_data = newData_qb)$.pred) %>% 
  left_join(
    .,
    nfl_playerIDs %>% select(gsis_id, name),
    by = "gsis_id"
  ) %>% 
  select(name, fantasyPoints_lag) %>% 
  arrange(-fantasyPoints_lag)

19.7.3 Elastic Net

Code
# Set seed for reproducibility
set.seed(52242)

# Set up cross-validation
folds <- rsample::group_vfold_cv(
  data_train_qb,
  group = gsis_id,
  v = 10) # 10-fold cross-validation

# Define Recipe (Formula)
rec <- recipes::recipe(
  fantasyPoints_lag ~ .,
  data = data_train_qb %>% select(-gsis_id, -fantasyPointsMC_lag))

# Define Model
enet_spec <- 
  linear_reg(
    penalty = tune(),
    mixture = tune()) %>%
  set_engine("glmnet")

# Workflow
enet_wf <- workflows::workflow() %>%
  workflows::add_recipe(rec) %>%
  workflows::add_model(enet_spec)

# Define a regular grid for both penalty and mixture
grid_enet <- dials::grid_regular(
  dials::penalty(range = c(-4, -1)),
  dials::mixture(range = c(0, 1)),
  levels = c(20, 5) # 20 penalty values × 5 mixture values
)

# Tune the Grid
cv_results <- tune::tune_grid(
  enet_wf,
  resamples = folds,
  grid = grid_enet,
  metrics = metric_set(rmse, mae, rsq),
  control = control_grid(save_pred = TRUE)
)

# View Cross-Validation metrics
tune::collect_metrics(cv_results)
Code
# Identify best penalty
select_best(cv_results, metric = "rmse")
Code
select_best(cv_results, metric = "mae")
Code
select_best(cv_results, metric = "rsq")
Code
best_penalty <- select_best(cv_results, metric = "mae")

# Finalize Workflow with Best Penalty
final_wf <- finalize_workflow(
  enet_wf,
  best_penalty)

# Fit Final Model on Training Data
final_model <- workflows::fit(
  final_wf,
  data = data_train_qb)

# View Coefficients
final_model %>% 
  workflows::extract_fit_parsnip() %>% 
  broom::tidy()
Code
# Predict on Test Data
predict(final_model, data_test_qb)
Code
df <- data_test_qb %>%
  mutate(pred = predict(final_model, new_data = data_test_qb)$.pred)

# Evaluate Accuracy of Predictions
petersenlab::accuracyOverall(
  predicted = df$pred,
  actual = df$fantasyPoints_lag,
  dropUndefined = TRUE
)
Code
# Calculate combined range for axes
axis_limits <- range(c(df$pred, df$fantasyPoints_lag), na.rm = TRUE)

ggplot(
  df,
  aes(
    x = pred,
    y = fantasyPoints_lag)) +
  geom_point(
    size = 2,
    alpha = 0.6) +
  geom_abline(
    slope = 1,
    intercept = 0,
    color = "blue",
    linetype = "dashed") +
  coord_equal(
    xlim = axis_limits,
    ylim = axis_limits) +
  labs(
    title = "Predicted vs Actual Fantasy Points (Test Data)",
    x = "Predicted Fantasy Points",
    y = "Actual Fantasy Points"
  ) +
  theme_classic()
Predicted Versus Actual Fantasy Points for Elastic Net Model.
Figure 19.5: Predicted Versus Actual Fantasy Points for Elastic Net Model.
Code
newData_qb %>%
  mutate(fantasyPoints_lag = predict(final_model, new_data = newData_qb)$.pred) %>% 
  left_join(
    .,
    nfl_playerIDs %>% select(gsis_id, name),
    by = "gsis_id"
  ) %>% 
  select(name, fantasyPoints_lag) %>% 
  arrange(-fantasyPoints_lag)

19.7.4 Random Forest Machine Learning

19.7.4.1 Cross-Sectional Data

Code
# Set seed for reproducibility
set.seed(52242)

# Set up cross-validation
folds <- rsample::group_vfold_cv(
  data_train_qb,
  group = gsis_id,
  v = 10) # 10-fold cross-validation

# Define Recipe (Formula)
rec <- recipes::recipe(
  fantasyPoints_lag ~ .,
  data = data_train_qb %>% select(-gsis_id, -fantasyPointsMC_lag))

# Define Model
rf_spec <- 
  parsnip::rand_forest(
    mtry = tune::tune(),
    min_n = tune::tune(),
    trees = 500) %>%
  parsnip::set_mode("regression") %>%
  parsnip::set_engine("ranger", importance = "impurity")

# Workflow
rf_wf <- workflows::workflow() %>%
  workflows::add_recipe(rec) %>%
  workflows::add_model(rf_spec)

# Create Grid
n_predictors <- recipes::prep(rec) %>%
  recipes::juice() %>%
  dplyr::select(-fantasyPoints_lag) %>%
  ncol()

# Dynamically define ranges based on data
rf_params <- hardhat::extract_parameter_set_dials(rf_spec) %>%
  dials:::update.parameters(
    mtry = dials::mtry(range = c(1L, n_predictors)),
    min_n = dials::min_n(range = c(2L, 10L))
  )

rf_grid <- dials::grid_random(rf_params, size = 15) #dials::grid_regular(rf_params, levels = 5)

# Tune the Grid
cv_results <- tune::tune_grid(
  rf_wf,
  resamples = folds,
  grid = rf_grid,
  metrics = metric_set(rmse, mae, rsq),
  control = control_grid(save_pred = TRUE)
)

# View Cross-Validation metrics
tune::collect_metrics(cv_results)
Code
# Identify best penalty
select_best(cv_results, metric = "rmse")
Code
select_best(cv_results, metric = "mae")
Code
select_best(cv_results, metric = "rsq")
Code
best_penalty <- select_best(cv_results, metric = "mae")

# Finalize Workflow with Best Penalty
final_wf <- finalize_workflow(
  rf_wf,
  best_penalty)

# Fit Final Model on Training Data
final_model <- workflows::fit(
  final_wf,
  data = data_train_qb)

# View Feature Importance
rf_fit <- final_model %>% 
  workflows::extract_fit_parsnip()

rf_fit
parsnip model object

Ranger result

Call:
 ranger::ranger(x = maybe_data_frame(x), y = y, mtry = min_cols(~3L,      x), num.trees = ~500, min.node.size = min_rows(~9L, x), importance = ~"impurity",      num.threads = 1, verbose = FALSE, seed = sample.int(10^5,          1)) 

Type:                             Regression 
Number of trees:                  500 
Sample size:                      1582 
Number of independent variables:  73 
Mtry:                             3 
Target node size:                 9 
Variable importance mode:         impurity 
Splitrule:                        variance 
OOB prediction error (MSE):       6315.315 
R squared (OOB):                  0.5148104 
Code
ranger_obj <- rf_fit$fit

ranger_obj$variable.importance
                   season                     games                        gs 
              126359.5422               319578.3505               382535.6517 
      years_of_experience                       age             ageCentered20 
              124783.8817               199342.1467               207939.2802 
   ageCentered20Quadratic                    height                    weight 
              205495.8993                73752.5223               135823.1306 
              rookie_year              draft_number            fantasy_points 
              121270.5272               235944.0545               641663.0852 
       fantasy_points_ppr             fantasyPoints               completions 
              753816.3235               673276.9667               577883.0044 
                 attempts             passing_yards               passing_tds 
              469528.4582               692904.1588               535369.7048 
    passing_interceptions            sacks_suffered           sack_yards_lost 
              154679.5322               294698.8048               237255.5206 
             sack_fumbles         sack_fumbles_lost         passing_air_yards 
               92927.1076                52731.7059               389664.7190 
passing_yards_after_catch       passing_first_downs               passing_epa 
              371266.0798               533911.4945               469375.8940 
             passing_cpoe   passing_2pt_conversions                      pacr 
              159111.5776                43285.8990               136591.0157 
                  carries             rushing_yards               rushing_tds 
              191763.4732               227786.6905                83784.6521 
          rushing_fumbles      rushing_fumbles_lost       rushing_first_downs 
               62378.0789                31328.7581               218593.7055 
              rushing_epa   rushing_2pt_conversions         special_teams_tds 
              174856.7482                22130.6589                  432.8602 
         pocket_time.pass        pass_attempts.pass           throwaways.pass 
               88617.4566               481207.3858               330967.1992 
              spikes.pass                drops.pass           bad_throws.pass 
               74008.1352               364123.0798               411395.3664 
       times_blitzed.pass        times_hurried.pass            times_hit.pass 
              481435.4648               374116.1093               266521.6649 
     times_pressured.pass         batted_balls.pass        on_tgt_throws.pass 
              424836.7389               151484.9543               402761.4106 
           rpo_plays.pass            rpo_yards.pass         rpo_pass_att.pass 
              192637.5465               225659.8819               243476.9956 
      rpo_pass_yards.pass         rpo_rush_att.pass       rpo_rush_yards.pass 
              205064.4104                77907.9318               117492.6716 
         pa_pass_att.pass        pa_pass_yards.pass             drop_pct.pass 
              347266.3891               454836.7853               129912.7438 
       bad_throw_pct.pass           on_tgt_pct.pass         pressure_pct.pass 
              145508.9586               106607.2086               126197.3464 
             ybc_att.rush              yac_att.rush                  att.rush 
              139772.9950               120705.2307               367870.7638 
                 yds.rush                   td.rush                  x1d.rush 
              208163.2897               125274.5721               221642.2967 
                 ybc.rush                  yac.rush              brk_tkl.rush 
              188152.9223               173028.6991                52392.4196 
              att_br.rush 
               86667.3297 
Code
# Predict on Test Data
predict(final_model, data_test_qb)
Code
df <- data_test_qb %>%
  mutate(pred = predict(final_model, new_data = data_test_qb)$.pred)

# Evaluate Accuracy of Predictions
petersenlab::accuracyOverall(
  predicted = df$pred,
  actual = df$fantasyPoints_lag,
  dropUndefined = TRUE
)
Code
# Calculate combined range for axes
axis_limits <- range(c(df$pred, df$fantasyPoints_lag), na.rm = TRUE)

ggplot(
  df,
  aes(
    x = pred,
    y = fantasyPoints_lag)) +
  geom_point(
    size = 2,
    alpha = 0.6) +
  geom_abline(
    slope = 1,
    intercept = 0,
    color = "blue",
    linetype = "dashed") +
  coord_equal(
    xlim = axis_limits,
    ylim = axis_limits) +
  labs(
    title = "Predicted vs Actual Fantasy Points (Test Data)",
    x = "Predicted Fantasy Points",
    y = "Actual Fantasy Points"
  ) +
  theme_classic()
Predicted Versus Actual Fantasy Points for Random Forest Model.
Figure 19.6: Predicted Versus Actual Fantasy Points for Random Forest Model.
Code
newData_qb %>%
  mutate(fantasyPoints_lag = predict(final_model, new_data = newData_qb)$.pred) %>% 
  left_join(
    .,
    nfl_playerIDs %>% select(gsis_id, name),
    by = "gsis_id"
  ) %>% 
  select(name, fantasyPoints_lag) %>% 
  arrange(-fantasyPoints_lag)

Now we can stop the parallel backend:

Code
future::plan(future::sequential)

19.7.4.2 Longitudinal Data

Approaches to estimating random forest models with longitudinal data are described by @Hu & Szymczak (2023). Below, we fit longitudinal random forest models using the MERF() function of the LongituRF package (Capitaine, 2020).

Code
smerf <- LongituRF::MERF(
  X = data_train_qb_matrix %>% as_tibble() %>% dplyr::select(season:att_br.rush) %>% as.matrix(),
  Y = data_train_qb_matrix[,c("fantasyPoints_lag")] %>% as.matrix(),
  Z = data_train_qb_matrix[,c("pacr")] %>% as.matrix(),
  id = data_train_qb_matrix[,c("gsis_id")] %>% as.matrix(),
  time = data_train_qb_matrix[,c("ageCentered20")] %>% as.matrix(),
  ntree = 500,
  sto = "BM")
[1] "stopped after 7 iterations."
Code
smerf$forest # the fitted random forest (obtained at the last iteration)

Call:
 randomForest(x = X, y = ystar, ntree = ntree, mtry = mtry, importance = TRUE) 
               Type of random forest: regression
                     Number of trees: 500
No. of variables tried at each split: 25

          Mean of squared residuals: 169.7337
                    % Var explained: 98.52
Code
smerf$random_effects # the predicted random effects for each player
                [,1]
  [1,] -0.2888613187
  [2,]  0.0808783165
  [3,] -0.3744136176
  [4,] -0.0058870803
  [5,] -0.0393494084
  [6,] -0.5659093504
  [7,] -0.0743022476
  [8,]  1.0283239780
  [9,] -0.3673934691
 [10,] -0.0630583207
 [11,] -0.5414705679
 [12,]  0.1017234109
 [13,]  0.3420394432
 [14,] -0.1363133409
 [15,]  0.0000000000
 [16,]  0.0000000000
 [17,] -0.0998732993
 [18,] -0.7019422920
 [19,]  0.1330643560
 [20,] -0.2041909176
 [21,]  0.2415654691
 [22,] -0.1674386429
 [23,]  0.0000000000
 [24,]  0.1968118235
 [25,]  0.5273468124
 [26,]  0.0896266074
 [27,] -0.2628429684
 [28,] -0.6742935540
 [29,] -0.2152977433
 [30,]  0.0000000000
 [31,] -0.1263935068
 [32,] -0.9390210241
 [33,] -0.0062962015
 [34,]  0.0000000000
 [35,] -0.0723451093
 [36,] -0.1762272416
 [37,]  0.3438144806
 [38,]  0.3545495863
 [39,]  0.0000000000
 [40,] -0.1589880010
 [41,] -0.3607526050
 [42,]  0.3772846485
 [43,] -0.0432659094
 [44,]  0.0000000000
 [45,] -0.2589126065
 [46,] -0.5236477402
 [47,]  0.7236810758
 [48,] -0.2049018673
 [49,] -0.7008137889
 [50,]  0.0000000000
 [51,] -1.0315047170
 [52,] -0.0525755297
 [53,]  0.0623901784
 [54,] -0.5037765373
 [55,]  0.6081956896
 [56,] -0.7068703936
 [57,] -0.2709923931
 [58,]  0.1176946626
 [59,] -0.7704637674
 [60,] -0.0820481301
 [61,] -0.5117325553
 [62,] -1.1168740086
 [63,]  0.0000000000
 [64,]  0.0058136970
 [65,] -0.2073675421
 [66,] -0.8032834148
 [67,] -0.3835048598
 [68,]  0.1985783531
 [69,]  0.0000000000
 [70,] -0.0616215564
 [71,] -0.4097112095
 [72,] -0.4874830616
 [73,] -0.1557869719
 [74,]  0.0000000000
 [75,] -0.0548501165
 [76,] -0.1640681418
 [77,] -0.3571451462
 [78,]  0.0000000000
 [79,] -0.1568558413
 [80,] -0.7074326051
 [81,] -0.1196436451
 [82,]  0.3742585560
 [83,] -0.1140468176
 [84,]  0.0000000000
 [85,] -0.0304912566
 [86,]  0.0000000000
 [87,] -0.1401624416
 [88,] -0.0056426972
 [89,] -0.0922167126
 [90,] -0.1686558039
 [91,]  0.0273782580
 [92,] -0.4221551188
 [93,]  0.6879985329
 [94,] -2.1740962380
 [95,] -0.2191074290
 [96,]  0.0000000000
 [97,] -0.1950516362
 [98,] -0.1738432756
 [99,]  0.1355646693
[100,] -0.0664723600
[101,] -0.2172179376
[102,] -0.4911641723
[103,] -0.3849648370
[104,]  0.0000000000
[105,] -0.0389591762
[106,] -0.1225083999
[107,] -0.9409084854
[108,] -0.2764812233
[109,]  0.0280409544
[110,]  0.0113088602
[111,] -0.0212477742
[112,] -0.1297398117
[113,]  0.0000000000
[114,] -0.0364581717
[115,]  0.0000000000
[116,] -0.1358905528
[117,] -0.0339585483
[118,]  0.0289559387
[119,] -0.0780591097
[120,]  0.0814471670
[121,] -0.1230061929
[122,]  0.0000000000
[123,]  0.0000000000
[124,]  0.0000000000
[125,] -0.1678915810
[126,] -0.0616035284
[127,]  0.2075364291
[128,] -0.0453085070
[129,]  0.7101641500
[130,]  0.0002488762
[131,] -0.1459063106
[132,] -0.0779694648
[133,]  0.0253412662
[134,] -0.0891190294
[135,] -0.1157799493
[136,]  0.1463396794
[137,]  0.0346808209
[138,] -0.0811197578
[139,] -0.1877274195
[140,]  0.0680455050
[141,] -0.1237211515
[142,]  0.0320504490
[143,] -0.1051331862
[144,] -0.0578672214
[145,] -0.0411682413
[146,] -0.2170529649
[147,] -0.0484276259
[148,] -0.3245280372
[149,] -0.3029682208
[150,] -0.0590264989
[151,] -0.0722641195
[152,] -0.1121671714
[153,] -0.2952587545
[154,]  0.0309180091
[155,]  0.0008123435
[156,]  0.0028539354
[157,]  0.0277568942
[158,] -0.0983737888
[159,] -0.1323479878
[160,] -0.2920793771
[161,] -0.1702230761
[162,]  0.4197044884
[163,] -0.2536662440
[164,] -0.0705536478
[165,]  0.0000000000
[166,]  0.0000000000
[167,] -0.0540725840
[168,] -0.0881455927
[169,] -0.1143525171
[170,] -0.1495833663
[171,] -0.0368658083
[172,] -0.1972330595
[173,] -0.0348944107
[174,] -0.0619706566
[175,] -0.2914716872
[176,] -0.1072489335
[177,]  0.0834439417
[178,] -0.0840815999
[179,] -0.6446942228
[180,] -0.0470353951
[181,] -0.1110894639
[182,] -0.0459517509
[183,] -0.1227639243
[184,] -0.0346360599
[185,] -0.0656105739
[186,]  0.3067521544
[187,]  0.1015629861
[188,]  0.2124552576
[189,] -0.2712598474
[190,]  0.3040509120
[191,] -0.2845852061
[192,] -0.0672822660
[193,] -0.1363483749
[194,] -0.1514932215
[195,] -0.0781322626
[196,] -0.2091918814
[197,] -0.1513288869
[198,] -0.1447832982
[199,] -0.1108976971
[200,] -0.0426150255
[201,] -0.4414353674
[202,] -0.1608418544
[203,]  0.0685016352
[204,] -0.1128642341
[205,] -0.2641431153
[206,] -0.0995623880
[207,]  0.3419404051
[208,] -0.2387375995
[209,] -0.0487325131
[210,] -0.2339632288
[211,] -0.0847638603
[212,] -0.1281424661
[213,] -0.1810291459
[214,] -0.0685577820
[215,] -0.0797967498
[216,] -0.2440177365
[217,] -0.1185505564
[218,] -0.0162147583
[219,] -0.0506618094
[220,] -0.1446682421
[221,]  0.0837719276
[222,]  0.3440242089
[223,] -0.0280589282
[224,] -0.2110092354
[225,]  0.3416017109
[226,] -0.6231208231
[227,] -0.0811446708
[228,] -0.5857962908
[229,]  0.4451250358
[230,] -0.0520991480
[231,] -0.0651970639
[232,] -0.0882738104
[233,] -0.1173651951
[234,]  2.0603249254
[235,] -0.4972618003
[236,] -0.1036548536
[237,] -0.0908128118
[238,] -0.1327611613
[239,]  0.0000000000
[240,] -0.1356394026
[241,]  0.0000000000
[242,] -0.2135824344
[243,] -0.1241929944
[244,]  0.6821125712
[245,] -0.2754187578
[246,] -0.1855253738
[247,]  1.5222791834
[248,]  0.2208413393
[249,]  1.0562361717
[250,] -0.0048449806
[251,] -0.1189017421
[252,] -0.4895602600
[253,] -0.1135719040
[254,] -0.0676320823
[255,]  0.1005739634
[256,] -0.2419745702
[257,] -0.0801348600
[258,] -0.1102444586
[259,] -0.0879023798
[260,] -0.1375754176
[261,] -0.1584847702
[262,] -0.0595301949
[263,] -0.1823720474
[264,] -0.3042325724
[265,] -0.0842451390
[266,] -0.0744563202
[267,] -0.1511682239
[268,]  0.2334425949
[269,] -0.1461719197
[270,] -0.0238821889
[271,]  0.6275300583
[272,] -0.0048609116
[273,]  0.6164579347
[274,] -0.0096025503
[275,]  0.7755123430
[276,]  0.5307152387
[277,] -0.0761037951
[278,] -0.1927386373
[279,] -0.0075903290
[280,] -0.1807851463
[281,]  0.1755096293
[282,]  0.0000000000
[283,]  0.2408096923
[284,] -0.0164246827
[285,] -0.9962304993
[286,] -0.1438171959
[287,] -0.8472877077
[288,] -0.0568512443
[289,] -0.0479584283
[290,] -0.0629269778
[291,]  0.3797721495
[292,] -0.1126645387
[293,] -0.1560998845
[294,] -0.1582869444
[295,] -0.1703760580
[296,] -0.0661093907
[297,] -0.1157941971
[298,] -0.1176459069
[299,] -0.1976240830
[300,] -0.0588367025
[301,]  0.0155135863
[302,] -0.2473955545
[303,] -0.1322487407
[304,] -0.0689979890
[305,] -0.0572836415
[306,] -0.3117814318
[307,] -0.1636116833
[308,] -0.1269969733
[309,] -0.2330664276
[310,] -0.0804619858
[311,] -0.0527538119
[312,]  0.1562319192
[313,] -0.1585361796
[314,]  0.0112082915
[315,]  0.0231350127
[316,]  0.0375446047
Code
smerf$omega # the predicted stochastic processes
   [1]  -6.24230182  -6.83557345  -7.74629285  -9.39912475  -9.80626999
   [6] -10.17763529  -9.83733058  -9.68053304  -5.46755523  -6.19466854
  [11]  -6.70500045  -6.93478231  -0.11403419  -6.38025816  -6.93047546
  [16]  -7.39190360  -7.74734238  -9.05357904 -10.71682476 -10.25849512
  [21] -10.42186361   3.03366171   2.46257813   1.66454175   1.18827166
  [26]  -0.38672230  -1.68057708  -2.55647448  -2.79103191  -2.81142716
  [31]   3.75707694   3.36281923   1.41078324  -2.23552264  -3.18683105
  [36]  -3.58856922  -5.23270510  -8.33094455 -11.24384133 -12.34123979
  [41] -13.21866132  -2.91742381  -3.49130516  -4.15386561  -5.52902706
  [46]  -7.03120125  -8.34016470  -9.24312629  -9.82446698  -4.36372971
  [51]  -5.95825319  -7.02111990  -8.18748189  -8.70324232 -10.24269878
  [56] -13.08144710 -12.20015985 -11.93849962 -12.33514205 -13.29029603
  [61] -14.37660131 -14.66336668  -9.46971092 -10.63412306 -11.49030189
  [66] -12.43228725 -13.26196619 -16.36374563 -18.39822528 -19.09154044
  [71]  -2.66273587  -3.47817862  -4.43389227  -4.75479110  18.21068145
  [76]  16.89649786  18.19668132  15.96743857  14.03303361   6.68542543
  [81]   2.20703824  -0.41892723  -2.73024402  -4.41056009  -5.48568413
  [86]  -0.66108019   0.47454653   0.52217328  -1.05022037  -2.08215294
  [91]  -3.50992777  -3.45799658  -3.55661324  -4.67520182  -5.30335933
  [96]  -5.37434147  -7.89431208  -8.69836905  -9.02127868  -9.51558331
 [101]  -9.42789106  -9.16233264  -8.73886099  -3.46083579  -4.92416166
 [106]  -6.26892876  -7.60018986  -8.45498959  -9.36887413  -9.73398466
 [111]  -9.15135167   1.63070687   2.83347187   3.15250721   3.33367859
 [116]   3.55762065   2.72397514   2.67556222   3.49623413   3.19022692
 [121]   3.67231242   1.28573563   1.05479490  -2.13444064  -1.35609241
 [126]  -1.42656781  -1.55727578  -2.35065043  -3.12437404  -3.06676744
 [131]  -4.17996825  -6.82848608  -8.19349574  -9.26143499 -10.28664183
 [136] -11.93963508 -12.67030972 -14.65126655 -13.14782948 -12.04067079
 [141] -10.81311442  -9.32058862  -9.24503948   9.38168410   8.47086174
 [146]   5.67378687   0.15929808  -1.87548037  -3.37729729  -9.91076801
 [151] -10.32220145  -5.75335789  -5.70605603  -5.15503115  -1.73276263
 [156]  -2.74182017  -7.43011042  -8.01308457  -8.20411460  -3.01136660
 [161]   2.06393832   2.97267291   3.33678604   1.61958528  -0.82903723
 [166]  -3.37502952  -4.77980638  -5.96785422  -6.60131247  -2.84186051
 [171]  -3.22125848  -3.03244913  -4.39210280  -5.77994017  -7.90306597
 [176]  -8.84484146 -10.15400408 -10.11168485  -9.93159692  -9.67419193
 [181]  -9.06018472  -3.60494565  -3.81659416  -3.43714397  -3.68644417
 [186]  -3.27483403  -2.34678398  -2.14201532  -2.23034073  -3.16179072
 [191]  -3.73208464  -6.85742418  -6.68931554  -7.20346554  -7.18384384
 [196]  -8.29720737  -9.39333497  -9.44984314  -9.47198372  -9.43557641
 [201]  -5.11214560  -5.29147181  -5.71092276  -5.61097927  -5.96859928
 [206]  -6.70377744  -6.37994006  -6.95949463  -2.22463071  -2.68872145
 [211]  -2.43466616  -6.35716288  -7.31014674  -8.67872250  -8.86083748
 [216]  -7.20857202  -8.09384392  -8.48913805  -7.03650843  -8.18019972
 [221] -10.77474106 -13.14918077 -15.21389780 -14.96235194 -14.99613966
 [226] -16.65361094 -16.29499082 -16.88488053  -7.24768446  -7.12621537
 [231]  -7.81618868  -8.38610661  -8.45855834  -4.76718515  -3.22811496
 [236]  -3.72093495  -4.52247594  -4.96573941  -7.34230497  -5.83202727
 [241]  -5.27725953  -5.92204278  -6.40492935  -9.28440655  -9.92626346
 [246]  -9.80221582 -10.47903674 -12.22722905 -11.09446085  -9.72097726
 [251] -11.39469805 -14.06287552 -14.60880065 -17.17527796 -16.74301036
 [256]  -3.06469296  -4.36871536  -4.20314512  -4.23044571  -4.66985647
 [261]  -4.23896237  -3.22271831  -2.39875751  -0.95995186  10.56248850
 [266]  12.76717142  14.82447898  17.40913129  20.52407989  19.74691407
 [271]  19.55739621  18.05747881  17.16379278  17.31752883  18.44216302
 [276]  20.73244150  26.10966990  19.62527826  11.50373636   7.57978459
 [281]   0.91925112  -6.90205956  -7.17329548  -7.55302283  -8.78890494
 [286]  -9.83859279 -10.87492287 -11.55831861  -4.89466088  -3.94620756
 [291]  -4.35611267  -5.37313746  -6.21834075  -6.42336111  -7.54797327
 [296]  -9.57166573  -9.88961157  -3.76964033  -4.09714192  -1.00805627
 [301]  -4.62899368  -4.54312828  -4.32817672  -4.73108119  -8.75486860
 [306]  -9.46359137  -9.28310233  -2.20431399  -2.49264552  -3.00525839
 [311]  -3.33608711  -3.70688903  -3.62740077  -2.84098604  -6.64197691
 [316] -12.57946481 -12.80639982 -12.73702165  -3.16780106  -3.41697654
 [321]  -4.17822787  -6.07490364  -6.25354846  -6.58317088  -8.66776372
 [326]  -9.76490491 -10.31226267  -9.68780511  -8.05423698  -8.19701183
 [331]  -5.97336362  -3.58321067  -3.13426729  -6.76402754  -6.86594636
 [336]  -6.38064749  -5.39432322  -3.23868600  -0.73745392   0.59351040
 [341]   0.28467364   0.06747873  -3.53037295  -4.39486432  -5.07434090
 [346]  -6.98622724 -10.21221982 -10.65310855 -10.83014646  -4.47809052
 [351]  -4.86243645  -5.11831374  -5.49723893  -5.76989051  -3.57616999
 [356]  -4.04108843  -5.10444531  -8.91202631 -11.47021490 -13.68022345
 [361] -13.97368602 -14.31158263 -14.57181210 -14.36592912 -14.20083894
 [366] -13.79533352  -2.51637026  -3.33992859  -4.49550612  -5.83139358
 [371]  -5.83436485  -3.43276495  -3.98075076  -4.63757329  -2.71768477
 [376]  -2.43096740  -2.85023201  -1.29853490  -1.73111558  -3.77676168
 [381]   2.15988706   4.01368238   4.78912933   6.40254821   8.30904588
 [386]  10.90825677  15.10311472   8.39779522  13.48152933  16.25622525
 [391]  19.63331304  19.33310014  17.91618380  18.17078510  18.33416571
 [396]  17.13355624  17.22266966  17.14942450  17.44541808  19.44138478
 [401]  21.34739529  19.57174853  19.36824167   5.00090458   5.83846995
 [406]   5.94343595   5.25137052   2.55690668   1.21562144  -0.42820715
 [411]  -1.33998342  -3.02814695  -2.06721973  -2.82550846  -3.49870671
 [416]  -3.92177899  -4.20089188  -2.56987279  -4.44592227  -5.37405015
 [421]  -8.94898305 -10.22675311 -10.16770910 -10.02295105  -7.11515612
 [426]  -7.56378904  -7.70612615  -2.13548738  -2.41869555  -0.40852328
 [431]  -4.89237993  -4.30567812  -4.59287093  -4.66556185  -6.92677204
 [436]   8.56481223   6.58500470   3.62066258   1.81742082  -0.95355959
 [441]  -2.32567656  -3.04045632  -3.44733914  -4.06270951  -5.50484323
 [446]  -5.67752403  -6.25757847  -7.42737764  -4.13008502  -5.71177156
 [451]  -6.91875777  -7.90088193  -8.24885317  -5.91993310  -7.05672443
 [456]  -9.29343409 -10.06472554  -2.78155460  -1.75105983   2.47779669
 [461]   4.58892820   8.02405901  -4.85345719  -5.84601930  -6.77836300
 [466]  -8.17834419  -9.66597741 -10.68349089 -11.10526178 -10.59989391
 [471]  -0.24264059  -0.50180988   3.56851967   5.70384525   8.10934220
 [476]   9.28382204  11.51044034  14.57793114  18.50355433  21.74516804
 [481]  24.49581793  28.53716541  26.47706168  24.00271179  21.42563342
 [486]  18.16161471  14.31641279   9.86118200   6.56496139   1.14044002
 [491]  -3.90589348 -10.39473660 -10.66130937 -12.44498217 -11.91564541
 [496]  -1.43768198  -2.12255613  -3.58659292  -5.82021930  -7.29206142
 [501]  -7.99248971  -8.56840106  -9.34910852  -9.89944564 -10.49050569
 [506]  -7.78664036  -8.84980665  -9.99635969  -9.83569038  -8.46922674
 [511] -11.26771950 -12.68909652 -13.44850268 -15.09015619 -13.95420434
 [516] -13.55607696 -12.98374674 -13.01453839 -11.94637619 -11.35060757
 [521]  -1.28893029  -1.38819436  -1.62129393  -2.98008449  -3.63581366
 [526]  -5.29875811  -4.16672175  -4.56689717  -5.45520533  -6.31646547
 [531]  -5.20729433  -4.44936363  -3.86000953  -4.99101856  -7.53656899
 [536]  -2.69987083  -6.15751921  -7.42511813  -6.49265172  -9.62809611
 [541] -10.83529529 -11.01163387  -5.17367064  -5.71336742  -6.98180044
 [546]  -7.74721255  -8.14686793  -8.52106645  -8.84831429   4.35438757
 [551]   2.86092886   0.14171010  -2.29268470  -0.61458317   1.64765936
 [556]   3.73308462   4.35178683   5.27029367   6.26714315   8.29221556
 [561]   4.18035744   0.77509294  -1.73600877  -9.85301882 -11.05107353
 [566] -10.84636247 -11.03438926 -11.15686854  -7.40713339  -8.71827149
 [571]  -8.73016293  -8.90038797  -9.04280725  -4.47463411  -5.42605706
 [576]  -5.84921365  -6.42513461  -6.41536870  -7.41620465  -8.57142582
 [581]  -9.14146807  -9.04017595  -8.60622264  -8.82190307  -8.91349538
 [586]  -3.16989672  -4.30869585  -6.86438124  -7.05314327  -3.81149133
 [591]  -2.93242170  -4.03393022  -4.55308114  -6.02180183  -6.17535889
 [596]  -6.20826660  -5.71969729  -5.42286783  -4.57408579  -4.06740054
 [601]  -4.17081101  -5.80298394  -6.45242285  -7.50965899  -9.16349623
 [606]  -9.64905873 -10.48128500  -4.25630522  -5.45299164  -6.09225623
 [611]  -6.76439955  -6.97021531  -7.54405760  -8.64192636  -8.68587002
 [616]  -8.38809564  -3.79024476  -6.84564538  -3.66588585  -1.31597828
 [621]  -3.87158894  -5.11791112  -6.64361481  -7.48228072  -2.57990346
 [626]  -3.95795761  18.42242972  -3.45576795  -4.23578714  -6.10765845
 [631]  -6.84091412  -7.39796791  -7.24364652  -6.56182609  -1.36863047
 [636]  -1.16143239   0.46860114   1.20293634   1.87383324   1.39227791
 [641]   2.69454209   4.05668656   6.04152829   7.72910518   7.10448005
 [646]   6.75655628   6.48980479   5.96510264   0.94411419   3.27077542
 [651]   2.67790757   2.37125058  -1.20297310  -0.01993767   2.22115757
 [656]   5.46081916   6.90706103   8.14764944   8.46740314   8.31638001
 [661]   9.46877130   9.43707565   9.41415867   9.10162861   9.39914535
 [666]   9.13671226   8.16273408   7.71113375   6.49910489  -4.28820556
 [671]  -4.47381764   2.37913101   8.02154004  15.87713823  20.44800919
 [676]  24.09833302  28.25113809  28.14525151  27.88630009  31.58988046
 [681]  29.03169463  28.12512716  24.57194395  24.32195682  21.97503446
 [686]  20.44657853  14.54962377   7.97345438   1.35905459   3.62144984
 [691]   2.66563662  -3.59998265  -3.39246896  -3.44498776  -4.52406127
 [696]  -5.84825927  -8.01885918  -9.66080468 -10.95301690 -10.19398479
 [701]  -9.49820137 -10.76321246  -8.67936825  -7.15805713  -5.63166485
 [706]  -5.27184223  -3.76901881  -2.00349165  -2.07705350  -0.55015778
 [711]  -1.03879059  -2.24746819  -3.04484183  -2.91832886  -1.81918390
 [716]   0.11372849   2.05508466  -1.80900247  -4.56057825  -7.72193397
 [721]  -8.04692518  -8.46350896  -1.27080275  -4.40589408  -5.98910670
 [726]  -7.23725152  -9.08065823  -9.37764373  -9.77817764  -9.85794926
 [731] -10.03417979  -9.90714060 -10.14381168  -9.59000512  -9.19365115
 [736]  -0.53510384  -0.04003796   1.00628882  -0.01689632  -0.53084528
 [741]  -1.84817202  -2.62013042  -4.02807519  -5.45489263  -6.25331308
 [746]  -7.46117756  -8.24951466  -9.03063769  -9.33164900  -6.34858131
 [751]  -6.31172598  -4.09741325  -3.36878746  -2.68142818  -3.00514169
 [756]  -3.73732671  -4.35364746  -3.87082465  -5.87295017  -6.66450476
 [761]  -7.86339295  -8.17232500 -11.34893495 -14.81782583 -14.67185493
 [766]  -2.87790475  -7.22314171  -8.71093623  -8.31149367  -8.08009077
 [771]  -8.14267391  -8.35200590   2.21456085   2.67256068   1.93931104
 [776]   0.42273738  -1.43771210  -1.84900133  -2.14805749  -2.42472500
 [781]  -4.76670531  -8.43393795  -9.69083695 -10.77224344  -5.24332070
 [786]  -6.73763273  -6.96209829  -7.24893044  -7.73088787  -8.67267888
 [791]  -9.48685510 -11.07921877 -11.19922598 -11.59050787 -11.64851686
 [796] -11.22799406  -5.03402909  -6.19892510  -7.64253921  -8.20689625
 [801]  -9.57029948 -11.83073014 -12.44015256 -12.60445862 -12.71027242
 [806]  -5.66120665  -6.92835651  -7.50050483  -8.07549724  -8.40370029
 [811]  -4.18000362  -5.91969161  -7.46673211  -8.67106892  -9.83230228
 [816] -10.53323745 -11.73021009 -11.06167116 -10.68576507  -1.94160596
 [821]  -8.21033609  -8.43045565  -8.16427721  -8.42550465  -8.92371485
 [826]  -9.18185679  -9.54311346  -9.76187683  -4.90908025  -5.85070027
 [831]  -2.64525986  -3.00572674  -7.09256872  -7.64210398  -8.21738024
 [836]  -2.28984342  -3.33915990  -3.93066686  -5.29652763  -4.95087886
 [841]  -5.66944248  -4.58455238  -5.38263177  -6.09236789  -6.44532033
 [846]  -2.64950369  -3.21691055  -2.87063394  -5.44123345  -7.89212404
 [851]  -9.55578409 -11.55278544 -11.13606767 -10.90155487 -10.42507224
 [856]  -9.70640177  -9.55628436  -8.54387915  -6.46077436  -1.24330026
 [861]  -1.05093900  -0.94360170   3.16701792   5.53422880   8.89887251
 [866]  11.76571060  13.69883048  16.46703981  19.45372892  23.02878829
 [871]  22.81948209  25.06774061  21.39796048  17.62981901  13.55444765
 [876]  10.48967753   8.81427170   1.74204004   2.35217593   2.35592524
 [881]   2.61529841   2.01960577   2.58219909   2.61730608   2.86468442
 [886]   2.35670677   2.72950762   0.62760894  -1.27399917  -2.21610746
 [891]  -2.34628681  -1.62309254  -1.28494716  -0.84680487  -5.08435280
 [896]  -5.82125161  -7.27976400  -9.03458523 -10.37341462 -11.26343079
 [901] -11.78618939 -12.47608675 -12.31281429 -12.04048915 -11.64667760
 [906] -11.08917564 -11.65449103 -11.87604200 -12.02044430  -5.73175965
 [911]  -6.71810659  -7.58932900  -8.25161038  -8.74172134  -9.93842365
 [916] -10.51111238 -10.26277889  -9.24530114  -4.96142272  -6.02513451
 [921]  -7.61364555  -8.86303520  -9.36177988 -10.20033950 -10.23656792
 [926]   6.79241538  14.61116928  14.80979600  14.09149400  13.02799297
 [931]  12.38654011  11.07487100   9.10329002   6.11490235   5.19387082
 [936]   5.86982539   7.00913717   5.60839465   5.30292962   4.85036560
 [941]   5.22268129  -4.75383162  -5.40861521  -6.23006708  -6.82562379
 [946]  -7.48095369  -8.04545576  -8.15274668  -7.96241802  -8.22461304
 [951]  -8.86695290  -9.52259500  -9.76945237  -9.54524440  -2.26478671
 [956]  -3.08033363  -5.11020471  -7.63184713 -10.30094252 -10.96402035
 [961] -11.19985703 -11.83772513   1.02826004  -3.02211673  -2.52559389
 [966]  -5.88135789  -7.54596970  -7.97255415  -5.72952819  -6.96346842
 [971]  -7.56656754  -9.60203937  -8.87552867  -8.19850247  -7.88125007
 [976]  -7.64077896  -8.26037919  -8.24832586  -8.39258845  -8.41705685
 [981]  -9.08382526  -2.35957230  -6.23592720  -7.17684804  -7.75090099
 [986]  -2.05094062  -4.06560043  -4.17956773  -6.75211521  -7.71865070
 [991]  -8.07455889  -8.55405952  -8.52787313  -8.76122684  -9.62243163
 [996] -11.43284759 -11.97631567 -12.08146369 -11.76853234  -3.60639453
[1001]  -5.58927357  -8.14642160  -8.22693039   3.76710814   4.80113768
[1006]   3.64250036   2.97105061   2.22801382   0.95605691   0.38627322
[1011]   0.40433305   1.15007132   0.39701851  -0.31708849  -0.65070365
[1016]   2.56389001   6.77997560  -5.12105023  -7.49478021  -9.14829125
[1021]  -9.56416941  -9.96777221  -2.35924492  -2.23080013  -0.91091742
[1026]   2.41103529   0.06181812  -3.26631288  -5.87164904  -7.66362114
[1031]  -8.68351989  -9.62295576 -10.75087914 -10.78166622 -11.40298152
[1036] -11.57209432  -1.72610515  -6.06451757  -7.12060338  -7.56539186
[1041]  -7.73547919  -1.96727377  -6.64102037  -6.97898743  -6.96965294
[1046]  -6.63708249  -6.74209571  -6.34569704  -4.52594663  -5.28795977
[1051]  -6.65823607  -8.41960071  -7.92078473  -8.32279319  -8.03812193
[1056]  -7.41860571  -3.79345523  12.12881417  15.05557213  17.00375153
[1061]  16.70048989  18.10266607  16.79786343  15.75731606  14.51540601
[1066]  11.40281005   8.56321834   6.28859298   2.40678763   0.24866332
[1071]   4.86347910   2.89410552   0.42411698  -2.74392488  -4.98535134
[1076]  -6.45635798  -8.39596028 -10.02111554 -12.33734522 -12.76253927
[1081] -12.97314771   6.84493941  10.16340216  14.73274018  14.92547844
[1086]  14.78824816  14.51214111  13.52263658  13.15465848  10.23474077
[1091]   6.20583595   1.36565388  -3.76926893  -7.97568977  -5.60255141
[1096]  -5.97237348  -6.87238471  -9.54076090 -10.17094227 -10.78462098
[1101] -10.14725083  12.37020974  15.49557666  16.67670075  18.11160380
[1106]  16.24712282  13.56435284  -6.13865119  -7.26838939  -8.28897011
[1111]  -8.60237284  -8.34010837  -8.15460467  -8.42113635  -3.59686834
[1116]  -5.57720567  -4.32153743  -5.80644963  -7.08583508  -7.46181631
[1121]  -7.67103459  -7.61087217  -8.16900349  -8.74885773  -4.84619831
[1126]  -5.74019546  -6.08216384  -7.81853788  -8.28913913  -8.59548504
[1131]  -8.88949247  -8.99218340  -5.00153465  -5.98921828  -7.98160053
[1136]  -8.87456192  -9.33627071  -9.75756303  -8.93760226  -4.45773035
[1141]  -5.88715061  -5.33951305  -4.57466978  -3.26805143  -1.11430364
[1146]   5.52587485  10.64281919  10.12185326  11.04496811  12.17107046
[1151]  -3.79235755  -2.43675743  -2.77830334  -2.87903948  -8.28103217
[1156] -12.58440923 -10.12926268  -8.56804562  -7.86649532  -9.51282311
[1161] -11.72404495 -12.89410231 -13.14455942  -4.91064101  -5.57324595
[1166]   2.31566566   1.30467789  -0.12024910   0.44987209   1.27978722
[1171]   2.54335306   1.99125517   0.80904054   0.15544265  -1.64007508
[1176]  -2.33316572  -6.31125578  -7.18005143  -7.70985980  -8.57689465
[1181]  -8.86625918  -9.12155595  -2.97594127  -3.17061762  -3.39371903
[1186]  -3.96935461  -3.31418781  -5.95527106  -5.29654042  -5.79875248
[1191]  -8.13289062  -8.53499602  -8.82426666  -5.97801656  -6.59729679
[1196]  -6.90788831  -7.27205254   7.80620223   4.44972826   0.18524811
[1201]  -5.10347831 -10.93250310 -11.42117740  -2.24857071  -4.67664207
[1206]  -7.57394105  -9.52935797 -16.52449109 -16.37628383 -15.88248464
[1211] -13.99201738 -12.67842224 -11.80926344  -3.46080244  -4.09597525
[1216]  -6.66646294  -9.15324708  -8.71251359  -2.87237352  -3.20582936
[1221]  -2.65324332  -2.81339725  -5.36271854  -6.57353024  -6.40568609
[1226]  -2.71291921  -2.61747590  -2.23736212  -1.19172881   0.76646904
[1231]   7.31812281  -2.69598759  -5.26538635  -6.44634450  -7.57802727
[1236]  -9.65841551 -12.00624674 -13.26713740 -15.89790628 -15.39453846
[1241] -15.73790463  -1.94690124  -7.99628108  -8.98985973  -9.45002591
[1246]  -9.71710501 -10.79218702  -6.83107521  -7.84285327  -8.79276684
[1251] -10.14647187 -10.18001839  -1.93624468  -2.56929779  -2.13938661
[1256]  -2.67966526  -5.23571537  -5.85672679  -7.01173191  -7.72505418
[1261]   2.52598615   1.42196648   1.54814238  -1.71753797  -2.54448594
[1266]  -5.59485580  -8.56555687  -9.72753819 -10.10589839   8.75584549
[1271]  12.87083549  15.19346478  16.49643800  20.12937681  18.11828160
[1276]  19.88970311  17.02319357  16.76947821  -0.99006454  -6.60102925
[1281]  -7.76854714  -8.94042597   5.87201855   8.58178058   8.73404383
[1286]   8.96705828   8.63378762  10.97274373  11.73524296  11.98788977
[1291]  11.76929315  -9.62812966 -12.86682752 -11.42096789 -13.33347834
[1296] -12.63418211 -11.78593208 -11.83383124 -10.79459769  -9.61449625
[1301]  -5.20468082  -5.77950393  -6.41953058  -7.09980981 -13.63399300
[1306] -13.71334108 -13.99142406 -12.30977954 -11.32855812 -11.92330797
[1311] -11.87039596  10.73300793   9.99229408   9.04274351   4.81242419
[1316]   4.60441977   4.22272227   5.23502081  -5.83586844  -6.35210546
[1321]  -2.69157850  -3.50532009  -7.37073665  -8.34128917  -8.64694332
[1326]  -8.88261371  -3.55933451  -7.16570880  -9.00549277 -10.76727035
[1331]  -9.87298226  -9.24117880  -8.90601656  -8.68686010  31.66646966
[1336]  31.11762963  31.84062392  31.63674103  32.05473616  25.30461911
[1341]  22.81749523  22.38479218 -10.29869980 -10.68042036  -2.66146802
[1346]  -4.34237240  -7.76400135  -8.53548341  -9.17533432  -9.94522718
[1351]  -3.42895194  -4.85600524  -6.15537194  -8.05594557  -8.54779869
[1356]  -5.53574555  -7.22936479  -7.87759327  -8.16910203  -8.75219002
[1361]  -8.95136961  -3.07603955  -6.19688826  -7.45635960  -9.08461934
[1366]  -9.53143809 -10.06673902 -10.50831754  -2.39488859  -6.45643991
[1371]  -7.58026343  -8.15255214  -8.30563020  -4.62627445  -2.67130453
[1376]  -2.48825992   0.60135725  -6.89631153  -9.78091401  -9.90906686
[1381]  -9.91275874 -10.11920607  -9.92514423  -9.89284566  -8.35284454
[1386]  -9.09162044 -10.04375564 -12.02185332 -13.47654414  24.22462837
[1391]  24.43501972  24.29848692  27.03743819  34.62963284  40.35088404
[1396]  37.83203624   6.64424062   8.35205725  10.97770372  16.07627178
[1401]  21.25716204  25.10455485  25.46301034  24.00953192  36.10826525
[1406]  42.04630168  44.29111429  43.70832656  42.06833069  40.55995681
[1411]   0.84198215   2.56781669   2.74416535   4.83084516   8.09702212
[1416]  13.81088053  13.18405088  -5.78582119  -6.58658276  -6.94986107
[1421]  -7.53987399 -18.86561716 -17.59952981  -4.08754323  -5.18481275
[1426]  -5.96519573  -3.20957276   3.04234067  -1.08023096  -5.07136676
[1431]  -6.09471268  -5.18760200  -9.09138628  -5.48133056  -6.94355604
[1436]  -6.24630241  -3.84411697  -5.05251686  -6.14242785  -6.99900074
[1441]  -7.26894508  -5.49798080  -7.79279028  -3.89782729  -4.93960068
[1446]  -4.33127537  -2.41977189  -0.34368046   1.75758096  -4.87214884
[1451]  -3.94620525  -4.34312752  -4.67969917  -5.98123224  -5.90165555
[1456]  -6.74543440  -7.77270935  -4.58363432  -6.93305376  -8.91962669
[1461] -14.28964594 -14.80340923 -14.58265565  -5.86561309  -6.70890539
[1466]  -7.84836375  -4.24082610  -5.63489216  -6.95982574  -7.53762354
[1471]  -7.76445139  -8.19733606  12.28641326  -2.49871928  -1.95118161
[1476]  -1.96491714  -3.89151008  -4.70022293  -2.02512085  -2.01908487
[1481]   7.23721123  11.90364670   9.87245370   9.37221877  -0.24660310
[1486]  12.36874697  12.23339726  11.87342964  10.86144816   9.70173354
[1491]  -0.47520191  15.94198179  20.89002593  20.11577643  17.79209504
[1496]  17.84801240  18.58697208  22.78213381  24.53052389  30.47350294
[1501]  27.56829729  -2.69326026  -4.04533282  -5.53672839  -8.52243186
[1506]  -8.35464965  -8.42345515  -4.14649004  -4.79594736  -4.77419320
[1511]  -2.87024825   6.32105923   4.72380576   2.79338731  -0.23199247
[1516]  -0.09574674   4.79553125   3.34856283   0.85672885  -0.26177498
[1521]  -0.49346052  -0.75954970  -1.41532035  -2.32560951 -11.86860191
[1526] -13.92585368 -13.96455994  -3.17199482  -4.12939766  -4.02776181
[1531] -12.90079929 -19.39013043 -18.19603847  -2.54350524  -3.66547834
[1536]  -3.83995599  -4.10310486   5.74948456   4.88620341   2.54366566
[1541]  -4.07240851  -6.21400318  -7.01023086  -3.85882127  -4.13791810
[1546]  -3.23192777  -4.15365381  -6.44481792  -6.69792550  -8.07295079
[1551]  -7.81316032  -7.19420903  -2.56011652  -4.11724372  -7.07389175
[1556]  -6.94027068  -8.56538979  -8.71758521  -2.99265208   0.75884605
[1561]   0.90928027  -5.33273057  -6.08275367  -6.57876530  -8.00522652
[1566]  -2.86047294  -3.92857675  -7.55851748 -11.07549604  -7.29873332
[1571]  -8.64205564  -2.35681342  -3.44182385  -2.53506162  -1.72655178
[1576]  -3.60974326  -2.25750463   6.31042655  -2.84219789   0.33887936
[1581]   1.13454336   1.08890217
Code
smerf$OOB # OOB error at each iteration
[1] 171.1873 131.2674 140.9046 142.3326 147.4578 167.9198 169.7337
Code
plot(smerf$Vraisemblance)
Evolution of the Log-Likelihood.
Figure 19.7: Evolution of the Log-Likelihood.

19.7.5 k-Fold Cross-Validation

19.7.6 Leave-One-Out (LOO) Cross-Validation

19.7.7 Combining Tree-Boosting with Mixed Models

To combine tree-boosting with mixed models, we use the gpboost package (Sigrist et al., 2025).

Adapted from here: https://towardsdatascience.com/mixed-effects-machine-learning-for-longitudinal-panel-data-with-gpboost-part-iii-523bb38effc

19.7.7.1 Process Data

If using a gamma distribution, it requires positive-only values:

Code
data_train_qb_matrix[,"fantasyPoints_lag"][data_train_qb_matrix[,"fantasyPoints_lag"] <= 0] <- 0.01

19.7.7.2 Specify Predictor Variables

Code
pred_vars_qb <- data_train_qb_matrix %>% 
  as_tibble() %>% 
  select(-fantasyPoints_lag, -fantasyPointsMC_lag, -ageCentered20, ageCentered20Quadratic) %>% # -gsis_id
  names()

pred_vars_qb_categorical <- "gsis_id" # to specify categorical predictors

19.7.7.3 Specify General Model Options

Code
model_likelihood <- "gamma" # gaussian
nrounds <- 2000 # maximum number of boosting iterations (i.e., number of trees built sequentially); more rounds = potentially better learning, but also greater risk of overfitting

19.7.7.4 Identify Optimal Tuning Parameters

For identifying the optimal tuning parameters for boosting, we partition the training data into inner training data and validation data. We randomly split the training data into 80% inner training data and 20% held-out validation data. We then use the mean absolute error as our index of prediction accuracy on the held-out validation data.

Code
# Partition training data into inner training data and validation data
ntrain_qb <- dim(data_train_qb_matrix)[1]

set.seed(52242)
valid_tune_idx_qb <- sample.int(ntrain_qb, as.integer(0.2*ntrain_qb)) # 

folds_qb <- list(valid_tune_idx_qb)

# Specify parameter grid, gp_model, and gpb.Dataset
param_grid_qb <- list(
  "learning_rate" = c(0.2, 0.1, 0.05, 0.01), # the step size used when updating predictions after each boosting round (high values make big updates, which can speed up learning but risk overshooting; low values are usually more accurate but require more rounds)
  "max_depth" = c(3, 5, 7), # maximum depth (levels) of each decision tree; deeper trees capture more complex patterns and interactions but risk overfitting; shallower trees tend to generalize better
  "min_data_in_leaf" = c(10, 50, 100), # minimum number of training examples in a leaf node; higher values = more regularization (simpler trees)
  "lambda_l2" = c(0, 1, 5)) # L2 regularization penalty for large weights in tree splits; adds a "cost" for complexity; helps prevent overfitting by shrinking the contribution of each tree

other_params_qb <- list(
  num_leaves = 2^6) # maximum number of leaves per tree; controls the maximum complexity of each tree (along with max_depth); more leaves = more expressive models, but can overfit if min_data_in_leaf is too small; num_leaves must be consistent with max_depth, because deeper trees naturally support more leaves; max is: 2^n, where n is the largest max_depth

gp_model_qb <- gpboost::GPModel(
  group_data = data_train_qb_matrix[,"gsis_id"],
  likelihood = model_likelihood,
  group_rand_coef_data = cbind(
    data_train_qb_matrix[,"ageCentered20"],
    data_train_qb_matrix[,"ageCentered20Quadratic"]),
  ind_effect_group_rand_coef = c(1,1))

gp_data_qb <- gpboost::gpb.Dataset(
  data = data_train_qb_matrix[,pred_vars_qb],
  categorical_feature = pred_vars_qb_categorical,
  label = data_train_qb_matrix[,"fantasyPoints_lag"]) # could instead use mean-centered variable (fantasyPointsMC_lag) and add mean back afterward

# Find optimal tuning parameters
opt_params_qb <- gpboost::gpb.grid.search.tune.parameters(
  param_grid = param_grid_qb,
  params = other_params_qb,
  num_try_random = NULL,
  folds = folds_qb,
  data = gp_data_qb,
  gp_model = gp_model_qb,
  nrounds = nrounds,
  early_stopping_rounds = 50, # stops training early if the model hasn’t improved on the validation set in 50 rounds; prevents overfitting and saves time
  verbose_eval = 1,
  metric = "mae")
Error in fd$booster$update(fobj = fobj): [GPBoost] [Fatal] Inf occured in gradient wrt covariance / auxiliary parameter number 3 (counting starts at 1, total nb. par. = 4) 
Code
opt_params_qb
Error: object 'opt_params_qb' not found

A learning rate of 1 is very high for boosting. Even if a learning rate of 1 did well in tuning, I use a lower learning rate (0.1) to avoid overfitting. I also added some light regularization (lambda_l2) for better generalization. I also set the maximum tree depth (max_depth) at 5 to capture complex (up to 5-way) interactions, and set the maximum number of terminal nodes (num_leaves) per tree at 2^5 (32). I set the minimum number of samples in any leaf (min_data_in_leaf) to be 10.

19.7.7.5 Specify Model and Tuning Parameters

Code
gp_model_qb <- gpboost::GPModel(
  group_data = data_train_qb_matrix[,"gsis_id"],
  likelihood = model_likelihood,
  group_rand_coef_data = cbind(
    data_train_qb_matrix[,"ageCentered20"],
    data_train_qb_matrix[,"ageCentered20Quadratic"]),
  ind_effect_group_rand_coef = c(1,1))

gp_data_qb <- gpboost::gpb.Dataset(
  data = data_train_qb_matrix[,pred_vars_qb],
  categorical_feature = pred_vars_qb_categorical,
  label = data_train_qb_matrix[,"fantasyPoints_lag"])

params_qb <- list(
  learning_rate = 0.1,
  max_depth = 5,
  min_data_in_leaf = 10,
  lambda_l2 = 1,
  num_leaves = 2^5,
  num_threads = num_cores)

nrounds_qb <- 123 # identify optimal number of trees through iteration and cross-validation

#gp_model_qb$set_optim_params(params = list(optimizer_cov = "nelder_mead")) # to speed up model estimation

19.7.7.6 Fit Model

Code
gp_model_fit_qb <- gpboost::gpb.train(
  data = gp_data_qb,
  gp_model = gp_model_qb,
  nrounds = nrounds_qb,
  params = params_qb) # verbose = 0
[GPBoost] [Info] Total Bins 8709
[GPBoost] [Info] Number of data points in the train set: 1582, number of used features: 73
[GPBoost] [Info] [GPBoost with gamma likelihood]: initscore=4.805531
[GPBoost] [Info] Start training from score 4.805531

19.7.7.7 Model Results

Code
summary(gp_model_qb) # estimated random effects model
=====================================================
Covariance parameters (random effects):
                       Param.
Group_1                     0
Group_1_rand_coef_nb_1      0
Group_1_rand_coef_nb_2      0
-----------------------------------------------------
Additional parameters:
      Param.
shape 0.8186
=====================================================
Code
gp_model_qb_importance <- gpboost::gpb.importance(gp_model_fit_qb)
gp_model_qb_importance
Code
gpboost::gpb.plot.importance(gp_model_qb_importance)
Importance of Features (Predictors) in Tree Boosting Machine Learning Model.
Figure 19.8: Importance of Features (Predictors) in Tree Boosting Machine Learning Model.

19.7.7.8 Evaluate Accuracy of Model on Test Data

Code
# Test Model on Test Data
pred_test_qb <- predict(
  gp_model_fit_qb,
  data = data_test_qb_matrix[,pred_vars_qb],
  group_data_pred = data_test_qb_matrix[,"gsis_id"],
  group_rand_coef_data_pred = cbind(
    data_test_qb_matrix[,"ageCentered20"],
    data_test_qb_matrix[,"ageCentered20Quadratic"]),
  predict_var = FALSE,
  pred_latent = FALSE)

y_pred_test_qb <- pred_test_qb[["response_mean"]] # if outcome is mean-centered, add mean(data_train_qb_matrix[,"fantasyPoints_lag"])

predictedVsActual <- data.frame(
  predictedPoints = y_pred_test_qb,
  actualPoints = data_test_qb_matrix[,"fantasyPoints_lag"]
)

predictedVsActual
Code
petersenlab::accuracyOverall(
  predicted = predictedVsActual$predictedPoints,
  actual = predictedVsActual$actualPoints,
  dropUndefined = TRUE
)

19.7.7.9 Generate Predictions for Next Season

Code
# Generate model predictions for next season
pred_nextYear_qb <- predict(
  gp_model_fit_qb,
  data = newData_qb_matrix[,pred_vars_qb],
  group_data_pred = newData_qb_matrix[,"gsis_id"],
  group_rand_coef_data_pred = cbind(
    newData_qb_matrix[,"ageCentered20"],
    newData_qb_matrix[,"ageCentered20Quadratic"]),
  predict_var = FALSE,
  pred_latent = FALSE)

newData_qb$fantasyPoints_lag <- pred_nextYear_qb$response_mean

# Merge with player names
newData_qb <- left_join(
  newData_qb,
  nfl_playerIDs %>% select(gsis_id, name),
  by = "gsis_id"
)

newData_qb %>% 
  arrange(-fantasyPoints_lag) %>% 
  select(name, fantasyPoints_lag, fantasyPoints)

19.8 Conclusion

19.9 Session Info

Code
sessionInfo()
R version 4.5.1 (2025-06-13)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 24.04.2 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
 [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
 [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
[10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   

time zone: UTC
tzcode source: system (glibc)

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] glmnet_4.1-9       Matrix_1.7-3       lubridate_1.9.4    forcats_1.0.0     
 [5] stringr_1.5.1      readr_2.1.5        tidyverse_2.0.0    gpboost_1.5.8     
 [9] R6_2.6.1           LongituRF_0.9      yardstick_1.3.2    workflowsets_1.1.1
[13] workflows_1.2.0    tune_1.3.0         tidyr_1.3.1        tibble_3.3.0      
[17] rsample_1.3.0      recipes_1.3.1      purrr_1.0.4        parsnip_1.3.2     
[21] modeldata_1.4.0    infer_1.0.9        ggplot2_3.5.2      dplyr_1.1.4       
[25] dials_1.4.0        scales_1.4.0       broom_1.0.8        tidymodels_1.3.0  
[29] powerjoin_0.1.0    missRanger_2.6.1   future_1.58.0      petersenlab_1.1.6 

loaded via a namespace (and not attached):
 [1] RColorBrewer_1.1-3   shape_1.4.6.1        rstudioapi_0.17.1   
 [4] jsonlite_2.0.0       magrittr_2.0.3       farver_2.1.2        
 [7] nloptr_2.2.1         rmarkdown_2.29       vctrs_0.6.5         
[10] minqa_1.2.8          base64enc_0.1-3      sparsevctrs_0.3.4   
[13] htmltools_0.5.8.1    Formula_1.2-5        parallelly_1.45.0   
[16] htmlwidgets_1.6.4    plyr_1.8.9           lifecycle_1.0.4     
[19] iterators_1.0.14     pkgconfig_2.0.3      fastmap_1.2.0       
[22] rbibutils_2.3        digest_0.6.37        colorspace_2.1-1    
[25] furrr_0.3.1          Hmisc_5.2-3          labeling_0.4.3      
[28] latex2exp_0.9.6      randomForest_4.7-1.2 RJSONIO_2.0.0       
[31] timechange_0.3.0     compiler_4.5.1       withr_3.0.2         
[34] htmlTable_2.4.3      backports_1.5.0      DBI_1.2.3           
[37] psych_2.5.6          MASS_7.3-65          lava_1.8.1          
[40] tools_4.5.1          pbivnorm_0.6.0       foreign_0.8-90      
[43] ranger_0.17.0        future.apply_1.20.0  nnet_7.3-20         
[46] doFuture_1.1.1       glue_1.8.0           quadprog_1.5-8      
[49] nlme_3.1-168         grid_4.5.1           checkmate_2.3.2     
[52] cluster_2.1.8.1      reshape2_1.4.4       generics_0.1.4      
[55] gtable_0.3.6         tzdb_0.5.0           class_7.3-23        
[58] data.table_1.17.6    hms_1.1.3            foreach_1.5.2       
[61] pillar_1.10.2        mitools_2.4          splines_4.5.1       
[64] lhs_1.2.0            lattice_0.22-7       survival_3.8-3      
[67] FNN_1.1.4.1          tidyselect_1.2.1     mix_1.0-13          
[70] knitr_1.50           reformulas_0.4.1     gridExtra_2.3       
[73] stats4_4.5.1         xfun_0.52            hardhat_1.4.1       
[76] timeDate_4041.110    stringi_1.8.7        DiceDesign_1.10     
[79] yaml_2.3.10          boot_1.3-31          evaluate_1.0.4      
[82] codetools_0.2-20     cli_3.6.5            rpart_4.1.24        
[85] xtable_1.8-4         Rdpack_2.6.4         lavaan_0.6-19       
[88] Rcpp_1.0.14          globals_0.18.0       gower_1.0.2         
[91] GPfit_1.0-9          lme4_1.1-37          listenv_0.9.1       
[94] viridisLite_0.4.2    mvtnorm_1.3-3        ipred_0.9-15        
[97] prodlim_2025.04.28   rlang_1.1.6          mnormt_2.1.1        

Feedback

Please consider providing feedback about this textbook, so that I can make it as helpful as possible. You can provide feedback at the following link: https://forms.gle/LsnVKwqmS1VuxWD18

Email Notification

The online version of this book will remain open access. If you want to know when the print version of the book is for sale, enter your email below so I can let you know.