I want your feedback to make the book better for you and other readers. If you find typos, errors, or places where the text may be improved, please let me know. The best ways to provide feedback are by GitHub or hypothes.is annotations.
You can leave a comment at the bottom of the page/chapter, or open an issue or submit a pull request on GitHub: https://github.com/isaactpetersen/Fantasy-Football-Analytics-Textbook
Alternatively, you can leave an annotation using hypothes.is.
To add an annotation, select some text and then click the
symbol on the pop-up menu.
To see the annotations of others, click the
symbol in the upper right-hand corner of the page.
19 Machine Learning
19.1 Getting Started
19.1.1 Load Packages
19.1.2 Load Data
Code
# Downloaded Data - Processed
load(file = "./data/nfl_players.RData")
load(file = "./data/nfl_teams.RData")
load(file = "./data/nfl_rosters.RData")
load(file = "./data/nfl_rosters_weekly.RData")
load(file = "./data/nfl_schedules.RData")
load(file = "./data/nfl_combine.RData")
load(file = "./data/nfl_draftPicks.RData")
load(file = "./data/nfl_depthCharts.RData")
load(file = "./data/nfl_pbp.RData")
load(file = "./data/nfl_4thdown.RData")
load(file = "./data/nfl_participation.RData")
#load(file = "./data/nfl_actualFantasyPoints_weekly.RData")
load(file = "./data/nfl_injuries.RData")
load(file = "./data/nfl_snapCounts.RData")
load(file = "./data/nfl_espnQBR_seasonal.RData")
load(file = "./data/nfl_espnQBR_weekly.RData")
load(file = "./data/nfl_nextGenStats_weekly.RData")
load(file = "./data/nfl_advancedStatsPFR_seasonal.RData")
load(file = "./data/nfl_advancedStatsPFR_weekly.RData")
load(file = "./data/nfl_playerContracts.RData")
load(file = "./data/nfl_ftnCharting.RData")
load(file = "./data/nfl_playerIDs.RData")
load(file = "./data/nfl_rankings_draft.RData")
load(file = "./data/nfl_rankings_weekly.RData")
load(file = "./data/nfl_expectedFantasyPoints_weekly.RData")
load(file = "./data/nfl_expectedFantasyPoints_pbp.RData")
# Calculated Data - Processed
load(file = "./data/nfl_actualStats_career.RData")
load(file = "./data/nfl_actualStats_seasonal.RData")
load(file = "./data/player_stats_weekly.RData")
load(file = "./data/player_stats_seasonal.RData")
19.1.3 Specify Options
19.2 Overview of Machine Learning
Machine learning takes us away from focusing on causal inference. Machine learning does not care about which processes are causal—i.e., which processes influence the outcome. Instead, machine learning cares about prediction—it cares about a predictor variable to the extent that it increases predictive accuracy regardless of whether it is causally related to the outcome.
Machine learning can be useful for leveraging big data and lots of predictor variable to develop predictive models with greater accuracy. However, many machine learning techniques are black boxes—it is often unclear how or why certain predictions are made, which can make it difficult to interpret the model’s decisions and understand the underlying relationships between variables. Machine learning tends to be a data-driven, atheoretical technique. This can result in overfitting. Thus, when estimating machine learning models, it is common to keep a hold-out sample for use in cross-validation to evaluate the extent of shrinkage of model coefficients. The data that the model is trained on is known as the “training data”. The data that the model was not trained on but is then is independently tested on—i.e., the hold-out sample—is the “test data”. Shrinkage occurs when predictor variables explain some random error variance in the original model. When the model is applied to an independent sample (i.e., the test data), the predictive model will likely not perform quite as well, and the regressions coefficients will tend to get smaller (i.e., shrink).
If the test data were collected as part of the same processes as the original data and were merely held out for purposes of analysis, this is called internal cross-validation. If the test data were collected separately from the original data used to train the model, this is called external cross-validation.
Most machine learning methods were developed with cross-sectional data in mind. That is, they assume that each person has only one observation on the outcome variable. However, with longitudinal data, each person has multiple observations on the outcome variable.
When performing machine learning, various approaches may help address this:
- transform data from long to wide form, so that each person has only one row
- when designing the training and test sets, keep all measurements from the same person in the same data object (either the training or test set); do not have some measurements from a given person in the training set and other measurements from the same person in the test set
- use a machine learning approach that accounts for the clustered/nested nature of the data
19.3 Types of Machine Learning
There are many approaches to machine learning. This chapter discusses several key ones:
- supervised learning
- continuous outcome (i.e., regression)
- linear regression
- lasso regression
- ridge regression
- elastic net regression
- categorical outcome (i.e., classification)
- logistic regression
- support vector machine
- random forest
- extreme gradient boosting
- continuous outcome (i.e., regression)
- unsupervised learning
- clustering
- principal component analysis
- semi-supervised learning
- reinforcement learning
- deep learning
- ensemble
Ensemble machine learning methods combine multiple machine learning approaches with the goal that combining multiple approaches might lead to more accurate predictions that any one method might be able to achieve on its own.
19.3.1 Supervised Learning
[DEFINE SUPERVISED LEARNING]
Unlike linear and logistic regression, various machine learning techniques can handle multicollinearity, including LASSO regression, ridge regression, and elastic net regression. Least absolute shrinkage and selection operator (LASSO) regression helps perform selection of which predictor variables to keep in the model by shrinking some coefficients to zero. Ridge regression shrinks the coefficients of predictor variables toward zero, but not to zero, so it does not perform selection of which predictor variables to retain; this allows it to allow nonzero coefficients for multiple correlated predictor variables in the context of multicollinearity. Elastic net involves a combination of LASSO and ridge regression; it performs selection of which predictor variables to keep by shrinking the coefficients of some predictor variables to zero, and it shrinks the coefficients of some predictor variables toward zero, to address multicollinearity.
Unless interactions or nonlinear terms are specified, linear, logistic, LASSO, ridge, and elastic net regresstion do not account for interactions among the predictor variables or for nonlinear associations between the predictor variables and the outcome variable. By contrast, random forests and extreme gradient boosting do account for interactions among the predictor variables and for nonlinear associations between the predictor variables and the outcome variable.
19.3.2 Unsupervised Learning
[DEFINE UNSUPERVISED LEARNING]
We describe cluster analysis in Chapter 21. We describe principal component analysis in Chapter 23.
19.3.3 Semi-supervised Learning
[DEFINE SEMI-SUPERVISED LEARNING]
19.3.4 Reinforcement Learning
[DEFINE REINFORCEMENT LEARNING]
19.4 Data Processing
19.4.1 Prepare Data for Merging
Code
# Prepare data for merging
#-todo: calculate years_of_experience
## Use common name for the same (gsis_id) ID variable
#nfl_actualFantasyPoints_player_weekly <- nfl_actualFantasyPoints_player_weekly %>%
# rename(gsis_id = player_id)
#
#nfl_actualFantasyPoints_player_seasonal <- nfl_actualFantasyPoints_player_seasonal %>%
# rename(gsis_id = player_id)
player_stats_seasonal_offense <- player_stats_seasonal %>%
filter(position_group %in% c("QB","RB","WR","TE")) %>%
rename(gsis_id = player_id)
player_stats_weekly_offense <- player_stats_weekly %>%
filter(position_group %in% c("QB","RB","WR","TE")) %>%
rename(gsis_id = player_id)
nfl_expectedFantasyPoints_weekly <- nfl_expectedFantasyPoints_weekly %>%
rename(gsis_id = player_id)
## Rename other variables to ensure common names
## Ensure variables with the same name have the same type
nfl_players <- nfl_players %>%
mutate(
birth_date = as.Date(birth_date),
jersey_number = as.character(jersey_number),
gsis_it_id = as.character(gsis_it_id),
years_of_experience = as.integer(years_of_experience))
player_stats_seasonal_offense <- player_stats_seasonal_offense %>%
mutate(
birth_date = as.Date(birth_date),
jersey_number = as.character(jersey_number),
gsis_it_id = as.character(gsis_it_id))
nfl_rosters <- nfl_rosters %>%
mutate(
draft_number = as.integer(draft_number))
nfl_rosters_weekly <- nfl_rosters_weekly %>%
mutate(
draft_number = as.integer(draft_number))
nfl_depthCharts <- nfl_depthCharts %>%
mutate(
season = as.integer(season))
nfl_expectedFantasyPoints_weekly <- nfl_expectedFantasyPoints_weekly %>%
mutate(
season = as.integer(season),
receptions = as.integer(receptions)) %>%
distinct(gsis_id, season, week, .keep_all = TRUE) # drop duplicated rows
## Rename variables
nfl_draftPicks <- nfl_draftPicks %>%
rename(
games_career = games,
pass_completions_career = pass_completions,
pass_attempts_career = pass_attempts,
pass_yards_career = pass_yards,
pass_tds_career = pass_tds,
pass_ints_career = pass_ints,
rush_atts_career = rush_atts,
rush_yards_career = rush_yards,
rush_tds_career = rush_tds,
receptions_career = receptions,
rec_yards_career = rec_yards,
rec_tds_career = rec_tds,
def_solo_tackles_career = def_solo_tackles,
def_ints_career = def_ints,
def_sacks_career = def_sacks
)
## Subset variables
nfl_expectedFantasyPoints_weekly <- nfl_expectedFantasyPoints_weekly %>%
select(gsis_id:position, contains("_exp"), contains("_diff"), contains("_team")) #drop "raw stats" variables (e.g., rec_yards_gained) so they don't get coalesced with actual stats
# Check duplicate ids
player_stats_seasonal_offense %>%
group_by(gsis_id, season) %>%
filter(n() > 1) %>%
head()
Code
Identify objects with shared variable names:
[1] "gsis_id" "position"
[1] 21360
[1] 2855
[1] "gsis_id" "season" "team" "age"
[1] 14859
[1] 10395
[1] 14858
[1] 10395
[1] 14859
[1] 10325
[1] "gsis_id" "season" "week" "position" "full_name"
[1] 845134
[1] 100272
[1] 841942
[1] 100272
[1] 845101
[1] 97815
[1] 845118
[1] 97815
19.4.2 Merge Data
To merge data, we use the powerjoin
package (Fabri, 2022):
Code
# Create lists of objects to merge, depending on data structure: id; or id-season; or id-season-week
#-todo: remove redundant variables
playerListToMerge <- list(
nfl_players %>% filter(!is.na(gsis_id)),
nfl_draftPicks %>% filter(!is.na(gsis_id)) %>% select(-season)
)
playerSeasonListToMerge <- list(
player_stats_seasonal_offense %>% filter(!is.na(gsis_id), !is.na(season)),
nfl_advancedStatsPFR_seasonal %>% filter(!is.na(gsis_id), !is.na(season))
)
playerSeasonWeekListToMerge <- list(
nfl_rosters_weekly %>% filter(!is.na(gsis_id), !is.na(season), !is.na(week)),
#nfl_actualStats_offense_weekly,
nfl_expectedFantasyPoints_weekly %>% filter(!is.na(gsis_id), !is.na(season), !is.na(week))
#nfl_advancedStatsPFR_weekly,
)
playerSeasonWeekPositionListToMerge <- list(
nfl_depthCharts %>% filter(!is.na(gsis_id), !is.na(season), !is.na(week))
)
# Merge data
playerMerged <- playerListToMerge %>%
reduce(
powerjoin::power_full_join,
by = c("gsis_id"),
conflict = coalesce_xy) # where the objects have the same variable name (e.g., position), keep the values from object 1, unless it's NA, in which case use the relevant value from object 2
playerSeasonMerged <- playerSeasonListToMerge %>%
reduce(
powerjoin::power_full_join,
by = c("gsis_id","season"),
conflict = coalesce_xy) # where the objects have the same variable name (e.g., team), keep the values from object 1, unless it's NA, in which case use the relevant value from object 2
playerSeasonWeekMerged <- playerSeasonWeekListToMerge %>%
reduce(
powerjoin::power_full_join,
by = c("gsis_id","season","week"),
conflict = coalesce_xy) # where the objects have the same variable name (e.g., position), keep the values from object 1, unless it's NA, in which case use the relevant value from object 2
Identify objects with shared variable names:
[1] "gsis_id" "position"
[3] "position_group" "first_name"
[5] "last_name" "esb_id"
[7] "display_name" "rookie_year"
[9] "college_conference" "current_team_id"
[11] "draft_club" "draft_number"
[13] "draftround" "entry_year"
[15] "football_name" "gsis_it_id"
[17] "headshot" "jersey_number"
[19] "short_name" "smart_id"
[21] "status" "status_description_abbr"
[23] "status_short_description" "uniform_number"
[25] "height" "weight"
[27] "college_name" "birth_date"
[29] "suffix" "years_of_experience"
[31] "pfr_player_name" "team"
[33] "age"
Code
seasonalData <- powerjoin::power_full_join(
playerSeasonMerged,
playerMerged %>% select(-age, -years_of_experience, -team, -team_abbr, -team_seq, -current_team_id), # drop variables from id objects that change from year to year (and thus are not necessarily accurate for a given season)
by = "gsis_id",
conflict = coalesce_xy # where the objects have the same variable name (e.g., position), keep the values from object 1, unless it's NA, in which case use the relevant value from object 2
) %>%
filter(!is.na(season)) %>%
select(gsis_id, season, player_display_name, position, team, games, everything())
[1] "gsis_id" "season"
[3] "week" "team"
[5] "jersey_number" "status"
[7] "first_name" "last_name"
[9] "birth_date" "height"
[11] "weight" "college"
[13] "pfr_id" "headshot_url"
[15] "status_description_abbr" "football_name"
[17] "esb_id" "gsis_it_id"
[19] "smart_id" "entry_year"
[21] "rookie_year" "draft_club"
[23] "draft_number" "position"
Code
seasonalAndWeeklyData <- powerjoin::power_full_join(
playerSeasonWeekMerged,
seasonalData,
by = c("gsis_id","season"),
conflict = coalesce_xy # where the objects have the same variable name (e.g., position), keep the values from object 1, unless it's NA, in which case use the relevant value from object 2
) %>%
filter(!is.na(week)) %>%
select(gsis_id, season, week, full_name, position, team, everything())
19.4.3 Additional Processing
19.4.4 Fill in Missing Data for Static Variables
19.4.5 Create New Data Object for Merging with Later Predictions
19.4.6 Lag Fantasy Points
19.4.7 Subset to Predictor Variables and Outcome Variable
Code
dropVars <- c(
"birth_date", "loaded", "full_name", "player_name", "player_display_name", "display_name", "suffix", "headshot_url", "player", "pos",
"espn_id", "sportradar_id", "yahoo_id", "rotowire_id", "pff_id", "fantasy_data_id", "sleeper_id", "pfr_id",
"pfr_player_id", "cfb_player_id", "pfr_player_name", "esb_id", "gsis_it_id", "smart_id",
"college", "college_name", "team_abbr", "current_team_id", "college_conference", "draft_club", "status_description_abbr",
"status_short_description", "short_name", "headshot", "uniform_number", "jersey_number", "first_name", "last_name",
"football_name", "team")
seasonalData_lag_subset <- seasonalData_lag %>%
dplyr::select(-any_of(dropVars))
19.4.8 Separate by Position
Code
seasonalData_lag_subsetQB <- seasonalData_lag_subset %>%
filter(position == "QB") %>%
select(
gsis_id, season, games, gs, years_of_experience, age, ageCentered20, ageCentered20Quadratic,
height, weight, rookie_year, draft_number,
fantasy_points, fantasy_points_ppr, fantasyPoints, fantasyPoints_lag,
completions:rushing_2pt_conversions, special_teams_tds, contains(".pass"), contains(".rush"))
seasonalData_lag_subsetRB <- seasonalData_lag_subset %>%
filter(position == "RB") %>%
select(
gsis_id, season, games, gs, years_of_experience, age, ageCentered20, ageCentered20Quadratic,
height, weight, rookie_year, draft_number,
fantasy_points, fantasy_points_ppr, fantasyPoints, fantasyPoints_lag,
carries:special_teams_tds, contains(".rush"), contains(".rec"))
seasonalData_lag_subsetWR <- seasonalData_lag_subset %>%
filter(position == "WR") %>%
select(
gsis_id, season, games, gs, years_of_experience, age, ageCentered20, ageCentered20Quadratic,
height, weight, rookie_year, draft_number,
fantasy_points, fantasy_points_ppr, fantasyPoints, fantasyPoints_lag,
carries:special_teams_tds, contains(".rush"), contains(".rec"))
seasonalData_lag_subsetTE <- seasonalData_lag_subset %>%
filter(position == "TE") %>%
select(
gsis_id, season, games, gs, years_of_experience, age, ageCentered20, ageCentered20Quadratic,
height, weight, rookie_year, draft_number,
fantasy_points, fantasy_points_ppr, fantasyPoints, fantasyPoints_lag,
carries:special_teams_tds, contains(".rush"), contains(".rec"))
19.4.9 Split into Test and Training Data
Code
seasonalData_lag_qb_all <- seasonalData_lag_subsetQB
seasonalData_lag_rb_all <- seasonalData_lag_subsetRB
seasonalData_lag_wr_all <- seasonalData_lag_subsetWR
seasonalData_lag_te_all <- seasonalData_lag_subsetTE
set.seed(52242) # for reproducibility (to keep the same train/holdout players)
activeQBs <- unique(seasonalData_lag_qb_all$gsis_id[which(seasonalData_lag_qb_all$season == max(seasonalData_lag_qb_all$season, na.rm = TRUE))])
retiredQBs <- unique(seasonalData_lag_qb_all$gsis_id[which(seasonalData_lag_qb_all$gsis_id %ni% activeQBs)])
numQBs <- length(unique(seasonalData_lag_qb_all$gsis_id))
qbHoldoutIDs <- sample(retiredQBs, size = ceiling(.2 * numQBs)) # holdout 20% of players
activeRBs <- unique(seasonalData_lag_rb_all$gsis_id[which(seasonalData_lag_rb_all$season == max(seasonalData_lag_rb_all$season, na.rm = TRUE))])
retiredRBs <- unique(seasonalData_lag_rb_all$gsis_id[which(seasonalData_lag_rb_all$gsis_id %ni% activeRBs)])
numRBs <- length(unique(seasonalData_lag_rb_all$gsis_id))
rbHoldoutIDs <- sample(retiredRBs, size = ceiling(.2 * numRBs)) # holdout 20% of players
set.seed(52242) # for reproducibility (to keep the same train/holdout players); added here to prevent a downstream error with predict.missRanger() due to missingness; this suggests that an error can arise from including a player in the holdout sample who has missingness in particular variables; would be good to identify which player(s) in the holdout sample evoke that error to identify the kinds of missingness that yield the error
activeWRs <- unique(seasonalData_lag_wr_all$gsis_id[which(seasonalData_lag_wr_all$season == max(seasonalData_lag_wr_all$season, na.rm = TRUE))])
retiredWRs <- unique(seasonalData_lag_wr_all$gsis_id[which(seasonalData_lag_wr_all$gsis_id %ni% activeWRs)])
numWRs <- length(unique(seasonalData_lag_wr_all$gsis_id))
wrHoldoutIDs <- sample(retiredWRs, size = ceiling(.2 * numWRs)) # holdout 20% of players
activeTEs <- unique(seasonalData_lag_te_all$gsis_id[which(seasonalData_lag_te_all$season == max(seasonalData_lag_te_all$season, na.rm = TRUE))])
retiredTEs <- unique(seasonalData_lag_te_all$gsis_id[which(seasonalData_lag_te_all$gsis_id %ni% activeTEs)])
numTEs <- length(unique(seasonalData_lag_te_all$gsis_id))
teHoldoutIDs <- sample(retiredTEs, size = ceiling(.2 * numTEs)) # holdout 20% of players
seasonalData_lag_qb_train <- seasonalData_lag_qb_all %>%
filter(gsis_id %ni% qbHoldoutIDs)
seasonalData_lag_qb_test <- seasonalData_lag_qb_all %>%
filter(gsis_id %in% qbHoldoutIDs)
seasonalData_lag_rb_train <- seasonalData_lag_rb_all %>%
filter(gsis_id %ni% rbHoldoutIDs)
seasonalData_lag_rb_test <- seasonalData_lag_rb_all %>%
filter(gsis_id %in% rbHoldoutIDs)
seasonalData_lag_wr_train <- seasonalData_lag_wr_all %>%
filter(gsis_id %ni% wrHoldoutIDs)
seasonalData_lag_wr_test <- seasonalData_lag_wr_all %>%
filter(gsis_id %in% wrHoldoutIDs)
seasonalData_lag_te_train <- seasonalData_lag_te_all %>%
filter(gsis_id %ni% teHoldoutIDs)
seasonalData_lag_te_test <- seasonalData_lag_te_all %>%
filter(gsis_id %in% teHoldoutIDs)
19.4.10 Impute the Missing Data
Here is a vignette demonstrating how to impute missing data using missForest()
: https://rpubs.com/lmorgan95/MissForest (archived at: https://perma.cc/6GB4-2E22). Below, we impute the training data (and all data) separately by position. We then use the imputed training data to make out-of-sample predictions to fill in the missing data for the testing data. We do not want to impute the training and testing data together so that we can keep them separate for the purposes of cross-validation. However, we impute all data (training and test data together) for purposes of making out-of-sample predictions from the machine learning models to predict players’ performance next season (when actuals are not yet available for evaluating their accuracy). To impute data, we use the missRanger
package (Mayer, 2024).
Note: the following code takes a while to run.
Code
Variables to impute: fantasy_points, fantasy_points_ppr, special_teams_tds, passing_epa, pacr, rushing_epa, fantasyPoints_lag, passing_cpoe, rookie_year, draft_number, gs, pass_attempts.pass, throwaways.pass, spikes.pass, drops.pass, bad_throws.pass, times_blitzed.pass, times_hurried.pass, times_hit.pass, times_pressured.pass, batted_balls.pass, on_tgt_throws.pass, rpo_plays.pass, rpo_yards.pass, rpo_pass_att.pass, rpo_pass_yards.pass, rpo_rush_att.pass, rpo_rush_yards.pass, pa_pass_att.pass, pa_pass_yards.pass, att.rush, yds.rush, td.rush, x1d.rush, ybc.rush, yac.rush, brk_tkl.rush, att_br.rush, drop_pct.pass, bad_throw_pct.pass, on_tgt_pct.pass, pressure_pct.pass, ybc_att.rush, yac_att.rush, pocket_time.pass
Variables used to impute: gsis_id, season, games, gs, years_of_experience, age, ageCentered20, ageCentered20Quadratic, height, weight, rookie_year, draft_number, fantasy_points, fantasy_points_ppr, fantasyPoints, fantasyPoints_lag, completions, attempts, passing_yards, passing_tds, passing_interceptions, sacks_suffered, sack_yards_lost, sack_fumbles, sack_fumbles_lost, passing_air_yards, passing_yards_after_catch, passing_first_downs, passing_epa, passing_cpoe, passing_2pt_conversions, pacr, carries, rushing_yards, rushing_tds, rushing_fumbles, rushing_fumbles_lost, rushing_first_downs, rushing_epa, rushing_2pt_conversions, special_teams_tds, pocket_time.pass, pass_attempts.pass, throwaways.pass, spikes.pass, drops.pass, bad_throws.pass, times_blitzed.pass, times_hurried.pass, times_hit.pass, times_pressured.pass, batted_balls.pass, on_tgt_throws.pass, rpo_plays.pass, rpo_yards.pass, rpo_pass_att.pass, rpo_pass_yards.pass, rpo_rush_att.pass, rpo_rush_yards.pass, pa_pass_att.pass, pa_pass_yards.pass, drop_pct.pass, bad_throw_pct.pass, on_tgt_pct.pass, pressure_pct.pass, ybc_att.rush, yac_att.rush, att.rush, yds.rush, td.rush, x1d.rush, ybc.rush, yac.rush, brk_tkl.rush, att_br.rush
fntsy_ fnts__ spcl__ pssng_p pacr rshng_ fntsP_ pssng_c rok_yr drft_n gs pss_t. thrww. spks.p drps.p bd_th. tms_b. tms_hr. tms_ht. tms_p. bttd_. on_tgt_t. rp_pl. rp_yr. rp_pss_t. rp_pss_y. rp_rsh_t. rp_rsh_y. p_pss_t. p_pss_y. att.rs yds.rs td.rsh x1d.rs ybc.rs yc.rsh brk_t. att_b. drp_p. bd_t_. on_tgt_p. prss_. ybc_t. yc_tt. pckt_.
iter 1: 0.0054 0.0024 0.7924 0.1919 0.7612 0.3628 0.4789 0.4133 0.0224 0.5216 0.0271 0.0134 0.3024 0.7659 0.1304 0.0541 0.0758 0.1759 0.1820 0.0370 0.3238 0.0291 0.2952 0.1812 0.0885 0.0867 0.2627 0.2563 0.1093 0.0902 0.0580 0.0645 0.1732 0.0524 0.0578 0.1795 0.3524 0.3428 0.7447 0.5158 0.0824 0.6803 0.3529 0.5758 0.8111
iter 2: 0.0044 0.0048 0.8304 0.2002 0.7926 0.3736 0.4801 0.4289 0.0488 0.6139 0.0188 0.0090 0.2883 0.7481 0.0764 0.0385 0.0718 0.1231 0.1329 0.0337 0.2760 0.0113 0.0548 0.0814 0.0765 0.0990 0.1989 0.2841 0.0707 0.0952 0.0396 0.0386 0.1606 0.0492 0.0525 0.1220 0.2541 0.3556 0.7468 0.4937 0.0827 0.6610 0.3465 0.5796 0.8134
iter 3: 0.0049 0.0046 0.8690 0.1986 0.7810 0.3641 0.4774 0.4360 0.0528 0.6123 0.0188 0.0088 0.2867 0.7538 0.0767 0.0393 0.0734 0.1261 0.1374 0.0343 0.2741 0.0119 0.0524 0.0816 0.0748 0.1008 0.2184 0.2811 0.0691 0.0926 0.0389 0.0413 0.1640 0.0511 0.0585 0.1255 0.2510 0.3609 0.7477 0.5108 0.0858 0.6426 0.3588 0.5734 0.8300
missRanger object. Extract imputed data via $data
- best iteration: 2
- best average OOB imputation error: 0.2524825
Code
data_all_qb <- seasonalData_lag_qb_all_imp$data
data_all_qb$fantasyPointsMC_lag <- scale(data_all_qb$fantasyPoints_lag, scale = FALSE) # mean-centered
data_all_qb_matrix <- data_all_qb %>%
mutate(across(where(is.factor), ~ as.numeric(as.integer(.)))) %>%
as.matrix()
newData_qb <- data_all_qb %>%
filter(season == max(season, na.rm = TRUE)) %>%
select(-fantasyPoints_lag, -fantasyPointsMC_lag)
newData_qb_matrix <- data_all_qb_matrix[
data_all_qb_matrix[, "season"] == max(data_all_qb_matrix[, "season"], na.rm = TRUE), # keep only rows with the most recent season
, # all columns
drop = FALSE]
dropCol_qb <- which(colnames(newData_qb_matrix) %in% c("fantasyPoints_lag","fantasyPointsMC_lag"))
newData_qb_matrix <- newData_qb_matrix[, -dropCol_qb, drop = FALSE]
seasonalData_lag_qb_train_imp <- missRanger::missRanger(
seasonalData_lag_qb_train,
pmm.k = 5,
verbose = 2,
seed = 52242,
keep_forests = TRUE)
Variables to impute: fantasy_points, fantasy_points_ppr, special_teams_tds, passing_epa, pacr, rushing_epa, fantasyPoints_lag, passing_cpoe, rookie_year, draft_number, gs, pass_attempts.pass, throwaways.pass, spikes.pass, drops.pass, bad_throws.pass, times_blitzed.pass, times_hurried.pass, times_hit.pass, times_pressured.pass, batted_balls.pass, on_tgt_throws.pass, rpo_plays.pass, rpo_yards.pass, rpo_pass_att.pass, rpo_pass_yards.pass, rpo_rush_att.pass, rpo_rush_yards.pass, pa_pass_att.pass, pa_pass_yards.pass, att.rush, yds.rush, td.rush, x1d.rush, ybc.rush, yac.rush, brk_tkl.rush, att_br.rush, drop_pct.pass, bad_throw_pct.pass, on_tgt_pct.pass, pressure_pct.pass, ybc_att.rush, yac_att.rush, pocket_time.pass
Variables used to impute: gsis_id, season, games, gs, years_of_experience, age, ageCentered20, ageCentered20Quadratic, height, weight, rookie_year, draft_number, fantasy_points, fantasy_points_ppr, fantasyPoints, fantasyPoints_lag, completions, attempts, passing_yards, passing_tds, passing_interceptions, sacks_suffered, sack_yards_lost, sack_fumbles, sack_fumbles_lost, passing_air_yards, passing_yards_after_catch, passing_first_downs, passing_epa, passing_cpoe, passing_2pt_conversions, pacr, carries, rushing_yards, rushing_tds, rushing_fumbles, rushing_fumbles_lost, rushing_first_downs, rushing_epa, rushing_2pt_conversions, special_teams_tds, pocket_time.pass, pass_attempts.pass, throwaways.pass, spikes.pass, drops.pass, bad_throws.pass, times_blitzed.pass, times_hurried.pass, times_hit.pass, times_pressured.pass, batted_balls.pass, on_tgt_throws.pass, rpo_plays.pass, rpo_yards.pass, rpo_pass_att.pass, rpo_pass_yards.pass, rpo_rush_att.pass, rpo_rush_yards.pass, pa_pass_att.pass, pa_pass_yards.pass, drop_pct.pass, bad_throw_pct.pass, on_tgt_pct.pass, pressure_pct.pass, ybc_att.rush, yac_att.rush, att.rush, yds.rush, td.rush, x1d.rush, ybc.rush, yac.rush, brk_tkl.rush, att_br.rush
fntsy_ fnts__ spcl__ pssng_p pacr rshng_ fntsP_ pssng_c rok_yr drft_n gs pss_t. thrww. spks.p drps.p bd_th. tms_b. tms_hr. tms_ht. tms_p. bttd_. on_tgt_t. rp_pl. rp_yr. rp_pss_t. rp_pss_y. rp_rsh_t. rp_rsh_y. p_pss_t. p_pss_y. att.rs yds.rs td.rsh x1d.rs ybc.rs yc.rsh brk_t. att_b. drp_p. bd_t_. on_tgt_p. prss_. ybc_t. yc_tt. pckt_.
iter 1: 0.0061 0.0028 0.8162 0.1897 0.5083 0.3633 0.4726 0.4456 0.0242 0.4723 0.0283 0.0141 0.2939 0.7728 0.1343 0.0558 0.0744 0.1757 0.1818 0.0381 0.3288 0.0351 0.2921 0.1846 0.0860 0.0894 0.2737 0.2661 0.1127 0.0900 0.0586 0.0644 0.1800 0.0574 0.0639 0.1792 0.3570 0.3486 0.7646 0.5313 0.0868 0.7084 0.3533 0.5933 0.8466
iter 2: 0.0052 0.0052 0.8304 0.1937 0.5621 0.3715 0.4614 0.4586 0.0505 0.5647 0.0192 0.0092 0.2953 0.7530 0.0800 0.0393 0.0725 0.1170 0.1355 0.0343 0.2771 0.0121 0.0555 0.0731 0.0713 0.0979 0.2073 0.2943 0.0698 0.0911 0.0416 0.0399 0.1683 0.0527 0.0577 0.1262 0.2474 0.3582 0.7719 0.5165 0.0900 0.6862 0.3642 0.5926 0.8400
iter 3: 0.0053 0.0051 0.8261 0.2008 0.5551 0.3571 0.4727 0.4410 0.0551 0.5658 0.0188 0.0092 0.2859 0.7460 0.0807 0.0402 0.0739 0.1202 0.1393 0.0351 0.2808 0.0114 0.0595 0.0705 0.0775 0.1051 0.2163 0.2935 0.0718 0.0921 0.0426 0.0400 0.1719 0.0535 0.0534 0.1225 0.2498 0.3484 0.7502 0.5100 0.0884 0.6609 0.3672 0.5852 0.8440
iter 4: 0.0054 0.0051 0.6928 0.1979 0.5598 0.3732 0.4771 0.4349 0.0506 0.5691 0.0189 0.0085 0.2891 0.7456 0.0785 0.0395 0.0737 0.1210 0.1353 0.0335 0.2836 0.0117 0.0566 0.0778 0.0743 0.1055 0.2131 0.2964 0.0697 0.0912 0.0396 0.0395 0.1611 0.0531 0.0597 0.1258 0.2600 0.3560 0.8062 0.5032 0.0973 0.6739 0.3698 0.5875 0.8485
iter 5: 0.0052 0.0055 0.8355 0.1965 0.5664 0.3710 0.4743 0.4604 0.0520 0.5598 0.0193 0.0091 0.2852 0.7474 0.0800 0.0405 0.0722 0.1213 0.1366 0.0344 0.2788 0.0118 0.0555 0.0756 0.0746 0.0986 0.2190 0.2765 0.0695 0.0932 0.0390 0.0425 0.1650 0.0509 0.0576 0.1305 0.2556 0.3509 0.7738 0.5051 0.0969 0.6902 0.3640 0.6007 0.8326
missRanger object. Extract imputed data via $data
- best iteration: 4
- best average OOB imputation error: 0.2482278
Code
data_train_qb <- seasonalData_lag_qb_train_imp$data
data_train_qb$fantasyPointsMC_lag <- scale(data_train_qb$fantasyPoints_lag, scale = FALSE) # mean-centered
data_train_qb_matrix <- data_train_qb %>%
mutate(across(where(is.factor), ~ as.numeric(as.integer(.)))) %>%
as.matrix()
seasonalData_lag_qb_test_imp <- predict(
object = seasonalData_lag_qb_train_imp,
newdata = seasonalData_lag_qb_test,
seed = 52242)
data_test_qb <- seasonalData_lag_qb_test_imp
data_test_qb_matrix <- data_test_qb %>%
mutate(across(where(is.factor), ~ as.numeric(as.integer(.)))) %>%
as.matrix()
Code
Variables to impute: games, ageCentered20, ageCentered20Quadratic, fantasy_points, fantasy_points_ppr, fantasyPoints, carries, rushing_yards, rushing_tds, rushing_fumbles, rushing_fumbles_lost, rushing_first_downs, rushing_2pt_conversions, receptions, targets, receiving_yards, receiving_tds, receiving_fumbles, receiving_fumbles_lost, receiving_air_yards, receiving_yards_after_catch, receiving_first_downs, receiving_2pt_conversions, special_teams_tds, years_of_experience, rushing_epa, air_yards_share, receiving_epa, racr, target_share, wopr, fantasyPoints_lag, rookie_year, draft_number, gs, att.rush, yds.rush, td.rush, x1d.rush, ybc.rush, yac.rush, brk_tkl.rush, att_br.rush, tgt.rec, rec.rec, yds.rec, td.rec, x1d.rec, ybc.rec, yac.rec, brk_tkl.rec, drop.rec, int.rec, ybc_att.rush, yac_att.rush, adot.rec, rat.rec, drop_percent.rec, rec_br.rec, ybc_r.rec, yac_r.rec
Variables used to impute: gsis_id, season, games, gs, years_of_experience, age, ageCentered20, ageCentered20Quadratic, height, weight, rookie_year, draft_number, fantasy_points, fantasy_points_ppr, fantasyPoints, fantasyPoints_lag, carries, rushing_yards, rushing_tds, rushing_fumbles, rushing_fumbles_lost, rushing_first_downs, rushing_epa, rushing_2pt_conversions, receptions, targets, receiving_yards, receiving_tds, receiving_fumbles, receiving_fumbles_lost, receiving_air_yards, receiving_yards_after_catch, receiving_first_downs, receiving_epa, receiving_2pt_conversions, racr, target_share, air_yards_share, wopr, special_teams_tds, ybc_att.rush, yac_att.rush, att.rush, yds.rush, td.rush, x1d.rush, ybc.rush, yac.rush, brk_tkl.rush, att_br.rush, ybc_r.rec, yac_r.rec, adot.rec, rat.rec, tgt.rec, rec.rec, yds.rec, td.rec, x1d.rec, ybc.rec, yac.rec, brk_tkl.rec, drop.rec, int.rec, drop_percent.rec, rec_br.rec
games agCn20 agC20Q fntsy_ fnts__ fntsyP carris rshng_y rshng_t rshng_f rshng_fm_ rshng_fr_ rsh_2_ rcptns targts rcvng_y rcvng_t rcvng_f rcvng_fm_ rcvng_r_ rcv___ rcvng_fr_ rcv_2_ spcl__ yrs_f_ rshng_p ar_yr_ rcvng_p racr trgt_s wopr fntsP_ rok_yr drft_n gs att.rs yds.rs td.rsh x1d.rs ybc.rs yc.rsh brk_tkl.rs att_b. tgt.rc rec.rc yds.rc td.rec x1d.rc ybc.rc yac.rc brk_tkl.rc drp.rc int.rc ybc_t. yc_tt. adt.rc rat.rc drp_p. rc_br. ybc_r. yc_r.r
iter 1: 0.8865 0.0057 0.0031 0.4544 0.0178 0.0032 0.0745 0.0233 0.1462 0.4895 0.2594 0.0295 0.9849 0.0690 0.0666 0.0534 0.4327 0.8626 0.4824 0.6841 0.0322 0.0614 1.0171 0.8263 0.1817 0.4512 0.3321 0.3894 0.5211 0.4549 0.1817 0.5440 0.0197 0.5999 0.1700 0.0244 0.0222 0.0792 0.0297 0.0527 0.0520 0.2134 0.3431 0.0252 0.0180 0.0257 0.1634 0.0437 0.3108 0.0217 0.3925 0.4610 0.6941 0.4880 0.5402 0.2670 0.2026 0.3482 0.1596 0.2698 0.3637
iter 2: 0.2755 0.0162 0.0207 0.0063 0.0037 0.0044 0.0161 0.0148 0.0912 0.2524 0.2898 0.0248 0.9832 0.0273 0.0444 0.0233 0.2065 0.4600 0.4891 0.1285 0.0332 0.0457 1.0175 0.8566 0.1824 0.4212 0.2329 0.3099 0.5605 0.2569 0.1742 0.5373 0.0424 0.6377 0.1653 0.0167 0.0123 0.0859 0.0302 0.0367 0.0349 0.1030 0.3689 0.0144 0.0159 0.0195 0.1403 0.0434 0.1549 0.0190 0.3840 0.1050 0.5453 0.4882 0.5616 0.2472 0.1953 0.1525 0.1687 0.2595 0.3619
iter 3: 0.2744 0.0163 0.0231 0.0062 0.0038 0.0047 0.0152 0.0137 0.0980 0.2601 0.2906 0.0244 0.9800 0.0265 0.0347 0.0231 0.2101 0.4638 0.4954 0.1284 0.0283 0.0458 1.0114 0.8731 0.1818 0.4117 0.2278 0.3037 0.5699 0.2052 0.1800 0.5389 0.0400 0.6423 0.1624 0.0166 0.0124 0.0893 0.0306 0.0374 0.0356 0.1074 0.3628 0.0144 0.0163 0.0187 0.1390 0.0463 0.1583 0.0190 0.3882 0.1062 0.5642 0.4796 0.5570 0.2380 0.1935 0.1586 0.1588 0.2625 0.3648
iter 4: 0.2776 0.0169 0.0220 0.0063 0.0038 0.0045 0.0151 0.0138 0.0979 0.2584 0.2846 0.0243 0.9782 0.0263 0.0281 0.0221 0.1968 0.4594 0.4817 0.1267 0.0290 0.0462 1.0104 0.8614 0.1854 0.4216 0.2333 0.3004 0.5467 0.1917 0.1815 0.5353 0.0443 0.6503 0.1657 0.0166 0.0121 0.0905 0.0313 0.0378 0.0357 0.1041 0.3437 0.0155 0.0159 0.0185 0.1405 0.0441 0.1613 0.0196 0.3816 0.1117 0.5682 0.5011 0.5585 0.2421 0.1975 0.1520 0.1770 0.2650 0.3647
iter 5: 0.2752 0.0163 0.0226 0.0063 0.0038 0.0045 0.0158 0.0138 0.1015 0.2614 0.2857 0.0242 0.9740 0.0250 0.0303 0.0218 0.2004 0.4607 0.4810 0.1167 0.0285 0.0449 1.0077 0.8658 0.1835 0.4182 0.2170 0.2995 0.5690 0.2010 0.1794 0.5375 0.0385 0.6487 0.1652 0.0166 0.0124 0.0878 0.0306 0.0368 0.0353 0.1069 0.3539 0.0154 0.0159 0.0193 0.1409 0.0447 0.1598 0.0205 0.3873 0.1062 0.5583 0.4895 0.5501 0.2418 0.1979 0.1713 0.1726 0.2625 0.3596
iter 6: 0.2760 0.0158 0.0223 0.0063 0.0037 0.0046 0.0150 0.0144 0.0982 0.2568 0.2816 0.0238 0.9810 0.0253 0.0273 0.0223 0.2141 0.4606 0.4881 0.1386 0.0300 0.0457 1.0174 0.8605 0.1821 0.4188 0.2263 0.2985 0.5497 0.1779 0.1536 0.5388 0.0389 0.6422 0.1668 0.0162 0.0119 0.0897 0.0305 0.0376 0.0356 0.1066 0.3529 0.0149 0.0159 0.0196 0.1446 0.0450 0.1585 0.0197 0.3857 0.1001 0.5607 0.4948 0.5478 0.2487 0.1945 0.1438 0.1543 0.2568 0.3612
iter 7: 0.2748 0.0158 0.0212 0.0064 0.0039 0.0047 0.0149 0.0141 0.0986 0.2611 0.2877 0.0241 0.9755 0.0253 0.0310 0.0223 0.2163 0.4553 0.4885 0.1335 0.0293 0.0456 1.0096 0.8575 0.1821 0.4236 0.2203 0.2998 0.5510 0.2107 0.1797 0.5354 0.0416 0.6395 0.1646 0.0166 0.0117 0.0895 0.0310 0.0371 0.0361 0.1073 0.3547 0.0154 0.0156 0.0193 0.1410 0.0449 0.1628 0.0201 0.3892 0.1076 0.5609 0.4946 0.5643 0.2392 0.1899 0.1540 0.1423 0.2667 0.3603
missRanger object. Extract imputed data via $data
- best iteration: 6
- best average OOB imputation error: 0.2175486
Code
data_all_rb <- seasonalData_lag_rb_all_imp$data
data_all_rb$fantasyPointsMC_lag <- scale(data_all_rb$fantasyPoints_lag, scale = FALSE) # mean-centered
data_all_rb_matrix <- data_all_rb %>%
mutate(across(where(is.factor), ~ as.numeric(as.integer(.)))) %>%
as.matrix()
newData_rb <- data_all_rb %>%
filter(season == max(season, na.rm = TRUE)) %>%
select(-fantasyPoints_lag, -fantasyPointsMC_lag)
newData_rb_matrix <- data_all_rb_matrix[
data_all_rb_matrix[, "season"] == max(data_all_rb_matrix[, "season"], na.rm = TRUE), # keep only rows with the most recent season
, # all columns
drop = FALSE]
dropCol_rb <- which(colnames(newData_rb_matrix) %in% c("fantasyPoints_lag","fantasyPointsMC_lag"))
newData_rb_matrix <- newData_rb_matrix[, -dropCol_rb, drop = FALSE]
seasonalData_lag_rb_train_imp <- missRanger::missRanger(
seasonalData_lag_rb_train,
pmm.k = 5,
verbose = 2,
seed = 52242,
keep_forests = TRUE)
Variables to impute: games, ageCentered20, ageCentered20Quadratic, fantasy_points, fantasy_points_ppr, fantasyPoints, carries, rushing_yards, rushing_tds, rushing_fumbles, rushing_fumbles_lost, rushing_first_downs, rushing_2pt_conversions, receptions, targets, receiving_yards, receiving_tds, receiving_fumbles, receiving_fumbles_lost, receiving_air_yards, receiving_yards_after_catch, receiving_first_downs, receiving_2pt_conversions, special_teams_tds, years_of_experience, rushing_epa, air_yards_share, receiving_epa, racr, target_share, wopr, fantasyPoints_lag, rookie_year, draft_number, gs, att.rush, yds.rush, td.rush, x1d.rush, ybc.rush, yac.rush, brk_tkl.rush, att_br.rush, tgt.rec, rec.rec, yds.rec, td.rec, x1d.rec, ybc.rec, yac.rec, brk_tkl.rec, drop.rec, int.rec, ybc_att.rush, yac_att.rush, adot.rec, rat.rec, drop_percent.rec, rec_br.rec, ybc_r.rec, yac_r.rec
Variables used to impute: gsis_id, season, games, gs, years_of_experience, age, ageCentered20, ageCentered20Quadratic, height, weight, rookie_year, draft_number, fantasy_points, fantasy_points_ppr, fantasyPoints, fantasyPoints_lag, carries, rushing_yards, rushing_tds, rushing_fumbles, rushing_fumbles_lost, rushing_first_downs, rushing_epa, rushing_2pt_conversions, receptions, targets, receiving_yards, receiving_tds, receiving_fumbles, receiving_fumbles_lost, receiving_air_yards, receiving_yards_after_catch, receiving_first_downs, receiving_epa, receiving_2pt_conversions, racr, target_share, air_yards_share, wopr, special_teams_tds, ybc_att.rush, yac_att.rush, att.rush, yds.rush, td.rush, x1d.rush, ybc.rush, yac.rush, brk_tkl.rush, att_br.rush, ybc_r.rec, yac_r.rec, adot.rec, rat.rec, tgt.rec, rec.rec, yds.rec, td.rec, x1d.rec, ybc.rec, yac.rec, brk_tkl.rec, drop.rec, int.rec, drop_percent.rec, rec_br.rec
games agCn20 agC20Q fntsy_ fnts__ fntsyP carris rshng_y rshng_t rshng_f rshng_fm_ rshng_fr_ rsh_2_ rcptns targts rcvng_y rcvng_t rcvng_f rcvng_fm_ rcvng_r_ rcv___ rcvng_fr_ rcv_2_ spcl__ yrs_f_ rshng_p ar_yr_ rcvng_p racr trgt_s wopr fntsP_ rok_yr drft_n gs att.rs yds.rs td.rsh x1d.rs ybc.rs yc.rsh brk_tkl.rs att_b. tgt.rc rec.rc yds.rc td.rec x1d.rc ybc.rc yac.rc brk_tkl.rc drp.rc int.rc ybc_t. yc_tt. adt.rc rat.rc drp_p. rc_br. ybc_r. yc_r.r
iter 1: 0.8759 0.0072 0.0036 0.4578 0.0178 0.0035 0.0736 0.0229 0.1524 0.4776 0.2679 0.0288 0.9965 0.0749 0.0744 0.0553 0.4578 0.8604 0.4998 0.6821 0.0360 0.0639 1.0042 0.8380 0.1806 0.4662 0.3419 0.3961 0.5595 0.4715 0.1968 0.5338 0.0246 0.5882 0.1726 0.0265 0.0235 0.0849 0.0311 0.0550 0.0521 0.2131 0.3689 0.0281 0.0197 0.0286 0.1742 0.0463 0.3114 0.0229 0.3942 0.4806 0.7239 0.5199 0.5631 0.2865 0.2195 0.3630 0.2052 0.2596 0.4091
iter 2: 0.2745 0.0177 0.0266 0.0067 0.0041 0.0049 0.0169 0.0154 0.1017 0.2590 0.2956 0.0237 0.9814 0.0286 0.0522 0.0240 0.2187 0.4582 0.4919 0.1541 0.0362 0.0475 1.0075 0.8811 0.1822 0.4481 0.2377 0.3184 0.6116 0.2628 0.2007 0.5254 0.0473 0.6411 0.1653 0.0179 0.0132 0.0940 0.0325 0.0392 0.0375 0.1057 0.3678 0.0161 0.0169 0.0196 0.1510 0.0484 0.1521 0.0202 0.3941 0.1035 0.5567 0.5120 0.5666 0.2466 0.2087 0.1733 0.1699 0.2524 0.4066
iter 3: 0.2766 0.0190 0.0273 0.0067 0.0041 0.0048 0.0159 0.0149 0.0971 0.2615 0.2952 0.0245 0.9668 0.0278 0.0409 0.0240 0.2180 0.4648 0.4931 0.1319 0.0350 0.0495 1.0128 0.8907 0.1820 0.4366 0.2459 0.3124 0.6236 0.2555 0.2114 0.5276 0.0438 0.6314 0.1658 0.0175 0.0122 0.0899 0.0319 0.0386 0.0386 0.1100 0.3783 0.0155 0.0165 0.0194 0.1477 0.0474 0.1499 0.0194 0.3929 0.1121 0.5761 0.5245 0.5651 0.2490 0.2103 0.1767 0.1817 0.2658 0.4106
missRanger object. Extract imputed data via $data
- best iteration: 2
- best average OOB imputation error: 0.226086
Code
data_train_rb <- seasonalData_lag_rb_train_imp$data
data_train_rb$fantasyPointsMC_lag <- scale(data_train_rb$fantasyPoints_lag, scale = FALSE) # mean-centered
data_train_rb_matrix <- data_train_rb %>%
mutate(across(where(is.factor), ~ as.numeric(as.integer(.)))) %>%
as.matrix()
seasonalData_lag_rb_test_imp <- predict(
object = seasonalData_lag_rb_train_imp,
newdata = seasonalData_lag_rb_test,
seed = 52242)
data_test_rb <- seasonalData_lag_rb_test_imp
data_test_rb_matrix <- data_test_rb %>%
mutate(across(where(is.factor), ~ as.numeric(as.integer(.)))) %>%
as.matrix()
Code
Variables to impute: fantasy_points, fantasy_points_ppr, special_teams_tds, years_of_experience, receiving_epa, racr, air_yards_share, target_share, wopr, fantasyPoints_lag, rookie_year, rushing_epa, draft_number, gs, att.rush, yds.rush, td.rush, x1d.rush, ybc.rush, yac.rush, brk_tkl.rush, att_br.rush, tgt.rec, rec.rec, yds.rec, td.rec, x1d.rec, ybc.rec, yac.rec, brk_tkl.rec, drop.rec, int.rec, adot.rec, rat.rec, drop_percent.rec, rec_br.rec, ybc_r.rec, yac_r.rec, ybc_att.rush, yac_att.rush
Variables used to impute: gsis_id, season, games, gs, years_of_experience, age, ageCentered20, ageCentered20Quadratic, height, weight, rookie_year, draft_number, fantasy_points, fantasy_points_ppr, fantasyPoints, fantasyPoints_lag, carries, rushing_yards, rushing_tds, rushing_fumbles, rushing_fumbles_lost, rushing_first_downs, rushing_epa, rushing_2pt_conversions, receptions, targets, receiving_yards, receiving_tds, receiving_fumbles, receiving_fumbles_lost, receiving_air_yards, receiving_yards_after_catch, receiving_first_downs, receiving_epa, receiving_2pt_conversions, racr, target_share, air_yards_share, wopr, special_teams_tds, ybc_att.rush, yac_att.rush, att.rush, yds.rush, td.rush, x1d.rush, ybc.rush, yac.rush, brk_tkl.rush, att_br.rush, ybc_r.rec, yac_r.rec, adot.rec, rat.rec, tgt.rec, rec.rec, yds.rec, td.rec, x1d.rec, ybc.rec, yac.rec, brk_tkl.rec, drop.rec, int.rec, drop_percent.rec, rec_br.rec
fntsy_ fnts__ spcl__ yrs_f_ rcvng_ racr ar_yr_ trgt_s wopr fntsP_ rok_yr rshng_ drft_n gs att.rs yds.rs td.rsh x1d.rs ybc.rs yc.rsh brk_tkl.rs att_b. tgt.rc rec.rc yds.rc td.rec x1d.rc ybc.rc yac.rc brk_tkl.rc drp.rc int.rc adt.rc rat.rc drp_p. rc_br. ybc_r. yc_r.r ybc_t. yc_tt.
iter 1: 0.0061 0.0010 0.7104 0.1566 0.1040 0.8131 0.1013 0.1722 0.0402 0.4890 0.0150 0.3811 0.6654 0.1459 0.1184 0.0898 0.2353 0.1234 0.0966 0.2670 0.6383 0.3084 0.0198 0.0136 0.0151 0.0671 0.0135 0.0268 0.0442 0.4465 0.4320 0.4674 0.2961 0.1410 0.3819 0.1840 0.2251 0.3929 0.2568 0.4760
iter 2: 0.0058 0.0019 0.7826 0.1601 0.0835 0.7518 0.0607 0.0930 0.0452 0.4939 0.0296 0.3301 0.6843 0.1440 0.0851 0.0600 0.2638 0.1161 0.0708 0.1804 0.3103 0.3223 0.0109 0.0108 0.0096 0.0719 0.0139 0.0200 0.0318 0.4476 0.0778 0.3692 0.2401 0.1448 0.1629 0.1601 0.2261 0.3793 0.2536 0.4775
iter 3: 0.0061 0.0019 0.7857 0.1593 0.0829 0.7421 0.0580 0.0986 0.0481 0.4946 0.0318 0.3334 0.6890 0.1430 0.0823 0.0604 0.2595 0.1177 0.0728 0.1802 0.3077 0.3194 0.0109 0.0114 0.0095 0.0724 0.0133 0.0199 0.0312 0.4411 0.0767 0.3687 0.2369 0.1455 0.1530 0.1660 0.2169 0.3878 0.2466 0.4716
iter 4: 0.0060 0.0018 0.7874 0.1604 0.0832 0.7394 0.0591 0.0940 0.0479 0.4926 0.0301 0.3317 0.6896 0.1434 0.0863 0.0601 0.2562 0.1227 0.0711 0.1900 0.3089 0.3194 0.0105 0.0112 0.0095 0.0707 0.0140 0.0202 0.0318 0.4447 0.0784 0.3674 0.2339 0.1423 0.1592 0.1700 0.2254 0.3886 0.2552 0.4662
missRanger object. Extract imputed data via $data
- best iteration: 3
- best average OOB imputation error: 0.203846
Code
data_all_wr <- seasonalData_lag_wr_all_imp$data
data_all_wr$fantasyPointsMC_lag <- scale(data_all_wr$fantasyPoints_lag, scale = FALSE) # mean-centered
data_all_wr_matrix <- data_all_wr %>%
mutate(across(where(is.factor), ~ as.numeric(as.integer(.)))) %>%
as.matrix()
newData_wr <- data_all_wr %>%
filter(season == max(season, na.rm = TRUE)) %>%
select(-fantasyPoints_lag, -fantasyPointsMC_lag)
newData_wr_matrix <- data_all_wr_matrix[
data_all_wr_matrix[, "season"] == max(data_all_wr_matrix[, "season"], na.rm = TRUE), # keep only rows with the most recent season
, # all columns
drop = FALSE]
dropCol_wr <- which(colnames(newData_wr_matrix) %in% c("fantasyPoints_lag","fantasyPointsMC_lag"))
newData_wr_matrix <- newData_wr_matrix[, -dropCol_wr, drop = FALSE]
seasonalData_lag_wr_train_imp <- missRanger::missRanger(
seasonalData_lag_wr_train,
pmm.k = 5,
verbose = 2,
seed = 52242,
keep_forests = TRUE)
Variables to impute: fantasy_points, fantasy_points_ppr, special_teams_tds, years_of_experience, receiving_epa, racr, air_yards_share, target_share, wopr, fantasyPoints_lag, rookie_year, rushing_epa, draft_number, gs, att.rush, yds.rush, td.rush, x1d.rush, ybc.rush, yac.rush, brk_tkl.rush, att_br.rush, tgt.rec, rec.rec, yds.rec, td.rec, x1d.rec, ybc.rec, yac.rec, brk_tkl.rec, drop.rec, int.rec, adot.rec, rat.rec, drop_percent.rec, rec_br.rec, ybc_r.rec, yac_r.rec, ybc_att.rush, yac_att.rush
Variables used to impute: gsis_id, season, games, gs, years_of_experience, age, ageCentered20, ageCentered20Quadratic, height, weight, rookie_year, draft_number, fantasy_points, fantasy_points_ppr, fantasyPoints, fantasyPoints_lag, carries, rushing_yards, rushing_tds, rushing_fumbles, rushing_fumbles_lost, rushing_first_downs, rushing_epa, rushing_2pt_conversions, receptions, targets, receiving_yards, receiving_tds, receiving_fumbles, receiving_fumbles_lost, receiving_air_yards, receiving_yards_after_catch, receiving_first_downs, receiving_epa, receiving_2pt_conversions, racr, target_share, air_yards_share, wopr, special_teams_tds, ybc_att.rush, yac_att.rush, att.rush, yds.rush, td.rush, x1d.rush, ybc.rush, yac.rush, brk_tkl.rush, att_br.rush, ybc_r.rec, yac_r.rec, adot.rec, rat.rec, tgt.rec, rec.rec, yds.rec, td.rec, x1d.rec, ybc.rec, yac.rec, brk_tkl.rec, drop.rec, int.rec, drop_percent.rec, rec_br.rec
fntsy_ fnts__ spcl__ yrs_f_ rcvng_ racr ar_yr_ trgt_s wopr fntsP_ rok_yr rshng_ drft_n gs att.rs yds.rs td.rsh x1d.rs ybc.rs yc.rsh brk_tkl.rs att_b. tgt.rc rec.rc yds.rc td.rec x1d.rc ybc.rc yac.rc brk_tkl.rc drp.rc int.rc adt.rc rat.rc drp_p. rc_br. ybc_r. yc_r.r ybc_t. yc_tt.
iter 1: 0.0064 0.0010 0.7029 0.1611 0.1089 0.8443 0.1021 0.1643 0.0427 0.4935 0.0173 0.3461 0.6788 0.1427 0.1364 0.0993 0.2403 0.1243 0.0979 0.2745 0.6190 0.3171 0.0201 0.0147 0.0159 0.0734 0.0140 0.0280 0.0454 0.4502 0.4439 0.4733 0.3088 0.1641 0.4547 0.2192 0.2439 0.4227 0.2921 0.5068
iter 2: 0.0063 0.0020 0.7835 0.1630 0.0901 0.8044 0.0674 0.0936 0.0479 0.4930 0.0331 0.3235 0.7090 0.1417 0.0896 0.0659 0.2659 0.1273 0.0752 0.1920 0.3068 0.3225 0.0112 0.0116 0.0101 0.0753 0.0141 0.0210 0.0333 0.4431 0.0809 0.3676 0.2571 0.1617 0.1735 0.1797 0.2441 0.3996 0.2911 0.4923
iter 3: 0.0063 0.0020 0.7710 0.1639 0.0881 0.7954 0.0646 0.0982 0.0515 0.4956 0.0338 0.3200 0.7088 0.1413 0.0900 0.0640 0.2565 0.1250 0.0735 0.1989 0.3082 0.3280 0.0114 0.0119 0.0096 0.0763 0.0141 0.0216 0.0326 0.4388 0.0807 0.3703 0.2582 0.1623 0.1657 0.2018 0.2375 0.4016 0.2838 0.4794
iter 4: 0.0062 0.0020 0.7792 0.1625 0.0877 0.8043 0.0632 0.0919 0.0477 0.4963 0.0341 0.3239 0.7038 0.1420 0.0950 0.0653 0.2664 0.1309 0.0767 0.2025 0.2933 0.3076 0.0109 0.0119 0.0097 0.0745 0.0143 0.0217 0.0326 0.4432 0.0804 0.3688 0.2584 0.1605 0.1931 0.2013 0.2378 0.4119 0.2824 0.4860
missRanger object. Extract imputed data via $data
- best iteration: 3
- best average OOB imputation error: 0.2110456
Code
data_train_wr <- seasonalData_lag_wr_train_imp$data
data_train_wr$fantasyPointsMC_lag <- scale(data_train_wr$fantasyPoints_lag, scale = FALSE) # mean-centered
data_train_wr_matrix <- data_train_wr %>%
mutate(across(where(is.factor), ~ as.numeric(as.integer(.)))) %>%
as.matrix()
seasonalData_lag_wr_test_imp <- predict(
object = seasonalData_lag_wr_train_imp,
newdata = seasonalData_lag_wr_test,
seed = 52242)
data_test_wr <- seasonalData_lag_wr_test_imp
data_test_wr_matrix <- data_test_wr %>%
mutate(across(where(is.factor), ~ as.numeric(as.integer(.)))) %>%
as.matrix()
Code
Variables to impute: games, ageCentered20, ageCentered20Quadratic, fantasy_points, fantasy_points_ppr, fantasyPoints, carries, rushing_yards, rushing_tds, rushing_fumbles, rushing_fumbles_lost, rushing_first_downs, rushing_2pt_conversions, receptions, targets, receiving_yards, receiving_tds, receiving_fumbles, receiving_fumbles_lost, receiving_air_yards, receiving_yards_after_catch, receiving_first_downs, receiving_2pt_conversions, special_teams_tds, years_of_experience, receiving_epa, racr, air_yards_share, target_share, wopr, fantasyPoints_lag, rookie_year, draft_number, gs, att.rush, yds.rush, td.rush, x1d.rush, ybc.rush, yac.rush, brk_tkl.rush, att_br.rush, tgt.rec, rec.rec, yds.rec, td.rec, x1d.rec, ybc.rec, yac.rec, brk_tkl.rec, drop.rec, int.rec, adot.rec, rat.rec, drop_percent.rec, rec_br.rec, ybc_r.rec, yac_r.rec, rushing_epa, ybc_att.rush, yac_att.rush
Variables used to impute: gsis_id, season, games, gs, years_of_experience, age, ageCentered20, ageCentered20Quadratic, height, weight, rookie_year, draft_number, fantasy_points, fantasy_points_ppr, fantasyPoints, fantasyPoints_lag, carries, rushing_yards, rushing_tds, rushing_fumbles, rushing_fumbles_lost, rushing_first_downs, rushing_epa, rushing_2pt_conversions, receptions, targets, receiving_yards, receiving_tds, receiving_fumbles, receiving_fumbles_lost, receiving_air_yards, receiving_yards_after_catch, receiving_first_downs, receiving_epa, receiving_2pt_conversions, racr, target_share, air_yards_share, wopr, special_teams_tds, ybc_att.rush, yac_att.rush, att.rush, yds.rush, td.rush, x1d.rush, ybc.rush, yac.rush, brk_tkl.rush, att_br.rush, ybc_r.rec, yac_r.rec, adot.rec, rat.rec, tgt.rec, rec.rec, yds.rec, td.rec, x1d.rec, ybc.rec, yac.rec, brk_tkl.rec, drop.rec, int.rec, drop_percent.rec, rec_br.rec
games agCn20 agC20Q fntsy_ fnts__ fntsyP carris rshng_y rshng_t rshng_f rshng_fm_ rshng_fr_ rsh_2_ rcptns targts rcvng_y rcvng_t rcvng_f rcvng_fm_ rcvng_r_ rcv___ rcvng_fr_ rcv_2_ spcl__ yrs_f_ rcvng_p racr ar_yr_ trgt_s wopr fntsP_ rok_yr drft_n gs att.rs yds.rs td.rsh x1d.rs ybc.rs yc.rsh brk_tkl.rs att_b. tgt.rc rec.rc yds.rc td.rec x1d.rc ybc.rc yac.rc brk_tkl.rc drp.rc int.rc adt.rc rat.rc drp_p. rc_br. ybc_r. yc_r.r rshng_p ybc_t. yc_tt.
iter 1: 0.8157 0.0061 0.0030 0.3406 0.0194 0.0039 0.5253 0.2259 0.2452 0.7083 0.6874 0.0802 1.1303 0.0281 0.0558 0.0255 0.0845 0.8134 0.4317 0.0655 0.0784 0.0253 0.9716 1.0271 0.1530 0.1689 0.6899 0.1092 0.4432 0.1004 0.4764 0.0180 0.6054 0.3846 0.0762 0.0832 0.1564 0.0618 0.0704 0.2123 0.3921 0.6733 0.0290 0.0207 0.0226 0.1012 0.0212 0.0420 0.0603 0.4332 0.4640 0.4996 0.2804 0.1667 0.3542 0.1652 0.2843 0.3948 0.3270 0.6620 0.7439
iter 2: 0.1712 0.0175 0.0256 0.0106 0.0037 0.0055 0.1140 0.1113 0.0990 0.5369 0.7422 0.0852 1.1286 0.0193 0.0200 0.0128 0.0862 0.4248 0.4659 0.0206 0.0529 0.0217 0.9711 1.0114 0.1561 0.1397 0.6715 0.0766 0.1819 0.1085 0.4649 0.0366 0.6346 0.3880 0.0722 0.0728 0.1592 0.0680 0.0759 0.2034 0.3651 0.6811 0.0164 0.0158 0.0161 0.1080 0.0211 0.0327 0.0475 0.4342 0.1149 0.4173 0.2589 0.1742 0.1467 0.1531 0.2941 0.3851 0.3357 0.6846 0.7397
iter 3: 0.1689 0.0170 0.0261 0.0114 0.0040 0.0056 0.1190 0.1155 0.0978 0.6088 0.7899 0.0945 1.1731 0.0195 0.0203 0.0132 0.0964 0.4270 0.4608 0.0202 0.0525 0.0214 0.9694 1.0265 0.1560 0.1380 0.6453 0.0751 0.1794 0.1204 0.4642 0.0364 0.6369 0.3853 0.0779 0.0786 0.1497 0.0569 0.0932 0.2027 0.4003 0.6633 0.0171 0.0164 0.0167 0.1049 0.0220 0.0335 0.0466 0.4371 0.1141 0.4304 0.2665 0.1775 0.1464 0.1537 0.2916 0.3720 0.3137 0.6640 0.7770
missRanger object. Extract imputed data via $data
- best iteration: 2
- best average OOB imputation error: 0.2477098
Code
data_all_te <- seasonalData_lag_te_all_imp$data
data_all_te$fantasyPointsMC_lag <- scale(data_all_te$fantasyPoints_lag, scale = FALSE) # mean-centered
data_all_te_matrix <- data_all_te %>%
mutate(across(where(is.factor), ~ as.numeric(as.integer(.)))) %>%
as.matrix()
newData_te <- data_all_te %>%
filter(season == max(season, na.rm = TRUE)) %>%
select(-fantasyPoints_lag, -fantasyPointsMC_lag)
newData_te_matrix <- data_all_te_matrix[
data_all_te_matrix[, "season"] == max(data_all_te_matrix[, "season"], na.rm = TRUE), # keep only rows with the most recent season
, # all columns
drop = FALSE]
dropCol_te <- which(colnames(newData_te_matrix) %in% c("fantasyPoints_lag","fantasyPointsMC_lag"))
newData_te_matrix <- newData_te_matrix[, -dropCol_te, drop = FALSE]
seasonalData_lag_te_train_imp <- missRanger::missRanger(
seasonalData_lag_te_train,
pmm.k = 5,
verbose = 2,
seed = 52242,
keep_forests = TRUE)
Variables to impute: games, years_of_experience, ageCentered20, ageCentered20Quadratic, fantasy_points, fantasy_points_ppr, fantasyPoints, carries, rushing_yards, rushing_tds, rushing_fumbles, rushing_fumbles_lost, rushing_first_downs, rushing_2pt_conversions, receptions, targets, receiving_yards, receiving_tds, receiving_fumbles, receiving_fumbles_lost, receiving_air_yards, receiving_yards_after_catch, receiving_first_downs, receiving_2pt_conversions, special_teams_tds, receiving_epa, racr, air_yards_share, target_share, wopr, fantasyPoints_lag, rookie_year, draft_number, gs, att.rush, yds.rush, td.rush, x1d.rush, ybc.rush, yac.rush, brk_tkl.rush, att_br.rush, tgt.rec, rec.rec, yds.rec, td.rec, x1d.rec, ybc.rec, yac.rec, brk_tkl.rec, drop.rec, int.rec, adot.rec, rat.rec, drop_percent.rec, rec_br.rec, ybc_r.rec, yac_r.rec, rushing_epa, ybc_att.rush, yac_att.rush
Variables used to impute: gsis_id, season, games, gs, years_of_experience, age, ageCentered20, ageCentered20Quadratic, height, weight, rookie_year, draft_number, fantasy_points, fantasy_points_ppr, fantasyPoints, fantasyPoints_lag, carries, rushing_yards, rushing_tds, rushing_fumbles, rushing_fumbles_lost, rushing_first_downs, rushing_epa, rushing_2pt_conversions, receptions, targets, receiving_yards, receiving_tds, receiving_fumbles, receiving_fumbles_lost, receiving_air_yards, receiving_yards_after_catch, receiving_first_downs, receiving_epa, receiving_2pt_conversions, racr, target_share, air_yards_share, wopr, special_teams_tds, ybc_att.rush, yac_att.rush, att.rush, yds.rush, td.rush, x1d.rush, ybc.rush, yac.rush, brk_tkl.rush, att_br.rush, ybc_r.rec, yac_r.rec, adot.rec, rat.rec, tgt.rec, rec.rec, yds.rec, td.rec, x1d.rec, ybc.rec, yac.rec, brk_tkl.rec, drop.rec, int.rec, drop_percent.rec, rec_br.rec
games yrs_f_ agCn20 agC20Q fntsy_ fnts__ fntsyP carris rshng_y rshng_t rshng_f rshng_fm_ rshng_fr_ rsh_2_ rcptns targts rcvng_y rcvng_t rcvng_f rcvng_fm_ rcvng_r_ rcv___ rcvng_fr_ rcv_2_ spcl__ rcvng_p racr ar_yr_ trgt_s wopr fntsP_ rok_yr drft_n gs att.rs yds.rs td.rsh x1d.rs ybc.rs yc.rsh brk_tkl.rs att_b. tgt.rc rec.rc yds.rc td.rec x1d.rc ybc.rc yac.rc brk_tkl.rc drp.rc int.rc adt.rc rat.rc drp_p. rc_br. ybc_r. yc_r.r rshng_p ybc_t. yc_tt.
iter 1: 0.8094 0.1093 0.0070 0.0035 0.3272 0.0235 0.0052 0.2840 0.1426 0.2634 0.8628 0.7885 0.0924 1.1067 0.0298 0.0611 0.0249 0.0969 0.8177 0.4537 0.0650 0.0804 0.0249 0.9680 1.0235 0.1738 0.5438 0.0868 0.4123 0.1172 0.4597 0.0189 0.6057 0.3973 0.0877 0.0886 0.1516 0.0467 0.0593 0.2086 0.4018 0.6464 0.0296 0.0223 0.0237 0.1062 0.0207 0.0428 0.0579 0.4367 0.4700 0.4818 0.3045 0.1724 0.4722 0.2410 0.2693 0.4025 0.3943 0.4791 0.7521
iter 2: 0.1728 0.1469 0.0179 0.0289 0.0104 0.0039 0.0051 0.0863 0.0763 0.1528 0.7880 0.9474 0.0849 1.0234 0.0193 0.0198 0.0137 0.0915 0.4327 0.4835 0.0238 0.0558 0.0221 0.9617 1.0376 0.1464 0.5141 0.0630 0.1827 0.1062 0.4562 0.0379 0.6361 0.3908 0.0641 0.0767 0.1425 0.0603 0.0747 0.1970 0.3903 0.6647 0.0182 0.0171 0.0165 0.1074 0.0226 0.0332 0.0503 0.4361 0.1170 0.4096 0.2673 0.1862 0.2255 0.2386 0.2793 0.4004 0.3648 0.5070 0.7621
iter 3: 0.1713 0.1447 0.0195 0.0276 0.0104 0.0036 0.0051 0.0796 0.0889 0.1611 0.8505 0.9348 0.0904 1.0447 0.0196 0.0205 0.0134 0.0901 0.4465 0.4867 0.0233 0.0569 0.0222 0.9519 1.0103 0.1457 0.5062 0.0617 0.1698 0.1115 0.4530 0.0382 0.6521 0.3899 0.0665 0.0681 0.1457 0.0647 0.0866 0.2055 0.3919 0.6791 0.0169 0.0169 0.0168 0.1107 0.0213 0.0339 0.0500 0.4315 0.1200 0.4148 0.2745 0.1822 0.1947 0.2205 0.2778 0.4032 0.3639 0.4933 0.7741
missRanger object. Extract imputed data via $data
- best iteration: 2
- best average OOB imputation error: 0.2519559
Code
data_train_te <- seasonalData_lag_te_train_imp$data
data_train_te$fantasyPointsMC_lag <- scale(data_train_te$fantasyPoints_lag, scale = FALSE) # mean-centered
data_train_te_matrix <- data_train_te %>%
mutate(across(where(is.factor), ~ as.numeric(as.integer(.)))) %>%
as.matrix()
seasonalData_lag_te_test_imp <- predict(
object = seasonalData_lag_te_train_imp,
newdata = seasonalData_lag_te_test,
seed = 52242)
data_test_te <- seasonalData_lag_te_test_imp
data_test_te_matrix <- data_test_te %>%
mutate(across(where(is.factor), ~ as.numeric(as.integer(.)))) %>%
as.matrix()
19.5 Identify Cores for Parallel Processing
We use the future
(Bengtsson, 2025) package for parallel (faster) processing.
19.6 Fitting the Traditional Regression Models
19.6.1 Regression with One Predictor
Code
# Set seed for reproducibility
set.seed(52242)
# Set up cross-validation
folds <- rsample::group_vfold_cv(
data_train_qb,
group = gsis_id,
v = 10) # 10-fold cross-validation
# Define Recipe (Formula)
rec <- recipes::recipe(
fantasyPoints_lag ~ ageCentered20,
data = data_train_qb)
# Define Model
lm_spec <- parsnip::linear_reg() %>%
parsnip::set_engine("lm") %>%
parsnip::set_mode("regression")
# Workflow
lm_wf <- workflows::workflow() %>%
workflows::add_recipe(rec) %>%
workflows::add_model(lm_spec)
# Fit Model with Cross-Validation
cv_results <- tune::fit_resamples(
lm_wf,
resamples = folds,
metrics = metric_set(rmse, mae, rsq),
control = control_resamples(save_pred = TRUE)
)
# View Cross-Validation metrics
tune::collect_metrics(cv_results)
Code
Code
Code
# Calculate combined range for axes
axis_limits <- range(c(df$pred, df$fantasyPoints_lag), na.rm = TRUE)
ggplot(
df,
aes(
x = pred,
y = fantasyPoints_lag)) +
geom_point(
size = 2,
alpha = 0.6) +
geom_abline(
slope = 1,
intercept = 0,
color = "blue",
linetype = "dashed") +
coord_equal(
xlim = axis_limits,
ylim = axis_limits) +
labs(
title = "Predicted vs Actual Fantasy Points (Test Data)",
x = "Predicted Fantasy Points",
y = "Actual Fantasy Points"
) +
theme_classic()
19.6.2 Regression with Multiple Predictors
Code
# Set seed for reproducibility
set.seed(52242)
# Set up cross-validation
folds <- rsample::group_vfold_cv(
data_train_qb,
group = gsis_id,
v = 10) # 10-fold cross-validation
# Define Recipe (Formula)
rec <- recipes::recipe(
fantasyPoints_lag ~ .,
data = data_train_qb %>% select(-gsis_id, -fantasyPointsMC_lag))
# Define Model
lm_spec <- parsnip::linear_reg() %>%
parsnip::set_engine("lm") %>%
parsnip::set_mode("regression")
# Workflow
lm_wf <- workflows::workflow() %>%
workflows::add_recipe(rec) %>%
workflows::add_model(lm_spec)
# Fit Model with Cross-Validation
cv_results <- tune::fit_resamples(
lm_wf,
resamples = folds,
metrics = metric_set(rmse, mae, rsq),
control = control_resamples(save_pred = TRUE)
)
# View Cross-Validation metrics
tune::collect_metrics(cv_results)
Code
Code
Code
# Calculate combined range for axes
axis_limits <- range(c(df$pred, df$fantasyPoints_lag), na.rm = TRUE)
ggplot(
df,
aes(
x = pred,
y = fantasyPoints_lag)) +
geom_point(
size = 2,
alpha = 0.6) +
geom_abline(
slope = 1,
intercept = 0,
color = "blue",
linetype = "dashed") +
coord_equal(
xlim = axis_limits,
ylim = axis_limits) +
labs(
title = "Predicted vs Actual Fantasy Points (Test Data)",
x = "Predicted Fantasy Points",
y = "Actual Fantasy Points"
) +
theme_classic()
19.7 Fitting the Machine Learning Models
19.7.1 Least Absolute Shrinkage and Selection Option (LASSO)
Code
# Set seed for reproducibility
set.seed(52242)
# Set up cross-validation
folds <- rsample::group_vfold_cv(
data_train_qb,
group = gsis_id,
v = 10) # 10-fold cross-validation
# Define Recipe (Formula)
rec <- recipes::recipe(
fantasyPoints_lag ~ .,
data = data_train_qb %>% select(-gsis_id, -fantasyPointsMC_lag))
# Define Model
lasso_spec <-
parsnip::linear_reg(
penalty = tune(),
mixture = 1) %>%
set_engine("glmnet")
# Workflow
lasso_wf <- workflows::workflow() %>%
workflows::add_recipe(rec) %>%
workflows::add_model(lasso_spec)
# Define grid of penalties to try (log scale is typical)
penalty_grid <- dials::grid_regular(
dials::penalty(range = c(-4, -1)),
levels = 20)
# Tune the Penalty Parameter
cv_results <- tune::tune_grid(
lasso_wf,
resamples = folds,
grid = penalty_grid,
metrics = metric_set(rmse, mae, rsq),
control = control_grid(save_pred = TRUE)
)
# View Cross-Validation metrics
tune::collect_metrics(cv_results)
Code
best_penalty <- select_best(cv_results, metric = "mae")
# Finalize Workflow with Best Penalty
final_wf <- finalize_workflow(
lasso_wf,
best_penalty)
# Fit Final Model on Training Data
final_model <- workflows::fit(
final_wf,
data = data_train_qb)
# View Coefficients
final_model %>%
workflows::extract_fit_parsnip() %>%
broom::tidy()
Code
Code
# Calculate combined range for axes
axis_limits <- range(c(df$pred, df$fantasyPoints_lag), na.rm = TRUE)
ggplot(
df,
aes(
x = pred,
y = fantasyPoints_lag)) +
geom_point(
size = 2,
alpha = 0.6) +
geom_abline(
slope = 1,
intercept = 0,
color = "blue",
linetype = "dashed") +
coord_equal(
xlim = axis_limits,
ylim = axis_limits) +
labs(
title = "Predicted vs Actual Fantasy Points (Test Data)",
x = "Predicted Fantasy Points",
y = "Actual Fantasy Points"
) +
theme_classic()
19.7.2 Ridge Regression
Code
# Set seed for reproducibility
set.seed(52242)
# Set up cross-validation
folds <- rsample::group_vfold_cv(
data_train_qb,
group = gsis_id,
v = 10) # 10-fold cross-validation
# Define Recipe (Formula)
rec <- recipes::recipe(
fantasyPoints_lag ~ .,
data = data_train_qb %>% select(-gsis_id, -fantasyPointsMC_lag))
# Define Model
ridge_spec <-
linear_reg(
penalty = tune(),
mixture = 0) %>%
set_engine("glmnet")
# Workflow
ridge_wf <- workflows::workflow() %>%
workflows::add_recipe(rec) %>%
workflows::add_model(ridge_spec)
# Define grid of penalties to try (log scale is typical)
penalty_grid <- dials::grid_regular(
dials::penalty(range = c(-4, -1)),
levels = 20)
# Tune the Penalty Parameter
cv_results <- tune::tune_grid(
ridge_wf,
resamples = folds,
grid = penalty_grid,
metrics = metric_set(rmse, mae, rsq),
control = control_grid(save_pred = TRUE)
)
# View Cross-Validation metrics
tune::collect_metrics(cv_results)
Code
best_penalty <- select_best(cv_results, metric = "mae")
# Finalize Workflow with Best Penalty
final_wf <- finalize_workflow(
ridge_wf,
best_penalty)
# Fit Final Model on Training Data
final_model <- workflows::fit(
final_wf,
data = data_train_qb)
# View Coefficients
final_model %>%
workflows::extract_fit_parsnip() %>%
broom::tidy()
Code
Code
# Calculate combined range for axes
axis_limits <- range(c(df$pred, df$fantasyPoints_lag), na.rm = TRUE)
ggplot(
df,
aes(
x = pred,
y = fantasyPoints_lag)) +
geom_point(
size = 2,
alpha = 0.6) +
geom_abline(
slope = 1,
intercept = 0,
color = "blue",
linetype = "dashed") +
coord_equal(
xlim = axis_limits,
ylim = axis_limits) +
labs(
title = "Predicted vs Actual Fantasy Points (Test Data)",
x = "Predicted Fantasy Points",
y = "Actual Fantasy Points"
) +
theme_classic()
19.7.3 Elastic Net
Code
# Set seed for reproducibility
set.seed(52242)
# Set up cross-validation
folds <- rsample::group_vfold_cv(
data_train_qb,
group = gsis_id,
v = 10) # 10-fold cross-validation
# Define Recipe (Formula)
rec <- recipes::recipe(
fantasyPoints_lag ~ .,
data = data_train_qb %>% select(-gsis_id, -fantasyPointsMC_lag))
# Define Model
enet_spec <-
linear_reg(
penalty = tune(),
mixture = tune()) %>%
set_engine("glmnet")
# Workflow
enet_wf <- workflows::workflow() %>%
workflows::add_recipe(rec) %>%
workflows::add_model(enet_spec)
# Define a regular grid for both penalty and mixture
grid_enet <- dials::grid_regular(
dials::penalty(range = c(-4, -1)),
dials::mixture(range = c(0, 1)),
levels = c(20, 5) # 20 penalty values × 5 mixture values
)
# Tune the Grid
cv_results <- tune::tune_grid(
enet_wf,
resamples = folds,
grid = grid_enet,
metrics = metric_set(rmse, mae, rsq),
control = control_grid(save_pred = TRUE)
)
# View Cross-Validation metrics
tune::collect_metrics(cv_results)
Code
best_penalty <- select_best(cv_results, metric = "mae")
# Finalize Workflow with Best Penalty
final_wf <- finalize_workflow(
enet_wf,
best_penalty)
# Fit Final Model on Training Data
final_model <- workflows::fit(
final_wf,
data = data_train_qb)
# View Coefficients
final_model %>%
workflows::extract_fit_parsnip() %>%
broom::tidy()
Code
Code
# Calculate combined range for axes
axis_limits <- range(c(df$pred, df$fantasyPoints_lag), na.rm = TRUE)
ggplot(
df,
aes(
x = pred,
y = fantasyPoints_lag)) +
geom_point(
size = 2,
alpha = 0.6) +
geom_abline(
slope = 1,
intercept = 0,
color = "blue",
linetype = "dashed") +
coord_equal(
xlim = axis_limits,
ylim = axis_limits) +
labs(
title = "Predicted vs Actual Fantasy Points (Test Data)",
x = "Predicted Fantasy Points",
y = "Actual Fantasy Points"
) +
theme_classic()
19.7.4 Random Forest Machine Learning
19.7.4.1 Cross-Sectional Data
Code
# Set seed for reproducibility
set.seed(52242)
# Set up cross-validation
folds <- rsample::group_vfold_cv(
data_train_qb,
group = gsis_id,
v = 10) # 10-fold cross-validation
# Define Recipe (Formula)
rec <- recipes::recipe(
fantasyPoints_lag ~ .,
data = data_train_qb %>% select(-gsis_id, -fantasyPointsMC_lag))
# Define Model
rf_spec <-
parsnip::rand_forest(
mtry = tune::tune(),
min_n = tune::tune(),
trees = 500) %>%
parsnip::set_mode("regression") %>%
parsnip::set_engine("ranger", importance = "impurity")
# Workflow
rf_wf <- workflows::workflow() %>%
workflows::add_recipe(rec) %>%
workflows::add_model(rf_spec)
# Create Grid
n_predictors <- recipes::prep(rec) %>%
recipes::juice() %>%
dplyr::select(-fantasyPoints_lag) %>%
ncol()
# Dynamically define ranges based on data
rf_params <- hardhat::extract_parameter_set_dials(rf_spec) %>%
dials:::update.parameters(
mtry = dials::mtry(range = c(1L, n_predictors)),
min_n = dials::min_n(range = c(2L, 10L))
)
rf_grid <- dials::grid_random(rf_params, size = 15) #dials::grid_regular(rf_params, levels = 5)
# Tune the Grid
cv_results <- tune::tune_grid(
rf_wf,
resamples = folds,
grid = rf_grid,
metrics = metric_set(rmse, mae, rsq),
control = control_grid(save_pred = TRUE)
)
# View Cross-Validation metrics
tune::collect_metrics(cv_results)
Code
best_penalty <- select_best(cv_results, metric = "mae")
# Finalize Workflow with Best Penalty
final_wf <- finalize_workflow(
rf_wf,
best_penalty)
# Fit Final Model on Training Data
final_model <- workflows::fit(
final_wf,
data = data_train_qb)
# View Feature Importance
rf_fit <- final_model %>%
workflows::extract_fit_parsnip()
rf_fit
parsnip model object
Ranger result
Call:
ranger::ranger(x = maybe_data_frame(x), y = y, mtry = min_cols(~3L, x), num.trees = ~500, min.node.size = min_rows(~9L, x), importance = ~"impurity", num.threads = 1, verbose = FALSE, seed = sample.int(10^5, 1))
Type: Regression
Number of trees: 500
Sample size: 1582
Number of independent variables: 73
Mtry: 3
Target node size: 9
Variable importance mode: impurity
Splitrule: variance
OOB prediction error (MSE): 6315.315
R squared (OOB): 0.5148104
season games gs
126359.5422 319578.3505 382535.6517
years_of_experience age ageCentered20
124783.8817 199342.1467 207939.2802
ageCentered20Quadratic height weight
205495.8993 73752.5223 135823.1306
rookie_year draft_number fantasy_points
121270.5272 235944.0545 641663.0852
fantasy_points_ppr fantasyPoints completions
753816.3235 673276.9667 577883.0044
attempts passing_yards passing_tds
469528.4582 692904.1588 535369.7048
passing_interceptions sacks_suffered sack_yards_lost
154679.5322 294698.8048 237255.5206
sack_fumbles sack_fumbles_lost passing_air_yards
92927.1076 52731.7059 389664.7190
passing_yards_after_catch passing_first_downs passing_epa
371266.0798 533911.4945 469375.8940
passing_cpoe passing_2pt_conversions pacr
159111.5776 43285.8990 136591.0157
carries rushing_yards rushing_tds
191763.4732 227786.6905 83784.6521
rushing_fumbles rushing_fumbles_lost rushing_first_downs
62378.0789 31328.7581 218593.7055
rushing_epa rushing_2pt_conversions special_teams_tds
174856.7482 22130.6589 432.8602
pocket_time.pass pass_attempts.pass throwaways.pass
88617.4566 481207.3858 330967.1992
spikes.pass drops.pass bad_throws.pass
74008.1352 364123.0798 411395.3664
times_blitzed.pass times_hurried.pass times_hit.pass
481435.4648 374116.1093 266521.6649
times_pressured.pass batted_balls.pass on_tgt_throws.pass
424836.7389 151484.9543 402761.4106
rpo_plays.pass rpo_yards.pass rpo_pass_att.pass
192637.5465 225659.8819 243476.9956
rpo_pass_yards.pass rpo_rush_att.pass rpo_rush_yards.pass
205064.4104 77907.9318 117492.6716
pa_pass_att.pass pa_pass_yards.pass drop_pct.pass
347266.3891 454836.7853 129912.7438
bad_throw_pct.pass on_tgt_pct.pass pressure_pct.pass
145508.9586 106607.2086 126197.3464
ybc_att.rush yac_att.rush att.rush
139772.9950 120705.2307 367870.7638
yds.rush td.rush x1d.rush
208163.2897 125274.5721 221642.2967
ybc.rush yac.rush brk_tkl.rush
188152.9223 173028.6991 52392.4196
att_br.rush
86667.3297
Code
Code
# Calculate combined range for axes
axis_limits <- range(c(df$pred, df$fantasyPoints_lag), na.rm = TRUE)
ggplot(
df,
aes(
x = pred,
y = fantasyPoints_lag)) +
geom_point(
size = 2,
alpha = 0.6) +
geom_abline(
slope = 1,
intercept = 0,
color = "blue",
linetype = "dashed") +
coord_equal(
xlim = axis_limits,
ylim = axis_limits) +
labs(
title = "Predicted vs Actual Fantasy Points (Test Data)",
x = "Predicted Fantasy Points",
y = "Actual Fantasy Points"
) +
theme_classic()
Code
Now we can stop the parallel backend:
19.7.4.2 Longitudinal Data
Approaches to estimating random forest models with longitudinal data are described by @Hu & Szymczak (2023). Below, we fit longitudinal random forest models using the MERF()
function of the LongituRF
package (Capitaine, 2020).
Code
smerf <- LongituRF::MERF(
X = data_train_qb_matrix %>% as_tibble() %>% dplyr::select(season:att_br.rush) %>% as.matrix(),
Y = data_train_qb_matrix[,c("fantasyPoints_lag")] %>% as.matrix(),
Z = data_train_qb_matrix[,c("pacr")] %>% as.matrix(),
id = data_train_qb_matrix[,c("gsis_id")] %>% as.matrix(),
time = data_train_qb_matrix[,c("ageCentered20")] %>% as.matrix(),
ntree = 500,
sto = "BM")
[1] "stopped after 7 iterations."
Call:
randomForest(x = X, y = ystar, ntree = ntree, mtry = mtry, importance = TRUE)
Type of random forest: regression
Number of trees: 500
No. of variables tried at each split: 25
Mean of squared residuals: 169.7337
% Var explained: 98.52
[,1]
[1,] -0.2888613187
[2,] 0.0808783165
[3,] -0.3744136176
[4,] -0.0058870803
[5,] -0.0393494084
[6,] -0.5659093504
[7,] -0.0743022476
[8,] 1.0283239780
[9,] -0.3673934691
[10,] -0.0630583207
[11,] -0.5414705679
[12,] 0.1017234109
[13,] 0.3420394432
[14,] -0.1363133409
[15,] 0.0000000000
[16,] 0.0000000000
[17,] -0.0998732993
[18,] -0.7019422920
[19,] 0.1330643560
[20,] -0.2041909176
[21,] 0.2415654691
[22,] -0.1674386429
[23,] 0.0000000000
[24,] 0.1968118235
[25,] 0.5273468124
[26,] 0.0896266074
[27,] -0.2628429684
[28,] -0.6742935540
[29,] -0.2152977433
[30,] 0.0000000000
[31,] -0.1263935068
[32,] -0.9390210241
[33,] -0.0062962015
[34,] 0.0000000000
[35,] -0.0723451093
[36,] -0.1762272416
[37,] 0.3438144806
[38,] 0.3545495863
[39,] 0.0000000000
[40,] -0.1589880010
[41,] -0.3607526050
[42,] 0.3772846485
[43,] -0.0432659094
[44,] 0.0000000000
[45,] -0.2589126065
[46,] -0.5236477402
[47,] 0.7236810758
[48,] -0.2049018673
[49,] -0.7008137889
[50,] 0.0000000000
[51,] -1.0315047170
[52,] -0.0525755297
[53,] 0.0623901784
[54,] -0.5037765373
[55,] 0.6081956896
[56,] -0.7068703936
[57,] -0.2709923931
[58,] 0.1176946626
[59,] -0.7704637674
[60,] -0.0820481301
[61,] -0.5117325553
[62,] -1.1168740086
[63,] 0.0000000000
[64,] 0.0058136970
[65,] -0.2073675421
[66,] -0.8032834148
[67,] -0.3835048598
[68,] 0.1985783531
[69,] 0.0000000000
[70,] -0.0616215564
[71,] -0.4097112095
[72,] -0.4874830616
[73,] -0.1557869719
[74,] 0.0000000000
[75,] -0.0548501165
[76,] -0.1640681418
[77,] -0.3571451462
[78,] 0.0000000000
[79,] -0.1568558413
[80,] -0.7074326051
[81,] -0.1196436451
[82,] 0.3742585560
[83,] -0.1140468176
[84,] 0.0000000000
[85,] -0.0304912566
[86,] 0.0000000000
[87,] -0.1401624416
[88,] -0.0056426972
[89,] -0.0922167126
[90,] -0.1686558039
[91,] 0.0273782580
[92,] -0.4221551188
[93,] 0.6879985329
[94,] -2.1740962380
[95,] -0.2191074290
[96,] 0.0000000000
[97,] -0.1950516362
[98,] -0.1738432756
[99,] 0.1355646693
[100,] -0.0664723600
[101,] -0.2172179376
[102,] -0.4911641723
[103,] -0.3849648370
[104,] 0.0000000000
[105,] -0.0389591762
[106,] -0.1225083999
[107,] -0.9409084854
[108,] -0.2764812233
[109,] 0.0280409544
[110,] 0.0113088602
[111,] -0.0212477742
[112,] -0.1297398117
[113,] 0.0000000000
[114,] -0.0364581717
[115,] 0.0000000000
[116,] -0.1358905528
[117,] -0.0339585483
[118,] 0.0289559387
[119,] -0.0780591097
[120,] 0.0814471670
[121,] -0.1230061929
[122,] 0.0000000000
[123,] 0.0000000000
[124,] 0.0000000000
[125,] -0.1678915810
[126,] -0.0616035284
[127,] 0.2075364291
[128,] -0.0453085070
[129,] 0.7101641500
[130,] 0.0002488762
[131,] -0.1459063106
[132,] -0.0779694648
[133,] 0.0253412662
[134,] -0.0891190294
[135,] -0.1157799493
[136,] 0.1463396794
[137,] 0.0346808209
[138,] -0.0811197578
[139,] -0.1877274195
[140,] 0.0680455050
[141,] -0.1237211515
[142,] 0.0320504490
[143,] -0.1051331862
[144,] -0.0578672214
[145,] -0.0411682413
[146,] -0.2170529649
[147,] -0.0484276259
[148,] -0.3245280372
[149,] -0.3029682208
[150,] -0.0590264989
[151,] -0.0722641195
[152,] -0.1121671714
[153,] -0.2952587545
[154,] 0.0309180091
[155,] 0.0008123435
[156,] 0.0028539354
[157,] 0.0277568942
[158,] -0.0983737888
[159,] -0.1323479878
[160,] -0.2920793771
[161,] -0.1702230761
[162,] 0.4197044884
[163,] -0.2536662440
[164,] -0.0705536478
[165,] 0.0000000000
[166,] 0.0000000000
[167,] -0.0540725840
[168,] -0.0881455927
[169,] -0.1143525171
[170,] -0.1495833663
[171,] -0.0368658083
[172,] -0.1972330595
[173,] -0.0348944107
[174,] -0.0619706566
[175,] -0.2914716872
[176,] -0.1072489335
[177,] 0.0834439417
[178,] -0.0840815999
[179,] -0.6446942228
[180,] -0.0470353951
[181,] -0.1110894639
[182,] -0.0459517509
[183,] -0.1227639243
[184,] -0.0346360599
[185,] -0.0656105739
[186,] 0.3067521544
[187,] 0.1015629861
[188,] 0.2124552576
[189,] -0.2712598474
[190,] 0.3040509120
[191,] -0.2845852061
[192,] -0.0672822660
[193,] -0.1363483749
[194,] -0.1514932215
[195,] -0.0781322626
[196,] -0.2091918814
[197,] -0.1513288869
[198,] -0.1447832982
[199,] -0.1108976971
[200,] -0.0426150255
[201,] -0.4414353674
[202,] -0.1608418544
[203,] 0.0685016352
[204,] -0.1128642341
[205,] -0.2641431153
[206,] -0.0995623880
[207,] 0.3419404051
[208,] -0.2387375995
[209,] -0.0487325131
[210,] -0.2339632288
[211,] -0.0847638603
[212,] -0.1281424661
[213,] -0.1810291459
[214,] -0.0685577820
[215,] -0.0797967498
[216,] -0.2440177365
[217,] -0.1185505564
[218,] -0.0162147583
[219,] -0.0506618094
[220,] -0.1446682421
[221,] 0.0837719276
[222,] 0.3440242089
[223,] -0.0280589282
[224,] -0.2110092354
[225,] 0.3416017109
[226,] -0.6231208231
[227,] -0.0811446708
[228,] -0.5857962908
[229,] 0.4451250358
[230,] -0.0520991480
[231,] -0.0651970639
[232,] -0.0882738104
[233,] -0.1173651951
[234,] 2.0603249254
[235,] -0.4972618003
[236,] -0.1036548536
[237,] -0.0908128118
[238,] -0.1327611613
[239,] 0.0000000000
[240,] -0.1356394026
[241,] 0.0000000000
[242,] -0.2135824344
[243,] -0.1241929944
[244,] 0.6821125712
[245,] -0.2754187578
[246,] -0.1855253738
[247,] 1.5222791834
[248,] 0.2208413393
[249,] 1.0562361717
[250,] -0.0048449806
[251,] -0.1189017421
[252,] -0.4895602600
[253,] -0.1135719040
[254,] -0.0676320823
[255,] 0.1005739634
[256,] -0.2419745702
[257,] -0.0801348600
[258,] -0.1102444586
[259,] -0.0879023798
[260,] -0.1375754176
[261,] -0.1584847702
[262,] -0.0595301949
[263,] -0.1823720474
[264,] -0.3042325724
[265,] -0.0842451390
[266,] -0.0744563202
[267,] -0.1511682239
[268,] 0.2334425949
[269,] -0.1461719197
[270,] -0.0238821889
[271,] 0.6275300583
[272,] -0.0048609116
[273,] 0.6164579347
[274,] -0.0096025503
[275,] 0.7755123430
[276,] 0.5307152387
[277,] -0.0761037951
[278,] -0.1927386373
[279,] -0.0075903290
[280,] -0.1807851463
[281,] 0.1755096293
[282,] 0.0000000000
[283,] 0.2408096923
[284,] -0.0164246827
[285,] -0.9962304993
[286,] -0.1438171959
[287,] -0.8472877077
[288,] -0.0568512443
[289,] -0.0479584283
[290,] -0.0629269778
[291,] 0.3797721495
[292,] -0.1126645387
[293,] -0.1560998845
[294,] -0.1582869444
[295,] -0.1703760580
[296,] -0.0661093907
[297,] -0.1157941971
[298,] -0.1176459069
[299,] -0.1976240830
[300,] -0.0588367025
[301,] 0.0155135863
[302,] -0.2473955545
[303,] -0.1322487407
[304,] -0.0689979890
[305,] -0.0572836415
[306,] -0.3117814318
[307,] -0.1636116833
[308,] -0.1269969733
[309,] -0.2330664276
[310,] -0.0804619858
[311,] -0.0527538119
[312,] 0.1562319192
[313,] -0.1585361796
[314,] 0.0112082915
[315,] 0.0231350127
[316,] 0.0375446047
[1] -6.24230182 -6.83557345 -7.74629285 -9.39912475 -9.80626999
[6] -10.17763529 -9.83733058 -9.68053304 -5.46755523 -6.19466854
[11] -6.70500045 -6.93478231 -0.11403419 -6.38025816 -6.93047546
[16] -7.39190360 -7.74734238 -9.05357904 -10.71682476 -10.25849512
[21] -10.42186361 3.03366171 2.46257813 1.66454175 1.18827166
[26] -0.38672230 -1.68057708 -2.55647448 -2.79103191 -2.81142716
[31] 3.75707694 3.36281923 1.41078324 -2.23552264 -3.18683105
[36] -3.58856922 -5.23270510 -8.33094455 -11.24384133 -12.34123979
[41] -13.21866132 -2.91742381 -3.49130516 -4.15386561 -5.52902706
[46] -7.03120125 -8.34016470 -9.24312629 -9.82446698 -4.36372971
[51] -5.95825319 -7.02111990 -8.18748189 -8.70324232 -10.24269878
[56] -13.08144710 -12.20015985 -11.93849962 -12.33514205 -13.29029603
[61] -14.37660131 -14.66336668 -9.46971092 -10.63412306 -11.49030189
[66] -12.43228725 -13.26196619 -16.36374563 -18.39822528 -19.09154044
[71] -2.66273587 -3.47817862 -4.43389227 -4.75479110 18.21068145
[76] 16.89649786 18.19668132 15.96743857 14.03303361 6.68542543
[81] 2.20703824 -0.41892723 -2.73024402 -4.41056009 -5.48568413
[86] -0.66108019 0.47454653 0.52217328 -1.05022037 -2.08215294
[91] -3.50992777 -3.45799658 -3.55661324 -4.67520182 -5.30335933
[96] -5.37434147 -7.89431208 -8.69836905 -9.02127868 -9.51558331
[101] -9.42789106 -9.16233264 -8.73886099 -3.46083579 -4.92416166
[106] -6.26892876 -7.60018986 -8.45498959 -9.36887413 -9.73398466
[111] -9.15135167 1.63070687 2.83347187 3.15250721 3.33367859
[116] 3.55762065 2.72397514 2.67556222 3.49623413 3.19022692
[121] 3.67231242 1.28573563 1.05479490 -2.13444064 -1.35609241
[126] -1.42656781 -1.55727578 -2.35065043 -3.12437404 -3.06676744
[131] -4.17996825 -6.82848608 -8.19349574 -9.26143499 -10.28664183
[136] -11.93963508 -12.67030972 -14.65126655 -13.14782948 -12.04067079
[141] -10.81311442 -9.32058862 -9.24503948 9.38168410 8.47086174
[146] 5.67378687 0.15929808 -1.87548037 -3.37729729 -9.91076801
[151] -10.32220145 -5.75335789 -5.70605603 -5.15503115 -1.73276263
[156] -2.74182017 -7.43011042 -8.01308457 -8.20411460 -3.01136660
[161] 2.06393832 2.97267291 3.33678604 1.61958528 -0.82903723
[166] -3.37502952 -4.77980638 -5.96785422 -6.60131247 -2.84186051
[171] -3.22125848 -3.03244913 -4.39210280 -5.77994017 -7.90306597
[176] -8.84484146 -10.15400408 -10.11168485 -9.93159692 -9.67419193
[181] -9.06018472 -3.60494565 -3.81659416 -3.43714397 -3.68644417
[186] -3.27483403 -2.34678398 -2.14201532 -2.23034073 -3.16179072
[191] -3.73208464 -6.85742418 -6.68931554 -7.20346554 -7.18384384
[196] -8.29720737 -9.39333497 -9.44984314 -9.47198372 -9.43557641
[201] -5.11214560 -5.29147181 -5.71092276 -5.61097927 -5.96859928
[206] -6.70377744 -6.37994006 -6.95949463 -2.22463071 -2.68872145
[211] -2.43466616 -6.35716288 -7.31014674 -8.67872250 -8.86083748
[216] -7.20857202 -8.09384392 -8.48913805 -7.03650843 -8.18019972
[221] -10.77474106 -13.14918077 -15.21389780 -14.96235194 -14.99613966
[226] -16.65361094 -16.29499082 -16.88488053 -7.24768446 -7.12621537
[231] -7.81618868 -8.38610661 -8.45855834 -4.76718515 -3.22811496
[236] -3.72093495 -4.52247594 -4.96573941 -7.34230497 -5.83202727
[241] -5.27725953 -5.92204278 -6.40492935 -9.28440655 -9.92626346
[246] -9.80221582 -10.47903674 -12.22722905 -11.09446085 -9.72097726
[251] -11.39469805 -14.06287552 -14.60880065 -17.17527796 -16.74301036
[256] -3.06469296 -4.36871536 -4.20314512 -4.23044571 -4.66985647
[261] -4.23896237 -3.22271831 -2.39875751 -0.95995186 10.56248850
[266] 12.76717142 14.82447898 17.40913129 20.52407989 19.74691407
[271] 19.55739621 18.05747881 17.16379278 17.31752883 18.44216302
[276] 20.73244150 26.10966990 19.62527826 11.50373636 7.57978459
[281] 0.91925112 -6.90205956 -7.17329548 -7.55302283 -8.78890494
[286] -9.83859279 -10.87492287 -11.55831861 -4.89466088 -3.94620756
[291] -4.35611267 -5.37313746 -6.21834075 -6.42336111 -7.54797327
[296] -9.57166573 -9.88961157 -3.76964033 -4.09714192 -1.00805627
[301] -4.62899368 -4.54312828 -4.32817672 -4.73108119 -8.75486860
[306] -9.46359137 -9.28310233 -2.20431399 -2.49264552 -3.00525839
[311] -3.33608711 -3.70688903 -3.62740077 -2.84098604 -6.64197691
[316] -12.57946481 -12.80639982 -12.73702165 -3.16780106 -3.41697654
[321] -4.17822787 -6.07490364 -6.25354846 -6.58317088 -8.66776372
[326] -9.76490491 -10.31226267 -9.68780511 -8.05423698 -8.19701183
[331] -5.97336362 -3.58321067 -3.13426729 -6.76402754 -6.86594636
[336] -6.38064749 -5.39432322 -3.23868600 -0.73745392 0.59351040
[341] 0.28467364 0.06747873 -3.53037295 -4.39486432 -5.07434090
[346] -6.98622724 -10.21221982 -10.65310855 -10.83014646 -4.47809052
[351] -4.86243645 -5.11831374 -5.49723893 -5.76989051 -3.57616999
[356] -4.04108843 -5.10444531 -8.91202631 -11.47021490 -13.68022345
[361] -13.97368602 -14.31158263 -14.57181210 -14.36592912 -14.20083894
[366] -13.79533352 -2.51637026 -3.33992859 -4.49550612 -5.83139358
[371] -5.83436485 -3.43276495 -3.98075076 -4.63757329 -2.71768477
[376] -2.43096740 -2.85023201 -1.29853490 -1.73111558 -3.77676168
[381] 2.15988706 4.01368238 4.78912933 6.40254821 8.30904588
[386] 10.90825677 15.10311472 8.39779522 13.48152933 16.25622525
[391] 19.63331304 19.33310014 17.91618380 18.17078510 18.33416571
[396] 17.13355624 17.22266966 17.14942450 17.44541808 19.44138478
[401] 21.34739529 19.57174853 19.36824167 5.00090458 5.83846995
[406] 5.94343595 5.25137052 2.55690668 1.21562144 -0.42820715
[411] -1.33998342 -3.02814695 -2.06721973 -2.82550846 -3.49870671
[416] -3.92177899 -4.20089188 -2.56987279 -4.44592227 -5.37405015
[421] -8.94898305 -10.22675311 -10.16770910 -10.02295105 -7.11515612
[426] -7.56378904 -7.70612615 -2.13548738 -2.41869555 -0.40852328
[431] -4.89237993 -4.30567812 -4.59287093 -4.66556185 -6.92677204
[436] 8.56481223 6.58500470 3.62066258 1.81742082 -0.95355959
[441] -2.32567656 -3.04045632 -3.44733914 -4.06270951 -5.50484323
[446] -5.67752403 -6.25757847 -7.42737764 -4.13008502 -5.71177156
[451] -6.91875777 -7.90088193 -8.24885317 -5.91993310 -7.05672443
[456] -9.29343409 -10.06472554 -2.78155460 -1.75105983 2.47779669
[461] 4.58892820 8.02405901 -4.85345719 -5.84601930 -6.77836300
[466] -8.17834419 -9.66597741 -10.68349089 -11.10526178 -10.59989391
[471] -0.24264059 -0.50180988 3.56851967 5.70384525 8.10934220
[476] 9.28382204 11.51044034 14.57793114 18.50355433 21.74516804
[481] 24.49581793 28.53716541 26.47706168 24.00271179 21.42563342
[486] 18.16161471 14.31641279 9.86118200 6.56496139 1.14044002
[491] -3.90589348 -10.39473660 -10.66130937 -12.44498217 -11.91564541
[496] -1.43768198 -2.12255613 -3.58659292 -5.82021930 -7.29206142
[501] -7.99248971 -8.56840106 -9.34910852 -9.89944564 -10.49050569
[506] -7.78664036 -8.84980665 -9.99635969 -9.83569038 -8.46922674
[511] -11.26771950 -12.68909652 -13.44850268 -15.09015619 -13.95420434
[516] -13.55607696 -12.98374674 -13.01453839 -11.94637619 -11.35060757
[521] -1.28893029 -1.38819436 -1.62129393 -2.98008449 -3.63581366
[526] -5.29875811 -4.16672175 -4.56689717 -5.45520533 -6.31646547
[531] -5.20729433 -4.44936363 -3.86000953 -4.99101856 -7.53656899
[536] -2.69987083 -6.15751921 -7.42511813 -6.49265172 -9.62809611
[541] -10.83529529 -11.01163387 -5.17367064 -5.71336742 -6.98180044
[546] -7.74721255 -8.14686793 -8.52106645 -8.84831429 4.35438757
[551] 2.86092886 0.14171010 -2.29268470 -0.61458317 1.64765936
[556] 3.73308462 4.35178683 5.27029367 6.26714315 8.29221556
[561] 4.18035744 0.77509294 -1.73600877 -9.85301882 -11.05107353
[566] -10.84636247 -11.03438926 -11.15686854 -7.40713339 -8.71827149
[571] -8.73016293 -8.90038797 -9.04280725 -4.47463411 -5.42605706
[576] -5.84921365 -6.42513461 -6.41536870 -7.41620465 -8.57142582
[581] -9.14146807 -9.04017595 -8.60622264 -8.82190307 -8.91349538
[586] -3.16989672 -4.30869585 -6.86438124 -7.05314327 -3.81149133
[591] -2.93242170 -4.03393022 -4.55308114 -6.02180183 -6.17535889
[596] -6.20826660 -5.71969729 -5.42286783 -4.57408579 -4.06740054
[601] -4.17081101 -5.80298394 -6.45242285 -7.50965899 -9.16349623
[606] -9.64905873 -10.48128500 -4.25630522 -5.45299164 -6.09225623
[611] -6.76439955 -6.97021531 -7.54405760 -8.64192636 -8.68587002
[616] -8.38809564 -3.79024476 -6.84564538 -3.66588585 -1.31597828
[621] -3.87158894 -5.11791112 -6.64361481 -7.48228072 -2.57990346
[626] -3.95795761 18.42242972 -3.45576795 -4.23578714 -6.10765845
[631] -6.84091412 -7.39796791 -7.24364652 -6.56182609 -1.36863047
[636] -1.16143239 0.46860114 1.20293634 1.87383324 1.39227791
[641] 2.69454209 4.05668656 6.04152829 7.72910518 7.10448005
[646] 6.75655628 6.48980479 5.96510264 0.94411419 3.27077542
[651] 2.67790757 2.37125058 -1.20297310 -0.01993767 2.22115757
[656] 5.46081916 6.90706103 8.14764944 8.46740314 8.31638001
[661] 9.46877130 9.43707565 9.41415867 9.10162861 9.39914535
[666] 9.13671226 8.16273408 7.71113375 6.49910489 -4.28820556
[671] -4.47381764 2.37913101 8.02154004 15.87713823 20.44800919
[676] 24.09833302 28.25113809 28.14525151 27.88630009 31.58988046
[681] 29.03169463 28.12512716 24.57194395 24.32195682 21.97503446
[686] 20.44657853 14.54962377 7.97345438 1.35905459 3.62144984
[691] 2.66563662 -3.59998265 -3.39246896 -3.44498776 -4.52406127
[696] -5.84825927 -8.01885918 -9.66080468 -10.95301690 -10.19398479
[701] -9.49820137 -10.76321246 -8.67936825 -7.15805713 -5.63166485
[706] -5.27184223 -3.76901881 -2.00349165 -2.07705350 -0.55015778
[711] -1.03879059 -2.24746819 -3.04484183 -2.91832886 -1.81918390
[716] 0.11372849 2.05508466 -1.80900247 -4.56057825 -7.72193397
[721] -8.04692518 -8.46350896 -1.27080275 -4.40589408 -5.98910670
[726] -7.23725152 -9.08065823 -9.37764373 -9.77817764 -9.85794926
[731] -10.03417979 -9.90714060 -10.14381168 -9.59000512 -9.19365115
[736] -0.53510384 -0.04003796 1.00628882 -0.01689632 -0.53084528
[741] -1.84817202 -2.62013042 -4.02807519 -5.45489263 -6.25331308
[746] -7.46117756 -8.24951466 -9.03063769 -9.33164900 -6.34858131
[751] -6.31172598 -4.09741325 -3.36878746 -2.68142818 -3.00514169
[756] -3.73732671 -4.35364746 -3.87082465 -5.87295017 -6.66450476
[761] -7.86339295 -8.17232500 -11.34893495 -14.81782583 -14.67185493
[766] -2.87790475 -7.22314171 -8.71093623 -8.31149367 -8.08009077
[771] -8.14267391 -8.35200590 2.21456085 2.67256068 1.93931104
[776] 0.42273738 -1.43771210 -1.84900133 -2.14805749 -2.42472500
[781] -4.76670531 -8.43393795 -9.69083695 -10.77224344 -5.24332070
[786] -6.73763273 -6.96209829 -7.24893044 -7.73088787 -8.67267888
[791] -9.48685510 -11.07921877 -11.19922598 -11.59050787 -11.64851686
[796] -11.22799406 -5.03402909 -6.19892510 -7.64253921 -8.20689625
[801] -9.57029948 -11.83073014 -12.44015256 -12.60445862 -12.71027242
[806] -5.66120665 -6.92835651 -7.50050483 -8.07549724 -8.40370029
[811] -4.18000362 -5.91969161 -7.46673211 -8.67106892 -9.83230228
[816] -10.53323745 -11.73021009 -11.06167116 -10.68576507 -1.94160596
[821] -8.21033609 -8.43045565 -8.16427721 -8.42550465 -8.92371485
[826] -9.18185679 -9.54311346 -9.76187683 -4.90908025 -5.85070027
[831] -2.64525986 -3.00572674 -7.09256872 -7.64210398 -8.21738024
[836] -2.28984342 -3.33915990 -3.93066686 -5.29652763 -4.95087886
[841] -5.66944248 -4.58455238 -5.38263177 -6.09236789 -6.44532033
[846] -2.64950369 -3.21691055 -2.87063394 -5.44123345 -7.89212404
[851] -9.55578409 -11.55278544 -11.13606767 -10.90155487 -10.42507224
[856] -9.70640177 -9.55628436 -8.54387915 -6.46077436 -1.24330026
[861] -1.05093900 -0.94360170 3.16701792 5.53422880 8.89887251
[866] 11.76571060 13.69883048 16.46703981 19.45372892 23.02878829
[871] 22.81948209 25.06774061 21.39796048 17.62981901 13.55444765
[876] 10.48967753 8.81427170 1.74204004 2.35217593 2.35592524
[881] 2.61529841 2.01960577 2.58219909 2.61730608 2.86468442
[886] 2.35670677 2.72950762 0.62760894 -1.27399917 -2.21610746
[891] -2.34628681 -1.62309254 -1.28494716 -0.84680487 -5.08435280
[896] -5.82125161 -7.27976400 -9.03458523 -10.37341462 -11.26343079
[901] -11.78618939 -12.47608675 -12.31281429 -12.04048915 -11.64667760
[906] -11.08917564 -11.65449103 -11.87604200 -12.02044430 -5.73175965
[911] -6.71810659 -7.58932900 -8.25161038 -8.74172134 -9.93842365
[916] -10.51111238 -10.26277889 -9.24530114 -4.96142272 -6.02513451
[921] -7.61364555 -8.86303520 -9.36177988 -10.20033950 -10.23656792
[926] 6.79241538 14.61116928 14.80979600 14.09149400 13.02799297
[931] 12.38654011 11.07487100 9.10329002 6.11490235 5.19387082
[936] 5.86982539 7.00913717 5.60839465 5.30292962 4.85036560
[941] 5.22268129 -4.75383162 -5.40861521 -6.23006708 -6.82562379
[946] -7.48095369 -8.04545576 -8.15274668 -7.96241802 -8.22461304
[951] -8.86695290 -9.52259500 -9.76945237 -9.54524440 -2.26478671
[956] -3.08033363 -5.11020471 -7.63184713 -10.30094252 -10.96402035
[961] -11.19985703 -11.83772513 1.02826004 -3.02211673 -2.52559389
[966] -5.88135789 -7.54596970 -7.97255415 -5.72952819 -6.96346842
[971] -7.56656754 -9.60203937 -8.87552867 -8.19850247 -7.88125007
[976] -7.64077896 -8.26037919 -8.24832586 -8.39258845 -8.41705685
[981] -9.08382526 -2.35957230 -6.23592720 -7.17684804 -7.75090099
[986] -2.05094062 -4.06560043 -4.17956773 -6.75211521 -7.71865070
[991] -8.07455889 -8.55405952 -8.52787313 -8.76122684 -9.62243163
[996] -11.43284759 -11.97631567 -12.08146369 -11.76853234 -3.60639453
[1001] -5.58927357 -8.14642160 -8.22693039 3.76710814 4.80113768
[1006] 3.64250036 2.97105061 2.22801382 0.95605691 0.38627322
[1011] 0.40433305 1.15007132 0.39701851 -0.31708849 -0.65070365
[1016] 2.56389001 6.77997560 -5.12105023 -7.49478021 -9.14829125
[1021] -9.56416941 -9.96777221 -2.35924492 -2.23080013 -0.91091742
[1026] 2.41103529 0.06181812 -3.26631288 -5.87164904 -7.66362114
[1031] -8.68351989 -9.62295576 -10.75087914 -10.78166622 -11.40298152
[1036] -11.57209432 -1.72610515 -6.06451757 -7.12060338 -7.56539186
[1041] -7.73547919 -1.96727377 -6.64102037 -6.97898743 -6.96965294
[1046] -6.63708249 -6.74209571 -6.34569704 -4.52594663 -5.28795977
[1051] -6.65823607 -8.41960071 -7.92078473 -8.32279319 -8.03812193
[1056] -7.41860571 -3.79345523 12.12881417 15.05557213 17.00375153
[1061] 16.70048989 18.10266607 16.79786343 15.75731606 14.51540601
[1066] 11.40281005 8.56321834 6.28859298 2.40678763 0.24866332
[1071] 4.86347910 2.89410552 0.42411698 -2.74392488 -4.98535134
[1076] -6.45635798 -8.39596028 -10.02111554 -12.33734522 -12.76253927
[1081] -12.97314771 6.84493941 10.16340216 14.73274018 14.92547844
[1086] 14.78824816 14.51214111 13.52263658 13.15465848 10.23474077
[1091] 6.20583595 1.36565388 -3.76926893 -7.97568977 -5.60255141
[1096] -5.97237348 -6.87238471 -9.54076090 -10.17094227 -10.78462098
[1101] -10.14725083 12.37020974 15.49557666 16.67670075 18.11160380
[1106] 16.24712282 13.56435284 -6.13865119 -7.26838939 -8.28897011
[1111] -8.60237284 -8.34010837 -8.15460467 -8.42113635 -3.59686834
[1116] -5.57720567 -4.32153743 -5.80644963 -7.08583508 -7.46181631
[1121] -7.67103459 -7.61087217 -8.16900349 -8.74885773 -4.84619831
[1126] -5.74019546 -6.08216384 -7.81853788 -8.28913913 -8.59548504
[1131] -8.88949247 -8.99218340 -5.00153465 -5.98921828 -7.98160053
[1136] -8.87456192 -9.33627071 -9.75756303 -8.93760226 -4.45773035
[1141] -5.88715061 -5.33951305 -4.57466978 -3.26805143 -1.11430364
[1146] 5.52587485 10.64281919 10.12185326 11.04496811 12.17107046
[1151] -3.79235755 -2.43675743 -2.77830334 -2.87903948 -8.28103217
[1156] -12.58440923 -10.12926268 -8.56804562 -7.86649532 -9.51282311
[1161] -11.72404495 -12.89410231 -13.14455942 -4.91064101 -5.57324595
[1166] 2.31566566 1.30467789 -0.12024910 0.44987209 1.27978722
[1171] 2.54335306 1.99125517 0.80904054 0.15544265 -1.64007508
[1176] -2.33316572 -6.31125578 -7.18005143 -7.70985980 -8.57689465
[1181] -8.86625918 -9.12155595 -2.97594127 -3.17061762 -3.39371903
[1186] -3.96935461 -3.31418781 -5.95527106 -5.29654042 -5.79875248
[1191] -8.13289062 -8.53499602 -8.82426666 -5.97801656 -6.59729679
[1196] -6.90788831 -7.27205254 7.80620223 4.44972826 0.18524811
[1201] -5.10347831 -10.93250310 -11.42117740 -2.24857071 -4.67664207
[1206] -7.57394105 -9.52935797 -16.52449109 -16.37628383 -15.88248464
[1211] -13.99201738 -12.67842224 -11.80926344 -3.46080244 -4.09597525
[1216] -6.66646294 -9.15324708 -8.71251359 -2.87237352 -3.20582936
[1221] -2.65324332 -2.81339725 -5.36271854 -6.57353024 -6.40568609
[1226] -2.71291921 -2.61747590 -2.23736212 -1.19172881 0.76646904
[1231] 7.31812281 -2.69598759 -5.26538635 -6.44634450 -7.57802727
[1236] -9.65841551 -12.00624674 -13.26713740 -15.89790628 -15.39453846
[1241] -15.73790463 -1.94690124 -7.99628108 -8.98985973 -9.45002591
[1246] -9.71710501 -10.79218702 -6.83107521 -7.84285327 -8.79276684
[1251] -10.14647187 -10.18001839 -1.93624468 -2.56929779 -2.13938661
[1256] -2.67966526 -5.23571537 -5.85672679 -7.01173191 -7.72505418
[1261] 2.52598615 1.42196648 1.54814238 -1.71753797 -2.54448594
[1266] -5.59485580 -8.56555687 -9.72753819 -10.10589839 8.75584549
[1271] 12.87083549 15.19346478 16.49643800 20.12937681 18.11828160
[1276] 19.88970311 17.02319357 16.76947821 -0.99006454 -6.60102925
[1281] -7.76854714 -8.94042597 5.87201855 8.58178058 8.73404383
[1286] 8.96705828 8.63378762 10.97274373 11.73524296 11.98788977
[1291] 11.76929315 -9.62812966 -12.86682752 -11.42096789 -13.33347834
[1296] -12.63418211 -11.78593208 -11.83383124 -10.79459769 -9.61449625
[1301] -5.20468082 -5.77950393 -6.41953058 -7.09980981 -13.63399300
[1306] -13.71334108 -13.99142406 -12.30977954 -11.32855812 -11.92330797
[1311] -11.87039596 10.73300793 9.99229408 9.04274351 4.81242419
[1316] 4.60441977 4.22272227 5.23502081 -5.83586844 -6.35210546
[1321] -2.69157850 -3.50532009 -7.37073665 -8.34128917 -8.64694332
[1326] -8.88261371 -3.55933451 -7.16570880 -9.00549277 -10.76727035
[1331] -9.87298226 -9.24117880 -8.90601656 -8.68686010 31.66646966
[1336] 31.11762963 31.84062392 31.63674103 32.05473616 25.30461911
[1341] 22.81749523 22.38479218 -10.29869980 -10.68042036 -2.66146802
[1346] -4.34237240 -7.76400135 -8.53548341 -9.17533432 -9.94522718
[1351] -3.42895194 -4.85600524 -6.15537194 -8.05594557 -8.54779869
[1356] -5.53574555 -7.22936479 -7.87759327 -8.16910203 -8.75219002
[1361] -8.95136961 -3.07603955 -6.19688826 -7.45635960 -9.08461934
[1366] -9.53143809 -10.06673902 -10.50831754 -2.39488859 -6.45643991
[1371] -7.58026343 -8.15255214 -8.30563020 -4.62627445 -2.67130453
[1376] -2.48825992 0.60135725 -6.89631153 -9.78091401 -9.90906686
[1381] -9.91275874 -10.11920607 -9.92514423 -9.89284566 -8.35284454
[1386] -9.09162044 -10.04375564 -12.02185332 -13.47654414 24.22462837
[1391] 24.43501972 24.29848692 27.03743819 34.62963284 40.35088404
[1396] 37.83203624 6.64424062 8.35205725 10.97770372 16.07627178
[1401] 21.25716204 25.10455485 25.46301034 24.00953192 36.10826525
[1406] 42.04630168 44.29111429 43.70832656 42.06833069 40.55995681
[1411] 0.84198215 2.56781669 2.74416535 4.83084516 8.09702212
[1416] 13.81088053 13.18405088 -5.78582119 -6.58658276 -6.94986107
[1421] -7.53987399 -18.86561716 -17.59952981 -4.08754323 -5.18481275
[1426] -5.96519573 -3.20957276 3.04234067 -1.08023096 -5.07136676
[1431] -6.09471268 -5.18760200 -9.09138628 -5.48133056 -6.94355604
[1436] -6.24630241 -3.84411697 -5.05251686 -6.14242785 -6.99900074
[1441] -7.26894508 -5.49798080 -7.79279028 -3.89782729 -4.93960068
[1446] -4.33127537 -2.41977189 -0.34368046 1.75758096 -4.87214884
[1451] -3.94620525 -4.34312752 -4.67969917 -5.98123224 -5.90165555
[1456] -6.74543440 -7.77270935 -4.58363432 -6.93305376 -8.91962669
[1461] -14.28964594 -14.80340923 -14.58265565 -5.86561309 -6.70890539
[1466] -7.84836375 -4.24082610 -5.63489216 -6.95982574 -7.53762354
[1471] -7.76445139 -8.19733606 12.28641326 -2.49871928 -1.95118161
[1476] -1.96491714 -3.89151008 -4.70022293 -2.02512085 -2.01908487
[1481] 7.23721123 11.90364670 9.87245370 9.37221877 -0.24660310
[1486] 12.36874697 12.23339726 11.87342964 10.86144816 9.70173354
[1491] -0.47520191 15.94198179 20.89002593 20.11577643 17.79209504
[1496] 17.84801240 18.58697208 22.78213381 24.53052389 30.47350294
[1501] 27.56829729 -2.69326026 -4.04533282 -5.53672839 -8.52243186
[1506] -8.35464965 -8.42345515 -4.14649004 -4.79594736 -4.77419320
[1511] -2.87024825 6.32105923 4.72380576 2.79338731 -0.23199247
[1516] -0.09574674 4.79553125 3.34856283 0.85672885 -0.26177498
[1521] -0.49346052 -0.75954970 -1.41532035 -2.32560951 -11.86860191
[1526] -13.92585368 -13.96455994 -3.17199482 -4.12939766 -4.02776181
[1531] -12.90079929 -19.39013043 -18.19603847 -2.54350524 -3.66547834
[1536] -3.83995599 -4.10310486 5.74948456 4.88620341 2.54366566
[1541] -4.07240851 -6.21400318 -7.01023086 -3.85882127 -4.13791810
[1546] -3.23192777 -4.15365381 -6.44481792 -6.69792550 -8.07295079
[1551] -7.81316032 -7.19420903 -2.56011652 -4.11724372 -7.07389175
[1556] -6.94027068 -8.56538979 -8.71758521 -2.99265208 0.75884605
[1561] 0.90928027 -5.33273057 -6.08275367 -6.57876530 -8.00522652
[1566] -2.86047294 -3.92857675 -7.55851748 -11.07549604 -7.29873332
[1571] -8.64205564 -2.35681342 -3.44182385 -2.53506162 -1.72655178
[1576] -3.60974326 -2.25750463 6.31042655 -2.84219789 0.33887936
[1581] 1.13454336 1.08890217
[1] 171.1873 131.2674 140.9046 142.3326 147.4578 167.9198 169.7337
19.7.5 k-Fold Cross-Validation
19.7.6 Leave-One-Out (LOO) Cross-Validation
19.7.7 Combining Tree-Boosting with Mixed Models
To combine tree-boosting with mixed models, we use the gpboost
package (Sigrist et al., 2025).
Adapted from here: https://towardsdatascience.com/mixed-effects-machine-learning-for-longitudinal-panel-data-with-gpboost-part-iii-523bb38effc
19.7.7.1 Process Data
If using a gamma distribution, it requires positive-only values:
19.7.7.2 Specify Predictor Variables
19.7.7.3 Specify General Model Options
19.7.7.4 Identify Optimal Tuning Parameters
For identifying the optimal tuning parameters for boosting, we partition the training data into inner training data and validation data. We randomly split the training data into 80% inner training data and 20% held-out validation data. We then use the mean absolute error as our index of prediction accuracy on the held-out validation data.
Code
# Partition training data into inner training data and validation data
ntrain_qb <- dim(data_train_qb_matrix)[1]
set.seed(52242)
valid_tune_idx_qb <- sample.int(ntrain_qb, as.integer(0.2*ntrain_qb)) #
folds_qb <- list(valid_tune_idx_qb)
# Specify parameter grid, gp_model, and gpb.Dataset
param_grid_qb <- list(
"learning_rate" = c(0.2, 0.1, 0.05, 0.01), # the step size used when updating predictions after each boosting round (high values make big updates, which can speed up learning but risk overshooting; low values are usually more accurate but require more rounds)
"max_depth" = c(3, 5, 7), # maximum depth (levels) of each decision tree; deeper trees capture more complex patterns and interactions but risk overfitting; shallower trees tend to generalize better
"min_data_in_leaf" = c(10, 50, 100), # minimum number of training examples in a leaf node; higher values = more regularization (simpler trees)
"lambda_l2" = c(0, 1, 5)) # L2 regularization penalty for large weights in tree splits; adds a "cost" for complexity; helps prevent overfitting by shrinking the contribution of each tree
other_params_qb <- list(
num_leaves = 2^6) # maximum number of leaves per tree; controls the maximum complexity of each tree (along with max_depth); more leaves = more expressive models, but can overfit if min_data_in_leaf is too small; num_leaves must be consistent with max_depth, because deeper trees naturally support more leaves; max is: 2^n, where n is the largest max_depth
gp_model_qb <- gpboost::GPModel(
group_data = data_train_qb_matrix[,"gsis_id"],
likelihood = model_likelihood,
group_rand_coef_data = cbind(
data_train_qb_matrix[,"ageCentered20"],
data_train_qb_matrix[,"ageCentered20Quadratic"]),
ind_effect_group_rand_coef = c(1,1))
gp_data_qb <- gpboost::gpb.Dataset(
data = data_train_qb_matrix[,pred_vars_qb],
categorical_feature = pred_vars_qb_categorical,
label = data_train_qb_matrix[,"fantasyPoints_lag"]) # could instead use mean-centered variable (fantasyPointsMC_lag) and add mean back afterward
# Find optimal tuning parameters
opt_params_qb <- gpboost::gpb.grid.search.tune.parameters(
param_grid = param_grid_qb,
params = other_params_qb,
num_try_random = NULL,
folds = folds_qb,
data = gp_data_qb,
gp_model = gp_model_qb,
nrounds = nrounds,
early_stopping_rounds = 50, # stops training early if the model hasn’t improved on the validation set in 50 rounds; prevents overfitting and saves time
verbose_eval = 1,
metric = "mae")
Error in fd$booster$update(fobj = fobj): [GPBoost] [Fatal] Inf occured in gradient wrt covariance / auxiliary parameter number 3 (counting starts at 1, total nb. par. = 4)
Error: object 'opt_params_qb' not found
A learning rate of 1 is very high for boosting. Even if a learning rate of 1 did well in tuning, I use a lower learning rate (0.1) to avoid overfitting. I also added some light regularization (lambda_l2
) for better generalization. I also set the maximum tree depth (max_depth
) at 5 to capture complex (up to 5-way) interactions, and set the maximum number of terminal nodes (num_leaves
) per tree at 2^5 (32). I set the minimum number of samples in any leaf (min_data_in_leaf
) to be 10.
19.7.7.5 Specify Model and Tuning Parameters
Code
gp_model_qb <- gpboost::GPModel(
group_data = data_train_qb_matrix[,"gsis_id"],
likelihood = model_likelihood,
group_rand_coef_data = cbind(
data_train_qb_matrix[,"ageCentered20"],
data_train_qb_matrix[,"ageCentered20Quadratic"]),
ind_effect_group_rand_coef = c(1,1))
gp_data_qb <- gpboost::gpb.Dataset(
data = data_train_qb_matrix[,pred_vars_qb],
categorical_feature = pred_vars_qb_categorical,
label = data_train_qb_matrix[,"fantasyPoints_lag"])
params_qb <- list(
learning_rate = 0.1,
max_depth = 5,
min_data_in_leaf = 10,
lambda_l2 = 1,
num_leaves = 2^5,
num_threads = num_cores)
nrounds_qb <- 123 # identify optimal number of trees through iteration and cross-validation
#gp_model_qb$set_optim_params(params = list(optimizer_cov = "nelder_mead")) # to speed up model estimation
19.7.7.6 Fit Model
Code
[GPBoost] [Info] Total Bins 8709
[GPBoost] [Info] Number of data points in the train set: 1582, number of used features: 73
[GPBoost] [Info] [GPBoost with gamma likelihood]: initscore=4.805531
[GPBoost] [Info] Start training from score 4.805531
19.7.7.7 Model Results
=====================================================
Covariance parameters (random effects):
Param.
Group_1 0
Group_1_rand_coef_nb_1 0
Group_1_rand_coef_nb_2 0
-----------------------------------------------------
Additional parameters:
Param.
shape 0.8186
=====================================================
19.7.7.8 Evaluate Accuracy of Model on Test Data
Code
# Test Model on Test Data
pred_test_qb <- predict(
gp_model_fit_qb,
data = data_test_qb_matrix[,pred_vars_qb],
group_data_pred = data_test_qb_matrix[,"gsis_id"],
group_rand_coef_data_pred = cbind(
data_test_qb_matrix[,"ageCentered20"],
data_test_qb_matrix[,"ageCentered20Quadratic"]),
predict_var = FALSE,
pred_latent = FALSE)
y_pred_test_qb <- pred_test_qb[["response_mean"]] # if outcome is mean-centered, add mean(data_train_qb_matrix[,"fantasyPoints_lag"])
predictedVsActual <- data.frame(
predictedPoints = y_pred_test_qb,
actualPoints = data_test_qb_matrix[,"fantasyPoints_lag"]
)
predictedVsActual
Code
19.7.7.9 Generate Predictions for Next Season
Code
# Generate model predictions for next season
pred_nextYear_qb <- predict(
gp_model_fit_qb,
data = newData_qb_matrix[,pred_vars_qb],
group_data_pred = newData_qb_matrix[,"gsis_id"],
group_rand_coef_data_pred = cbind(
newData_qb_matrix[,"ageCentered20"],
newData_qb_matrix[,"ageCentered20Quadratic"]),
predict_var = FALSE,
pred_latent = FALSE)
newData_qb$fantasyPoints_lag <- pred_nextYear_qb$response_mean
# Merge with player names
newData_qb <- left_join(
newData_qb,
nfl_playerIDs %>% select(gsis_id, name),
by = "gsis_id"
)
newData_qb %>%
arrange(-fantasyPoints_lag) %>%
select(name, fantasyPoints_lag, fantasyPoints)
19.8 Conclusion
19.9 Session Info
R version 4.5.1 (2025-06-13)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 24.04.2 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
locale:
[1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
[4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
[7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
time zone: UTC
tzcode source: system (glibc)
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] glmnet_4.1-9 Matrix_1.7-3 lubridate_1.9.4 forcats_1.0.0
[5] stringr_1.5.1 readr_2.1.5 tidyverse_2.0.0 gpboost_1.5.8
[9] R6_2.6.1 LongituRF_0.9 yardstick_1.3.2 workflowsets_1.1.1
[13] workflows_1.2.0 tune_1.3.0 tidyr_1.3.1 tibble_3.3.0
[17] rsample_1.3.0 recipes_1.3.1 purrr_1.0.4 parsnip_1.3.2
[21] modeldata_1.4.0 infer_1.0.9 ggplot2_3.5.2 dplyr_1.1.4
[25] dials_1.4.0 scales_1.4.0 broom_1.0.8 tidymodels_1.3.0
[29] powerjoin_0.1.0 missRanger_2.6.1 future_1.58.0 petersenlab_1.1.6
loaded via a namespace (and not attached):
[1] RColorBrewer_1.1-3 shape_1.4.6.1 rstudioapi_0.17.1
[4] jsonlite_2.0.0 magrittr_2.0.3 farver_2.1.2
[7] nloptr_2.2.1 rmarkdown_2.29 vctrs_0.6.5
[10] minqa_1.2.8 base64enc_0.1-3 sparsevctrs_0.3.4
[13] htmltools_0.5.8.1 Formula_1.2-5 parallelly_1.45.0
[16] htmlwidgets_1.6.4 plyr_1.8.9 lifecycle_1.0.4
[19] iterators_1.0.14 pkgconfig_2.0.3 fastmap_1.2.0
[22] rbibutils_2.3 digest_0.6.37 colorspace_2.1-1
[25] furrr_0.3.1 Hmisc_5.2-3 labeling_0.4.3
[28] latex2exp_0.9.6 randomForest_4.7-1.2 RJSONIO_2.0.0
[31] timechange_0.3.0 compiler_4.5.1 withr_3.0.2
[34] htmlTable_2.4.3 backports_1.5.0 DBI_1.2.3
[37] psych_2.5.6 MASS_7.3-65 lava_1.8.1
[40] tools_4.5.1 pbivnorm_0.6.0 foreign_0.8-90
[43] ranger_0.17.0 future.apply_1.20.0 nnet_7.3-20
[46] doFuture_1.1.1 glue_1.8.0 quadprog_1.5-8
[49] nlme_3.1-168 grid_4.5.1 checkmate_2.3.2
[52] cluster_2.1.8.1 reshape2_1.4.4 generics_0.1.4
[55] gtable_0.3.6 tzdb_0.5.0 class_7.3-23
[58] data.table_1.17.6 hms_1.1.3 foreach_1.5.2
[61] pillar_1.10.2 mitools_2.4 splines_4.5.1
[64] lhs_1.2.0 lattice_0.22-7 survival_3.8-3
[67] FNN_1.1.4.1 tidyselect_1.2.1 mix_1.0-13
[70] knitr_1.50 reformulas_0.4.1 gridExtra_2.3
[73] stats4_4.5.1 xfun_0.52 hardhat_1.4.1
[76] timeDate_4041.110 stringi_1.8.7 DiceDesign_1.10
[79] yaml_2.3.10 boot_1.3-31 evaluate_1.0.4
[82] codetools_0.2-20 cli_3.6.5 rpart_4.1.24
[85] xtable_1.8-4 Rdpack_2.6.4 lavaan_0.6-19
[88] Rcpp_1.0.14 globals_0.18.0 gower_1.0.2
[91] GPfit_1.0-9 lme4_1.1-37 listenv_0.9.1
[94] viridisLite_0.4.2 mvtnorm_1.3-3 ipred_0.9-15
[97] prodlim_2025.04.28 rlang_1.1.6 mnormt_2.1.1