I want your feedback to make the book better for you and other readers. If you find typos, errors, or places where the text may be improved, please let me know. The best ways to provide feedback are by GitHub or hypothes.is annotations.
You can leave a comment at the bottom of the page/chapter, or open an issue or submit a pull request on GitHub: https://github.com/isaactpetersen/Fantasy-Football-Analytics-Textbook
Alternatively, you can leave an annotation using hypothes.is.
To add an annotation, select some text and then click the
symbol on the pop-up menu.
To see the annotations of others, click the
symbol in the upper right-hand corner of the page.
23 Simulation: Bootstrapping and the Monte Carlo Method
23.1 Getting Started
23.1.1 Load Packages
23.1.2 Load Data
23.2 Overview
A simulation is an “imitative representation” of a phenomenon that could exist the real world. In statistics, simulations are computer-driven investigations to better understand a phenomenon by studying its behavior under different conditions. For instance, we might want to determine the likely range of outcomes for a player, in terms of the range of fantasy points that a player might score over the course of a season. Simulations can be conducted in various ways. Two common ways of conducting simulations are via bootstrapping and via Monte Carlo simulation.
23.2.1 Bootstrapping
Bootstrapping involves repeated resampling (with replacement) from observed data. For instance, if we have 100 sources provide projections for a player, we could estimate the most likely range of fantasy points for the player by repeatedly sampling from the 100 projections.
23.2.2 Monte Carlo Simulation
Monte Carlo simulation involves repeated random sampling from a known distribution (Sigal & Chalmers, 2016). For instance, if we know the population distribution for the likely outcomes for a player—e.g., a normal distribution with a known mean (e.g., 150 points) and standard deviation (e.g., 20 points)—we can repeatedly sample randomly from this distribution.
23.3 Simulation of Projected Statistics and Points
23.3.1 Bootstrapping
Code
vars_by_pos <- list(
QB = c(
"games",
"pass_att", "pass_comp", "pass_yds", "pass_tds", "pass_int",
"rush_att", "rush_yds", "rush_tds",
"fumbles_lost",
"sacks",
"pass_09_tds", "pass_1019_tds", "pass_2029_tds", "pass_3039_tds", "pass_4049_tds", "pass_50_tds",
"pass_250_yds", "pass_300_yds", "pass_350_yds", "pass_40_yds", "pass_400_yds",
"rush_100_yds", "rush_150_yds", "rush_200_yds", "rush_40_yds"
),
RB = c(
"games",
"rush_att", "rush_yds", "rush_tds",
"rec_tgt", "rec", "rec_yds", "rec_tds",
"fumbles_lost",
"return_yds", "return_tds",
"rush_09_tds", "rush_1019_tds", "rush_2029_tds", "rush_3039_tds", "rush_4049_tds", "rush_50_tds",
"rush_50_yds", "rush_100_yds", "rush_150_yds", "rush_200_yds", "rush_40_yds",
"rec_rz_tgt", "rec_50_yds", "rec_100_yds", "rec_150_yds", "rec_200_yds", "rec_40_yds"
),
WR = c(
"games",
"pass_att", "pass_comp", "pass_yds", "pass_tds", "pass_int",
"rush_att", "rush_yds", "rush_tds",
"rec_tgt", "rec", "rec_yds", "rec_tds",
"fumbles_lost",
"return_yds", "return_tds",
"rush_09_tds", "rush_1019_tds", "rush_2029_tds", "rush_3039_tds", "rush_4049_tds", "rush_50_tds",
"rush_50_yds", "rush_100_yds", "rush_150_yds", "rush_200_yds", "rush_40_yds",
"rec_rz_tgt", "rec_50_yds", "rec_100_yds", "rec_150_yds", "rec_200_yds", "rec_40_yds"
),
TE = c(
"games",
"pass_att", "pass_comp", "pass_yds", "pass_tds", "pass_int",
"rush_att", "rush_yds", "rush_tds",
"rec_tgt", "rec", "rec_yds", "rec_tds",
"fumbles_lost",
"return_yds", "return_tds",
"rush_09_tds", "rush_1019_tds", "rush_2029_tds", "rush_3039_tds", "rush_4049_tds", "rush_50_tds",
"rush_50_yds", "rush_100_yds", "rush_150_yds", "rush_200_yds", "rush_40_yds",
"rec_rz_tgt", "rec_50_yds", "rec_100_yds", "rec_150_yds", "rec_200_yds", "rec_40_yds"
),
K = c(
"fg_50", "fg_50_att", "fg_49", "fg_49_att", "fg_39", "fg_att_39", "fg", "fg_att", "xp", "xp_att"
),
DST = c(
"dst_fum_recvr", "dst_sacks", "dst_fumbles", "dst_tackles", "dst_td", "dst_yds_against", "dst_pts_against"
) ### ADD IDP VARS
)
Code
bootstrapSimulation <- function(projectedStats, vars_by_pos, n_iter = 10000, seed = NULL) {
dt <- data.table::as.data.table(projectedStats) # using data.table for speed
all_ids <- unique(dt$id)
if (!is.null(seed)) set.seed(seed)
future::plan(future::multisession) # parallelize tasks across multiple background R sessions using multiple cores to speed up simulation
results <- future.apply::future_lapply(
all_ids, # apply the function below to each player using a parallelized loop
function(player_id) {
player_data <- dt[id == player_id]
player_pos <- unique(player_data$pos)
if (length(player_pos) != 1 || !player_pos %in% names(vars_by_pos)) return(NULL)
stat_vars <- vars_by_pos[[player_pos]] # pulls the relevant stat variables to simulate for this player's position
out <- data.table(iteration = seq_len(n_iter), id = player_id, pos = player_pos)
for (var in stat_vars) { # Loop over each stat variable that should be simulated for the player's position
if (var %in% names(player_data)) {
non_na_values <- player_data[[var]][!is.na(player_data[[var]])] # pulls non-missing values of the stat for the player (from all projection sources)
if (length(non_na_values) > 0) {
out[[var]] <- sample(non_na_values, n_iter, replace = TRUE) # If there are valid values, sample with replacement to simulate n_iter values
} else {
out[[var]] <- NA_real_ # specifies a numeric missing value (if all values were missing)
}
} else {
out[[var]] <- NA_real_ # specifies a numeric missing value (if the stat variable doesn't exist)
}
}
return(out)
},
future.seed = TRUE # ensures that each parallel process gets a reproducible random seed
)
data.table::rbindlist(results, use.names = TRUE, fill = TRUE) # combines all the individual player results into one large data table, aligning columns by name; fill = TRUE ensures that missing columns are filled with NA where necessary
}
Code
bootstappedStats$data_src <- bootstappedStats$iteration
bootstappedStatsByPosition <- split(
bootstappedStats,
by = "pos",
keep.by = TRUE)
bootstappedStatsByPosition <- lapply(
bootstappedStatsByPosition,
as_tibble)
attr(bootstappedStatsByPosition, "season") <- 2024
attr(bootstappedStatsByPosition, "week") <- 0
bootstrappedFantasyPoints <- ffanalytics:::source_points(
data_result = bootstappedStatsByPosition,
scoring_rules = ffanalytics::scoring)
bootstrappedFantasyPoints$iteration <- bootstrappedFantasyPoints$data_src
bootstrappedFantasyPoints$data_src <- NULL
bootstrappedFantasyPoints <- bootstrappedFantasyPoints %>%
left_join(
nfl_playerIDs[,c("mfl_id","name","merge_name","team")],
by = c("id" = "mfl_id")
)
bootstrappedFantasyPoints <- bootstrappedFantasyPoints %>%
rename(projectedPoints = raw_points) %>%
arrange(-projectedPoints)
Code
bootstrappedFantasyPoints_summary <- bootstrappedFantasyPoints %>%
group_by(id) %>%
summarise(
mean = mean(projectedPoints, na.rm = TRUE),
SD = sd(projectedPoints, na.rm = TRUE),
min = min(projectedPoints, na.rm = TRUE),
max = max(projectedPoints, na.rm = TRUE),
range = max(projectedPoints, na.rm = TRUE) - min(projectedPoints, na.rm = TRUE),
IQR = IQR(projectedPoints, na.rm = TRUE),
MAD = mad(projectedPoints, na.rm = TRUE),
CV = SD/mean,
median = median(projectedPoints, na.rm = TRUE),
pseudomedian = DescTools::HodgesLehmann(projectedPoints, na.rm = TRUE),
mode = petersenlab::Mode(projectedPoints, multipleModes = "mean"),
skewness = psych::skew(projectedPoints, na.rm = TRUE),
kurtosis = psych::kurtosi(projectedPoints, na.rm = TRUE)
)
bootstrappedFantasyPoints_summary <- bootstrappedFantasyPoints_summary %>%
left_join(
nfl_playerIDs[,c("mfl_id","name","merge_name","team","position")],
by = c("id" = "mfl_id")
) %>%
select(name, team, position, mean:kurtosis, everything()) %>%
arrange(-mean)
Code
Code
Code
Code
Code
An example distribution of projected fantasy points is in Figure 23.1.
Code
ggplot2::ggplot(
data = bootstrappedFantasyPoints %>%
filter(pos == "QB" & name == "Patrick Mahomes"),
mapping = aes(
x = projectedPoints)
) +
geom_histogram(
aes(y = after_stat(density)),
color = "#000000",
fill = "#0099F8"
) +
geom_density(
color = "#000000",
fill = "#F85700",
alpha = 0.6 # add transparency
) +
geom_rug() +
#coord_cartesian(
# xlim = c(0,400)) +
labs(
x = "Fantasy Points",
y = "Density",
title = "Distribution of Projected Fantasy Points for Patrick Mahomes"
) +
theme_classic() +
theme(axis.title.y = element_text(angle = 0, vjust = 0.5)) # horizontal y-axis title
Projections of two players—one with relatively narrow uncertainty and one with relatively wide uncertainty—are depicted in Figure 23.2.
Code
ggplot2::ggplot(
data = bootstrappedFantasyPoints %>%
filter(pos == "QB" & (name %in% c("Dak Prescott", "Drake Maye"))),
mapping = aes(
x = projectedPoints,
group = name,
#color = name,
fill = name)
) +
geom_histogram(
aes(y = after_stat(density))
) +
geom_density(
alpha = 0.6 # add transparency
) +
coord_cartesian(
xlim = c(0,NA),
expand = FALSE) +
#geom_rug() +
labs(
x = "Fantasy Points",
y = "Density",
fill = "",
color = "",
title = "Distribution of Projected Fantasy Points"
) +
theme_classic() +
theme(axis.title.y = element_text(angle = 0, vjust = 0.5)) # horizontal y-axis title
23.3.2 Monte Carlo Simulation
23.4 Conclusion
23.5 Session Info
R version 4.5.0 (2025-04-11)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 24.04.2 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
locale:
[1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
[4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
[7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
time zone: UTC
tzcode source: system (glibc)
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] lubridate_1.9.4 forcats_1.0.0 stringr_1.5.1
[4] dplyr_1.1.4 purrr_1.0.4 readr_2.1.5
[7] tidyr_1.3.1 tibble_3.3.0 ggplot2_3.5.2
[10] tidyverse_2.0.0 future.apply_1.20.0 future_1.58.0
[13] data.table_1.17.4 ffanalytics_3.1.5.0000
loaded via a namespace (and not attached):
[1] DBI_1.2.3 mnormt_2.1.1 gridExtra_2.3 httr2_1.1.2
[5] gld_2.6.7 readxl_1.4.5 rlang_1.1.6 magrittr_2.0.3
[9] e1071_1.7-16 compiler_4.5.0 vctrs_0.6.5 reshape2_1.4.4
[13] quadprog_1.5-8 rvest_1.0.4 pkgconfig_2.0.3 fastmap_1.2.0
[17] backports_1.5.0 labeling_0.4.3 pbivnorm_0.6.0 promises_1.3.3
[21] rmarkdown_2.29 tzdb_0.5.0 haven_2.5.5 ps_1.9.1
[25] xfun_0.52 jsonlite_2.0.0 later_1.4.2 rrapply_1.2.7
[29] psych_2.5.3 parallel_4.5.0 lavaan_0.6-19 cluster_2.1.8.1
[33] DescTools_0.99.60 R6_2.6.1 stringi_1.8.7 RColorBrewer_1.1-3
[37] parallelly_1.45.0 boot_1.3-31 rpart_4.1.24 cellranger_1.1.0
[41] Rcpp_1.0.14 knitr_1.50 base64enc_0.1-3 Matrix_1.7-3
[45] nnet_7.3-20 timechange_0.3.0 tidyselect_1.2.1 rstudioapi_0.17.1
[49] yaml_2.3.10 codetools_0.2-20 websocket_1.4.4 processx_3.8.6
[53] listenv_0.9.1 lattice_0.22-6 plyr_1.8.9 withr_3.0.2
[57] evaluate_1.0.3 foreign_0.8-90 proxy_0.4-27 xml2_1.3.8
[61] pillar_1.10.2 checkmate_2.3.2 stats4_4.5.0 generics_0.1.4
[65] chromote_0.5.1 mix_1.0-13 hms_1.1.3 petersenlab_1.1.5
[69] scales_1.4.0 rootSolve_1.8.2.4 xtable_1.8-4 globals_0.18.0
[73] class_7.3-23 glue_1.8.0 Hmisc_5.2-3 lmom_3.2
[77] tools_4.5.0 Exact_3.3 fs_1.6.6 mvtnorm_1.3-3
[81] grid_4.5.0 mitools_2.4 colorspace_2.1-1 nlme_3.1-168
[85] htmlTable_2.4.3 Formula_1.2-5 cli_3.6.5 rappdirs_0.3.3
[89] expm_1.0-0 viridisLite_0.4.2 gtable_0.3.6 digest_0.6.37
[93] htmlwidgets_1.6.4 farver_2.1.2 htmltools_0.5.8.1 lifecycle_1.0.4
[97] httr_1.4.7 MASS_7.3-65