I want your feedback to make the book better for you and other readers. If you find typos, errors, or places where the text may be improved, please let me know. The best ways to provide feedback are by GitHub or hypothes.is annotations.
You can leave a comment at the bottom of the page/chapter, or open an issue or submit a pull request on GitHub: https://github.com/isaactpetersen/Fantasy-Football-Analytics-Textbook
Alternatively, you can leave an annotation using hypothes.is.
To add an annotation, select some text and then click the
symbol on the pop-up menu.
To see the annotations of others, click the
symbol in the upper right-hand corner of the page.
25 Simulation: Bootstrapping and the Monte Carlo Method
25.1 Getting Started
25.1.1 Load Packages
25.1.2 Load Data
25.2 Overview
A simulation is an “imitative representation” of a phenomenon that could exist the real world. In statistics, simulations are computer-driven investigations to better understand a phenomenon by studying its behavior under different conditions. For instance, we might want to determine the likely range of outcomes for a player, in terms of the range of fantasy points that a player might score over the course of a season. Simulations can be conducted in various ways. Two common ways of conducting simulations are via bootstrapping and via Monte Carlo simulation.
25.2.1 Bootstrapping
Bootstrapping involves repeated resampling (with replacement) from observed data. For instance, if we have 100 sources provide projections for a player, we could estimate the most likely range of fantasy points for the player by repeatedly sampling from the 100 projections.
25.2.2 Monte Carlo Simulation
Monte Carlo simulation involves repeated random sampling from a known distribution (Sigal & Chalmers, 2016). For instance, if we know the population distribution for the likely outcomes for a player—e.g., a normal distribution with a known mean (e.g., 150 points) and standard deviation (e.g., 20 points)—we can repeatedly sample randomly from this distribution. The distribution could be, as examples, a normal distribution, a log-normal distribution, a binomial distribution, a chi-square distribution, etc. (Sigal & Chalmers, 2016). The distribution provides a probability density function, which indicates the probability that any particular value would be observed if the data arose from that distribution.
25.3 Simulation of Projected Statistics and Points
Below, we perform bootstrapping and Monte Carlo simulations of projected statistics and points. However, it is worth noting that—as for any simulation—the quality of the results depend on the quality of the inputs. In this case, the quality of the simulation depends on the quality of the projections. If the projections are no good, the simulation results will not be trustworthy. Garbage in, garbage out. As we evaluated in Section 18.12, projections tend to show moderate accuracy for fantasy performance, but they are not highly accurate. Thus, we should treat simulation results arising from fantasy projections with a good dose of skepticism.
25.3.1 Bootstrapping
25.3.1.1 Prepare Data
Code
vars_by_pos <- list(
QB = c(
"games",
"pass_att", "pass_comp", "pass_inc", "pass_yds", "pass_tds", "pass_int",
"rush_att", "rush_yds", "rush_tds",
"fumbles_lost", "fumbles_total", "two_pts",
"sacks",
"pass_09_tds", "pass_1019_tds", "pass_2029_tds", "pass_3039_tds", "pass_4049_tds", "pass_50_tds",
"pass_40_yds", "pass_250_yds", "pass_300_yds", "pass_350_yds", "pass_400_yds",
"rush_40_yds", "rush_50_yds", "rush_100_yds", "rush_150_yds", "rush_200_yds"
),
RB = c(
"games",
"rush_att", "rush_yds", "rush_tds",
"rec_tgt", "rec", "rec_yds", "rec_tds", "rec_rz_tgt",
"fumbles_lost", "fumbles_total", "two_pts",
"return_yds", "return_tds",
"rush_09_tds", "rush_1019_tds", "rush_2029_tds", "rush_3039_tds", "rush_4049_tds", "rush_50_tds",
"rush_40_yds", "rush_50_yds", "rush_100_yds", "rush_150_yds", "rush_200_yds",
"rec_40_yds", "rec_50_yds", "rec_100_yds", "rec_150_yds", "rec_200_yds"
),
WR = c(
"games",
"pass_att", "pass_comp", "pass_inc", "pass_yds", "pass_tds", "pass_int",
"rush_att", "rush_yds", "rush_tds",
"rec_tgt", "rec", "rec_yds", "rec_tds", "rec_rz_tgt",
"fumbles_lost", "fumbles_total", "two_pts",
"return_yds", "return_tds",
"rush_09_tds", "rush_1019_tds", "rush_2029_tds", "rush_3039_tds", "rush_4049_tds", "rush_50_tds",
"rush_40_yds", "rush_50_yds", "rush_100_yds", "rush_150_yds", "rush_200_yds",
"rec_40_yds", "rec_50_yds", "rec_100_yds", "rec_150_yds", "rec_200_yds"
),
TE = c(
"games",
"pass_att", "pass_comp", "pass_inc", "pass_yds", "pass_tds", "pass_int",
"rush_att", "rush_yds", "rush_tds",
"rec_tgt", "rec", "rec_yds", "rec_tds", "rec_rz_tgt",
"fumbles_lost", "fumbles_total", "two_pts",
"return_yds", "return_tds",
"rush_09_tds", "rush_1019_tds", "rush_2029_tds", "rush_3039_tds", "rush_4049_tds", "rush_50_tds",
"rush_40_yds", "rush_50_yds", "rush_100_yds", "rush_150_yds", "rush_200_yds",
"rec_40_yds", "rec_50_yds", "rec_100_yds", "rec_150_yds", "rec_200_yds"
),
K = c(
"fg_0019", "fg_2029", "fg_3039", "fg_4049", "fg_50", "fg_50_att",
"fg_39", "fg_att_39", "fg_49", "fg_49_att",
"fg", "fg_att", "fg_miss", "xp", "xp_att"
),
D = c(
"idp_solo", "idp_asst", "idp_sack", "idp_int", "idp_fum_force", "idp_fum_rec", "idp_pd", "idp_td", "idp_safety"
),
DL = c(
"idp_solo", "idp_asst", "idp_sack", "idp_int", "idp_fum_force", "idp_fum_rec", "idp_pd", "idp_td", "idp_safety"
),
LB = c(
"idp_solo", "idp_asst", "idp_sack", "idp_int", "idp_fum_force", "idp_fum_rec", "idp_pd", "idp_td", "idp_safety"
),
DB = c(
"idp_solo", "idp_asst", "idp_sack", "idp_int", "idp_fum_force", "idp_fum_rec", "idp_pd", "idp_td", "idp_safety"
),
DST = c(
"dst_fum_recvr", "dst_fum_rec", "dst_int", "dst_safety", "dst_sacks", "dst_td", "dst_blk",
"dst_fumbles", "dst_tackles", "dst_yds_against", "dst_pts_against", "dst_pts_allowed", "dst_ret_yds"
)
)
25.3.1.2 Bootstrapping Function
For performing the bootstrapping, we leverage the data.table
(Barrett et al., 2025), future
(Bengtsson, 2025a), and future.apply
(Bengtsson, 2025b) packages for speed (by using parallel processing) and memory efficiency. We use the progressr
(Bengtsson, 2024) package to create a progress bar.
Code
bootstrapSimulation <- function(
projectedStats,
vars_by_pos,
n_iter = 10000,
seed = NULL,
progress = TRUE) {
dt <- data.table::as.data.table(projectedStats) # use data.table for speed
all_ids <- unique(dt$id)
if (!is.null(seed)) set.seed(seed)
future::plan(future::multisession) # parallelize tasks across multiple background R sessions using multiple cores to speed up simulation
if (progress) progressr::handlers("txtprogressbar") # specify progress-bar style
results <- progressr::with_progress({ # wrap in with_progress for progress bar
p <- if (progress) progressr::progressor(along = all_ids) else NULL # create progressor for progress bar
future.apply::future_lapply(
all_ids, # apply the function below to each player using a parallelized loop
function(player_id) {
if (!is.null(p)) p() # advance progress bar
player_data <- dt[id == player_id]
player_pos <- unique(player_data$pos)
if (length(player_pos) != 1 || !player_pos %in% names(vars_by_pos)) return(NULL)
stat_vars <- vars_by_pos[[player_pos]] # pull the relevant stat variables to simulate for this player's position
out <- data.table(iteration = seq_len(n_iter), id = player_id, pos = player_pos)
for (var in stat_vars) { # loop over each stat variable that should be simulated for the player's position
if (var %in% names(player_data)) {
non_na_values <- player_data[[var]][!is.na(player_data[[var]])] # pull non-missing values of the stat for the player (from all projection sources)
if (length(non_na_values) > 0) {
out[[var]] <- sample(non_na_values, n_iter, replace = TRUE) # if there are valid values, sample with replacement to simulate n_iter values
} else {
out[[var]] <- NA_real_ # specify a numeric missing value (if all values were missing)
}
} else {
out[[var]] <- NA_real_ # specify a numeric missing value (if the stat variable doesn't exist)
}
}
return(out)
},
future.seed = TRUE # ensures that each parallel process gets a reproducible random seed
)
})
data.table::rbindlist(results, use.names = TRUE, fill = TRUE) # combines all the individual player results into one large data table, aligning columns by name; fill = TRUE ensures that missing columns are filled with NA where necessary
}
25.3.1.3 Run the Bootstrapping Simulation
25.3.1.4 Score Fantasy Points from the Simulation
Code
bootstrappedFantasyPoints$iteration <- bootstrappedFantasyPoints$data_src
bootstrappedFantasyPoints$data_src <- NULL
bootstrappedFantasyPoints <- bootstrappedFantasyPoints %>%
left_join(
nfl_playerIDs[,c("mfl_id","name","merge_name","team")],
by = c("id" = "mfl_id")
)
bootstrappedFantasyPoints <- bootstrappedFantasyPoints %>%
rename(projectedPoints = raw_points)
25.3.1.5 Summarize Players’ Distribution of Projected Fantasy Points
Code
bootstrappedFantasyPoints_summary <- bootstrappedFantasyPoints %>%
group_by(id) %>%
summarise(
mean = mean(projectedPoints, na.rm = TRUE),
SD = sd(projectedPoints, na.rm = TRUE),
min = min(projectedPoints, na.rm = TRUE),
max = max(projectedPoints, na.rm = TRUE),
q10 = quantile(projectedPoints, .10, na.rm = TRUE), # 10th quantile
q90 = quantile(projectedPoints, .90, na.rm = TRUE), # 90th quantile
range = max(projectedPoints, na.rm = TRUE) - min(projectedPoints, na.rm = TRUE),
IQR = IQR(projectedPoints, na.rm = TRUE),
MAD = mad(projectedPoints, na.rm = TRUE),
CV = SD/mean,
median = median(projectedPoints, na.rm = TRUE),
pseudomedian = DescTools::HodgesLehmann(projectedPoints, na.rm = TRUE),
mode = petersenlab::Mode(projectedPoints, multipleModes = "mean"),
skewness = psych::skew(projectedPoints, na.rm = TRUE),
kurtosis = psych::kurtosi(projectedPoints, na.rm = TRUE)
)
25.3.1.6 View Players’ Distribution of Projected Fantasy Points
Code
Code
Code
Code
Code
Code
Code
Code
An example distribution of projected fantasy points is in Figure 25.1.
Code
ggplot2::ggplot(
data = bootstrappedFantasyPoints %>%
filter(pos == "QB" & name == "Patrick Mahomes"),
mapping = aes(
x = projectedPoints)
) +
geom_histogram(
aes(y = after_stat(density)),
color = "#000000",
fill = "#0099F8"
) +
geom_density(
color = "#000000",
fill = "#F85700",
alpha = 0.6 # add transparency
) +
geom_rug() +
#coord_cartesian(
# xlim = c(0,400)) +
labs(
x = "Fantasy Points",
y = "Density",
title = "Distribution of Projected Fantasy Points for Patrick Mahomes"
) +
theme_classic() +
theme(axis.title.y = element_text(angle = 0, vjust = 0.5)) # horizontal y-axis title
Projections of two players—one with relatively narrow uncertainty and one with relatively wide uncertainty—are depicted in Figure 25.2.
Code
ggplot2::ggplot(
data = bootstrappedFantasyPoints %>%
filter(pos == "QB" & (name %in% c("Dak Prescott", "Drake Maye"))),
mapping = aes(
x = projectedPoints,
group = name,
#color = name,
fill = name)
) +
geom_histogram(
aes(y = after_stat(density))
) +
geom_density(
alpha = 0.6 # add transparency
) +
coord_cartesian(
xlim = c(0,NA),
expand = FALSE) +
#geom_rug() +
labs(
x = "Fantasy Points",
y = "Density",
fill = "",
color = "",
title = "Distribution of Projected Fantasy Points"
) +
theme_classic() +
theme(axis.title.y = element_text(angle = 0, vjust = 0.5)) # horizontal y-axis title
25.3.2 Monte Carlo Simulation
25.3.2.1 SimDesign
Package
You can generate a template for Monte Carlo simulations in the SimDesign
(Chalmers, 2025; Chalmers & Adkins, 2020) package using the following code:
25.3.2.2 Prepare Data
25.3.2.3 Optimal Distribution for Each Player
For each player, we identify the optimal distribution as either a normal distribution , or as a skew-normal distribution. The normal distribution was fit using the fitdist()
function of the fitdistrplus
package (Delignette-Muller & Dutang, 2015; Delignette-Muller et al., 2025). The skew-normal distribution was fit using the selm()
function of the sn
package (A. Azzalini, 2023; A. A. Azzalini, 2023).
Code
# Function to identify the "best" distribution (Normal vs Skew‑Normal) for every player (tries both families and picks by AIC; uses empirical distribution if fewer than 2 unique scores)
fit_best <- function(x) {
# Basic checks
if (length(unique(x)) < 2 || all(is.na(x))) { # Use empirical distribution if there are fewer than 2 unique scores
return(list(type = "empirical", empirical = x))
}
# Try Normal Distribution
fit_norm <- tryCatch(
fitdistrplus::fitdist(x, distr = "norm"),
error = function(e) NULL
)
# Try Skew-Normal Distribution
fit_skew <- tryCatch(
sn::selm(x ~ 1),
error = function(e) NULL
)
# Handle bad fits: sd = NA, etc.
if (!is.null(fit_norm) && any(is.na(fit_norm$estimate))) {
fit_norm <- NULL
}
if (!is.null(fit_skew)) {
pars <- tryCatch(sn::coef(fit_skew, param.type = "dp"), error = function(e) NULL)
if (is.null(pars) || any(is.na(pars))) {
fit_skew <- NULL
}
}
# Choose best available
if (!is.null(fit_norm) && !is.null(fit_skew)) {
aic_norm <- AIC(fit_norm)
aic_skew <- AIC(fit_skew)
if (aic_skew + 2 < aic_norm) { # skew-normal is more complex (has more parameters) than normal distribution, so only select a skew-normal distribution if it fits substantially better than a normal distribution
pars <- sn::coef(fit_skew, param.type = "dp")
return(list(
type = "skewnorm",
xi = pars["dp.location"],
omega = pars["dp.scale"],
alpha = pars["dp.shape"]))
} else {
return(list(
type = "norm",
mean = fit_norm$estimate["mean"],
sd = fit_norm$estimate["sd"]))
}
} else if (!is.null(fit_norm)) {
return(list(
type = "norm",
mean = fit_norm$estimate["mean"],
sd = fit_norm$estimate["sd"]))
} else {
return(list(type = "empirical", empirical = x))
}
}
Code
proj_dists_tbl <- all_proj %>%
filter(!is.na(id) & id != "") %>%
group_by(id) %>%
summarise(
dist_info = list(fit_best(projectedPoints)),
n_proj = n(), # record of how many sources they have
.groups = "drop"
)
proj_dists <- proj_dists_tbl %>%
filter(!is.na(id) & id != "") %>%
distinct(id, .keep_all = TRUE) %>%
(\(x) setNames(x$dist_info, x$id))()
25.3.2.4 SimDesign
Step 1: Design Grid
Now we build the SimDesign
design grid based on the number of projections that each player had.
25.3.2.5 SimDesign
Step 2: Generate
Code
Generate <- function(condition, fixed_objects = NULL) {
dist_info <- fixed_objects$proj_dists[[as.character(condition$id)]]
n_sources <- condition$n_sources
sim_points <- switch(
dist_info$type,
empirical = sample(
dist_info$empirical,
n_sources,
replace = TRUE),
norm = rnorm(
n_sources,
mean = dist_info$mean,
sd = dist_info$sd),
skewnorm = sn::rsn(
n_sources,
xi = dist_info$xi,
omega = dist_info$omega,
alpha = dist_info$alpha),
stop("Unknown distribution type: ", dist_info$type)
)
data.frame(
id = condition$id,
sim_points = sim_points)
}
25.3.2.6 SimDesign
Step 3: Analyze
Code
Analyse <- function(condition, dat, fixed_objects = NULL) {
tibble::tibble(
id = condition$id,
mean_pts = mean(dat$sim_points, na.rm = TRUE),
sd_pts = sd(dat$sim_points, na.rm = TRUE),
q10 = quantile(dat$sim_points, 0.10, na.rm = TRUE),
q90 = quantile(dat$sim_points, 0.90, na.rm = TRUE),
p100 = mean(dat$sim_points >= 100, na.rm = TRUE),
p150 = mean(dat$sim_points >= 150, na.rm = TRUE),
p200 = mean(dat$sim_points >= 200, na.rm = TRUE),
p250 = mean(dat$sim_points >= 250, na.rm = TRUE),
p300 = mean(dat$sim_points >= 300, na.rm = TRUE),
p350 = mean(dat$sim_points >= 350, na.rm = TRUE)
)
}
25.3.2.7 SimDesign
Step 4: Summarize
25.3.2.8 SimDesign
Step 5: Run the Simulation
Now, we can run the model.
Note: the following code that runs the simulation takes a while. If you just want to save time and load the results object instead of running the simulation, you can load the results object of the simulation (which has already been run) using this code:
Code
monteCarloSim_results <- SimDesign::runSimulation(
design = Design,
replications = 1000,
generate = Generate,
analyse = Analyse,
summarise = Summarise,
fixed_objects = list(proj_dists = proj_dists),
seed = genSeeds(Design, iseed = 52242), # for reproducibility
parallel = TRUE # for faster (parallel) processing
)
25.3.2.9 Simulation Results
The pX
variable represent the probability that a player scoring more than X number of points. For example, the p300
variable represents the probability that each player scores more than 300 points. However, it is important to note that this is based on the distribution of projected points.
25.4 Conclusion
25.5 Session Info
R version 4.5.1 (2025-06-13)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 24.04.2 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
locale:
[1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
[4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
[7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
time zone: UTC
tzcode source: system (glibc)
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] lubridate_1.9.4 forcats_1.0.0 stringr_1.5.1
[4] dplyr_1.1.4 purrr_1.0.4 readr_2.1.5
[7] tidyr_1.3.1 tibble_3.3.0 ggplot2_3.5.2
[10] tidyverse_2.0.0 sn_2.1.1 fitdistrplus_1.2-4
[13] survival_3.8-3 MASS_7.3-65 SimDesign_2.19.2
[16] progressr_0.15.1 future.apply_1.20.0 future_1.58.0
[19] data.table_1.17.6 ffanalytics_3.1.5.0000
loaded via a namespace (and not attached):
[1] RColorBrewer_1.1-3 rstudioapi_0.17.1 audio_0.1-11
[4] jsonlite_2.0.0 magrittr_2.0.3 farver_2.1.2
[7] nloptr_2.2.1 rmarkdown_2.29 fs_1.6.6
[10] vctrs_0.6.5 minqa_1.2.8 base64enc_0.1-3
[13] htmltools_0.5.8.1 haven_2.5.5 cellranger_1.1.0
[16] Formula_1.2-5 parallelly_1.45.0 htmlwidgets_1.6.4
[19] plyr_1.8.9 testthat_3.2.3 httr2_1.1.2
[22] rootSolve_1.8.2.4 lifecycle_1.0.4 pkgconfig_2.0.3
[25] Matrix_1.7-3 R6_2.6.1 fastmap_1.2.0
[28] rbibutils_2.3 digest_0.6.37 Exact_3.3
[31] numDeriv_2016.8-1.1 colorspace_2.1-1 ps_1.9.1
[34] Hmisc_5.2-3 labeling_0.4.3 timechange_0.3.0
[37] httr_1.4.7 compiler_4.5.1 proxy_0.4-27
[40] withr_3.0.2 htmlTable_2.4.3 backports_1.5.0
[43] DBI_1.2.3 psych_2.5.6 R.utils_2.13.0
[46] rappdirs_0.3.3 sessioninfo_1.2.3 petersenlab_1.1.6
[49] gld_2.6.7 tools_4.5.1 chromote_0.5.1
[52] pbivnorm_0.6.0 foreign_0.8-90 nnet_7.3-20
[55] R.oo_1.27.1 glue_1.8.0 quadprog_1.5-8
[58] nlme_3.1-168 promises_1.3.3 grid_4.5.1
[61] checkmate_2.3.2 reshape2_1.4.4 cluster_2.1.8.1
[64] generics_0.1.4 gtable_0.3.6 tzdb_0.5.0
[67] R.methodsS3_1.8.2 class_7.3-23 websocket_1.4.4
[70] lmom_3.2 hms_1.1.3 xml2_1.3.8
[73] pillar_1.11.0 later_1.4.2 mitools_2.4
[76] splines_4.5.1 lattice_0.22-7 tidyselect_1.2.1
[79] pbapply_1.7-2 mix_1.0-13 knitr_1.50
[82] reformulas_0.4.1 gridExtra_2.3 xfun_0.52
[85] expm_1.0-0 brio_1.1.5 stringi_1.8.7
[88] yaml_2.3.10 boot_1.3-31 evaluate_1.0.4
[91] codetools_0.2-20 beepr_2.0 cli_3.6.5
[94] rpart_4.1.24 xtable_1.8-4 DescTools_0.99.60
[97] Rdpack_2.6.4 processx_3.8.6 lavaan_0.6-19
[100] Rcpp_1.1.0 readxl_1.4.5 globals_0.18.0
[103] parallel_4.5.1 lme4_1.1-37 listenv_0.9.1
[106] viridisLite_0.4.2 mvtnorm_1.3-3 scales_1.4.0
[109] e1071_1.7-16 rrapply_1.2.7 rlang_1.1.6
[112] rvest_1.0.4 mnormt_2.1.1