I need your help!

I want your feedback to make the book better for you and other readers. If you find typos, errors, or places where the text may be improved, please let me know. The best ways to provide feedback are by GitHub or hypothes.is annotations.

You can leave a comment at the bottom of the page/chapter, or open an issue or submit a pull request on GitHub: https://github.com/isaactpetersen/Fantasy-Football-Analytics-Textbook

Hypothesis Alternatively, you can leave an annotation using hypothes.is. To add an annotation, select some text and then click the symbol on the pop-up menu. To see the annotations of others, click the symbol in the upper right-hand corner of the page.

23  Simulation: Bootstrapping and the Monte Carlo Method

23.1 Getting Started

23.1.1 Load Packages

Code
library("ffanalytics")
library("data.table")
library("future")
library("future.apply")
library("tidyverse")

23.1.2 Load Data

Code
load(file = "./data/players_projectedPoints_seasonal.RData")
load(file = "./data/nfl_playerIDs.RData")

23.2 Overview

A simulation is an “imitative representation” of a phenomenon that could exist the real world. In statistics, simulations are computer-driven investigations to better understand a phenomenon by studying its behavior under different conditions. For instance, we might want to determine the likely range of outcomes for a player, in terms of the range of fantasy points that a player might score over the course of a season. Simulations can be conducted in various ways. Two common ways of conducting simulations are via bootstrapping and via Monte Carlo simulation.

23.2.1 Bootstrapping

Bootstrapping involves repeated resampling (with replacement) from observed data. For instance, if we have 100 sources provide projections for a player, we could estimate the most likely range of fantasy points for the player by repeatedly sampling from the 100 projections.

23.2.2 Monte Carlo Simulation

Monte Carlo simulation involves repeated random sampling from a known distribution (Sigal & Chalmers, 2016). For instance, if we know the population distribution for the likely outcomes for a player—e.g., a normal distribution with a known mean (e.g., 150 points) and standard deviation (e.g., 20 points)—we can repeatedly sample randomly from this distribution.

23.3 Simulation of Projected Statistics and Points

23.3.1 Bootstrapping

Code
all_proj <- dplyr::bind_rows(players_projectedPoints_seasonal)
Code
vars_by_pos <- list(
  QB = c(
    "games",
    "pass_att", "pass_comp", "pass_yds", "pass_tds", "pass_int",
    "rush_att", "rush_yds", "rush_tds",
    "fumbles_lost",
    "sacks",
    "pass_09_tds", "pass_1019_tds", "pass_2029_tds", "pass_3039_tds", "pass_4049_tds", "pass_50_tds",  
    "pass_250_yds", "pass_300_yds", "pass_350_yds", "pass_40_yds", "pass_400_yds",
    "rush_100_yds", "rush_150_yds", "rush_200_yds", "rush_40_yds"
    ),
  RB = c(
    "games",
    "rush_att", "rush_yds", "rush_tds",
    "rec_tgt", "rec", "rec_yds", "rec_tds",
    "fumbles_lost",
    "return_yds", "return_tds",
    "rush_09_tds", "rush_1019_tds", "rush_2029_tds", "rush_3039_tds", "rush_4049_tds", "rush_50_tds",
    "rush_50_yds", "rush_100_yds", "rush_150_yds", "rush_200_yds", "rush_40_yds",
    "rec_rz_tgt", "rec_50_yds", "rec_100_yds", "rec_150_yds", "rec_200_yds", "rec_40_yds"
  ),
  WR = c(
    "games",
    "pass_att", "pass_comp", "pass_yds", "pass_tds", "pass_int",
    "rush_att", "rush_yds", "rush_tds",
    "rec_tgt", "rec", "rec_yds", "rec_tds",
    "fumbles_lost",
    "return_yds", "return_tds",
    "rush_09_tds", "rush_1019_tds", "rush_2029_tds", "rush_3039_tds", "rush_4049_tds", "rush_50_tds",
    "rush_50_yds", "rush_100_yds", "rush_150_yds", "rush_200_yds", "rush_40_yds",
    "rec_rz_tgt", "rec_50_yds", "rec_100_yds", "rec_150_yds", "rec_200_yds", "rec_40_yds"
  ),
  TE = c(
    "games",
    "pass_att", "pass_comp", "pass_yds", "pass_tds", "pass_int",
    "rush_att", "rush_yds", "rush_tds",
    "rec_tgt", "rec", "rec_yds", "rec_tds",
    "fumbles_lost",
    "return_yds", "return_tds",
    "rush_09_tds", "rush_1019_tds", "rush_2029_tds", "rush_3039_tds", "rush_4049_tds", "rush_50_tds",
    "rush_50_yds", "rush_100_yds", "rush_150_yds", "rush_200_yds", "rush_40_yds",
    "rec_rz_tgt", "rec_50_yds", "rec_100_yds", "rec_150_yds", "rec_200_yds", "rec_40_yds"
  ),
  K = c(
    "fg_50", "fg_50_att", "fg_49", "fg_49_att", "fg_39", "fg_att_39", "fg", "fg_att", "xp", "xp_att"
  ),
  DST = c(
    "dst_fum_recvr", "dst_sacks", "dst_fumbles", "dst_tackles", "dst_td", "dst_yds_against", "dst_pts_against"
  ) ### ADD IDP VARS
)
Code
bootstrapSimulation <- function(projectedStats, vars_by_pos, n_iter = 10000, seed = NULL) {
  dt <- data.table::as.data.table(projectedStats) # using data.table for speed
  all_ids <- unique(dt$id)
  
  if (!is.null(seed)) set.seed(seed)
  
  future::plan(future::multisession) # parallelize tasks across multiple background R sessions using multiple cores to speed up simulation
  
  results <- future.apply::future_lapply(
    all_ids, # apply the function below to each player using a parallelized loop
    function(player_id) {
      player_data <- dt[id == player_id]
      player_pos <- unique(player_data$pos)
      
      if (length(player_pos) != 1 || !player_pos %in% names(vars_by_pos)) return(NULL)
      
      stat_vars <- vars_by_pos[[player_pos]] # pulls the relevant stat variables to simulate for this player's position
      out <- data.table(iteration = seq_len(n_iter), id = player_id, pos = player_pos)
      
      for (var in stat_vars) { # Loop over each stat variable that should be simulated for the player's position
        if (var %in% names(player_data)) {
          non_na_values <- player_data[[var]][!is.na(player_data[[var]])] # pulls non-missing values of the stat for the player (from all projection sources)
          
          if (length(non_na_values) > 0) {
            out[[var]] <- sample(non_na_values, n_iter, replace = TRUE) # If there are valid values, sample with replacement to simulate n_iter values
          } else {
            out[[var]] <- NA_real_ # specifies a numeric missing value (if all values were missing)
          }
        } else {
          out[[var]] <- NA_real_ # specifies a numeric missing value (if the stat variable doesn't exist)
        }
      }
      
      return(out)
    },
    future.seed = TRUE # ensures that each parallel process gets a reproducible random seed
  )
  
  data.table::rbindlist(results, use.names = TRUE, fill = TRUE) # combines all the individual player results into one large data table, aligning columns by name; fill = TRUE ensures that missing columns are filled with NA where necessary
}
Code
bootstappedStats <- bootstrapSimulation(
  projectedStats = all_proj,
  vars_by_pos = vars_by_pos,
  n_iter = 10000,
  seed = 52242)
Code
bootstappedStats$data_src <- bootstappedStats$iteration

bootstappedStatsByPosition <- split(
  bootstappedStats,
  by = "pos",
  keep.by = TRUE)

bootstappedStatsByPosition <- lapply(
  bootstappedStatsByPosition,
  as_tibble)

attr(bootstappedStatsByPosition, "season") <- 2024
attr(bootstappedStatsByPosition, "week") <- 0

bootstrappedFantasyPoints <- ffanalytics:::source_points(
  data_result = bootstappedStatsByPosition,
  scoring_rules = ffanalytics::scoring)

bootstrappedFantasyPoints$iteration <- bootstrappedFantasyPoints$data_src
bootstrappedFantasyPoints$data_src <- NULL

bootstrappedFantasyPoints <- bootstrappedFantasyPoints %>% 
  left_join(
    nfl_playerIDs[,c("mfl_id","name","merge_name","team")],
    by = c("id" = "mfl_id")
  )

bootstrappedFantasyPoints <- bootstrappedFantasyPoints %>% 
  rename(projectedPoints = raw_points) %>% 
  arrange(-projectedPoints)
Code
bootstrappedFantasyPoints_summary <- bootstrappedFantasyPoints %>% 
  group_by(id) %>% 
  summarise(
    mean = mean(projectedPoints, na.rm = TRUE),
    SD = sd(projectedPoints, na.rm = TRUE),
    min = min(projectedPoints, na.rm = TRUE),
    max = max(projectedPoints, na.rm = TRUE),
    range = max(projectedPoints, na.rm = TRUE) - min(projectedPoints, na.rm = TRUE),
    IQR = IQR(projectedPoints, na.rm = TRUE),
    MAD = mad(projectedPoints, na.rm = TRUE),
    CV = SD/mean,
    median = median(projectedPoints, na.rm = TRUE),
    pseudomedian = DescTools::HodgesLehmann(projectedPoints, na.rm = TRUE),
    mode = petersenlab::Mode(projectedPoints, multipleModes = "mean"),
    skewness = psych::skew(projectedPoints, na.rm = TRUE),
    kurtosis = psych::kurtosi(projectedPoints, na.rm = TRUE)
  )

bootstrappedFantasyPoints_summary <- bootstrappedFantasyPoints_summary %>% 
  left_join(
    nfl_playerIDs[,c("mfl_id","name","merge_name","team","position")],
    by = c("id" = "mfl_id")
  ) %>% 
  select(name, team, position, mean:kurtosis, everything()) %>% 
  arrange(-mean)
Code
bootstrappedFantasyPoints_summary %>% 
  filter(position == "QB") %>% 
  mutate(
    across(
      where(is.numeric),
      \(x) round(x, digits = 2)))
Code
bootstrappedFantasyPoints_summary %>% 
  filter(position == "RB") %>% 
  mutate(
    across(
      where(is.numeric),
      \(x) round(x, digits = 2)))
Code
bootstrappedFantasyPoints_summary %>% 
  filter(position == "WR") %>% 
  mutate(
    across(
      where(is.numeric),
      \(x) round(x, digits = 2)))
Code
bootstrappedFantasyPoints_summary %>% 
  filter(position == "TE") %>% 
  mutate(
    across(
      where(is.numeric),
      \(x) round(x, digits = 2)))
Code
bootstrappedFantasyPoints_summary %>% 
  filter(position == "PK") %>% 
  mutate(
    across(
      where(is.numeric),
      \(x) round(x, digits = 2)))

An example distribution of projected fantasy points is in Figure 23.1.

Code
ggplot2::ggplot(
  data = bootstrappedFantasyPoints %>%
    filter(pos == "QB" & name == "Patrick Mahomes"),
  mapping = aes(
    x = projectedPoints)
) +
  geom_histogram(
    aes(y = after_stat(density)),
    color = "#000000",
    fill = "#0099F8"
  ) +
  geom_density(
    color = "#000000",
    fill = "#F85700",
    alpha = 0.6 # add transparency
  ) +
  geom_rug() +
  #coord_cartesian(
  #  xlim = c(0,400)) +
  labs(
    x = "Fantasy Points",
    y = "Density",
    title = "Distribution of Projected Fantasy Points for Patrick Mahomes"
  ) +
  theme_classic() +
  theme(axis.title.y = element_text(angle = 0, vjust = 0.5)) # horizontal y-axis title
Distribution of Projected Fantasy Points for Patrick Mahomes from Bootstrapping.
Figure 23.1: Distribution of Projected Fantasy Points for Patrick Mahomes from Bootstrapping.

Projections of two players—one with relatively narrow uncertainty and one with relatively wide uncertainty—are depicted in Figure 23.2.

Code
ggplot2::ggplot(
  data = bootstrappedFantasyPoints %>%
    filter(pos == "QB" & (name %in% c("Dak Prescott", "Drake Maye"))),
  mapping = aes(
    x = projectedPoints,
    group = name,
    #color = name,
    fill = name)
) +
  geom_histogram(
    aes(y = after_stat(density))
  ) +
  geom_density(
    alpha = 0.6 # add transparency
  ) +
  coord_cartesian(
    xlim = c(0,NA),
    expand = FALSE) +
  #geom_rug() +
  labs(
    x = "Fantasy Points",
    y = "Density",
    fill = "",
    color = "",
    title = "Distribution of Projected Fantasy Points"
  ) +
  theme_classic() +
  theme(axis.title.y = element_text(angle = 0, vjust = 0.5)) # horizontal y-axis title
Distribution of Projected Fantasy Points for Two Players from Bootstrapping. There is relatively narrow uncertainty around projected fantasy points for Dak Prescott, whereas there is relatively wide uncertainty around the projected fantasy points for Drake Maye.
Figure 23.2: Distribution of Projected Fantasy Points for Two Players from Bootstrapping. There is relatively narrow uncertainty around projected fantasy points for Dak Prescott, whereas there is relatively wide uncertainty around the projected fantasy points for Drake Maye.

23.3.2 Monte Carlo Simulation

23.4 Conclusion

23.5 Session Info

Code
sessionInfo()
R version 4.5.0 (2025-04-11)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 24.04.2 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
 [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
 [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
[10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   

time zone: UTC
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] lubridate_1.9.4        forcats_1.0.0          stringr_1.5.1         
 [4] dplyr_1.1.4            purrr_1.0.4            readr_2.1.5           
 [7] tidyr_1.3.1            tibble_3.3.0           ggplot2_3.5.2         
[10] tidyverse_2.0.0        future.apply_1.20.0    future_1.58.0         
[13] data.table_1.17.4      ffanalytics_3.1.5.0000

loaded via a namespace (and not attached):
 [1] DBI_1.2.3          mnormt_2.1.1       gridExtra_2.3      httr2_1.1.2       
 [5] gld_2.6.7          readxl_1.4.5       rlang_1.1.6        magrittr_2.0.3    
 [9] e1071_1.7-16       compiler_4.5.0     vctrs_0.6.5        reshape2_1.4.4    
[13] quadprog_1.5-8     rvest_1.0.4        pkgconfig_2.0.3    fastmap_1.2.0     
[17] backports_1.5.0    labeling_0.4.3     pbivnorm_0.6.0     promises_1.3.3    
[21] rmarkdown_2.29     tzdb_0.5.0         haven_2.5.5        ps_1.9.1          
[25] xfun_0.52          jsonlite_2.0.0     later_1.4.2        rrapply_1.2.7     
[29] psych_2.5.3        parallel_4.5.0     lavaan_0.6-19      cluster_2.1.8.1   
[33] DescTools_0.99.60  R6_2.6.1           stringi_1.8.7      RColorBrewer_1.1-3
[37] parallelly_1.45.0  boot_1.3-31        rpart_4.1.24       cellranger_1.1.0  
[41] Rcpp_1.0.14        knitr_1.50         base64enc_0.1-3    Matrix_1.7-3      
[45] nnet_7.3-20        timechange_0.3.0   tidyselect_1.2.1   rstudioapi_0.17.1 
[49] yaml_2.3.10        codetools_0.2-20   websocket_1.4.4    processx_3.8.6    
[53] listenv_0.9.1      lattice_0.22-6     plyr_1.8.9         withr_3.0.2       
[57] evaluate_1.0.3     foreign_0.8-90     proxy_0.4-27       xml2_1.3.8        
[61] pillar_1.10.2      checkmate_2.3.2    stats4_4.5.0       generics_0.1.4    
[65] chromote_0.5.1     mix_1.0-13         hms_1.1.3          petersenlab_1.1.5 
[69] scales_1.4.0       rootSolve_1.8.2.4  xtable_1.8-4       globals_0.18.0    
[73] class_7.3-23       glue_1.8.0         Hmisc_5.2-3        lmom_3.2          
[77] tools_4.5.0        Exact_3.3          fs_1.6.6           mvtnorm_1.3-3     
[81] grid_4.5.0         mitools_2.4        colorspace_2.1-1   nlme_3.1-168      
[85] htmlTable_2.4.3    Formula_1.2-5      cli_3.6.5          rappdirs_0.3.3    
[89] expm_1.0-0         viridisLite_0.4.2  gtable_0.3.6       digest_0.6.37     
[93] htmlwidgets_1.6.4  farver_2.1.2       htmltools_0.5.8.1  lifecycle_1.0.4   
[97] httr_1.4.7         MASS_7.3-65       

Feedback

Please consider providing feedback about this textbook, so that I can make it as helpful as possible. You can provide feedback at the following link: https://forms.gle/LsnVKwqmS1VuxWD18

Email Notification

The online version of this book will remain open access. If you want to know when the print version of the book is for sale, enter your email below so I can let you know.