I need your help!

I want your feedback to make the book better for you and other readers. If you find typos, errors, or places where the text may be improved, please let me know. The best ways to provide feedback are by GitHub or hypothes.is annotations.

You can leave a comment at the bottom of the page/chapter, or open an issue or submit a pull request on GitHub: https://github.com/isaactpetersen/Fantasy-Football-Analytics-Textbook

Alternatively, you can leave an annotation using hypothes.is. To add an annotation, select some text and then click the symbol on the pop-up menu. To see the annotations of others, click the symbol in the upper right-hand corner of the page.

4 Download and Process NFL Football Data

4.1 Load Packages

Code

library("ffanalytics") # to install: install.packages("remotes"); remotes::install_github("FantasyFootballAnalytics/ffanalytics")
library("petersenlab") # to install: install.packages("remotes"); remotes::install_github("DevPsyLab/petersenlab")
library("nflreadr")
library("nflfastR")
library("nflseedR")
library("nfl4th")
library("nflplotR")
library("progressr")
library("lubridate")
library("tidyverse")

4.2 Data Dictionaries of NFL Data

Data Dictionaries are metadata that describe the meaning of the variables in a dataset. Ho & Carl (2025a) provide Data Dictionaries for the various National Football League (NFL) datasets at the following link: https://nflreadr.nflverse.com/articles/index.html.

4.3 Types of NFL Data

Below, we provide examples for how to download various types of NFL data using the nflreadr (Ho & Carl, 2024) and nflfastR (Carl & Baldwin, 2024) packages. For additional resources, Congelio (2023) provides a helpful introductory text for working with NFL data in R. We save each data file after downloading it, so we can use the data in subsequent chapters. If you have difficulty downloading the data files using the nflreadr (Ho & Carl, 2024) or nflfastR (Carl & Baldwin, 2024) packages, we also saved the data files so they are publicly available on the Open Science Framework: https://osf.io/z6pg4.

This chapter extensively uses merging to process the data for later use. See Section 3.22 for a reminder of how to perform merging, the types of merges, and what you can expect when you merge data objects with different formats. Guidance for how to merge the various NFL-related data files is provided by Sharpe (2020a): https://github.com/nflverse/nfldata/blob/master/DATASETS.md.

4.3.1 Players

Code

nfl_players_raw <- progressr::with_progress(
  nflreadr::load_players())

Code

save(
  nfl_players_raw,
  file = "./data/nfl_players_raw.RData"
)

The nfl_players object is in player form. That is, each row should be uniquely identified by gsis_id. Let’s rearrange the data accordingly by player.

Code

nfl_players <- nfl_players_raw %>% 
  select(gsis_id, everything()) %>% 
  arrange(display_name)

In this case, I arrange the data by player using display_name. You could arrange by gsis_id (or any other variable, for that matter), but gsis_id is fairly unintelligible by itself. For instance, here are the first six players by ID according to gsis_id:

Code

head(nfl_players_raw$gsis_id)

[1] "00-0004866" "00-0032889" "00-0037845" "00-0039793" "00-0030228"
[6] "00-0035676"

Note how it is unclear who these players are until you combine this column with other, more relevant information. Thus, I prefer to sort by a variable that is more interpretable, such as a player’s name. Here are the first six players by name according to display_name:

Code

head(nfl_players_raw$display_name)

[1] "'Omar Ellison"    "A'Shawn Robinson" "A.J. Arcuri"      "A.J. Barner"     
[5] "A.J. Bouye"       "A.J. Brown"

Note how it is more clear, at a glance and without additional information, who these players are.

In data analysis, the variable(s) that are used to sort the dataframe are primarily chosen for aesthetic or usability purposes. Many data analysis approaches do not depend on the order of the rows in the data. However, the sorting of the dataframe may influence data wrangling operations. Thus, when performing data wrangling operations it may be prudent to make sure to re-sort the data to the desired format before performing such operations.

Regardless of which variable(s) are used to sort the dataframe, it is important to know which variable(s) uniquely identify the rows because that will influence approaches to merge the data with other dataframes. For instance, it would be preferable to merge two data objects based on the player ID (gsis_id) rather than the player name (display_name), because multiple players share the same name, which can lead the merge operation not to know which player from the first dataframe goes with which player from the second dataframe when both players have the same name (despite different having different player IDs).

Let’s check for duplicate player instances:

Code

nfl_players %>% 
  group_by(gsis_id) %>% 
  filter(n() > 1) %>% 
  head()

4.3.1.1 Processing

Let’s do some data cleanup:

Code

# Convert missing values to NA
nfl_players[nfl_players == ""] <- NA

# Drop players with missing values for gsis_id
nfl_players <- nfl_players %>% 
  filter(!is.na(gsis_id))

Code

save(
  nfl_players,
  file = "./data/nfl_players.RData"
)

4.3.2 Teams

Code

nfl_teams_raw <- progressr::with_progress(
  nflreadr::load_teams(current = TRUE))

Code

save(
  nfl_teams_raw,
  file = "./data/nfl_teams_raw.RData"
)

The nfl_teams object is in team form. That is, each row should be uniquely identified by team_id. Let’s rearrange the data accordingly:

Code

nfl_teams <- nfl_teams_raw %>% 
  select(team_id, everything())

Let’s check for duplicate team instances:

Code

nfl_teams %>% 
  group_by(team_id) %>% 
  filter(n() > 1) %>% 
  head()

Code

save(
  nfl_teams,
  file = "./data/nfl_teams.RData"
)

4.3.3 Fantasy Player IDs

Ho & Carl (2025h) provide a Data Dictionary for fantasy player ID data at the following link: https://nflreadr.nflverse.com/articles/dictionary_ff_playerids.html

Code

nfl_playerIDs_raw <- progressr::with_progress(
  nflreadr::load_ff_playerids())

Code

save(
  nfl_playerIDs_raw,
  file = "./data/nfl_playerIDs_raw.RData"
)

The nfl_playerIDs object is in player form. That is, each row should be uniquely identified by mfl_id.

Code

nfl_playerIDs <- nfl_playerIDs_raw %>% 
  arrange(name, mfl_id)

Let’s check for duplicate player instances:

Code

nfl_playerIDs %>% 
  group_by(mfl_id) %>% 
  filter(n() > 1) %>% 
  head()

Code

nfl_playerIDs %>% 
  filter(!is.na(gsis_id)) %>% 
  group_by(gsis_id) %>% 
  filter(n() > 1) %>% 
  head()

Let’s do some data processing to help with merging the dataset with other datasets:

Code

nfl_playerIDs$name[which(nfl_playerIDs$name == "Bennett,Michael")] <- "Michael Bennett"

nfl_playerIDs <- nfl_playerIDs %>% 
  mutate(
    merge_name = nflreadr::clean_player_names(name, lowercase = TRUE)
  )

Code

save(
  nfl_playerIDs,
  file = "./data/nfl_playerIDs.RData"
)

4.3.4 Player Info

4.3.5 Rosters

A Data Dictionary for rosters is located at the following links:

Code

nfl_rosters_raw <- progressr::with_progress(
  nflreadr::load_rosters(seasons = TRUE))

nfl_rosters_weekly_raw <- progressr::with_progress(
  nflreadr::load_rosters_weekly(seasons = TRUE))

rosters <- read_csv("https://raw.githubusercontent.com/leesharpe/nfldata/master/data/rosters.csv")

Code

save(
  nfl_rosters_raw,
  rosters,
  file = "./data/nfl_rosters_raw.RData"
)

save(
  nfl_rosters_weekly_raw,
  file = "./data/nfl_rosters_weekly_raw.RData"
)

The nfl_rosters object is in player-season-team form. That is, each row should be uniquely identified by the combination of gsis_id, season, and team. Let’s rearrange the data accordingly:

Code

nfl_rosters <- nfl_rosters_raw %>% 
  left_join(
    rosters %>% select(playerid, season, team, side, category, games, starts, years, av),
    by = c("season","team","pfr_id" = "playerid"),
    na_matches = "never"
  ) %>% 
  select(gsis_id, season, team, week, everything()) %>% 
  arrange(full_name, gsis_id, season, team, week)

Let’s check for duplicate player-season-team instances:

Code

nfl_rosters %>% 
  group_by(gsis_id, season, team) %>% 
  filter(n() > 1) %>% 
  head()

4.3.5.1 Processing

Let’s do some data cleanup:

Code

# Drop players with missing values for gsis_id
nfl_rosters <- nfl_rosters %>% 
  filter(!is.na(gsis_id))

# Fill in missing values for a player in their duplicate instances, and then keep only the first of the duplicate instances
nfl_rosters <- nfl_rosters %>% 
  group_by(gsis_id, season, team) %>% 
  fill(names(.), .direction = "downup") %>% 
  slice_head(n = 1) %>% 
  ungroup()

Let’s check again for duplicate player-season-team instances:

Code

nfl_rosters %>% 
  group_by(gsis_id, season, team) %>% 
  filter(n() > 1) %>% 
  head()

The nfl_rosters_weekly object is in player-season-week form. That is, each row should be uniquely identified by the combination of gsis_id, season, and week. Let’s rearrange the data accordingly:

Code

nfl_rosters_weekly <- nfl_rosters_weekly_raw %>% 
  select(gsis_id, season, week, everything()) %>% 
  arrange(full_name, gsis_id, season, week)

Let’s check for duplicate player-season-week instances:

Code

nfl_rosters_weekly %>% 
  group_by(gsis_id, season, week) %>% 
  filter(n() > 1) %>% 
  head()

Let’s do some data cleanup:

Code

# Drop players with missing values for gsis_id
nfl_rosters_weekly <- nfl_rosters_weekly %>% 
  filter(!is.na(gsis_id))

# Fill in missing values for a player in their duplicate instances, and then keep only the first of the duplicate instances
nfl_rosters_weekly <- nfl_rosters_weekly %>% 
  group_by(gsis_id, season, week) %>% 
  fill(names(.), .direction = "downup") %>% 
  slice_head(n = 1) %>% 
  ungroup()

Let’s check again for duplicate player-season-week instances:

Code

nfl_rosters_weekly %>% 
  group_by(gsis_id, season, week) %>% 
  filter(n() > 1) %>% 
  head()

Code

save(
  nfl_rosters,
  file = "./data/nfl_rosters.RData"
)

save(
  nfl_rosters_weekly,
  file = "./data/nfl_rosters_weekly.RData"
)

4.3.6 Game Schedules

Ho & Carl (2025s) provide a Data Dictionary for game schedules data at the following link: https://nflreadr.nflverse.com/articles/dictionary_schedules.html

Code

nfl_schedules_raw <- progressr::with_progress(
  nflreadr::load_schedules(seasons = TRUE))

Code

save(
  nfl_schedules_raw,
  file = "./data/nfl_schedules_raw.RData"
)

The nfl_schedules object is in game form and in season-week (and -game type) form. That is, each row should be uniquely identified by game_id. Each row should also be uniquely identified by the combination of season and week (and game type).

Code

nfl_schedules <- nfl_schedules_raw

Let’s check for duplicate game instances:

Code

nfl_schedules %>% 
  group_by(game_id) %>% 
  filter(n() > 1) %>% 
  head()

Code

save(
  nfl_schedules,
  file = "./data/nfl_schedules.RData"
)

4.3.7 Standings

Sharpe (2020d) provides a Data Dictionary for standings data at the following link: https://github.com/nflverse/nfldata/blob/master/DATASETS.md#standings

We can determine NFL standings based on the nfl_schedules object using the nflseedR (Carl & Sharpe, 2025) package.

Code

nfl_standings_raw <- nflseedR::nfl_standings(nfl_schedules) #read_csv("http://www.habitatring.com/standings.csv"); archived at: https://perma.cc/5QLN-VK7V

Code

save(
  nfl_standings_raw,
  file = "./data/nfl_standings_raw.RData"
)

The nfl_standings object is in season-team form. That is, each row should be uniquely identified by the combination of season and team.

Code

nfl_standings <- data.frame(nfl_standings_raw)

Let’s check for duplicate season-team instances:

Code

nfl_standings %>% 
  group_by(season, team) %>% 
  filter(n() > 1) %>% 
  head()

Code

save(
  nfl_standings,
  file = "./data/nfl_standings.RData"
)

4.3.8 The Combine

Ho & Carl (2025b) provide a Data Dictionary for data from the combine at the following link: https://nflreadr.nflverse.com/articles/dictionary_combine.html

Code

nfl_combine_raw <- progressr::with_progress(
  nflreadr::load_combine(seasons = TRUE))

Code

save(
  nfl_combine_raw,
  file = "./data/nfl_combine_raw.RData"
)

The nfl_combine object is in player form. That is, each row should be uniquely identified by the player’s id. However, there is no gsis_id variable to merge it easily with other datasets. Some of the players have other id variables, including pfr_id and cfb_id. Let’s rearrange the data accordingly:

Code

nfl_combine <- nfl_combine_raw %>% 
  select(pfr_id, cfb_id, everything()) %>% 
  arrange(season, player_name)

Let’s do some data processing to help with merging the dataset with other datasets:

Code

nfl_combine <- nfl_combine %>% 
  mutate(
    merge_name = nflreadr::clean_player_names(player_name, lowercase = TRUE)
  )

# First, merge on both pfr_id and cfb_id
merged_data1 <- left_join(
  nfl_combine,
  nfl_playerIDs %>% select(pfr_id, cfbref_id, gsis_id),
  by = c("pfr_id", "cfb_id" = "cfbref_id"),
  na_matches = "never"
)

# Second, merge on pfr_id
merged_data2 <- left_join(
  nfl_combine,
  nfl_playerIDs %>% select(pfr_id, merge_name, position, gsis_id),
  by = c("pfr_id","merge_name","pos" = "position"),
  na_matches = "never"
)

# Third, merge on cfb_id
merged_data3 <- left_join(
  nfl_combine,
  nfl_playerIDs %>% select(cfbref_id, merge_name, position, gsis_id),
  by = c("cfb_id" = "cfbref_id","merge_name","pos" = "position"),
  na_matches = "never"
)

# Combine gsis_id across merges
nfl_combine$gsis_id <- coalesce(
  merged_data1$gsis_id,
  merged_data2$gsis_id,
  merged_data3$gsis_id
  )

# Rearrange the data
nfl_combine <- nfl_combine %>% 
  select(gsis_id, pfr_id, cfb_id, player_name, everything()) %>% 
  arrange(season, player_name)

Let’s check for duplicate gsis_id, pfr_id, and cfb_id instances:

Code

nfl_combine %>% 
  group_by(gsis_id) %>% 
  filter(n() > 1, !is.na(gsis_id)) %>% 
  arrange(gsis_id) %>% 
  head()

Code

nfl_combine %>% 
  group_by(pfr_id) %>% 
  filter(n() > 1, !is.na(pfr_id)) %>% 
  arrange(pfr_id) %>% 
  head()

Code

nfl_combine %>% 
  group_by(cfb_id) %>% 
  filter(n() > 1, !is.na(cfb_id)) %>% 
  arrange(cfb_id) %>% 
  head()

Let’s do some additional data processing:

Code

# Drop Stanford Samuels Jr.
nfl_combine$gsis_id[which(nfl_combine$cfb_id == "stanford-samuels-1")] <- NA

nfl_combine <- nfl_combine %>% 
  separate_wider_delim(
    ht,
    names = c("feet", "inches"),
    delim = "-") %>%
  mutate(
    feet = as.numeric(feet),
    inches = as.numeric(inches),
    ht = feet * 12 + inches
  ) %>%
  select(-feet, -inches)

However, these apparent duplicates appear to be different players at different positions:

Code

nfl_combine %>% 
  group_by(season, gsis_id, pos) %>% 
  filter(n() > 1, !is.na(gsis_id)) %>% 
  arrange(gsis_id) %>% 
  head()

Code

nfl_combine %>% 
  group_by(season, pfr_id, pos) %>% 
  filter(n() > 1, !is.na(pfr_id)) %>% 
  arrange(pfr_id) %>% 
  head()

Code

nfl_combine %>% 
  group_by(season, cfb_id, pos) %>% 
  filter(n() > 1, !is.na(cfb_id)) %>% 
  arrange(cfb_id) %>% 
  head()

Code

save(
  nfl_combine,
  file = "./data/nfl_combine.RData"
)

4.3.9 Draft Picks

Ho & Carl (2025e) provide a Data Dictionary for draft picks data at the following link: https://nflreadr.nflverse.com/articles/dictionary_draft_picks.html

Code

nfl_draftPicks_raw <- progressr::with_progress(
  nflreadr::load_draft_picks(seasons = TRUE))

Code

save(
  nfl_draftPicks_raw,
  file = "./data/nfl_draftPicks_raw.RData"
)

The nfl_draftPicks object is in player form. That is, each row should be uniquely identified by gsis_id. Let’s rearrange the data accordingly:

Code

nfl_draftPicks <- nfl_draftPicks_raw %>% 
  select(gsis_id, everything()) %>% 
  arrange(pfr_player_name)

Let’s check for duplicate player instances:

Code

nfl_draftPicks %>% 
  group_by(gsis_id) %>% 
  filter(n() > 1) %>% 
  head()

4.3.9.1 Processing

Let’s do some data cleanup:

Code

# Convert missing values to NA
nfl_draftPicks[nfl_draftPicks == ""] <- NA

# Drop players with missing values for gsis_id
nfl_draftPicks <- nfl_draftPicks %>% 
  filter(!is.na(gsis_id))

Let’s check again for duplicate player instances:

Code

nfl_draftPicks %>% 
  group_by(gsis_id) %>% 
  filter(n() > 1) %>% 
  head()

Code

save(
  nfl_draftPicks,
  file = "./data/nfl_draftPicks.RData"
)

4.3.10 Draft Values

Sharpe (2020b) provides a Data Dictionary for draft values data at the following link: https://github.com/nflverse/nfldata/blob/master/DATASETS.md#draft_values

Code

nfl_draftValues_raw <- read_csv("https://raw.githubusercontent.com/leesharpe/nfldata/master/data/draft_values.csv")

Code

save(
  nfl_draftValues_raw,
  file = "./data/nfl_draftValues_raw.RData"
)

The nfl_draftValues object is in pick form. That is, each row should be uniquely identified by pick. Let’s rearrange the data accordingly:

Code

nfl_draftValues <- nfl_draftValues_raw %>% 
  select(pick, everything()) %>% 
  arrange(pick)

Let’s check for duplicate pick instances:

Code

nfl_draftValues %>% 
  group_by(pick) %>% 
  filter(n() > 1) %>% 
  head()

Code

save(
  nfl_draftValues,
  file = "./data/nfl_draftValues.RData"
)

4.3.11 Depth Charts

Ho & Carl (2025d) provide a Data Dictionary for data from weekly depth charts at the following link: https://nflreadr.nflverse.com/articles/dictionary_depth_charts.html

Code

nfl_depthCharts_raw <- progressr::with_progress(
  nflreadr::load_depth_charts(seasons = TRUE))

Code

save(
  nfl_depthCharts_raw,
  file = "./data/nfl_depthCharts_raw.RData"
)

The nfl_depthCharts object is in player-season-week-position form. That is, each row should be uniquely identified by the combination of gsis_id, season, week, and depth_position. Let’s rearrange the data accordingly:

Code

nfl_depthCharts <- nfl_depthCharts_raw %>% 
  select(gsis_id, season, week, depth_position, everything()) %>% 
  arrange(full_name, gsis_id, season, week, depth_position)

Let’s check for duplicate player-season-week-position instances:

Code

nfl_depthCharts %>% 
  group_by(gsis_id, season, week, depth_position) %>% 
  filter(n() > 1) %>% 
  head()

4.3.11.1 Processing

Let’s do some data cleanup:

Code

# Drop players with missing values for gsis_id
nfl_depthCharts <- nfl_depthCharts %>% 
  filter(!is.na(gsis_id))

# Fill in missing values for a player in their duplicate instances, and then keep only the first of the duplicate instances
nfl_depthCharts <- nfl_depthCharts %>% 
  group_by(gsis_id, season, week, depth_position) %>% 
  fill(names(.), .direction = "downup") %>% 
  slice_head(n = 1) %>% 
  ungroup()

Let’s check again for duplicate player-season-week-position instances:

Code

nfl_depthCharts %>% 
  group_by(gsis_id, season, week, depth_position) %>% 
  filter(n() > 1) %>% 
  head()

Code

save(
  nfl_depthCharts,
  file = "./data/nfl_depthCharts.RData"
)

4.3.12 Trades

Sharpe (2020e) provides a Data Dictionary for NFL trade data at the following link: https://github.com/nflverse/nfldata/blob/master/DATASETS.md#trades

Code

nfl_trades_raw <- read_csv("https://raw.githubusercontent.com/leesharpe/nfldata/master/data/trades.csv")

Code

save(
  nfl_trades_raw,
  file = "./data/nfl_trades_raw.RData"
)

The nfl_trades object is in trade-player form. That is, each row should be uniquely identified by the combination of trade_id and pfr_id.

Code

nfl_trades <- nfl_trades_raw

Let’s check for duplicate trade-player instances:

Code

nfl_trades %>%
  filter(!is.na(pfr_id)) %>% 
  group_by(trade_id, pfr_id) %>% 
  filter(n() > 1) %>% 
  head()

Code

save(
  nfl_trades,
  file = "./data/nfl_trades.RData"
)

4.3.13 Play-By-Play Data

To download play-by-play data from prior weeks and seasons, we can use the load_pbp() function of the nflreadr package (Ho & Carl, 2024). We add a progress bar using the with_progress() function from the progressr package (Bengtsson, 2024) because it takes a while to run. Ho & Carl (2025n) provide a Data Dictionary for the play-by-play data at the following link: https://nflreadr.nflverse.com/articles/dictionary_pbp.html

Note 4.1: Downloading play-by-play data

Note: the following code takes a while to run.

Code

nfl_pbp_raw <- progressr::with_progress(
  nflreadr::load_pbp(seasons = TRUE))

Code

save(
  nfl_pbp_raw,
  file = "./data/nfl_pbp_raw.RData"
)

The nfl_pbp object is in game-drive-play form. That is, each row should be uniquely identified by the combination of game_id, fixed_drive, play_id. Let’s rearrange the data accordingly:

Code

nfl_pbp <- nfl_pbp_raw %>% 
  select(game_id, drive, play_id, everything()) %>% 
  arrange(game_id, drive, play_id)

Let’s check for duplicate game-drive-play instances:

Code

nfl_pbp %>% 
  group_by(game_id, fixed_drive, play_id) %>% 
  filter(n() > 1) %>% 
  head()

Code

save(
  nfl_pbp,
  file = "./data/nfl_pbp.RData"
)

Now, let’s identify the unique seasons of play-by-play data for use in later calculations:

Code

seasons <- unique(nfl_pbp$season)

4.3.14 4th Down Data

We use the nfl4th package (Baldwin, 2023) to download fourth down data.

Note 4.2: Downloading 4th down data

Note: the following code takes a while to run.

Code

nfl_4thdown_raw <- nfl4th::load_4th_pbp(
  seasons = 2014:nflreadr::most_recent_season())

Code

save(
  nfl_4thdown_raw,
  file = "./data/nfl_4thdown_raw.RData"
)

The nfl_4thdown object is in game-drive-play form. That is, each row should be uniquely identified by the combination of game_id, drive, play_id. Let’s rearrange the data accordingly:

Code

nfl_4thdown <- nfl_4thdown_raw %>% 
  select(game_id, drive, play_id, everything()) %>% 
  arrange(game_id, drive, play_id)

Let’s check for duplicate game-drive-play instances:

Code

nfl_4thdown %>% 
  group_by(game_id, drive, play_id) %>% 
  filter(n() > 1) %>% 
  head()

Code

save(
  nfl_4thdown,
  file = "./data/nfl_4thdown.RData"
)

4.3.15 Participation

Ho & Carl (2025m) provide a Data Dictionary for the participation data at the following link: https://nflreadr.nflverse.com/articles/dictionary_participation.html

Code

nfl_participation_raw <- progressr::with_progress(
  nflreadr::load_participation(
    seasons = 2016:2023, # participation data are no longer available after 2023
    include_pbp = TRUE))

Code

save(
  nfl_participation_raw,
  file = "./data/nfl_participation_raw.RData"
)

The nfl_participation object is in game-drive-play form. That is, each row should be uniquely identified by the combination of nflverse_game_id, drive, play_id. Let’s rearrange the data accordingly:

Code

nfl_participation <- nfl_participation_raw %>% 
  select(nflverse_game_id, drive, play_id, everything()) %>% 
  arrange(nflverse_game_id, drive, play_id)

Let’s check for duplicate game-drive-play instances:

Code

nfl_participation %>% 
  group_by(nflverse_game_id, drive, play_id) %>% 
  filter(n() > 1) %>% 
  head()

Code

save(
  nfl_participation,
  file = "./data/nfl_participation.RData"
)

4.3.16 Historical Actual Player Statistics

4.3.16.1 Season-by-Season Statistics

We calculate historical actual season-by-season statistics in Section 4.4.1.1.

4.3.16.2 Career Statistics

We calculate historical actual career statistics in Section 4.4.1.2.

4.3.16.3 Week-by-Week Statistics

We can download historical week-by-week actual player statistics using the load_player_stats() function from the nflreadr package (Ho & Carl, 2024). Ho & Carl (2025p) provide a Data Dictionary for statistics for offensive players at the following link: https://nflreadr.nflverse.com/articles/dictionary_player_stats.html. Ho & Carl (2025q) provide a Data Dictionary for statistics for defensive players at the following link: https://nflreadr.nflverse.com/articles/dictionary_player_stats_def.html.

Code

nfl_actualStats_offense_weekly_raw <- progressr::with_progress(
  nflreadr::load_player_stats(
    seasons = TRUE,
    stat_type = "offense"))

nfl_actualStats_defense_weekly_raw <- progressr::with_progress(
  nflreadr::load_player_stats(
    seasons = TRUE,
    stat_type = "defense"))

nfl_actualStats_kicking_weekly_raw <- progressr::with_progress(
  nflreadr::load_player_stats(
    seasons = TRUE,
    stat_type = "kicking"))

Code

save(
  nfl_actualStats_offense_weekly_raw, nfl_actualStats_defense_weekly_raw, nfl_actualStats_kicking_weekly_raw,
  file = "./data/nfl_actualStats_position_weekly_raw.RData"
)

The nfl_actualStats_weekly objects are in player-season-week form. That is, each row should be uniquely identified by the combination of player_id, season, and week. Let’s rearrange the data accordingly:

Code

nfl_actualStats_offense_weekly <- nfl_actualStats_offense_weekly_raw %>% 
  rename(team = recent_team) %>% 
  select(player_id, season, week, everything()) %>% 
  arrange(player_display_name, player_id, season, week)

nfl_actualStats_defense_weekly <- nfl_actualStats_defense_weekly_raw %>% 
  select(player_id, season, week, everything()) %>% 
  arrange(player_display_name, player_id, season, week)

nfl_actualStats_kicking_weekly <- nfl_actualStats_kicking_weekly_raw %>% 
  select(player_id, season, week, everything()) %>% 
  arrange(player_display_name, player_id, season, week)

Let’s check for duplicate player-season-week instances:

Code

nfl_actualStats_offense_weekly %>% 
  group_by(player_id, season, week) %>% 
  filter(n() > 1) %>% 
  head()

Code

nfl_actualStats_defense_weekly %>% 
  group_by(player_id, season, week) %>% 
  filter(n() > 1) %>% 
  head()

Code

nfl_actualStats_kicking_weekly %>% 
  group_by(player_id, season, week) %>% 
  filter(n() > 1) %>% 
  head()

Code

save(
  nfl_actualStats_offense_weekly, nfl_actualStats_defense_weekly, nfl_actualStats_kicking_weekly,
  file = "./data/nfl_actualStats_position_weekly.RData"
)

4.3.17 Injuries

Ho & Carl (2025k) provide a Data Dictionary for injury data at the following link: https://nflreadr.nflverse.com/articles/dictionary_injuries.html

Code

nfl_injuries_raw <- progressr::with_progress(
  nflreadr::load_injuries(seasons = TRUE))

Code

save(
  nfl_injuries_raw,
  file = "./data/nfl_injuries_raw.RData"
)

The nfl_injuries object is in player-season-week form. That is, each row should be uniquely identified by the combination of gsis_id, season, and week. Let’s rearrange the data accordingly:

Code

nfl_injuries <- nfl_injuries_raw %>% 
  select(gsis_id, season, week, everything()) %>% 
  arrange(full_name, gsis_id, season, week)

Let’s check for duplicate player-season-week instances:

Code

nfl_injuries %>% 
  group_by(gsis_id, season, week) %>% 
  filter(n() > 1) %>% 
  head()

Code

save(
  nfl_injuries,
  file = "./data/nfl_injuries.RData"
)

4.3.18 Snap Counts

Ho & Carl (2025t) provide a Data Dictionary for snap counts data at the following link: https://nflreadr.nflverse.com/articles/dictionary_snap_counts.html

Code

nfl_snapCounts_raw <- progressr::with_progress(
  nflreadr::load_snap_counts(seasons = TRUE))

Code

save(
  nfl_snapCounts_raw,
  file = "./data/nfl_snapCounts_raw.RData"
)

The nfl_snapCounts object is in game-player form. That is, each row should be uniquely identified by the combination of game_id and pfr_player_id. Let’s rearrange the data accordingly:

Code

nfl_snapCounts <- nfl_snapCounts_raw %>% 
  select(game_id, pfr_player_id, everything()) %>% 
  arrange(game_id, pfr_player_id)

Let’s check for duplicate game instances:

Code

nfl_snapCounts %>% 
  group_by(game_id, pfr_player_id) %>% 
  filter(n() > 1) %>% 
  head()

Code

save(
  nfl_snapCounts,
  file = "./data/nfl_snapCounts.RData"
)

4.3.19 ESPN QBR

Ho & Carl (2025f) provide a Data Dictionary for ESPN QBR data at the following link: https://nflreadr.nflverse.com/articles/dictionary_espn_qbr.html

Code

nfl_espnQBR_seasonal_raw <- progressr::with_progress(
  nflreadr::load_espn_qbr(
    seasons = TRUE,
    summary_type = c("season")))

nfl_espnQBR_weekly_raw <- progressr::with_progress(
  nflreadr::load_espn_qbr(
    seasons = TRUE,
    summary_type = c("week")))

Code

save(
  nfl_espnQBR_seasonal_raw,
  file = "./data/nfl_espnQBR_seasonal_raw.RData"
)

save(
  nfl_espnQBR_weekly_raw,
  file = "./data/nfl_espnQBR_weekly_raw.RData"
)

The nfl_espnQBR_seasonal object is in player-season-season type form, where season type refers to regular season versus postseason. That is, each row should be uniquely identified by the combination of player_id, season, and season_type. Let’s rearrange the data accordingly:

Code

nfl_espnQBR_seasonal <- nfl_espnQBR_seasonal_raw %>% 
  select(player_id, season, season_type, everything()) %>% 
  arrange(name_display, player_id, season, season_type)

Let’s check for duplicate player-season-team instances:

Code

nfl_espnQBR_seasonal %>% 
  group_by(player_id, season, season_type) %>% 
  filter(n() > 1) %>% 
  head()

The nfl_espnQBR_weekly object is in both game-player form and player-season-season type-week form, where season type refers to regular season versus postseason. That is, each row should be uniquely identified by the combination of gsis_id, season, and week or by the combination of player_id, season, season_type, and week_num. Let’s rearrange the data accordingly:

Code

nfl_espnQBR_weekly <- nfl_espnQBR_weekly_raw %>% 
  select(player_id, season, season_type, week_num, everything()) %>% 
  arrange(name_display, player_id, season, season_type, week_num)

Let’s check for duplicate game-player or player-season-season type-week instances:

Code

nfl_espnQBR_weekly %>% 
  arrange(game_id, player_id) %>% 
  group_by(game_id, player_id) %>% 
  filter(n() > 1) %>% 
  head()

Code

nfl_espnQBR_weekly %>% 
  arrange(player_id, season, season_type, week_num) %>% 
  group_by(player_id, season, season_type, week_num) %>% 
  filter(n() > 1) %>% 
  head()

Code

save(
  nfl_espnQBR_seasonal,
  file = "./data/nfl_espnQBR_seasonal.RData"
)

save(
  nfl_espnQBR_weekly,
  file = "./data/nfl_espnQBR_weekly.RData"
)

4.3.20 NFL Next Gen Stats

Ho & Carl (2025l) provide a Data Dictionary for NFL Next Gen Stats data at the following link: https://nflreadr.nflverse.com/articles/dictionary_nextgen_stats.html

Code

nfl_nextGenStats_pass_weekly_raw <- progressr::with_progress(
  nflreadr::load_nextgen_stats(
    seasons = TRUE,
    stat_type = c("passing")))

nfl_nextGenStats_rush_weekly_raw <- progressr::with_progress(
  nflreadr::load_nextgen_stats(
    seasons = TRUE,
    stat_type = c("rushing")))

nfl_nextGenStats_rec_weekly_raw <- progressr::with_progress(
  nflreadr::load_nextgen_stats(
    seasons = TRUE,
    stat_type = c("receiving")))

nfl_nextGenStats_weekly_raw <- bind_rows(
  nfl_nextGenStats_pass_weekly_raw,
  nfl_nextGenStats_rush_weekly_raw,
  nfl_nextGenStats_rec_weekly_raw
)

Code

save(
  nfl_nextGenStats_weekly_raw,
  file = "./data/nfl_nextGenStats_weekly_raw.RData"
)

The nfl_nextGenStats_weekly object is in player-season-season type-week form, where season type refers to regular season versus postseason. That is, each row should be uniquely identified by the combination of player_gsis_id, season, season_type, and week. Let’s rearrange the data accordingly:

Code

nfl_nextGenStats_weekly <- nfl_nextGenStats_weekly_raw %>% 
  select(player_gsis_id, season, season_type, week, everything()) %>% 
  arrange(player_display_name, player_gsis_id, season, season_type)

Let’s check for duplicate player-season-season type-week instances:

Code

nfl_nextGenStats_weekly %>% 
  group_by(player_gsis_id, season, season_type, week) %>% 
  filter(n() > 1) %>% 
  head()

Code

save(
  nfl_nextGenStats_weekly,
  file = "./data/nfl_nextGenStats_weekly.RData"
)

4.3.21 Advanced Stats from Pro Football Reference

There is a Data Dictionary for Pro Football Reference passing data at the following links:

https://nflreadr.nflverse.com/articles/dictionary_pfr_passing.html (Ho & Carl, 2025o)
https://www.pro-football-reference.com/about/advanced_stats.htm [Pro Football Reference (2025); archived at https://perma.cc/LKH5-92PQ]

Pro Football Reference (2024) provides advanced stats from the 2024 season at the following link: https://www.pro-football-reference.com/years/2024/advanced.htm

Code

# Seasonal Data
nfl_advancedStatsPFR_pass_seasonal_raw <- progressr::with_progress(
  nflreadr::load_pfr_advstats(
    seasons = TRUE,
    stat_type = c("pass"),
    summary_level = c("season")))

nfl_advancedStatsPFR_rush_seasonal_raw <- progressr::with_progress(
  nflreadr::load_pfr_advstats(
    seasons = TRUE,
    stat_type = c("rush"),
    summary_level = c("season")))

nfl_advancedStatsPFR_rec_seasonal_raw <- progressr::with_progress(
  nflreadr::load_pfr_advstats(
    seasons = TRUE,
    stat_type = c("rec"),
    summary_level = c("season")))

nfl_advancedStatsPFR_def_seasonal_raw <- progressr::with_progress(
  nflreadr::load_pfr_advstats(
    seasons = TRUE,
    stat_type = c("def"),
    summary_level = c("season")))

# Weekly Data
nfl_advancedStatsPFR_pass_weekly_raw <- progressr::with_progress(
  nflreadr::load_pfr_advstats(
    seasons = TRUE,
    stat_type = c("pass"),
    summary_level = c("week")))

nfl_advancedStatsPFR_rush_weekly_raw <- progressr::with_progress(
  nflreadr::load_pfr_advstats(
    seasons = TRUE,
    stat_type = c("rush"),
    summary_level = c("week")))

nfl_advancedStatsPFR_rec_weekly_raw <- progressr::with_progress(
  nflreadr::load_pfr_advstats(
    seasons = TRUE,
    stat_type = c("rec"),
    summary_level = c("week")))

nfl_advancedStatsPFR_def_weekly_raw <- progressr::with_progress(
  nflreadr::load_pfr_advstats(
    seasons = TRUE,
    stat_type = c("def"),
    summary_level = c("week")))

Code

save(
  nfl_advancedStatsPFR_pass_seasonal_raw,
  nfl_advancedStatsPFR_rush_seasonal_raw,
  nfl_advancedStatsPFR_rec_seasonal_raw,
  nfl_advancedStatsPFR_def_seasonal_raw,
  file = "./data/nfl_advancedStatsPFR_seasonal_raw.RData"
)

save(
  nfl_advancedStatsPFR_pass_weekly_raw,
  nfl_advancedStatsPFR_rush_weekly_raw,
  nfl_advancedStatsPFR_rec_weekly_raw,
  nfl_advancedStatsPFR_def_weekly_raw,
  file = "./data/nfl_advancedStatsPFR_weekly_raw.RData"
)

4.3.21.1 Processing

Code

# Clean up player name for merging; name variables based on which data object they're from

## Seasonal Data
nfl_advancedStatsPFR_pass_seasonal <- nfl_advancedStatsPFR_pass_seasonal_raw %>% 
  mutate(
    merge_name = nflreadr::clean_player_names(player, lowercase = TRUE)
  ) %>% 
  rename_with(
    ~ paste0(., ".pass"),
    -c(pfr_id, merge_name, season, team))

nfl_advancedStatsPFR_rush_seasonal <- nfl_advancedStatsPFR_rush_seasonal_raw %>% 
  mutate(
    merge_name = nflreadr::clean_player_names(player, lowercase = TRUE)
  ) %>% 
  rename(
    team = tm
  ) %>% 
  rename_with(
    ~ paste0(., ".rush"),
    -c(pfr_id, merge_name, season, team))

nfl_advancedStatsPFR_rec_seasonal <- nfl_advancedStatsPFR_rec_seasonal_raw %>% 
  mutate(
    merge_name = nflreadr::clean_player_names(player, lowercase = TRUE)
  ) %>% 
  rename(
    team = tm
  ) %>% 
  rename_with(
    ~ paste0(., ".rec"),
    -c(pfr_id, merge_name, season, team))

nfl_advancedStatsPFR_def_seasonal <- nfl_advancedStatsPFR_def_seasonal_raw %>% 
  mutate(
    merge_name = nflreadr::clean_player_names(player, lowercase = TRUE)
  )  %>% 
  rename(
    team = tm
  ) %>% 
  rename_with(
    ~ paste0(., ".def"),
    -c(pfr_id, merge_name, season, team))

## Weekly Data
nfl_advancedStatsPFR_pass_weekly <- nfl_advancedStatsPFR_pass_weekly_raw %>% 
  mutate(
    merge_name = nflreadr::clean_player_names(pfr_player_name, lowercase = TRUE)
  ) %>% 
  rename_with(
    ~ paste0(., ".pass"),
    -c(game_id, pfr_player_id))

nfl_advancedStatsPFR_rush_weekly <- nfl_advancedStatsPFR_rush_weekly_raw %>% 
  mutate(
    merge_name = nflreadr::clean_player_names(pfr_player_name, lowercase = TRUE)
  ) %>% 
  rename_with(
    ~ paste0(., ".rush"),
    -c(game_id, pfr_player_id))

nfl_advancedStatsPFR_rec_weekly <- nfl_advancedStatsPFR_rec_weekly_raw %>% 
  mutate(
    merge_name = nflreadr::clean_player_names(pfr_player_name, lowercase = TRUE)
  ) %>% 
  rename_with(
    ~ paste0(., ".rec"),
    -c(game_id, pfr_player_id))

nfl_advancedStatsPFR_def_weekly <- nfl_advancedStatsPFR_def_weekly_raw %>% 
  mutate(
    merge_name = nflreadr::clean_player_names(pfr_player_name, lowercase = TRUE)
  ) %>% 
  rename_with(
    ~ paste0(., ".def"),
    -c(game_id, pfr_player_id))

## Merge across positions
nfl_advancedStatsPFR_seasonal_list <- list(
  nfl_advancedStatsPFR_pass_seasonal,
  nfl_advancedStatsPFR_rush_seasonal,
  nfl_advancedStatsPFR_rec_seasonal,
  nfl_advancedStatsPFR_def_seasonal)

nfl_advancedStatsPFR_weekly_list <- list(
  nfl_advancedStatsPFR_pass_weekly,
  nfl_advancedStatsPFR_rush_weekly,
  nfl_advancedStatsPFR_rec_weekly,
  nfl_advancedStatsPFR_def_weekly)

nfl_advancedStatsPFR_seasonalByTeam <- nfl_advancedStatsPFR_seasonal_list %>% 
  purrr::reduce(
    full_join,
    by = c("pfr_id","merge_name","season","team"))

nfl_advancedStatsPFR_weekly <- nfl_advancedStatsPFR_weekly_list %>% 
  purrr::reduce(
    full_join,
    by = c("game_id","pfr_player_id")) #merge_name

#nfl_advancedStatsPFR_weekly <- nfl_advancedStatsPFR_weekly_list %>% 
#  purrr::reduce(
#    full_join,
#    by = c("pfr_player_id","merge_name","season","week"))

nfl_advancedStatsPFR_seasonalByTeam <- nfl_advancedStatsPFR_seasonalByTeam %>% 
  mutate(
    pfr_player_name = coalesce(
      player.pass,
      player.rush,
      player.rec,
      player.def
    ),
    age = coalesce(
      #age.pass,
      age.rush,
      age.rec,
      age.def
    ),
    pos = coalesce(
      #pos.pass,
      pos.rush,
      pos.rec,
      pos.def
    ),
    g = coalesce(
      #g.pass,
      g.rush,
      g.rec,
      g.def
    ),
    gs = coalesce(
      #gs.pass,
      gs.rush,
      gs.rec,
      gs.def
    )
  ) %>% 
  select(-c(
    starts_with("player."),
    starts_with("age."),
    starts_with("pos."),
    starts_with("g."),
    starts_with("gs.")))

nfl_advancedStatsPFR_weekly <- nfl_advancedStatsPFR_weekly %>% 
  mutate(
    pfr_player_name = coalesce(
      pfr_player_name.pass,
      pfr_player_name.rush,
      pfr_player_name.rec,
      pfr_player_name.def
    ),
    season = coalesce(
      season.pass,
      season.rush,
      season.rec,
      season.def
    ),
    week = coalesce(
      week.pass,
      week.rush,
      week.rec,
      week.def
    ),
    team = coalesce(
      team.pass,
      team.rush,
      team.rec,
      team.def
    ),
    merge_name = coalesce(
      merge_name.pass,
      merge_name.rush,
      merge_name.rec,
      merge_name.def
    ),
    pfr_game_id = coalesce(
      pfr_game_id.pass,
      pfr_game_id.rush,
      pfr_game_id.rec,
      pfr_game_id.def
    ),
    game_type = coalesce(
      game_type.pass,
      game_type.rush,
      game_type.rec,
      game_type.def
    ),
    opponent = coalesce(
      opponent.pass,
      opponent.rush,
      opponent.rec,
      opponent.def
    )
  ) %>% 
  select(-c(
    starts_with("pfr_player_name."),
    starts_with("season."),
    starts_with("week."),
    starts_with("team."),
    starts_with("merge_name."),
    starts_with("pfr_game_id."),
    starts_with("game_type."),
    starts_with("opponent."),
    ))

The nfl_advancedStatsPFR_seasonalByTeam object is in player-season-team form. That is, each row should be uniquely identified by the combination of pfr_id, season, and team. Let’s rearrange the data accordingly:

Code

nfl_advancedStatsPFR_seasonalByTeam <- nfl_advancedStatsPFR_seasonalByTeam %>% 
  select(pfr_id, season, team, pfr_player_name, everything()) %>% 
  arrange(pfr_player_name, pfr_id, season, team)

Let’s check for duplicate player-season-team instances:

Code

nfl_advancedStatsPFR_seasonalByTeam %>% 
  group_by(pfr_id, season, team) %>% 
  filter(n() > 1) %>% 
  head()

Aggregate variables within each pass/rush/rec/def object by team for seasonal data (so seasonal data are in player-season form, not player-season-team form). Depending on the variable, aggregation was performed using a sum, weighted mean (weighted by the number of games played for each team), or a recomputed percentage.

Code

pfrVars <- nfl_advancedStatsPFR_seasonalByTeam %>% 
  select(pass_attempts.pass:m_tkl_percent.def, g, gs) %>% 
  names()

weightedAverageVars <- c(
  "pocket_time.pass",
  "ybc_att.rush","yac_att.rush",
  "ybc_r.rec","yac_r.rec","adot.rec","rat.rec",
  "yds_cmp.def","yds_tgt.def","dadot.def","m_tkl_percent.def","rat.def"
)

recomputeVars <- c(
  "drop_pct.pass", # drops.pass / pass_attempts.pass
  "bad_throw_pct.pass", # bad_throws.pass / pass_attempts.pass
  "on_tgt_pct.pass", # on_tgt_throws.pass / pass_attempts.pass
  "pressure_pct.pass", # times_pressured.pass / pass_attempts.pass
  "drop_percent.rec", # drop.rec / tgt.rec
  "rec_br.rec", # rec.rec / brk_tkl.rec
  "cmp_percent.def" # cmp.def / tgt.def
)

sumVars <- pfrVars[pfrVars %ni% c(
  weightedAverageVars, recomputeVars,
  "merge_name", "loaded.pass", "loaded.rush", "loaded.rec", "loaded.def")]

nfl_advancedStatsPFR_seasonal <- nfl_advancedStatsPFR_seasonalByTeam %>% 
  group_by(pfr_id, merge_name, season) %>% 
  summarise(
    across(all_of(weightedAverageVars), ~ weighted.mean(.x, w = g, na.rm = TRUE)),
    across(all_of(sumVars), ~ sum(.x, na.rm = TRUE)),
    .groups = "drop") %>% 
  mutate(
    drop_pct.pass = drops.pass / pass_attempts.pass,
    bad_throw_pct.pass = bad_throws.pass / pass_attempts.pass,
    on_tgt_pct.pass = on_tgt_throws.pass / pass_attempts.pass,
    pressure_pct.pass = times_pressured.pass / pass_attempts.pass,
    drop_percent.rec = drop.rec / tgt.rec,
    rec_br.rec = drop.rec / tgt.rec,
    cmp_percent.def = cmp.def / tgt.def
  )

# Merge with other player info
nfl_advancedStatsPFR_seasonalByTeam_1stTeam <- nfl_advancedStatsPFR_seasonalByTeam %>% 
  group_by(pfr_id, season) %>% 
  slice(1)

nfl_advancedStatsPFR_seasonalByTeam_1stTeam_mergeVars <- nfl_advancedStatsPFR_seasonalByTeam_1stTeam %>% 
  select(pfr_id, season, team, pfr_player_name, age, pos) #, g, gs

nfl_advancedStatsPFR_seasonal <- nfl_advancedStatsPFR_seasonal %>% 
  left_join(
    nfl_advancedStatsPFR_seasonalByTeam_1stTeam_mergeVars,
    by = c("pfr_id","season")
  ) %>% 
  select(
    pfr_id, season, pfr_player_name, pos, age, team, g, gs,
    contains(".pass"), contains(".rush"), contains(".rec"), contains(".def"),
    everything())

Let’s check for duplicate player-season instances:

Code

nfl_advancedStatsPFR_seasonal %>% 
  group_by(pfr_id, season) %>% 
  filter(n() > 1) %>% 
  head()

The nfl_advancedStatsPFR_weekly object is in both game-player form and player-season-week form. That is, each row should be uniquely identified by the combination of pfr_player_id, season, and week or by the combination of pfr_player_id, season, game_type, and week. Let’s rearrange the data accordingly:

Code

nfl_advancedStatsPFR_weekly <- nfl_advancedStatsPFR_weekly %>% 
  select(pfr_player_id, season, week, game_type, game_id, pfr_player_name, everything()) %>% 
  arrange(pfr_player_name, pfr_player_id, season, week)

Let’s check for duplicate game-player or player-season-week instances:

Code

nfl_advancedStatsPFR_weekly %>% 
  arrange(game_id, pfr_player_id) %>% 
  group_by(game_id, pfr_player_id) %>% 
  filter(n() > 1) %>% 
  head()

Code

nfl_advancedStatsPFR_weekly %>% 
  arrange(pfr_player_id, season, week) %>% 
  group_by(pfr_player_id, season, week) %>% 
  filter(n() > 1) %>% 
  head()

Merge with gsis_id for merging with other datasets:

Code

# Prepare data for merging
nfl_advancedStatsPFR_weekly <- nfl_advancedStatsPFR_weekly %>% 
  rename(pfr_id = pfr_player_id)

# Identify duplicates
nfl_advancedStatsPFR_seasonal %>% 
  select(pfr_id, season, age) %>% 
  na.omit() %>% 
  unique() %>% 
  group_by(pfr_id, season) %>% 
  filter(n() > 1) %>% 
  arrange(pfr_id, season) %>% 
  head()

Code

# Merge seasonal data with the player IDs
nfl_advancedStatsPFR_seasonal <- left_join(
  nfl_advancedStatsPFR_seasonal,
  nfl_playerIDs %>% 
    filter(!is.na(pfr_id)) %>% 
    filter(gsis_id != "00-0039137") %>% # drop DL Byron Young, keep OLB Byron Young
    select(pfr_id, gsis_id) %>% 
    unique(),
  by = "pfr_id"
)

# Merge weekly data with the player IDs
nfl_advancedStatsPFR_weekly <- left_join(
  nfl_advancedStatsPFR_weekly,
  nfl_playerIDs %>% 
    filter(!is.na(pfr_id)) %>% 
    filter(gsis_id != "00-0039137") %>% # drop DL Byron Young, keep OLB Byron Young
    select(pfr_id, gsis_id) %>% 
    unique(),
  by = "pfr_id"
)

# Remove distinct players who were given the same `pfr_id` (to allow merging)
nfl_advancedStatsPFR_seasonal$gsis_id[which(nfl_advancedStatsPFR_seasonal$gsis_id == "00-0035665" & nfl_advancedStatsPFR_seasonal$pos %in% c("LB","LILB","RILB"))] <- NA # drop LB David Young, keep DB David Young
#nfl_advancedStatsPFR_weekly$gsis_id[which(nfl_advancedStatsPFR_weekly$gsis_id == "00-0035665" & nfl_advancedStatsPFR_weekly$team %in% c("TEN","MIA"))] <- NA # drop LB David Young, keep DB David Young

nfl_advancedStatsPFR_seasonal$gsis_id[which(nfl_advancedStatsPFR_seasonal$gsis_id == "00-0035292" & nfl_advancedStatsPFR_seasonal$pos %in% c("LB","LILB","RILB"))] <- NA # drop LB David Young, keep DB David Young
nfl_advancedStatsPFR_weekly$gsis_id[which(nfl_advancedStatsPFR_weekly$gsis_id == "00-0035292" & nfl_advancedStatsPFR_weekly$team %in% c("TEN","MIA"))] <- NA # drop LB David Young, keep DB David Young

nfl_advancedStatsPFR_seasonal$gsis_id[which(nfl_advancedStatsPFR_seasonal$gsis_id == "00-0033894" & nfl_advancedStatsPFR_seasonal$pos == "DB")] <- NA # drop S Marcus Williams, keep DB David Young
#nfl_advancedStatsPFR_weekly$gsis_id[which(nfl_advancedStatsPFR_weekly$gsis_id == "00-0033894" & nfl_advancedStatsPFR_weekly$pos == "DB")] <- NA # drop S Marcus Williams

nfl_advancedStatsPFR_seasonal$gsis_id[which(nfl_advancedStatsPFR_seasonal$gsis_id == "00-0038407" & nfl_advancedStatsPFR_seasonal$pos == "DB")] <- NA # drop DB Jaylon Jones, keep CB Jaylon Jones
#nfl_advancedStatsPFR_weekly$gsis_id[which(nfl_advancedStatsPFR_weekly$gsis_id == "00-0038407" & nfl_advancedStatsPFR_weekly$pos == "DB")] <- NA # drop DB Jaylon Jones, keep CB Jaylon Jones

nfl_advancedStatsPFR_seasonal$gsis_id[which(nfl_advancedStatsPFR_seasonal$gsis_id == "00-0037106" & nfl_advancedStatsPFR_seasonal$pos == "DB")] <- NA # drop DB Jaylon Jones, keep CB Jaylon Jones
#nfl_advancedStatsPFR_weekly$gsis_id[which(nfl_advancedStatsPFR_weekly$gsis_id == "00-0037106" & nfl_advancedStatsPFR_weekly$pos == "DB")] <- NA # drop DB Jaylon Jones, keep CB Jaylon Jones

nfl_advancedStatsPFR_seasonal$gsis_id[which(nfl_advancedStatsPFR_seasonal$gsis_id == "00-0038549" & nfl_advancedStatsPFR_seasonal$pos == "WR")] <- NA # drop WR DJ TUrner, keep CB DJ Turner
#nfl_advancedStatsPFR_weekly$gsis_id[which(nfl_advancedStatsPFR_weekly$gsis_id == "00-0038549" & nfl_advancedStatsPFR_weekly$pos == "WR")] <- NA # drop WR DJ TUrner, keep CB DJ Turner

Now, each row of the nfl_advancedStatsPFR_seasonal object should be uniquely identified by the combination of gsis_id (or pfr_id), and season. Each row of the nfl_advancedStatsPFR_weekly object should be uniquely identified by the combination of gsis_id (or pfr_id), season, and week or by the combination of gsis_id (or pfr_id), season, game_type, and week.

Let’s check again for duplicate game-player or player-season-week instances:

Code

# Based on gsis_id
nfl_advancedStatsPFR_seasonal %>% 
  select(gsis_id, everything()) %>% 
  filter(!is.na(gsis_id)) %>% 
  group_by(gsis_id, season) %>% 
  filter(n() > 1) %>% 
  head()

Code

nfl_advancedStatsPFR_weekly %>% 
  select(gsis_id, everything()) %>% 
  filter(!is.na(gsis_id)) %>% 
  arrange(game_id, gsis_id) %>% 
  group_by(game_id, gsis_id) %>% 
  filter(n() > 1) %>% 
  head()

Code

nfl_advancedStatsPFR_weekly %>% 
  select(gsis_id, everything()) %>% 
  filter(!is.na(gsis_id)) %>% 
  arrange(gsis_id, season, week) %>% 
  group_by(gsis_id, season, week) %>% 
  filter(n() > 1) %>% 
  head()

Code

# Based on pfr_id
nfl_advancedStatsPFR_seasonal %>% 
  group_by(pfr_id, season) %>% 
  filter(n() > 1) %>% 
  head()

Code

nfl_advancedStatsPFR_weekly %>% 
  arrange(game_id, pfr_id) %>% 
  group_by(game_id, pfr_id) %>% 
  filter(n() > 1) %>% 
  head()

Code

nfl_advancedStatsPFR_weekly %>% 
  arrange(pfr_id, season, week) %>% 
  group_by(pfr_id, season, week) %>% 
  filter(n() > 1) %>% 
  head()

Code

save(
  nfl_advancedStatsPFR_seasonal,
  file = "./data/nfl_advancedStatsPFR_seasonal.RData"
)

save(
  nfl_advancedStatsPFR_weekly,
  file = "./data/nfl_advancedStatsPFR_weekly.RData"
)

4.3.22 Player Contracts

Ho & Carl (2025c) provide a Data Dictionary for player contracts data at the following link: https://nflreadr.nflverse.com/articles/dictionary_contracts.html

Code

nfl_playerContracts_raw <- progressr::with_progress(
  nflreadr::load_contracts())

Code

save(
  nfl_playerContracts_raw,
  file = "./data/nfl_playerContracts_raw.RData"
)

The nfl_playerContracts object is in player-year-team-value form. That is, each row should be uniquely identified by the combination of otc_id, year_signed, team, and value. Let’s rearrange the data accordingly:

Code

nfl_playerContracts <- nfl_playerContracts_raw %>% 
  select(otc_id, year_signed, team, everything()) %>% 
  arrange(player, otc_id, year_signed, team)

Let’s check for duplicate player-year-team-value instances:

Code

nfl_playerContracts %>% 
  group_by(otc_id, year_signed, team, value) %>% 
  filter(n() > 1) %>% 
  head()

Code

save(
  nfl_playerContracts,
  file = "./data/nfl_playerContracts.RData"
)

4.3.23 FTN Charting Data

Ho & Carl (2025j) provide a Data Dictionary for FTN Charting data at the following link: https://nflreadr.nflverse.com/articles/dictionary_ftn_charting.html

Code

nfl_ftnCharting_raw <- progressr::with_progress(
  nflreadr::load_ftn_charting(seasons = TRUE))

Code

save(
  nfl_ftnCharting_raw,
  file = "./data/nfl_ftnCharting_raw.RData"
)

The nfl_ftnCharting object is in game-play form. That is, each row should be uniquely identified by the combination of nflverse_game_id and play_id. Let’s rearrange the data accordingly:

Code

nfl_ftnCharting <- nfl_ftnCharting_raw %>% 
  select(nflverse_game_id, nflverse_play_id, everything()) %>% 
  arrange(nflverse_game_id, nflverse_play_id)

Let’s check for duplicate game-drive-play instances:

Code

nfl_ftnCharting %>% 
  group_by(nflverse_game_id, nflverse_play_id) %>% 
  filter(n() > 1) %>% 
  head()

Code

save(
  nfl_ftnCharting,
  file = "./data/nfl_ftnCharting.RData"
)

4.3.24 FantasyPros Rankings

Ho & Carl (2025i) provide a Data Dictionary for FantasyPros ranking data at the following link: https://nflreadr.nflverse.com/articles/dictionary_ff_rankings.html

Code

#nfl_rankings_raw <- progressr::with_progress( # currently throws error
#  nflreadr::load_ff_rankings(type = "all"))

nfl_rankings_draft_raw <- progressr::with_progress(
  nflreadr::load_ff_rankings(type = "draft"))

nfl_rankings_weekly_raw <- progressr::with_progress(
  nflreadr::load_ff_rankings(type = "week"))

Code

save(
  nfl_rankings_draft_raw,
  file = "./data/nfl_rankings_draft_raw.RData"
)

save(
  nfl_rankings_weekly_raw,
  file = "./data/nfl_rankings_weekly_raw.RData"
)

The nfl_rankings_draft object is in player-page_type form. That is, each row should be uniquely identified by the player’s id. Let’s rearrange the data accordingly:

Code

nfl_rankings_draft <- nfl_rankings_draft_raw %>% 
  select(id, page_type, player, pos, team, everything()) %>% 
  arrange(player, id, pos, page_type)

Let’s check for duplicate player-page_type instances:

Code

nfl_rankings_draft %>% 
  group_by(id, page_type) %>% 
  filter(n() > 1) %>% 
  head()

The nfl_rankings_weekly object is in player-page form. That is, each row should be uniquely identified by fantasypros_id and page. Let’s rearrange the data accordingly:

Code

nfl_rankings_weekly <- nfl_rankings_weekly_raw %>% 
  select(fantasypros_id, page, player_name, pos, team, everything()) %>% 
  arrange(player_name, fantasypros_id, page, pos)

Let’s check for duplicate player-page instances:

Code

nfl_rankings_weekly %>% 
  group_by(fantasypros_id, page) %>% 
  filter(n() > 1) %>% 
  head()

Code

save(
  nfl_rankings_draft,
  file = "./data/nfl_rankings_draft.RData"
)

save(
  nfl_rankings_weekly,
  file = "./data/nfl_rankings_weekly.RData"
)

4.3.25 Expected Fantasy Points

Ho & Carl (2025g) provide a Data Dictionary for expected fantasy points data at the following link: https://nflreadr.nflverse.com/articles/dictionary_ff_opportunity.html

Code

nfl_expectedFantasyPoints_weekly_raw <- progressr::with_progress(
  nflreadr::load_ff_opportunity(
    seasons = TRUE,
    stat_type = "weekly",
    model_version = "latest"
  ))

nfl_expectedFantasyPoints_pass_raw <- progressr::with_progress(
  nflreadr::load_ff_opportunity(
    seasons = TRUE,
    stat_type = "pbp_pass",
    model_version = "latest"
  ))

nfl_expectedFantasyPoints_rush_raw <- progressr::with_progress(
  nflreadr::load_ff_opportunity(
    seasons = TRUE,
    stat_type = "pbp_rush",
    model_version = "latest"
  ))

Code

nfl_expectedFantasyPoints_weekly_raw <- nflreadr::load_ff_opportunity(
  seasons = TRUE,
  stat_type = "weekly",
  model_version = "latest"
  )

nfl_expectedFantasyPoints_pass_raw <- nflreadr::load_ff_opportunity(
  seasons = TRUE,
  stat_type = "pbp_pass",
  model_version = "latest"
  )

nfl_expectedFantasyPoints_rush_raw <- nflreadr::load_ff_opportunity(
  seasons = TRUE,
  stat_type = "pbp_rush",
  model_version = "latest"
  )

Code

save(
  nfl_expectedFantasyPoints_weekly_raw,
  file = "./data/nfl_expectedFantasyPoints_weekly_raw.RData"
)

save(
  nfl_expectedFantasyPoints_pass_raw,
  nfl_expectedFantasyPoints_rush_raw,
  file = "./data/nfl_expectedFantasyPoints_pbp_raw.RData"
)

Code

nfl_expectedFantasyPoints_pbp_list <- list(
  nfl_expectedFantasyPoints_pass_raw,
  nfl_expectedFantasyPoints_rush_raw)

nfl_expectedFantasyPoints_pbp <- full_join(
  nfl_expectedFantasyPoints_pass_raw,
  nfl_expectedFantasyPoints_rush_raw,
  by = c("game_id","fixed_drive","play_id"),
  suffix = c(".pass", ".rush")
)

The nfl_expectedFantasyPoints_weekly object is in game-player form and in player-season-week form. That is, each row should be uniquely identified by the combination of game_id and player_id. Each row should also be uniquely identified by the combination of player_id, season, and week. Let’s rearrange the data accordingly:

Code

nfl_expectedFantasyPoints_weekly <- nfl_expectedFantasyPoints_weekly_raw %>% 
  select(player_id, season, week, game_id, full_name, posteam, position, everything()) %>% 
  arrange(full_name, player_id, season, week)

Let’s check for duplicate game-player instances and player-season-week instances:

Code

nfl_expectedFantasyPoints_weekly %>% 
  group_by(game_id, player_id) %>% 
  filter(n() > 1) %>% 
  head()

Code

nfl_expectedFantasyPoints_weekly %>% 
  group_by(player_id, season, week) %>% 
  filter(n() > 1) %>% 
  head()

4.3.25.1 Processing

Let’s do some data cleanup:

Code

# Drop players with missing values for player_id
nfl_expectedFantasyPoints_weekly <- nfl_expectedFantasyPoints_weekly %>% 
  filter(!is.na(player_id))

Let’s check again for duplicate game-player instances and season-week-player instances:

Code

nfl_expectedFantasyPoints_weekly %>% 
  group_by(game_id, player_id) %>% 
  filter(n() > 1) %>% 
  head()

Code

nfl_expectedFantasyPoints_weekly %>% 
  group_by(player_id, season, week) %>% 
  filter(n() > 1) %>% 
  head()

The nfl_expectedFantasyPoints_pbp object is in game-drive-play form. That is, each row should be uniquely identified by the combination of game_id, fixed_drive, and play_id. Let’s rearrange the data accordingly:

Code

nfl_expectedFantasyPoints_pbp <- nfl_expectedFantasyPoints_pbp %>% 
  select(game_id, fixed_drive, play_id, everything()) %>% 
  arrange(game_id, fixed_drive, play_id)

Let’s check for duplicate game-player instances and season-week-player instances:

Code

nfl_expectedFantasyPoints_pbp %>% 
  group_by(game_id, fixed_drive, play_id) %>% 
  filter(n() > 1) %>% 
  head()

Code

save(
  nfl_expectedFantasyPoints_weekly,
  file = "./data/nfl_expectedFantasyPoints_weekly.RData"
)

save(
  nfl_expectedFantasyPoints_pbp,
  file = "./data/nfl_expectedFantasyPoints_pbp.RData"
)

4.3.26 Fantasy Football Projections

4.3.26.1 Download Players’ Projections

Note 4.3: Downloading players’ seasonal projections

Note: the following code takes a while to run.

Code

# Seasonal Projections
players_projections_seasonal_raw <- ffanalytics::scrape_data(
  season = NULL, # NULL grabs the current season
  week = 0) # 0 grabs seasonal projections

# Weekly Projections
players_projections_weekly_raw <- ffanalytics::scrape_data(
  season = NULL, # NULL grabs the current season
  week = NULL) # NULL grabs the current week

Code

save(
  players_projections_seasonal_raw,
  file = "./data/players_projections_seasonal_raw.RData"
)

save(
  players_projections_weekly_raw,
  file = "./data/players_projections_weekly_raw.RData"
)

The data file is saved in the project repository and can be loaded using the following command:

Code

load(file = "./data/players_projections_seasonal_raw.RData")
load(file = "./data/players_projections_weekly_raw.RData")

The players_projections_seasonal_raw and players_projections_weekly_raw object is in player-position-projection source form. That is, each row should be uniquely identified by the combination of id, pos and data_src. Each row should also be uniquely identified by the combination of player, pos, and data_src.

Let’s check for duplicate player-position-projection source instances:

Code

players_projections_seasonal_raw %>% 
  bind_rows() %>% 
  select(id, pos, data_src, everything()) %>%
  arrange(player, id, pos, data_src) %>% 
  group_by(id, pos, data_src) %>% 
  filter(n() > 1, !is.na(id)) %>% 
  head()

Code

players_projections_seasonal_raw %>% 
  bind_rows() %>% 
  select(player, pos, data_src, everything()) %>%
  arrange(player, id, pos, data_src) %>% 
  group_by(id, pos, data_src) %>% 
  filter(n() > 1, !is.na(id)) %>% 
  head()

Code

players_projections_weekly_raw %>% 
  bind_rows() %>% 
  select(id, pos, data_src, everything()) %>%
  arrange(player, id, pos, data_src) %>% 
  group_by(id, pos, data_src) %>% 
  filter(n() > 1, !is.na(id)) %>% 
  head()

Code

players_projections_weekly_raw %>% 
  bind_rows() %>% 
  select(player, pos, data_src, everything()) %>%
  arrange(player, id, pos, data_src) %>% 
  group_by(id, pos, data_src) %>% 
  filter(n() > 1, !is.na(id)) %>% 
  head()

4.3.26.2 Specify League Scoring Settings

First, create a scoring object using the default scoring object:

Code

scoring_obj_default <- ffanalytics::scoring

View the default scoring settings:

Code

scoring_obj_default

$pass
$pass$pass_att
[1] 0

$pass$pass_comp
[1] 0

$pass$pass_inc
[1] 0

$pass$pass_yds
[1] 0.04

$pass$pass_tds
[1] 4

$pass$pass_int
[1] -3

$pass$pass_40_yds
[1] 0

$pass$pass_300_yds
[1] 0

$pass$pass_350_yds
[1] 0

$pass$pass_400_yds
[1] 0


$rush
$rush$all_pos
[1] TRUE

$rush$rush_yds
[1] 0.1

$rush$rush_att
[1] 0

$rush$rush_40_yds
[1] 0

$rush$rush_tds
[1] 6

$rush$rush_100_yds
[1] 0

$rush$rush_150_yds
[1] 0

$rush$rush_200_yds
[1] 0


$rec
$rec$all_pos
[1] TRUE

$rec$rec
[1] 0

$rec$rec_yds
[1] 0.1

$rec$rec_tds
[1] 6

$rec$rec_40_yds
[1] 0

$rec$rec_100_yds
[1] 0

$rec$rec_150_yds
[1] 0

$rec$rec_200_yds
[1] 0


$misc
$misc$all_pos
[1] TRUE

$misc$fumbles_lost
[1] -3

$misc$fumbles_total
[1] 0

$misc$sacks
[1] 0

$misc$two_pts
[1] 2


$kick
$kick$xp
[1] 1

$kick$fg_0019
[1] 3

$kick$fg_2029
[1] 3

$kick$fg_3039
[1] 3

$kick$fg_4049
[1] 4

$kick$fg_50
[1] 5

$kick$fg_miss
[1] 0


$ret
$ret$all_pos
[1] TRUE

$ret$return_tds
[1] 6

$ret$return_yds
[1] 0


$idp
$idp$all_pos
[1] TRUE

$idp$idp_solo
[1] 1

$idp$idp_asst
[1] 0.5

$idp$idp_sack
[1] 2

$idp$idp_int
[1] 3

$idp$idp_fum_force
[1] 3

$idp$idp_fum_rec
[1] 2

$idp$idp_pd
[1] 1

$idp$idp_td
[1] 6

$idp$idp_safety
[1] 2


$dst
$dst$dst_fum_rec
[1] 2

$dst$dst_int
[1] 2

$dst$dst_safety
[1] 2

$dst$dst_sacks
[1] 1

$dst$dst_td
[1] 6

$dst$dst_blk
[1] 1.5

$dst$dst_ret_yds
[1] 0

$dst$dst_pts_allowed
[1] 0


$pts_bracket
$pts_bracket[[1]]
$pts_bracket[[1]]$threshold
[1] 0

$pts_bracket[[1]]$points
[1] 10


$pts_bracket[[2]]
$pts_bracket[[2]]$threshold
[1] 6

$pts_bracket[[2]]$points
[1] 7


$pts_bracket[[3]]
$pts_bracket[[3]]$threshold
[1] 20

$pts_bracket[[3]]$points
[1] 4


$pts_bracket[[4]]
$pts_bracket[[4]]$threshold
[1] 34

$pts_bracket[[4]]$points
[1] 0


$pts_bracket[[5]]
$pts_bracket[[5]]$threshold
[1] 99

$pts_bracket[[5]]$points
[1] -4

Now, modify the scoring settings to match your league settings. Below, we use the scoring settings for fantasy leagues on NFL.com, which happen to be point-per-reception leagues (i.e., PPR leagues):

Code

scoring_obj <- scoring_obj_default

# Offense
scoring_obj$pass$pass_int <- -2
scoring_obj$rec$rec <- 1
scoring_obj$misc$fumbles_lost <- -2

# Kickers
scoring_obj$kick$fg_4049 <- 3

# Defense/Special Teams
scoring_obj$pts_bracket <- list(
  list(threshold = 0, points = 10),
  list(threshold = 6, points = 7),
  list(threshold = 13, points = 4),
  list(threshold = 20, points = 1),
  list(threshold = 27, points = 0),
  list(threshold = 34, points = -1),
  list(threshold = 99, points = -4)
)

View our scoring settings:

Code

scoring_obj

$pass
$pass$pass_att
[1] 0

$pass$pass_comp
[1] 0

$pass$pass_inc
[1] 0

$pass$pass_yds
[1] 0.04

$pass$pass_tds
[1] 4

$pass$pass_int
[1] -2

$pass$pass_40_yds
[1] 0

$pass$pass_300_yds
[1] 0

$pass$pass_350_yds
[1] 0

$pass$pass_400_yds
[1] 0


$rush
$rush$all_pos
[1] TRUE

$rush$rush_yds
[1] 0.1

$rush$rush_att
[1] 0

$rush$rush_40_yds
[1] 0

$rush$rush_tds
[1] 6

$rush$rush_100_yds
[1] 0

$rush$rush_150_yds
[1] 0

$rush$rush_200_yds
[1] 0


$rec
$rec$all_pos
[1] TRUE

$rec$rec
[1] 1

$rec$rec_yds
[1] 0.1

$rec$rec_tds
[1] 6

$rec$rec_40_yds
[1] 0

$rec$rec_100_yds
[1] 0

$rec$rec_150_yds
[1] 0

$rec$rec_200_yds
[1] 0


$misc
$misc$all_pos
[1] TRUE

$misc$fumbles_lost
[1] -2

$misc$fumbles_total
[1] 0

$misc$sacks
[1] 0

$misc$two_pts
[1] 2


$kick
$kick$xp
[1] 1

$kick$fg_0019
[1] 3

$kick$fg_2029
[1] 3

$kick$fg_3039
[1] 3

$kick$fg_4049
[1] 3

$kick$fg_50
[1] 5

$kick$fg_miss
[1] 0


$ret
$ret$all_pos
[1] TRUE

$ret$return_tds
[1] 6

$ret$return_yds
[1] 0


$idp
$idp$all_pos
[1] TRUE

$idp$idp_solo
[1] 1

$idp$idp_asst
[1] 0.5

$idp$idp_sack
[1] 2

$idp$idp_int
[1] 3

$idp$idp_fum_force
[1] 3

$idp$idp_fum_rec
[1] 2

$idp$idp_pd
[1] 1

$idp$idp_td
[1] 6

$idp$idp_safety
[1] 2


$dst
$dst$dst_fum_rec
[1] 2

$dst$dst_int
[1] 2

$dst$dst_safety
[1] 2

$dst$dst_sacks
[1] 1

$dst$dst_td
[1] 6

$dst$dst_blk
[1] 1.5

$dst$dst_ret_yds
[1] 0

$dst$dst_pts_allowed
[1] 0


$pts_bracket
$pts_bracket[[1]]
$pts_bracket[[1]]$threshold
[1] 0

$pts_bracket[[1]]$points
[1] 10


$pts_bracket[[2]]
$pts_bracket[[2]]$threshold
[1] 6

$pts_bracket[[2]]$points
[1] 7


$pts_bracket[[3]]
$pts_bracket[[3]]$threshold
[1] 13

$pts_bracket[[3]]$points
[1] 4


$pts_bracket[[4]]
$pts_bracket[[4]]$threshold
[1] 20

$pts_bracket[[4]]$points
[1] 1


$pts_bracket[[5]]
$pts_bracket[[5]]$threshold
[1] 27

$pts_bracket[[5]]$points
[1] 0


$pts_bracket[[6]]
$pts_bracket[[6]]$threshold
[1] 34

$pts_bracket[[6]]$points
[1] -1


$pts_bracket[[7]]
$pts_bracket[[7]]$threshold
[1] 99

$pts_bracket[[7]]$points
[1] -4

4.3.26.3 Calculate Projected Points

Calculate projected points by source:

Code

players_projectedPoints_seasonal <- ffanalytics:::impute_and_score_sources(
  data_result = players_projections_seasonal_raw,
  scoring_rules = scoring_obj)

players_projectedPoints_weekly <- ffanalytics:::impute_and_score_sources(
  data_result = players_projections_weekly_raw,
  scoring_rules = scoring_obj)

Calculate projected statistics and points, averaged across sources:

Code

# Seasonal Projections
players_projectedStatsAverage_seasonal <- ffanalytics::projections_table(
  players_projections_seasonal_raw,
  scoring_rules = scoring_obj,
  return_raw_stats = TRUE)

players_projectedPointsAverage_seasonal <- ffanalytics::projections_table(
  players_projections_seasonal_raw,
  scoring_rules = scoring_obj,
  return_raw_stats = FALSE)

# Weekly Projections
players_projectedStatsAverage_weekly <- ffanalytics::projections_table(
  players_projections_weekly_raw,
  scoring_rules = scoring_obj,
  return_raw_stats = TRUE)

players_projectedPointsAverage_weekly <- ffanalytics::projections_table(
  players_projections_weekly_raw,
  scoring_rules = scoring_obj,
  return_raw_stats = FALSE)

The players_projectedPoints_seasonal, players_projectedStatsAverage_seasonal, players_projectedPointsAverage_seasonal, players_projectedPoints_weekly, players_projectedStatsAverage_weekly, and players_projectedPointsAverage_weekly objects are in player-average type-position form. That is, each row should be uniquely identified by the combination of id, avg_type and pos (or position).

Let’s check for duplicate player-position-projection source instances:

Code

players_projectedPoints_seasonal %>% 
  bind_rows() %>% 
  select(id, pos, data_src, everything()) %>%
  arrange(player, id, pos, data_src) %>% 
  group_by(id, pos, data_src) %>% 
  filter(n() > 1, !is.na(id)) %>% 
  head()

Code

players_projectedStatsAverage_seasonal %>% 
  group_by(id, avg_type, position) %>% 
  filter(n() > 1) %>% 
  head()

Code

players_projectedPointsAverage_seasonal %>% 
  group_by(id, avg_type, pos) %>% 
  filter(n() > 1) %>% 
  head()

Code

players_projectedPoints_weekly %>% 
  bind_rows() %>% 
  select(id, pos, data_src, everything()) %>%
  arrange(player, id, pos, data_src) %>% 
  group_by(id, pos, data_src) %>% 
  filter(n() > 1, !is.na(id)) %>% 
  head()

Code

players_projectedStatsAverage_weekly %>% 
  group_by(id, avg_type, position) %>% 
  filter(n() > 1) %>% 
  head()

Code

players_projectedPointsAverage_weekly %>% 
  group_by(id, avg_type, pos) %>% 
  filter(n() > 1) %>% 
  head()

Let’s merge the two averaged projected statistics and points objects:

Code

players_projections_seasonal_average <- full_join(
  players_projectedPointsAverage_seasonal,
  players_projectedStatsAverage_seasonal,
  by = c("id","avg_type","pos" = "position")
)

players_projections_weekly_average <- full_join(
  players_projectedPointsAverage_weekly,
  players_projectedStatsAverage_weekly,
  by = c("id","avg_type","pos" = "position")
)

Let’s again check for duplicate player-position-projection source instances:

Code

players_projections_seasonal_average %>% 
  group_by(id, avg_type, pos) %>% 
  filter(n() > 1) %>% 
  head()

Code

players_projections_weekly_average %>% 
  group_by(id, avg_type, pos) %>% 
  filter(n() > 1) %>% 
  head()

4.3.26.4 Add Additional Player Information

Code

players_projections_seasonal_average <- players_projections_seasonal_average %>% 
  add_ecr() %>% 
  add_adp() %>% 
  add_aav() %>%
  add_uncertainty() %>% 
  add_player_info()

players_projections_weekly_average <- players_projections_weekly_average %>% 
  add_ecr() %>% 
  #add_uncertainty() %>% # currently throws an error
  add_player_info()

Code

save(
  players_projectedPoints_seasonal, players_projections_seasonal_average,
  file = "./data/players_projectedPoints_seasonal.RData"
)

save(
  players_projectedPoints_weekly, players_projections_weekly_average,
  file = "./data/players_projectedPoints_weekly.RData"
)

The data file is saved in the project repository and can be loaded using the following command:

Code

load(file = "./data/players_projectedPoints_seasonal.RData")
load(file = "./data/players_projectedPoints_weekly.RData")

4.4 Calculations

4.4.1 Historical Actual Player Statistics

In addition to week-by-week actual player statistics, we can also compute historical actual player statistics as a function of different timeframes, including season-by-season and career statistics.

4.4.1.1 Season-by-Season Statistics

First, we can compute the players’ season-by-season statistics using the nflfastR (Carl & Baldwin, 2024) package.

TODO: Save/update data file in repo with the data generated from this code, and follow the data file through the rest of the book, updating code as necessary.

Note 4.4: Downloading season-by-season statistics

Note: the following code takes a while to run.

Code

nfl_actualStats_seasonal_player_raw <- nflfastR::calculate_stats(
  seasons = TRUE,
  summary_level = "season",
  season_type = "REG")

nfl_actualStats_seasonal_team_raw <- nflfastR::calculate_stats(
  seasons = TRUE,
  summary_level = "season",
  stat_type = "team",
  season_type = "REG")

nfl_actualStats_seasonal_player_inclPost_raw <- nflfastR::calculate_stats(
  seasons = TRUE,
  summary_level = "season",
  season_type = "REG+POST")

nfl_actualStats_seasonal_team_inclPost_raw <- nflfastR::calculate_stats(
  seasons = TRUE,
  summary_level = "season",
  stat_type = "team",
  season_type = "REG+POST")

Code

save(
  nfl_actualStats_seasonal_player_raw, nfl_actualStats_seasonal_team_raw,
  nfl_actualStats_seasonal_player_inclPost_raw, nfl_actualStats_seasonal_team_inclPost_raw,
  file = "./data/nfl_actualStats_seasonal_raw.RData"
)

A Data Dictionary for the variables is available in the nfl_stats_variables object that is returned when running the calculate_stats() function:

Code

nfl_stats_variables

The nfl_actualStats_seasonal_player_raw object is in player-season form. That is, each row should be uniquely identified by the combination of player_id and season. The nfl_actualStats_seasonal_team_raw object is in team-season form. That is, each row should be uniquely identified by the combination of team and season. Let’s rearrange the data accordingly:

Code

nfl_actualStats_seasonal_player <- nfl_actualStats_seasonal_player_raw %>% 
  rename(team = recent_team) %>% 
  select(player_id, season, everything()) %>% 
  arrange(player_display_name, player_id, season)

nfl_actualStats_seasonal_team <- nfl_actualStats_seasonal_team_raw %>% 
  select(season, team, everything()) %>% 
  arrange(season, team)

nfl_actualStats_seasonal_player_inclPost <- nfl_actualStats_seasonal_player_inclPost_raw %>% 
  rename(team = recent_team) %>% 
  select(player_id, season, everything()) %>% 
  arrange(player_display_name, player_id, season)

nfl_actualStats_seasonal_team_inclPost <- nfl_actualStats_seasonal_team_inclPost_raw %>% 
  select(season, team, everything()) %>% 
  arrange(season, team)

Let’s check for duplicate player instances:

Code

nfl_actualStats_seasonal_player %>% 
  group_by(player_id, season) %>% 
  filter(n() > 1) %>% 
  head()

Code

nfl_actualStats_seasonal_team %>% 
  group_by(team, season) %>% 
  filter(n() > 1) %>% 
  head()

Code

nfl_actualStats_seasonal_player_inclPost %>% 
  group_by(player_id, season) %>% 
  filter(n() > 1) %>% 
  head()

Code

nfl_actualStats_seasonal_team_inclPost %>% 
  group_by(team, season) %>% 
  filter(n() > 1) %>% 
  head()

Let’s remove infinite (implausible) values:

Code

nfl_actualStats_seasonal_player[sapply(nfl_actualStats_seasonal_player, is.infinite)] <- NA
nfl_actualStats_seasonal_team[sapply(nfl_actualStats_seasonal_team, is.infinite)] <- NA
nfl_actualStats_seasonal_player_inclPost[sapply(nfl_actualStats_seasonal_player_inclPost, is.infinite)] <- NA
nfl_actualStats_seasonal_team_inclPost[sapply(nfl_actualStats_seasonal_team_inclPost, is.infinite)] <- NA

4.4.1.2 Career Statistics

Second, we can compute the players’ career statistics using the calculate_stats() function from the nflfastR package (Carl & Baldwin, 2024).

Code

# Compute Player Stats by Career
playerVars <- nfl_actualStats_seasonal_player %>% 
  select(games:fantasy_points_ppr) %>% 
  names()

weightedAverageVars_byGames <- c(
  "passing_epa","rushing_epa","receiving_epa",
  "target_share","air_yards_share"
)

weightedAverageVars_byAttempts <- c(
  "passing_cpoe"
)

weightedAverageVars_byTargets <- c(
  "wopr"
)

concatenateVars <- c(
  "fg_made_list","fg_missed_list","fg_blocked_list","gwfg_distance_list"
)

recomputeVars <- c(
  "pacr", # passing_yards / passing_air_yards
  "racr", # receiving_yards / receiving_air_yards
  "fg_pct", # fg_made / fg_att
  "pat_pct" # pat_made / pat_att
)

sumVars <- playerVars[playerVars %ni% c(
  weightedAverageVars_byGames, weightedAverageVars_byAttempts, weightedAverageVars_byTargets, concatenateVars, recomputeVars)]

nfl_actualStats_career_player <- nfl_actualStats_seasonal_player %>% 
  group_by(player_id) %>% 
  summarise(
    across(all_of(weightedAverageVars_byGames), ~ weighted.mean(.x, w = games, na.rm = TRUE)),
    across(all_of(weightedAverageVars_byAttempts), ~ weighted.mean(.x, w = attempts, na.rm = TRUE)),
    across(all_of(weightedAverageVars_byTargets), ~ weighted.mean(.x, w = targets, na.rm = TRUE)),
    across(all_of(concatenateVars), ~ paste(.x[!is.na(.x)], collapse = ";")),
    across(all_of(sumVars), ~ sum(.x, na.rm = TRUE)),
    .groups = "drop") %>% 
  mutate(
    pacr = passing_yards / passing_air_yards,
    racr = receiving_yards / receiving_air_yards,
    fg_pct = fg_made / fg_att,
    pat_pct = pat_made / pat_att
  )

nfl_actualStats_career_player$fg_made_list[which(nfl_actualStats_career_player$fg_made_list == "")] <- NA
nfl_actualStats_career_player$fg_missed_list[which(nfl_actualStats_career_player$fg_missed_list == "")] <- NA
nfl_actualStats_career_player$fg_blocked_list[which(nfl_actualStats_career_player$fg_blocked_list == "")] <- NA
nfl_actualStats_career_player$gwfg_distance_list[which(nfl_actualStats_career_player$gwfg_distance_list == "")] <- NA

nfl_actualStats_career_player_inclPost <- nfl_actualStats_seasonal_player_inclPost %>% 
  group_by(player_id) %>% 
  summarise(
    across(all_of(weightedAverageVars_byGames), ~ weighted.mean(.x, w = games, na.rm = TRUE)),
    across(all_of(weightedAverageVars_byAttempts), ~ weighted.mean(.x, w = attempts, na.rm = TRUE)),
    across(all_of(weightedAverageVars_byTargets), ~ weighted.mean(.x, w = targets, na.rm = TRUE)),
    across(all_of(concatenateVars), ~ paste(.x[!is.na(.x)], collapse = ";")),
    across(all_of(sumVars), ~ sum(.x, na.rm = TRUE)),
    .groups = "drop") %>% 
  mutate(
    pacr = passing_yards / passing_air_yards,
    racr = receiving_yards / receiving_air_yards,
    fg_pct = fg_made / fg_att,
    pat_pct = pat_made / pat_att
  )

nfl_actualStats_career_player_inclPost$fg_made_list[which(nfl_actualStats_career_player_inclPost$fg_made_list == "")] <- NA
nfl_actualStats_career_player_inclPost$fg_missed_list[which(nfl_actualStats_career_player_inclPost$fg_missed_list == "")] <- NA
nfl_actualStats_career_player_inclPost$fg_blocked_list[which(nfl_actualStats_career_player_inclPost$fg_blocked_list == "")] <- NA
nfl_actualStats_career_player_inclPost$gwfg_distance_list[which(nfl_actualStats_career_player_inclPost$gwfg_distance_list == "")] <- NA

nfl_actualStats_career_player[sapply(nfl_actualStats_career_player, is.infinite)] <- NA
nfl_actualStats_career_player_inclPost[sapply(nfl_actualStats_career_player_inclPost, is.infinite)] <- NA

Let’s check for duplicate player instances:

Code

nfl_actualStats_career_player %>% 
  group_by(player_id) %>% 
  filter(n() > 1) %>% 
  head()

Code

nfl_actualStats_career_player_inclPost %>% 
  group_by(player_id) %>% 
  filter(n() > 1) %>% 
  head()

Code

save(
  nfl_actualStats_career_player, nfl_actualStats_career_player_inclPost,
  file = "./data/nfl_actualStats_career.RData"
)

4.4.1.3 Week-by-Week Statistics

We already load players’ week-by-week statistics above. Nevertheless, we could compute players’ weekly statistics from the play-by-play data using the following syntax:

TODO:

Decide whether to use calculate_stats() from nflfastR (Carl & Baldwin, 2024) or load_player_stats() from nflreadr (Ho & Carl, 2024).
Save/update data file in repo with the data generated from the relevant code, and follow the data file through the rest of the book, updating code as necessary.

Note 4.5: Downloading week-by-week statistics

Note: the following code takes a while to run.

Code

nfl_actualStats_weekly_player_raw <- nflfastR::calculate_stats(
  seasons = TRUE,
  summary_level = "week")

nfl_actualStats_weekly_team_raw <- nflfastR::calculate_stats(
  seasons = TRUE,
  summary_level = "week",
  stat_type = "team")

Code

save(
  nfl_actualStats_weekly_player_raw, nfl_actualStats_weekly_team_raw,
  file = "./data/nfl_actualStats_weekly_raw.RData"
)

Code

nfl_actualStats_weekly_player <- nfl_actualStats_weekly_player_raw %>% 
  select(player_id, season, week, everything()) %>% 
  arrange(player_display_name, player_id, season, week)

nfl_actualStats_weekly_team <- nfl_actualStats_weekly_team_raw %>% 
  select(season, week, team, everything()) %>% 
  arrange(season, week, team)

Let’s check for duplicate player-season-week instances:

Code

nfl_actualStats_weekly_player %>% 
  group_by(player_id, season, week) %>% 
  filter(n() > 1) %>% 
  head()

Code

nfl_actualStats_weekly_team %>% 
  group_by(team, season, week) %>% 
  filter(n() > 1) %>% 
  head()

Code

save(
  nfl_actualStats_weekly_player, nfl_actualStats_weekly_team,
  file = "./data/nfl_actualStats_weekly.RData"
)

4.4.2 Historical Actual Fantasy Points

Specify scoring settings:

4.4.2.1 Week-by-Week

Code

nfl_actualFantasyPoints_player_weekly_seasonalList <- list()
nfl_actualFantasyPoints_dst_weekly_seasonalList <- list()

Note 4.6: Calculating players’ week-by-week fantasy points

Note: the following code takes a while to run.

We set the seasons variable in Section 4.3.13.

Code

#nfl_actualFantasyPoints_weekly_raw <- ffanalytics:::actual_points_scoring(
#  season = 2023,
#  summary_level = c("week"),
#  stat_type = c("player", "dst", "team"),
#  season_type = c("REG", "POST", "REG+POST"),
#  scoring_rules = scoring_obj,
#  vor_baseline = NULL,
#  rename_colums = FALSE
#)

pb <- txtProgressBar(
  min = 0,
  max = length(seasons),
  style = 3)

for(i in 1:length(seasons)){
  # Compute actual statistics by season
  nfl_actualFantasyPoints_player_weekly_seasonalList[[i]] <- 
    ffanalytics:::actual_points_scoring(
      season = seasons[i],
      summary_level = c("week"),
      stat_type = c("player"),
      #season_type = c("REG"),
      scoring_rules = scoring_obj,
      vor_baseline = NULL,
      rename_colums = FALSE)
  
  nfl_actualFantasyPoints_dst_weekly_seasonalList[[i]] <- 
    ffanalytics:::actual_points_scoring(
      season = seasons[i],
      summary_level = c("week"),
      stat_type = c("dst"),
      #season_type = c("REG"),
      scoring_rules = scoring_obj,
      vor_baseline = NULL,
      rename_colums = FALSE)
  
  nfl_actualFantasyPoints_player_weekly_seasonalList[[i]]$season <- seasons[i]
  nfl_actualFantasyPoints_dst_weekly_seasonalList[[i]]$season <- seasons[i]
  
  print(
    paste("Completed computing actual fantasy points for season: ", seasons[i], sep = ""))
  
  # Update the progress bar
  setTxtProgressBar(pb, i)
}

# Close the progress bar
close(pb)

nfl_actualFantasyPoints_player_weekly_raw <- nfl_actualFantasyPoints_player_weekly_seasonalList %>% 
  dplyr::bind_rows()

nfl_actualFantasyPoints_dst_weekly_raw <- nfl_actualFantasyPoints_dst_weekly_seasonalList %>% 
  dplyr::bind_rows()

Code

save(
  nfl_actualFantasyPoints_player_weekly_raw, nfl_actualFantasyPoints_dst_weekly_raw,
  file = "./data/nfl_actualFantasyPoints_weekly_raw.RData"
)

The nfl_actualFantasyPoints_weekly objects are in player-season-week (or team-season-week) form. That is, each row should be uniquely identified by the combination of player_id, season, and week (or team-season-week). Let’s rearrange the data accordingly:

Code

nfl_actualFantasyPoints_player_weekly <- nfl_actualFantasyPoints_player_weekly_raw %>% 
  rename(
    fantasyPoints = raw_points,
    team = recent_team) %>% 
  select(player_id, season, week, everything()) %>% 
  arrange(player_display_name, player_id, season, week)

nfl_actualFantasyPoints_dst_weekly <- nfl_actualFantasyPoints_dst_weekly_raw %>% 
  rename(
    fantasyPoints = raw_points,
    team = recent_team) %>% 
  select(season, week, team, everything()) %>% 
  arrange(season, week, team)

Let’s check for duplicate player-season-week instances:

Code

nfl_actualFantasyPoints_player_weekly %>% 
  group_by(player_id, season, week) %>% 
  filter(n() > 1) %>% 
  head()

Code

nfl_actualFantasyPoints_dst_weekly %>% 
  group_by(team, season, week) %>% 
  filter(n() > 1) %>% 
  head()

Let’s remove infinite (implausible) values:

Code

nfl_actualFantasyPoints_player_weekly_raw[sapply(nfl_actualFantasyPoints_player_weekly_raw, is.infinite)] <- NA
nfl_actualFantasyPoints_dst_weekly_raw[sapply(nfl_actualFantasyPoints_dst_weekly_raw, is.infinite)] <- NA

Code

save(
  nfl_actualFantasyPoints_player_weekly, nfl_actualFantasyPoints_dst_weekly,
  file = "./data/nfl_actualFantasyPoints_weekly.RData"
)

4.4.2.2 Season-by-Season

Code

nfl_actualFantasyPoints_player_seasonal_seasonalList <- list()
nfl_actualFantasyPoints_dst_seasonal_seasonalList <- list()

Note 4.7: Calculating players’ season-by-season fantasy points

Note: the following code takes a while to run.

Code

#nfl_actualFantasyPoints_seasonal_raw <- ffanalytics:::actual_points_scoring(
#  season = 2023,
#  summary_level = c("season"),
#  stat_type = c("player", "dst", "team"),
#  season_type = c("REG"),
#  scoring_rules = scoring_obj,
#  vor_baseline = NULL,
#  rename_colums = TRUE
#)

pb <- txtProgressBar(
  min = 0,
  max = length(seasons),
  style = 3)

for(i in 1:length(seasons)){
  # Compute actual statistics by season
  nfl_actualFantasyPoints_player_seasonal_seasonalList[[i]] <- 
    ffanalytics:::actual_points_scoring(
      season = seasons[i],
      summary_level = c("season"),
      stat_type = c("player"),
      season_type = c("REG"),
      scoring_rules = scoring_obj,
      vor_baseline = NULL,
      rename_colums = FALSE)
  
  nfl_actualFantasyPoints_dst_seasonal_seasonalList[[i]] <- 
    ffanalytics:::actual_points_scoring(
      season = seasons[i],
      summary_level = c("season"),
      stat_type = c("dst"),
      season_type = c("REG"),
      scoring_rules = scoring_obj,
      vor_baseline = NULL,
      rename_colums = FALSE)
  
  nfl_actualFantasyPoints_player_seasonal_seasonalList[[i]]$season <- seasons[i]
  nfl_actualFantasyPoints_dst_seasonal_seasonalList[[i]]$season <- seasons[i]
  
  print(
    paste("Completed computing actual fantasy points for season: ", seasons[i], sep = ""))
  
  # Update the progress bar
  setTxtProgressBar(pb, i)
}

# Close the progress bar
close(pb)

nfl_actualFantasyPoints_player_seasonal_raw <- nfl_actualFantasyPoints_player_seasonal_seasonalList %>% 
  dplyr::bind_rows()

nfl_actualFantasyPoints_dst_seasonal_raw <- nfl_actualFantasyPoints_dst_seasonal_seasonalList %>% 
  dplyr::bind_rows()

Code

save(
  nfl_actualFantasyPoints_player_seasonal_raw, nfl_actualFantasyPoints_dst_seasonal_raw,
  file = "./data/nfl_actualFantasyPoints_seasonal_raw.RData"
)

The nfl_actualFantasyPoints_seasonal objects are in player-season (or team-season) form. That is, each row should be uniquely identified by the combination of player_id and season (or team-season). Let’s rearrange the data accordingly:

Code

nfl_actualFantasyPoints_player_seasonal <- nfl_actualFantasyPoints_player_seasonal_raw %>% 
  rename(
    fantasyPoints = raw_points,
    team = recent_team) %>% 
  select(player_id, season, everything()) %>% 
  arrange(player_display_name, player_id, season)

nfl_actualFantasyPoints_dst_seasonal <- nfl_actualFantasyPoints_dst_seasonal_raw %>% 
  rename(
    fantasyPoints = raw_points,
    team = recent_team) %>% 
  select(season, team, everything()) %>% 
  arrange(season, team)

Let’s check for duplicate player-season-week instances:

Code

nfl_actualFantasyPoints_player_seasonal %>% 
  group_by(player_id, season) %>% 
  filter(n() > 1) %>% 
  head()

Code

nfl_actualFantasyPoints_dst_seasonal %>% 
  group_by(team, season) %>% 
  filter(n() > 1) %>% 
  head()

Let’s remove infinite (implausible) values:

Code

nfl_actualFantasyPoints_player_seasonal[sapply(nfl_actualFantasyPoints_player_seasonal, is.infinite)] <- NA
nfl_actualFantasyPoints_dst_seasonal[sapply(nfl_actualFantasyPoints_dst_seasonal, is.infinite)] <- NA

Code

save(
  nfl_actualFantasyPoints_player_seasonal, nfl_actualFantasyPoints_dst_seasonal,
  file = "./data/nfl_actualFantasyPoints_seasonal.RData"
)

4.4.2.3 Career

Code

nfl_actualFantasyPoints_player_career_raw <- nfl_actualFantasyPoints_player_seasonal %>% 
  group_by(player_id) %>% 
  summarise(fantasyPoints = sum(fantasyPoints, na.rm = TRUE))

nfl_actualFantasyPoints_player_career_raw <- full_join(
  nfl_actualFantasyPoints_player_career_raw,
  nfl_players %>% select(gsis_id, display_name, first_name, last_name, position_group, position),
  by = c("player_id" = "gsis_id")
)

Code

save(
  nfl_actualFantasyPoints_player_career,
  file = "./data/nfl_actualFantasyPoints_career_raw.RData"
)

Let’s rearrange the data:

Code

nfl_actualFantasyPoints_player_career <- nfl_actualFantasyPoints_player_career_raw %>% 
  select(player_id, display_name, first_name, last_name, position_group, position, fantasyPoints)

Code

save(
  nfl_actualFantasyPoints_player_career,
  file = "./data/nfl_actualFantasyPoints_career.RData"
)

4.4.3 Player Age and Experience

4.4.3.1 Weekly

We calculate the player’s age based on the difference between dates using the lubridate package (Spinu et al., 2024):

Code

# Reshape from wide to long format
nfl_actualFantasyPoints_player_weekly_long <- nfl_actualFantasyPoints_player_weekly %>% 
  tidyr::pivot_longer(
    cols = c(team, opponent_team),
    names_to = "role",
    values_to = "team")

# Perform separate inner join operations for the home_team and away_team
nfl_actualFantasyPoints_player_weekly_home <- dplyr::inner_join(
  nfl_actualFantasyPoints_player_weekly_long,
  nfl_schedules,
  by = c("season","week","team" = "home_team")) %>% 
  mutate(home_away = "home_team")

nfl_actualFantasyPoints_player_weekly_away <- dplyr::inner_join(
  nfl_actualFantasyPoints_player_weekly_long,
  nfl_schedules,
  by = c("season","week","team" = "away_team")) %>% 
  mutate(home_away = "away_team")

# Combine the results of the join operations
nfl_actualFantasyPoints_player_weekly_schedules_long <- dplyr::bind_rows(
  nfl_actualFantasyPoints_player_weekly_home,
  nfl_actualFantasyPoints_player_weekly_away)

# Reshape from long to wide
player_game_gameday <- nfl_actualFantasyPoints_player_weekly_schedules_long %>%
  dplyr::distinct(player_id, season, week, game_id, home_away, team, gameday) %>% #, .keep_all = TRUE
  tidyr::pivot_wider(
    names_from = home_away,
    values_from = team)

# Merge player birthdate and the game date
player_game_birthdate_gameday <- dplyr::left_join(
  player_game_gameday,
  unique(nfl_players[,c("gsis_id","birth_date")]),
  by = c("player_id" = "gsis_id")
)

player_game_birthdate_gameday$birth_date <- lubridate::ymd(player_game_birthdate_gameday$birth_date)
player_game_birthdate_gameday$gameday <- lubridate::ymd(player_game_birthdate_gameday$gameday)

# Calculate player's age for a given week as the difference between their birthdate and the game date
player_game_birthdate_gameday$age <- lubridate::interval(
  start = player_game_birthdate_gameday$birth_date,
  end = player_game_birthdate_gameday$gameday
) %>% 
  lubridate::time_length(unit = "years")

# Merge with Pro Football Reference Data on Player Age by Season
player_game_birthdate_gameday <- player_game_birthdate_gameday %>% 
  dplyr::left_join(
    nfl_advancedStatsPFR_seasonal %>% filter(!is.na(gsis_id), !is.na(season), !is.na(age)) %>% select(gsis_id, season, age) %>% unique(),
    by = c("player_id" = "gsis_id", "season")
  )

# Set age as first non-missing value from calculation above or from PFR
player_game_birthdate_gameday <- player_game_birthdate_gameday %>% 
  mutate(age = coalesce(age.x, age.y)) %>% 
  select(-age.x, -age.y)

# Calculate ageCentered and ageCenteredQuadratic
player_game_birthdate_gameday$ageCentered20 <- player_game_birthdate_gameday$age - 20
player_game_birthdate_gameday$ageCentered20Quadratic <- player_game_birthdate_gameday$ageCentered20 ^ 2

# Merge with player info
player_age <- dplyr::left_join(
  player_game_birthdate_gameday,
  nfl_players %>% select(-birth_date, -team_abbr, - team_seq),
  by = c("player_id" = "gsis_id"))

# Add game_id to weekly stats to facilitate merging
nfl_actualFantasyPoints_player_weekly <- nfl_actualFantasyPoints_player_weekly %>% 
  dplyr::left_join(
    player_age[,c("season","week","player_id","game_id")],
    by = c("season","week","player_id"))

# Merge with player weekly stats
player_stats_weekly <- dplyr::full_join(
  player_age %>% select(-position, -position_group),
  nfl_actualFantasyPoints_player_weekly,
  by = c("season","week","player_id","game_id"))

player_stats_weekly$total_years_of_experience <- as.integer(player_stats_weekly$years_of_experience)

player_stats_weekly$years_of_experience <- NULL

distinct_seasons <- player_stats_weekly %>%
  dplyr::select(player_id, season) %>%
  dplyr::distinct() %>% 
  dplyr::left_join(
    nfl_players[,c("gsis_id","years_of_experience")],
    by = c("player_id" = "gsis_id")
  ) %>% 
  dplyr::mutate(total_years_of_experience = as.integer(years_of_experience)) %>% 
  dplyr::select(-years_of_experience)

years_of_experience <- distinct_seasons %>% 
  dplyr::arrange(player_id, -season) %>% 
  dplyr::group_by(player_id) %>%
  dplyr::mutate(years_of_experience = first(total_years_of_experience) - (row_number() - 1)) %>%
  dplyr::ungroup()

years_of_experience$years_of_experience[which(years_of_experience$years_of_experience < 0)] <- 0

player_stats_weekly <- player_stats_weekly %>% 
  dplyr::left_join(
    years_of_experience[,c("player_id","season","years_of_experience")],
    by = c("player_id","season")
  )

The player_stats_weekly objects are in player-season-week form. That is, each row should be uniquely identified by the combination of player_id, season, and week. Let’s rearrange the data accordingly:

Code

player_stats_weekly <- player_stats_weekly %>% 
  arrange(player_display_name, player_id, season, week)

Let’s check for duplicate player-season-week instances:

Code

player_stats_weekly %>% 
  group_by(player_id, season, week) %>% 
  filter(n() > 1) %>% 
  head()

Code

# Save data
save(
  player_stats_weekly,
  file = "./data/player_stats_weekly.RData"
)

4.4.3.2 Seasonal

Code

# Merge player info with seasonal stats
player_stats_seasonal <- dplyr::full_join(
  nfl_actualFantasyPoints_player_seasonal,
  nfl_players %>% select(-position, -position_group, -team_abbr, - team_seq),
  by = c("player_id" = "gsis_id")
)

# Calculate age
season_startdate <- nfl_schedules %>% 
  dplyr::group_by(season) %>% 
  dplyr::summarise(startdate = min(gameday, na.rm = TRUE))

player_stats_seasonal <- player_stats_seasonal %>% 
  dplyr::left_join(
    season_startdate,
    by = "season"
  )

player_stats_seasonal$age <- lubridate::interval(
  start = player_stats_seasonal$birth_date,
  end = player_stats_seasonal$startdate
) %>% 
  lubridate::time_length(unit = "years")

# Merge with Pro Football Reference Data on Player Age by Season
player_stats_seasonal <- player_stats_seasonal %>% 
  dplyr::left_join(
    nfl_advancedStatsPFR_seasonal %>% filter(!is.na(gsis_id), !is.na(season), !is.na(age)) %>% select(gsis_id, season, age) %>% unique(),
    by = c("player_id" = "gsis_id", "season")
  )

# Set age as first non-missing value from calculation above or from PFR
player_stats_seasonal <- player_stats_seasonal %>% 
  mutate(age = coalesce(age.x, age.y)) %>% 
  select(-age.x, -age.y)

# Calculate ageCentered and ageCenteredQuadratic
player_stats_seasonal$ageCentered20 <- player_stats_seasonal$age - 20
player_stats_seasonal$ageCentered20Quadratic <- player_stats_seasonal$ageCentered20 ^ 2

# Years of experience
player_stats_seasonal$years_of_experience <- NULL

player_stats_seasonal <- player_stats_seasonal %>% 
  dplyr::left_join(
    years_of_experience[,c("player_id","season","years_of_experience")],
    by = c("player_id","season")
  )

The player_stats_seasonal objects are in player-season form. That is, each row should be uniquely identified by the combination of player_id and season. Let’s rearrange the data accordingly:

Code

player_stats_seasonal <- player_stats_seasonal %>% 
  select(player_id, season, everything()) %>% 
  arrange(player_display_name, player_id, season)

Let’s check for duplicate player-season instances:

Code

player_stats_seasonal %>% 
  group_by(player_id, season) %>% 
  filter(n() > 1) %>% 
  head()

Code

# Save data
save(
  player_stats_seasonal,
  file = "./data/player_stats_seasonal.RData"
)

4.5 Session Info

Code

sessionInfo()

R version 4.5.0 (2025-04-11)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 24.04.2 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
 [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
 [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
[10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   

time zone: UTC
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] forcats_1.0.0          stringr_1.5.1          dplyr_1.1.4           
 [4] purrr_1.0.4            readr_2.1.5            tidyr_1.3.1           
 [7] tibble_3.2.1           ggplot2_3.5.2          tidyverse_2.0.0       
[10] lubridate_1.9.4        progressr_0.15.1       nflplotR_1.4.0        
[13] nfl4th_1.0.4           nflseedR_2.0.0         nflfastR_5.1.0        
[16] nflreadr_1.4.1         petersenlab_1.1.5      ffanalytics_3.1.5.0000

loaded via a namespace (and not attached):
 [1] DBI_1.2.3          mnormt_2.1.1       gridExtra_2.3      httr2_1.1.2       
 [5] readxl_1.4.5       rlang_1.1.6        magrittr_2.0.3     furrr_0.3.1       
 [9] compiler_4.5.0     mgcv_1.9-1         vctrs_0.6.5        reshape2_1.4.4    
[13] quadprog_1.5-8     rvest_1.0.4        pkgconfig_2.0.3    fastmap_1.2.0     
[17] backports_1.5.0    pbivnorm_0.6.0     promises_1.3.2     rmarkdown_2.29    
[21] tzdb_0.5.0         ps_1.9.1           xfun_0.52          cachem_1.1.0      
[25] jsonlite_2.0.0     later_1.4.2        rrapply_1.2.7      psych_2.5.3       
[29] parallel_4.5.0     lavaan_0.6-19      cluster_2.1.8.1    R6_2.6.1          
[33] stringi_1.8.7      RColorBrewer_1.1-3 parallelly_1.44.0  rpart_4.1.24      
[37] cellranger_1.1.0   xgboost_1.7.11.1   Rcpp_1.0.14        knitr_1.50        
[41] base64enc_0.1-3    splines_4.5.0      timechange_0.3.0   Matrix_1.7-3      
[45] nnet_7.3-20        tidyselect_1.2.1   rstudioapi_0.17.1  yaml_2.3.10       
[49] codetools_0.2-20   websocket_1.4.4    curl_6.2.3         processx_3.8.6    
[53] listenv_0.9.1      lattice_0.22-6     plyr_1.8.9         withr_3.0.2       
[57] evaluate_1.0.3     foreign_0.8-90     future_1.49.0      xml2_1.3.8        
[61] pillar_1.10.2      checkmate_2.3.2    stats4_4.5.0       generics_0.1.4    
[65] chromote_0.5.1     hms_1.1.3          mix_1.0-13         scales_1.4.0      
[69] globals_0.18.0     xtable_1.8-4       glue_1.8.0         Hmisc_5.2-3       
[73] tools_4.5.0        data.table_1.17.2  gsubfn_0.7         mvtnorm_1.3-3     
[77] grid_4.5.0         mitools_2.4        colorspace_2.1-1   nlme_3.1-168      
[81] htmlTable_2.4.3    proto_1.0.0        Formula_1.2-5      cli_3.6.5         
[85] rappdirs_0.3.3     viridisLite_0.4.2  gt_1.0.0           gtable_0.3.6      
[89] fastrmodels_2.0.0  digest_0.6.37      htmlwidgets_1.6.4  farver_2.1.2      
[93] memoise_2.0.1      htmltools_0.5.8.1  lifecycle_1.0.4    httr_1.4.7

4.1 Load Packages

4.2 Data Dictionaries of NFL Data

4.3 Types of NFL Data

4.3.1 Players

4.3.1.1 Processing

4.3.2 Teams

4.3.3 Fantasy Player IDs

4.3.4 Player Info

4.3.5 Rosters

4.3.5.1 Processing

4.3.6 Game Schedules

4.3.7 Standings

4.3.8 The Combine

4.3.9 Draft Picks

4.3.9.1 Processing

4.3.10 Draft Values

4.3.11 Depth Charts

4.3.11.1 Processing

4.3.12 Trades

4.3.13 Play-By-Play Data

4.3.14 4th Down Data

4.3.15 Participation

4.3.16 Historical Actual Player Statistics

4.3.16.1 Season-by-Season Statistics

4.3.16.2 Career Statistics

4.3.16.3 Week-by-Week Statistics

4.3.17 Injuries

4.3.18 Snap Counts

4.3.19 ESPN QBR

4.3.20 NFL Next Gen Stats

4.3.21 Advanced Stats from Pro Football Reference

4.3.21.1 Processing

4.3.22 Player Contracts

4.3.23 FTN Charting Data

4.3.24 FantasyPros Rankings

4.3.25 Expected Fantasy Points

4.3.25.1 Processing

4.3.26 Fantasy Football Projections

4.3.26.1 Download Players’ Projections

4.3.26.2 Specify League Scoring Settings

4.3.26.3 Calculate Projected Points

4.3.26.4 Add Additional Player Information

4.4 Calculations

4.4.1 Historical Actual Player Statistics

4.4.1.1 Season-by-Season Statistics

4.4.1.2 Career Statistics

4.4.1.3 Week-by-Week Statistics

4.4.2 Historical Actual Fantasy Points

4.4.2.1 Week-by-Week

4.4.2.2 Season-by-Season

4.4.2.3 Career

4.4.3 Player Age and Experience

4.4.3.1 Weekly

4.4.3.2 Seasonal

4.5 Session Info

Feedback

Email Notification