I need your help!

I want your feedback to make the book better for you and other readers. If you find typos, errors, or places where the text may be improved, please let me know. The best ways to provide feedback are by GitHub or hypothes.is annotations.

Opening an issue or submitting a pull request on GitHub: https://github.com/isaactpetersen/Fantasy-Football-Analytics-Textbook

Hypothesis Adding an annotation using hypothes.is. To add an annotation, select some text and then click the symbol on the pop-up menu. To see the annotations of others, click the symbol in the upper right-hand corner of the page.

4  Download and Process NFL Football Data

4.1 Load Packages

Code
library("ffanalytics") # to install: install.packages("remotes"); remotes::install_github("FantasyFootballAnalytics/ffanalytics")
library("petersenlab") # to install: install.packages("remotes"); remotes::install_github("DevPsyLab/petersenlab")
library("nflreadr")
library("nflfastR")
library("nfl4th")
library("nflplotR")
library("progressr")
library("lubridate")
library("tidyverse")

4.2 Data Dictionaries of NFL Data

Data Dictionaries are metadata that describe the meaning of the variables in a dataset. You can find Data Dictionaries for the various National Football League (NFL) datasets at the following link: https://nflreadr.nflverse.com/articles/index.html.

4.3 Types of NFL Data

Below, we provide examples for how to download various types of NFL data using the nflreadr package. For additional resources, Congelio (2023) provides a helpful introductory text for working with NFL data in R. We save each data file after downloading it, so we can use the data in subsequent chapters. If you have difficulty downloading the data files using the nflreadr package, we also saved the data files so they are publicly available on the Open Science Framework: https://osf.io/z6pg4.

This chapter extensively uses merging to process the data for later use. See Section 3.22 for a reminder of how to perform merging, the types of merges, and what you can expect when you merge data objects with different formats. Guidance for how to merge the various NFL-related data files is provided at the following link: https://github.com/nflverse/nfldata/blob/master/DATASETS.md.

4.3.1 Players

Code
nfl_players_raw <- progressr::with_progress(
  nflreadr::load_players())
Code
save(
  nfl_players_raw,
  file = "./data/nfl_players_raw.RData"
)

The nfl_players object is in player form. That is, each row should be uniquely identified by gsis_id. Let’s rearrange the data accordingly:

Code
nfl_players <- nfl_players_raw %>% 
  select(gsis_id, everything()) %>% 
  arrange(display_name)

Let’s check for duplicate player instances:

Code
nfl_players %>% 
  group_by(gsis_id) %>% 
  filter(n() > 1) %>% 
  head()

4.3.1.1 Processing

Let’s do some data cleanup:

Code
# Convert missing values to NA
nfl_players[nfl_players == ""] <- NA

# Drop players with missing values for gsis_id
nfl_players <- nfl_players %>% 
  filter(!is.na(gsis_id))
Code
save(
  nfl_players,
  file = "./data/nfl_players.RData"
)

4.3.2 Teams

Code
nfl_teams_raw <- progressr::with_progress(
  nflreadr::load_teams(current = TRUE))
Code
save(
  nfl_teams_raw,
  file = "./data/nfl_teams_raw.RData"
)

The nfl_teams object is in team form. That is, each row should be uniquely identified by team_id. Let’s rearrange the data accordingly:

Code
nfl_teams <- nfl_teams_raw %>% 
  select(team_id, everything())

Let’s check for duplicate team instances:

Code
nfl_teams %>% 
  group_by(team_id) %>% 
  filter(n() > 1) %>% 
  head()
Code
save(
  nfl_teams,
  file = "./data/nfl_teams.RData"
)

4.3.3 Fantasy Player IDs

A Data Dictionary for fantasy player ID data is located at the following link: https://nflreadr.nflverse.com/articles/dictionary_ff_playerids.html

Code
nfl_playerIDs_raw <- progressr::with_progress(
  nflreadr::load_ff_playerids())
Code
save(
  nfl_playerIDs_raw,
  file = "./data/nfl_playerIDs_raw.RData"
)

The nfl_playerIDs object is in player form. That is, each row should be uniquely identified by mfl_id.

Code
nfl_playerIDs <- nfl_playerIDs_raw %>% 
  arrange(name, mfl_id)

Let’s check for duplicate player instances:

Code
nfl_playerIDs %>% 
  group_by(mfl_id) %>% 
  filter(n() > 1) %>% 
  head()
Code
nfl_playerIDs %>% 
  filter(!is.na(gsis_id)) %>% 
  group_by(gsis_id) %>% 
  filter(n() > 1) %>% 
  head()

Let’s do some data processing to help with merging the dataset with other datasets:

Code
nfl_playerIDs$name[which(nfl_playerIDs$name == "Bennett,Michael")] <- "Michael Bennett"

nfl_playerIDs <- nfl_playerIDs %>% 
  mutate(
    merge_name = nflreadr::clean_player_names(name, lowercase = TRUE)
  )
Code
save(
  nfl_playerIDs,
  file = "./data/nfl_playerIDs.RData"
)

4.3.4 Player Info

4.3.5 Rosters

A Data Dictionary for rosters is located at the following links:

Code
nfl_rosters_raw <- progressr::with_progress(
  nflreadr::load_rosters(seasons = TRUE))

nfl_rosters_weekly_raw <- progressr::with_progress(
  nflreadr::load_rosters_weekly(seasons = TRUE))

rosters <- read_csv("https://raw.githubusercontent.com/leesharpe/nfldata/master/data/rosters.csv")
Code
save(
  nfl_rosters_raw,
  rosters,
  file = "./data/nfl_rosters_raw.RData"
)

save(
  nfl_rosters_weekly_raw,
  file = "./data/nfl_rosters_weekly_raw.RData"
)

The nfl_rosters object is in player-season-team form. That is, each row should be uniquely identified by the combination of gsis_id, season, and team. Let’s rearrange the data accordingly:

Code
nfl_rosters <- nfl_rosters_raw %>% 
  left_join(
    rosters %>% select(playerid, season, team, side, category, games, starts, years, av),
    by = c("season","team","pfr_id" = "playerid"),
    na_matches = "never"
  ) %>% 
  select(gsis_id, season, team, week, everything()) %>% 
  arrange(full_name, gsis_id, season, team, week)

Let’s check for duplicate player-season-team instances:

Code
nfl_rosters %>% 
  group_by(gsis_id, season, team) %>% 
  filter(n() > 1) %>% 
  head()

4.3.5.1 Processing

Let’s do some data cleanup:

Code
# Drop players with missing values for gsis_id
nfl_rosters <- nfl_rosters %>% 
  filter(!is.na(gsis_id))

# Fill in missing values for a player in their duplicate instances, and then keep only the first of the duplicate instances
nfl_rosters <- nfl_rosters %>% 
  group_by(gsis_id, season, team) %>% 
  fill(names(.), .direction = "downup") %>% 
  slice_head(n = 1) %>% 
  ungroup()

Let’s check again for duplicate player-season-team instances:

Code
nfl_rosters %>% 
  group_by(gsis_id, season, team) %>% 
  filter(n() > 1) %>% 
  head()

The nfl_rosters_weekly object is in player-season-week form. That is, each row should be uniquely identified by the combination of gsis_id, season, and week. Let’s rearrange the data accordingly:

Code
nfl_rosters_weekly <- nfl_rosters_weekly_raw %>% 
  select(gsis_id, season, week, everything()) %>% 
  arrange(full_name, gsis_id, season, week)

Let’s check for duplicate player-season-week instances:

Code
nfl_rosters_weekly %>% 
  group_by(gsis_id, season, week) %>% 
  filter(n() > 1) %>% 
  head()

Let’s do some data cleanup:

Code
# Drop players with missing values for gsis_id
nfl_rosters_weekly <- nfl_rosters_weekly %>% 
  filter(!is.na(gsis_id))

# Fill in missing values for a player in their duplicate instances, and then keep only the first of the duplicate instances
nfl_rosters_weekly <- nfl_rosters_weekly %>% 
  group_by(gsis_id, season, week) %>% 
  fill(names(.), .direction = "downup") %>% 
  slice_head(n = 1) %>% 
  ungroup()

Let’s check again for duplicate player-season-week instances:

Code
nfl_rosters_weekly %>% 
  group_by(gsis_id, season, week) %>% 
  filter(n() > 1) %>% 
  head()
Code
save(
  nfl_rosters,
  file = "./data/nfl_rosters.RData"
)

save(
  nfl_rosters_weekly,
  file = "./data/nfl_rosters_weekly.RData"
)

4.3.6 Game Schedules

A Data Dictionary for game schedules data is located at the following link: https://nflreadr.nflverse.com/articles/dictionary_schedules.html

Code
nfl_schedules_raw <- progressr::with_progress(
  nflreadr::load_schedules(seasons = TRUE))
Code
save(
  nfl_schedules_raw,
  file = "./data/nfl_schedules_raw.RData"
)

The nfl_schedules object is in game form and in season-week (and -game type) form. That is, each row should be uniquely identified by game_id. Each row should also be uniquely identified by the combination of season and week (and game type).

Code
nfl_schedules <- nfl_schedules_raw

Let’s check for duplicate game instances:

Code
nfl_schedules %>% 
  group_by(game_id) %>% 
  filter(n() > 1) %>% 
  head()
Code
save(
  nfl_schedules,
  file = "./data/nfl_schedules.RData"
)

4.3.7 Standings

A Data Dictionary for game schedules data is located at the following link: https://github.com/nflverse/nfldata/blob/master/DATASETS.md#standings

Code
nfl_standings_raw <- read_csv("http://www.habitatring.com/standings.csv")
Code
save(
  nfl_standings_raw,
  file = "./data/nfl_standings_raw.RData"
)

The nfl_standings object is in season-team form. That is, each row should be uniquely identified by the combination of season and team.

Code
nfl_standings <- nfl_standings_raw

Let’s check for duplicate season-team instances:

Code
nfl_standings %>% 
  group_by(season, team) %>% 
  filter(n() > 1) %>% 
  head()
Code
save(
  nfl_standings,
  file = "./data/nfl_standings.RData"
)

4.3.8 The Combine

A Data Dictionary for data from the combine is located at the following link: https://nflreadr.nflverse.com/articles/dictionary_combine.html

Code
nfl_combine_raw <- progressr::with_progress(
  nflreadr::load_combine(seasons = TRUE))
Code
save(
  nfl_combine_raw,
  file = "./data/nfl_combine_raw.RData"
)

The nfl_combine object is in player form. That is, each row should be uniquely identified by the player’s id. However, there is no gsis_id variable to merge it easily with other datasets. Some of the players have other id variables, including pfr_id and cfb_id. Let’s rearrange the data accordingly:

Code
nfl_combine <- nfl_combine_raw %>% 
  select(pfr_id, cfb_id, everything()) %>% 
  arrange(season, player_name)

Let’s do some data processing to help with merging the dataset with other datasets:

Code
nfl_combine <- nfl_combine %>% 
  mutate(
    merge_name = nflreadr::clean_player_names(player_name, lowercase = TRUE)
  )

# First, merge on both pfr_id and cfb_id
merged_data1 <- left_join(
  nfl_combine,
  nfl_playerIDs %>% select(pfr_id, cfbref_id, gsis_id),
  by = c("pfr_id", "cfb_id" = "cfbref_id"),
  na_matches = "never"
)

# Second, merge on pfr_id
merged_data2 <- left_join(
  nfl_combine,
  nfl_playerIDs %>% select(pfr_id, merge_name, position, gsis_id),
  by = c("pfr_id","merge_name","pos" = "position"),
  na_matches = "never"
)

# Third, merge on cfb_id
merged_data3 <- left_join(
  nfl_combine,
  nfl_playerIDs %>% select(cfbref_id, merge_name, position, gsis_id),
  by = c("cfb_id" = "cfbref_id","merge_name","pos" = "position"),
  na_matches = "never"
)

# Combine gsis_id across merges
nfl_combine$gsis_id <- coalesce(
  merged_data1$gsis_id,
  merged_data2$gsis_id,
  merged_data3$gsis_id
  )

# Rearrange the data
nfl_combine <- nfl_combine %>% 
  select(gsis_id, pfr_id, cfb_id, player_name, everything()) %>% 
  arrange(season, player_name)

Let’s check for duplicate gsis_id, pfr_id, and cfb_id instances:

Code
nfl_combine %>% 
  group_by(gsis_id) %>% 
  filter(n() > 1, !is.na(gsis_id)) %>% 
  arrange(gsis_id) %>% 
  head()
Code
nfl_combine %>% 
  group_by(pfr_id) %>% 
  filter(n() > 1, !is.na(pfr_id)) %>% 
  arrange(pfr_id) %>% 
  head()
Code
nfl_combine %>% 
  group_by(cfb_id) %>% 
  filter(n() > 1, !is.na(cfb_id)) %>% 
  arrange(cfb_id) %>% 
  head()

Let’s do some additional data processing:

Code
# Drop Stanford Samuels Jr.
nfl_combine$gsis_id[which(nfl_combine$cfb_id == "stanford-samuels-1")] <- NA

nfl_combine <- nfl_combine %>% 
  separate_wider_delim(
    ht,
    names = c("feet", "inches"),
    delim = "-") %>%
  mutate(
    feet = as.numeric(feet),
    inches = as.numeric(inches),
    ht = feet * 12 + inches
  ) %>%
  select(-feet, -inches)

However, these apparent duplicates appear to be different players at different positions:

Code
nfl_combine %>% 
  group_by(season, gsis_id, pos) %>% 
  filter(n() > 1, !is.na(gsis_id)) %>% 
  arrange(gsis_id) %>% 
  head()
Code
nfl_combine %>% 
  group_by(season, pfr_id, pos) %>% 
  filter(n() > 1, !is.na(pfr_id)) %>% 
  arrange(pfr_id) %>% 
  head()
Code
nfl_combine %>% 
  group_by(season, cfb_id, pos) %>% 
  filter(n() > 1, !is.na(cfb_id)) %>% 
  arrange(cfb_id) %>% 
  head()
Code
save(
  nfl_combine,
  file = "./data/nfl_combine.RData"
)

4.3.9 Draft Picks

A Data Dictionary for draft picks data is located at the following link: https://nflreadr.nflverse.com/articles/dictionary_draft_picks.html

Code
nfl_draftPicks_raw <- progressr::with_progress(
  nflreadr::load_draft_picks(seasons = TRUE))
Code
save(
  nfl_draftPicks_raw,
  file = "./data/nfl_draftPicks_raw.RData"
)

The nfl_draftPicks object is in player form. That is, each row should be uniquely identified by gsis_id. Let’s rearrange the data accordingly:

Code
nfl_draftPicks <- nfl_draftPicks_raw %>% 
  select(gsis_id, everything()) %>% 
  arrange(pfr_player_name)

Let’s check for duplicate player instances:

Code
nfl_draftPicks %>% 
  group_by(gsis_id) %>% 
  filter(n() > 1) %>% 
  head()

4.3.9.1 Processing

Let’s do some data cleanup:

Code
# Convert missing values to NA
nfl_draftPicks[nfl_draftPicks == ""] <- NA

# Drop players with missing values for gsis_id
nfl_draftPicks <- nfl_draftPicks %>% 
  filter(!is.na(gsis_id))

Let’s check again for duplicate player instances:

Code
nfl_draftPicks %>% 
  group_by(gsis_id) %>% 
  filter(n() > 1) %>% 
  head()
Code
save(
  nfl_draftPicks,
  file = "./data/nfl_draftPicks.RData"
)

4.3.10 Draft Values

A Data Dictionary for draft values data is located at the following link: https://github.com/nflverse/nfldata/blob/master/DATASETS.md#draft_values

Code
nfl_draftValues_raw <- read_csv("https://raw.githubusercontent.com/leesharpe/nfldata/master/data/draft_values.csv")
Code
save(
  nfl_draftValues_raw,
  file = "./data/nfl_draftValues_raw.RData"
)

The nfl_draftValues object is in pick form. That is, each row should be uniquely identified by pick. Let’s rearrange the data accordingly:

Code
nfl_draftValues <- nfl_draftValues_raw %>% 
  select(pick, everything()) %>% 
  arrange(pick)

Let’s check for duplicate pick instances:

Code
nfl_draftValues %>% 
  group_by(pick) %>% 
  filter(n() > 1) %>% 
  head()
Code
save(
  nfl_draftValues,
  file = "./data/nfl_draftValues.RData"
)

4.3.11 Depth Charts

A Data Dictionary for data from weekly depth charts is located at the following link: https://nflreadr.nflverse.com/articles/dictionary_depth_charts.html

Code
nfl_depthCharts_raw <- progressr::with_progress(
  nflreadr::load_depth_charts(seasons = TRUE))
Code
save(
  nfl_depthCharts_raw,
  file = "./data/nfl_depthCharts_raw.RData"
)

The nfl_depthCharts object is in player-season-week-position form. That is, each row should be uniquely identified by the combination of gsis_id, season, week, and depth_position. Let’s rearrange the data accordingly:

Code
nfl_depthCharts <- nfl_depthCharts_raw %>% 
  select(gsis_id, season, week, depth_position, everything()) %>% 
  arrange(full_name, gsis_id, season, week, depth_position)

Let’s check for duplicate player-season-week-position instances:

Code
nfl_depthCharts %>% 
  group_by(gsis_id, season, week, depth_position) %>% 
  filter(n() > 1) %>% 
  head()

4.3.11.1 Processing

Let’s do some data cleanup:

Code
# Drop players with missing values for gsis_id
nfl_depthCharts <- nfl_depthCharts %>% 
  filter(!is.na(gsis_id))

# Fill in missing values for a player in their duplicate instances, and then keep only the first of the duplicate instances
nfl_depthCharts <- nfl_depthCharts %>% 
  group_by(gsis_id, season, week, depth_position) %>% 
  fill(names(.), .direction = "downup") %>% 
  slice_head(n = 1) %>% 
  ungroup()

Let’s check again for duplicate player-season-week-position instances:

Code
nfl_depthCharts %>% 
  group_by(gsis_id, season, week, depth_position) %>% 
  filter(n() > 1) %>% 
  head()
Code
save(
  nfl_depthCharts,
  file = "./data/nfl_depthCharts.RData"
)

4.3.12 Trades

A Data Dictionary for game schedules data is located at the following link: https://github.com/nflverse/nfldata/blob/master/DATASETS.md#trades

Code
nfl_trades_raw <- read_csv("https://raw.githubusercontent.com/leesharpe/nfldata/master/data/trades.csv")
Code
save(
  nfl_trades_raw,
  file = "./data/nfl_trades_raw.RData"
)

The nfl_trades object is in trade-player form. That is, each row should be uniquely identified by the combination of trade_id and pfr_id.

Code
nfl_trades <- nfl_trades_raw

Let’s check for duplicate trade-player instances:

Code
nfl_trades %>%
  filter(!is.na(pfr_id)) %>% 
  group_by(trade_id, pfr_id) %>% 
  filter(n() > 1) %>% 
  head()
Code
save(
  nfl_trades,
  file = "./data/nfl_trades.RData"
)

4.3.13 Play-By-Play Data

To download play-by-play data from prior weeks and seasons, we can use the load_pbp() function of the nflreadr package. We add a progress bar using the with_progress() function from the progressr package because it takes a while to run. A Data Dictionary for the play-by-play data is located at the following link: https://nflreadr.nflverse.com/articles/dictionary_pbp.html

Note 4.1: Downloading play-by-play data

Note: the following code takes a while to run.

Code
nfl_pbp_raw <- progressr::with_progress(
  nflreadr::load_pbp(seasons = TRUE))
Code
save(
  nfl_pbp_raw,
  file = "./data/nfl_pbp_raw.RData"
)

The nfl_pbp object is in game-drive-play form. That is, each row should be uniquely identified by the combination of game_id, fixed_drive, play_id. Let’s rearrange the data accordingly:

Code
nfl_pbp <- nfl_pbp_raw %>% 
  select(game_id, drive, play_id, everything()) %>% 
  arrange(game_id, drive, play_id)

Let’s check for duplicate game-drive-play instances:

Code
nfl_pbp %>% 
  group_by(game_id, fixed_drive, play_id) %>% 
  filter(n() > 1) %>% 
  head()
Code
save(
  nfl_pbp,
  file = "./data/nfl_pbp.RData"
)

4.3.14 4th Down Data

Note 4.2: Downloading 4th down data

Note: the following code takes a while to run.

Code
nfl_4thdown_raw <- nfl4th::load_4th_pbp(
  seasons = 2014:nflreadr::most_recent_season())
Code
save(
  nfl_4thdown_raw,
  file = "./data/nfl_4thdown_raw.RData"
)

The nfl_4thdown object is in game-drive-play form. That is, each row should be uniquely identified by the combination of game_id, drive, play_id. Let’s rearrange the data accordingly:

Code
nfl_4thdown <- nfl_4thdown_raw %>% 
  select(game_id, drive, play_id, everything()) %>% 
  arrange(game_id, drive, play_id)

Let’s check for duplicate game-drive-play instances:

Code
nfl_4thdown %>% 
  group_by(game_id, drive, play_id) %>% 
  filter(n() > 1) %>% 
  head()
Code
save(
  nfl_4thdown,
  file = "./data/nfl_4thdown.RData"
)

4.3.15 Participation

A Data Dictionary for the participation data is located at the following link: https://nflreadr.nflverse.com/articles/dictionary_participation.html

Code
nfl_participation_raw <- progressr::with_progress(
  nflreadr::load_participation(
    seasons = TRUE,
    include_pbp = TRUE))
Code
save(
  nfl_participation_raw,
  file = "./data/nfl_participation_raw.RData"
)

The nfl_participation object is in game-drive-play form. That is, each row should be uniquely identified by the combination of nflverse_game_id, drive, play_id. Let’s rearrange the data accordingly:

Code
nfl_participation <- nfl_participation_raw %>% 
  select(nflverse_game_id, drive, play_id, everything()) %>% 
  arrange(nflverse_game_id, drive, play_id)

Let’s check for duplicate game-drive-play instances:

Code
nfl_participation %>% 
  group_by(nflverse_game_id, drive, play_id) %>% 
  filter(n() > 1) %>% 
  head()
Code
save(
  nfl_participation,
  file = "./data/nfl_participation.RData"
)

4.3.16 Historical Weekly Actual Player Statistics

We can download historical week-by-week actual player statistics using the load_player_stats() function from the nflreadr package. A Data Dictionary for statistics for offensive players is located at the following link: https://nflreadr.nflverse.com/articles/dictionary_player_stats.html. A Data Dictionary for statistics for defensive players is located at the following link: https://nflreadr.nflverse.com/articles/dictionary_player_stats_def.html.

Code
nfl_actualStats_offense_weekly_raw <- progressr::with_progress(
  nflreadr::load_player_stats(
    seasons = TRUE,
    stat_type = "offense"))

nfl_actualStats_defense_weekly_raw <- progressr::with_progress(
  nflreadr::load_player_stats(
    seasons = TRUE,
    stat_type = "defense"))

nfl_actualStats_kicking_weekly_raw <- progressr::with_progress(
  nflreadr::load_player_stats(
    seasons = TRUE,
    stat_type = "kicking"))
Code
save(
  nfl_actualStats_offense_weekly_raw, nfl_actualStats_defense_weekly_raw, nfl_actualStats_kicking_weekly_raw,
  file = "./data/nfl_actualStats_weekly_raw.RData"
)

The nfl_actualStats_weekly objects are in player-season-week form. That is, each row should be uniquely identified by the combination of player_id, season, and week. Let’s rearrange the data accordingly:

Code
nfl_actualStats_offense_weekly <- nfl_actualStats_offense_weekly_raw %>% 
  rename(team = recent_team) %>% 
  select(player_id, season, week, everything()) %>% 
  arrange(player_display_name, player_id, season, week)

nfl_actualStats_defense_weekly <- nfl_actualStats_defense_weekly_raw %>% 
  select(player_id, season, week, everything()) %>% 
  arrange(player_display_name, player_id, season, week)

nfl_actualStats_kicking_weekly <- nfl_actualStats_kicking_weekly_raw %>% 
  select(player_id, season, week, everything()) %>% 
  arrange(player_display_name, player_id, season, week)

Let’s check for duplicate player-season-week instances:

Code
nfl_actualStats_offense_weekly %>% 
  group_by(player_id, season, week) %>% 
  filter(n() > 1) %>% 
  head()
Code
nfl_actualStats_defense_weekly %>% 
  group_by(player_id, season, week) %>% 
  filter(n() > 1) %>% 
  head()
Code
nfl_actualStats_kicking_weekly %>% 
  group_by(player_id, season, week) %>% 
  filter(n() > 1) %>% 
  head()
Code
save(
  nfl_actualStats_offense_weekly, nfl_actualStats_defense_weekly, nfl_actualStats_kicking_weekly,
  file = "./data/nfl_actualStats_weekly.RData"
)

4.3.17 Injuries

A Data Dictionary for injury data is located at the following link: https://nflreadr.nflverse.com/articles/dictionary_injuries.html

Code
nfl_injuries_raw <- progressr::with_progress(
  nflreadr::load_injuries(seasons = TRUE))
Code
save(
  nfl_injuries_raw,
  file = "./data/nfl_injuries_raw.RData"
)

The nfl_injuries object is in player-season-week form. That is, each row should be uniquely identified by the combination of gsis_id, season, and week. Let’s rearrange the data accordingly:

Code
nfl_injuries <- nfl_injuries_raw %>% 
  select(gsis_id, season, week, everything()) %>% 
  arrange(full_name, gsis_id, season, week)

Let’s check for duplicate player-season-week instances:

Code
nfl_injuries %>% 
  group_by(gsis_id, season, week) %>% 
  filter(n() > 1) %>% 
  head()
Code
save(
  nfl_injuries,
  file = "./data/nfl_injuries.RData"
)

4.3.18 Snap Counts

A Data Dictionary for snap counts data is located at the following link: https://nflreadr.nflverse.com/articles/dictionary_snap_counts.html

Code
nfl_snapCounts_raw <- progressr::with_progress(
  nflreadr::load_snap_counts(seasons = TRUE))
Code
save(
  nfl_snapCounts_raw,
  file = "./data/nfl_snapCounts_raw.RData"
)

The nfl_snapCounts object is in game-player form. That is, each row should be uniquely identified by the combination of game_id and pfr_player_id. Let’s rearrange the data accordingly:

Code
nfl_snapCounts <- nfl_snapCounts_raw %>% 
  select(game_id, pfr_player_id, everything()) %>% 
  arrange(game_id, pfr_player_id)

Let’s check for duplicate game instances:

Code
nfl_snapCounts %>% 
  group_by(game_id, pfr_player_id) %>% 
  filter(n() > 1) %>% 
  head()
Code
save(
  nfl_snapCounts,
  file = "./data/nfl_snapCounts.RData"
)

4.3.19 ESPN QBR

A Data Dictionary for ESPN QBR data is located at the following link: https://nflreadr.nflverse.com/articles/dictionary_espn_qbr.html

Code
nfl_espnQBR_seasonal_raw <- progressr::with_progress(
  nflreadr::load_espn_qbr(
    seasons = TRUE,
    summary_type = c("season")))

nfl_espnQBR_weekly_raw <- progressr::with_progress(
  nflreadr::load_espn_qbr(
    seasons = TRUE,
    summary_type = c("week")))
Code
save(
  nfl_espnQBR_seasonal_raw,
  file = "./data/nfl_espnQBR_seasonal_raw.RData"
)

save(
  nfl_espnQBR_weekly_raw,
  file = "./data/nfl_espnQBR_weekly_raw.RData"
)

The nfl_espnQBR_seasonal object is in player-season-season type form, where season type refers to regular season versus postseason. That is, each row should be uniquely identified by the combination of player_id, season, and season_type. Let’s rearrange the data accordingly:

Code
nfl_espnQBR_seasonal <- nfl_espnQBR_seasonal_raw %>% 
  select(player_id, season, season_type, everything()) %>% 
  arrange(name_display, player_id, season, season_type)

Let’s check for duplicate player-season-team instances:

Code
nfl_espnQBR_seasonal %>% 
  group_by(player_id, season, season_type) %>% 
  filter(n() > 1) %>% 
  head()

The nfl_espnQBR_weekly object is in both game-player form and player-season-season type-week form, where season type refers to regular season versus postseason. That is, each row should be uniquely identified by the combination of gsis_id, season, and week or by the combination of player_id, season, season_type, and week_num. Let’s rearrange the data accordingly:

Code
nfl_espnQBR_weekly <- nfl_espnQBR_weekly_raw %>% 
  select(player_id, season, season_type, week_num, everything()) %>% 
  arrange(name_display, player_id, season, season_type, week_num)

Let’s check for duplicate game-player or player-season-season type-week instances:

Code
nfl_espnQBR_weekly %>% 
  arrange(game_id, player_id) %>% 
  group_by(game_id, player_id) %>% 
  filter(n() > 1) %>% 
  head()
Code
nfl_espnQBR_weekly %>% 
  arrange(player_id, season, season_type, week_num) %>% 
  group_by(player_id, season, season_type, week_num) %>% 
  filter(n() > 1) %>% 
  head()
Code
save(
  nfl_espnQBR_seasonal,
  file = "./data/nfl_espnQBR_seasonal.RData"
)

save(
  nfl_espnQBR_weekly,
  file = "./data/nfl_espnQBR_weekly.RData"
)

4.3.20 NFL Next Gen Stats

A Data Dictionary for NFL Next Gen Stats data is located at the following link: https://nflreadr.nflverse.com/articles/dictionary_nextgen_stats.html

Code
nfl_nextGenStats_pass_weekly_raw <- progressr::with_progress(
  nflreadr::load_nextgen_stats(
    seasons = TRUE,
    stat_type = c("passing")))

nfl_nextGenStats_rush_weekly_raw <- progressr::with_progress(
  nflreadr::load_nextgen_stats(
    seasons = TRUE,
    stat_type = c("rushing")))

nfl_nextGenStats_rec_weekly_raw <- progressr::with_progress(
  nflreadr::load_nextgen_stats(
    seasons = TRUE,
    stat_type = c("receiving")))

nfl_nextGenStats_weekly_raw <- bind_rows(
  nfl_nextGenStats_pass_weekly_raw,
  nfl_nextGenStats_rush_weekly_raw,
  nfl_nextGenStats_rec_weekly_raw
)
Code
save(
  nfl_nextGenStats_weekly_raw,
  file = "./data/nfl_nextGenStats_weekly_raw.RData"
)

The nfl_nextGenStats_weekly object is in player-season-season type-week form, where season type refers to regular season versus postseason. That is, each row should be uniquely identified by the combination of player_gsis_id, season, season_type, and week. Let’s rearrange the data accordingly:

Code
nfl_nextGenStats_weekly <- nfl_nextGenStats_weekly_raw %>% 
  select(player_gsis_id, season, season_type, week, everything()) %>% 
  arrange(player_display_name, player_gsis_id, season, season_type)

Let’s check for duplicate player-season-season type-week instances:

Code
nfl_nextGenStats_weekly %>% 
  group_by(player_gsis_id, season, season_type, week) %>% 
  filter(n() > 1) %>% 
  head()
Code
save(
  nfl_nextGenStats_weekly,
  file = "./data/nfl_nextGenStats_weekly.RData"
)

4.3.21 Advanced Stats from Pro Football Reference

A Data Dictionary for Pro Football Reference passing data is located at the following links:

Advanced stats from the 2024 season are available at the following link: https://www.pro-football-reference.com/years/2024/advanced.htm

Code
# Seasonal Data
nfl_advancedStatsPFR_pass_seasonal_raw <- progressr::with_progress(
  nflreadr::load_pfr_advstats(
    seasons = TRUE,
    stat_type = c("pass"),
    summary_level = c("season")))

nfl_advancedStatsPFR_rush_seasonal_raw <- progressr::with_progress(
  nflreadr::load_pfr_advstats(
    seasons = TRUE,
    stat_type = c("rush"),
    summary_level = c("season")))

nfl_advancedStatsPFR_rec_seasonal_raw <- progressr::with_progress(
  nflreadr::load_pfr_advstats(
    seasons = TRUE,
    stat_type = c("rec"),
    summary_level = c("season")))

nfl_advancedStatsPFR_def_seasonal_raw <- progressr::with_progress(
  nflreadr::load_pfr_advstats(
    seasons = TRUE,
    stat_type = c("def"),
    summary_level = c("season")))

# Weekly Data
nfl_advancedStatsPFR_pass_weekly_raw <- progressr::with_progress(
  nflreadr::load_pfr_advstats(
    seasons = TRUE,
    stat_type = c("pass"),
    summary_level = c("week")))

nfl_advancedStatsPFR_rush_weekly_raw <- progressr::with_progress(
  nflreadr::load_pfr_advstats(
    seasons = TRUE,
    stat_type = c("rush"),
    summary_level = c("week")))

nfl_advancedStatsPFR_rec_weekly_raw <- progressr::with_progress(
  nflreadr::load_pfr_advstats(
    seasons = TRUE,
    stat_type = c("rec"),
    summary_level = c("week")))

nfl_advancedStatsPFR_def_weekly_raw <- progressr::with_progress(
  nflreadr::load_pfr_advstats(
    seasons = TRUE,
    stat_type = c("def"),
    summary_level = c("week")))
Code
save(
  nfl_advancedStatsPFR_pass_seasonal_raw,
  nfl_advancedStatsPFR_rush_seasonal_raw,
  nfl_advancedStatsPFR_rec_seasonal_raw,
  nfl_advancedStatsPFR_def_seasonal_raw,
  file = "./data/nfl_advancedStatsPFR_seasonal_raw.RData"
)

save(
  nfl_advancedStatsPFR_pass_weekly_raw,
  nfl_advancedStatsPFR_rush_weekly_raw,
  nfl_advancedStatsPFR_rec_weekly_raw,
  nfl_advancedStatsPFR_def_weekly_raw,
  file = "./data/nfl_advancedStatsPFR_weekly_raw.RData"
)

4.3.21.1 Processing

Code
# Clean up player name for merging; name variables based on which data object they're from

## Seasonal Data
nfl_advancedStatsPFR_pass_seasonal <- nfl_advancedStatsPFR_pass_seasonal_raw %>% 
  mutate(
    merge_name = nflreadr::clean_player_names(player, lowercase = TRUE)
  ) %>% 
  rename_with(
    ~ paste0(., ".pass"),
    -c(pfr_id, merge_name, season, team))

nfl_advancedStatsPFR_rush_seasonal <- nfl_advancedStatsPFR_rush_seasonal_raw %>% 
  mutate(
    merge_name = nflreadr::clean_player_names(player, lowercase = TRUE)
  ) %>% 
  rename(
    team = tm
  ) %>% 
  rename_with(
    ~ paste0(., ".rush"),
    -c(pfr_id, merge_name, season, team))

nfl_advancedStatsPFR_rec_seasonal <- nfl_advancedStatsPFR_rec_seasonal_raw %>% 
  mutate(
    merge_name = nflreadr::clean_player_names(player, lowercase = TRUE)
  ) %>% 
  rename(
    team = tm
  ) %>% 
  rename_with(
    ~ paste0(., ".rec"),
    -c(pfr_id, merge_name, season, team))

nfl_advancedStatsPFR_def_seasonal <- nfl_advancedStatsPFR_def_seasonal_raw %>% 
  mutate(
    merge_name = nflreadr::clean_player_names(player, lowercase = TRUE)
  )  %>% 
  rename(
    team = tm
  ) %>% 
  rename_with(
    ~ paste0(., ".def"),
    -c(pfr_id, merge_name, season, team))

## Weekly Data
nfl_advancedStatsPFR_pass_weekly <- nfl_advancedStatsPFR_pass_weekly_raw %>% 
  mutate(
    merge_name = nflreadr::clean_player_names(pfr_player_name, lowercase = TRUE)
  ) %>% 
  rename_with(
    ~ paste0(., ".pass"),
    -c(game_id, pfr_player_id))

nfl_advancedStatsPFR_rush_weekly <- nfl_advancedStatsPFR_rush_weekly_raw %>% 
  mutate(
    merge_name = nflreadr::clean_player_names(pfr_player_name, lowercase = TRUE)
  ) %>% 
  rename_with(
    ~ paste0(., ".rush"),
    -c(game_id, pfr_player_id))

nfl_advancedStatsPFR_rec_weekly <- nfl_advancedStatsPFR_rec_weekly_raw %>% 
  mutate(
    merge_name = nflreadr::clean_player_names(pfr_player_name, lowercase = TRUE)
  ) %>% 
  rename_with(
    ~ paste0(., ".rec"),
    -c(game_id, pfr_player_id))

nfl_advancedStatsPFR_def_weekly <- nfl_advancedStatsPFR_def_weekly_raw %>% 
  mutate(
    merge_name = nflreadr::clean_player_names(pfr_player_name, lowercase = TRUE)
  ) %>% 
  rename_with(
    ~ paste0(., ".def"),
    -c(game_id, pfr_player_id))

## Merge across positions
nfl_advancedStatsPFR_seasonal_list <- list(
  nfl_advancedStatsPFR_pass_seasonal,
  nfl_advancedStatsPFR_rush_seasonal,
  nfl_advancedStatsPFR_rec_seasonal,
  nfl_advancedStatsPFR_def_seasonal)

nfl_advancedStatsPFR_weekly_list <- list(
  nfl_advancedStatsPFR_pass_weekly,
  nfl_advancedStatsPFR_rush_weekly,
  nfl_advancedStatsPFR_rec_weekly,
  nfl_advancedStatsPFR_def_weekly)

nfl_advancedStatsPFR_seasonalByTeam <- nfl_advancedStatsPFR_seasonal_list %>% 
  purrr::reduce(
    full_join,
    by = c("pfr_id","merge_name","season","team"))

nfl_advancedStatsPFR_weekly <- nfl_advancedStatsPFR_weekly_list %>% 
  purrr::reduce(
    full_join,
    by = c("game_id","pfr_player_id")) #merge_name

#nfl_advancedStatsPFR_weekly <- nfl_advancedStatsPFR_weekly_list %>% 
#  purrr::reduce(
#    full_join,
#    by = c("pfr_player_id","merge_name","season","week"))

nfl_advancedStatsPFR_seasonalByTeam <- nfl_advancedStatsPFR_seasonalByTeam %>% 
  mutate(
    pfr_player_name = coalesce(
      player.pass,
      player.rush,
      player.rec,
      player.def
    ),
    age = coalesce(
      #age.pass,
      age.rush,
      age.rec,
      age.def
    ),
    pos = coalesce(
      #pos.pass,
      pos.rush,
      pos.rec,
      pos.def
    ),
    g = coalesce(
      #g.pass,
      g.rush,
      g.rec,
      g.def
    ),
    gs = coalesce(
      #gs.pass,
      gs.rush,
      gs.rec,
      gs.def
    )
  ) %>% 
  select(-c(
    starts_with("player."),
    starts_with("age."),
    starts_with("pos."),
    starts_with("g."),
    starts_with("gs.")))

nfl_advancedStatsPFR_weekly <- nfl_advancedStatsPFR_weekly %>% 
  mutate(
    pfr_player_name = coalesce(
      pfr_player_name.pass,
      pfr_player_name.rush,
      pfr_player_name.rec,
      pfr_player_name.def
    ),
    season = coalesce(
      season.pass,
      season.rush,
      season.rec,
      season.def
    ),
    week = coalesce(
      week.pass,
      week.rush,
      week.rec,
      week.def
    ),
    team = coalesce(
      team.pass,
      team.rush,
      team.rec,
      team.def
    ),
    merge_name = coalesce(
      merge_name.pass,
      merge_name.rush,
      merge_name.rec,
      merge_name.def
    ),
    pfr_game_id = coalesce(
      pfr_game_id.pass,
      pfr_game_id.rush,
      pfr_game_id.rec,
      pfr_game_id.def
    ),
    game_type = coalesce(
      game_type.pass,
      game_type.rush,
      game_type.rec,
      game_type.def
    ),
    opponent = coalesce(
      opponent.pass,
      opponent.rush,
      opponent.rec,
      opponent.def
    )
  ) %>% 
  select(-c(
    starts_with("pfr_player_name."),
    starts_with("season."),
    starts_with("week."),
    starts_with("team."),
    starts_with("merge_name."),
    starts_with("pfr_game_id."),
    starts_with("game_type."),
    starts_with("opponent."),
    ))

The nfl_advancedStatsPFR_seasonalByTeam object is in player-season-team form. That is, each row should be uniquely identified by the combination of pfr_id, season, and team. Let’s rearrange the data accordingly:

Code
nfl_advancedStatsPFR_seasonalByTeam <- nfl_advancedStatsPFR_seasonalByTeam %>% 
  select(pfr_id, season, team, pfr_player_name, everything()) %>% 
  arrange(pfr_player_name, pfr_id, season, team)

Let’s check for duplicate player-season-team instances:

Code
nfl_advancedStatsPFR_seasonalByTeam %>% 
  group_by(pfr_id, season, team) %>% 
  filter(n() > 1) %>% 
  head()

Aggregate variables within each pass/rush/rec/def object by team for seasonal data (so seasonal data are in player-season form, not player-season-team form). Depending on the variable, aggregation was performed using a sum, weighted mean (weighted by the number of games played for each team), or a recomputed percentage.

Code
pfrVars <- nfl_advancedStatsPFR_seasonalByTeam %>% 
  select(pass_attempts.pass:m_tkl_percent.def, g, gs) %>% 
  names()

weightedAverageVars <- c(
  "pocket_time.pass",
  "ybc_att.rush","yac_att.rush",
  "ybc_r.rec","yac_r.rec","adot.rec","rat.rec",
  "yds_cmp.def","yds_tgt.def","dadot.def","m_tkl_percent.def","rat.def"
)

recomputeVars <- c(
  "drop_pct.pass", # drops.pass / pass_attempts.pass
  "bad_throw_pct.pass", # bad_throws.pass / pass_attempts.pass
  "on_tgt_pct.pass", # on_tgt_throws.pass / pass_attempts.pass
  "pressure_pct.pass", # times_pressured.pass / pass_attempts.pass
  "drop_percent.rec", # drop.rec / tgt.rec
  "rec_br.rec", # rec.rec / brk_tkl.rec
  "cmp_percent.def" # cmp.def / tgt.def
)

sumVars <- pfrVars[pfrVars %ni% c(
  weightedAverageVars, recomputeVars,
  "merge_name", "loaded.pass", "loaded.rush", "loaded.rec", "loaded.def")]

nfl_advancedStatsPFR_seasonal <- nfl_advancedStatsPFR_seasonalByTeam %>% 
  group_by(pfr_id, merge_name, season) %>% 
  summarise(
    across(all_of(weightedAverageVars), ~ weighted.mean(.x, w = g, na.rm = TRUE)),
    across(all_of(sumVars), ~ sum(.x, na.rm = TRUE)),
    .groups = "drop") %>% 
  mutate(
    drop_pct.pass = drops.pass / pass_attempts.pass,
    bad_throw_pct.pass = bad_throws.pass / pass_attempts.pass,
    on_tgt_pct.pass = on_tgt_throws.pass / pass_attempts.pass,
    pressure_pct.pass = times_pressured.pass / pass_attempts.pass,
    drop_percent.rec = drop.rec / tgt.rec,
    rec_br.rec = drop.rec / tgt.rec,
    cmp_percent.def = cmp.def / tgt.def
  )

# Merge with other player info
nfl_advancedStatsPFR_seasonalByTeam_1stTeam <- nfl_advancedStatsPFR_seasonalByTeam %>% 
  group_by(pfr_id, season) %>% 
  slice(1)

nfl_advancedStatsPFR_seasonalByTeam_1stTeam_mergeVars <- nfl_advancedStatsPFR_seasonalByTeam_1stTeam %>% 
  select(pfr_id, season, team, pfr_player_name, age, pos) #, g, gs

nfl_advancedStatsPFR_seasonal <- nfl_advancedStatsPFR_seasonal %>% 
  left_join(
    nfl_advancedStatsPFR_seasonalByTeam_1stTeam_mergeVars,
    by = c("pfr_id","season")
  ) %>% 
  select(
    pfr_id, season, pfr_player_name, pos, age, team, g, gs, 
    contains(".pass"), contains(".rush"), contains(".rec"), contains(".def"), 
    everything())

Let’s check for duplicate player-season instances:

Code
nfl_advancedStatsPFR_seasonal %>% 
  group_by(pfr_id, season) %>% 
  filter(n() > 1) %>% 
  head()

The nfl_advancedStatsPFR_weekly object is in both game-player form and player-season-week form. That is, each row should be uniquely identified by the combination of pfr_player_id, season, and week or by the combination of pfr_player_id, season, game_type, and week. Let’s rearrange the data accordingly:

Code
nfl_advancedStatsPFR_weekly <- nfl_advancedStatsPFR_weekly %>% 
  select(pfr_player_id, season, week, game_type, game_id, pfr_player_name, everything()) %>% 
  arrange(pfr_player_name, pfr_player_id, season, week)

Let’s check for duplicate game-player or player-season-week instances:

Code
nfl_advancedStatsPFR_weekly %>% 
  arrange(game_id, pfr_player_id) %>% 
  group_by(game_id, pfr_player_id) %>% 
  filter(n() > 1) %>% 
  head()
Code
nfl_advancedStatsPFR_weekly %>% 
  arrange(pfr_player_id, season, week) %>% 
  group_by(pfr_player_id, season, week) %>% 
  filter(n() > 1) %>% 
  head()

Merge with gsis_id for merging with other datasets:

Code
# Prepare data for merging
nfl_advancedStatsPFR_weekly <- nfl_advancedStatsPFR_weekly %>% 
  rename(pfr_id = pfr_player_id)

# Identify duplicates
nfl_advancedStatsPFR_seasonal %>% 
  select(pfr_id, season, age) %>% 
  na.omit() %>% 
  unique() %>% 
  group_by(pfr_id, season) %>% 
  filter(n() > 1) %>% 
  arrange(pfr_id, season) %>% 
  head()
Code
# Merge seasonal data with the player IDs
nfl_advancedStatsPFR_seasonal <- left_join(
  nfl_advancedStatsPFR_seasonal,
  nfl_playerIDs %>% 
    filter(!is.na(pfr_id)) %>% 
    filter(gsis_id != "00-0039137") %>% # drop DL Byron Young, keep OLB Byron Young
    select(pfr_id, gsis_id) %>% 
    unique(),
  by = "pfr_id"
)

# Merge weekly data with the player IDs
nfl_advancedStatsPFR_weekly <- left_join(
  nfl_advancedStatsPFR_weekly,
  nfl_playerIDs %>% 
    filter(!is.na(pfr_id)) %>% 
    filter(gsis_id != "00-0039137") %>% # drop DL Byron Young, keep OLB Byron Young
    select(pfr_id, gsis_id) %>% 
    unique(),
  by = "pfr_id"
)

# Remove distinct players who were given the same `pfr_id` (to allow merging)
nfl_advancedStatsPFR_seasonal$gsis_id[which(nfl_advancedStatsPFR_seasonal$gsis_id == "00-0035665" & nfl_advancedStatsPFR_seasonal$pos %in% c("LB","LILB","RILB"))] <- NA # drop LB David Young, keep DB David Young
#nfl_advancedStatsPFR_weekly$gsis_id[which(nfl_advancedStatsPFR_weekly$gsis_id == "00-0035665" & nfl_advancedStatsPFR_weekly$team %in% c("TEN","MIA"))] <- NA # drop LB David Young, keep DB David Young

nfl_advancedStatsPFR_seasonal$gsis_id[which(nfl_advancedStatsPFR_seasonal$gsis_id == "00-0035292" & nfl_advancedStatsPFR_seasonal$pos %in% c("LB","LILB","RILB"))] <- NA # drop LB David Young, keep DB David Young
nfl_advancedStatsPFR_weekly$gsis_id[which(nfl_advancedStatsPFR_weekly$gsis_id == "00-0035292" & nfl_advancedStatsPFR_weekly$team %in% c("TEN","MIA"))] <- NA # drop LB David Young, keep DB David Young

nfl_advancedStatsPFR_seasonal$gsis_id[which(nfl_advancedStatsPFR_seasonal$gsis_id == "00-0033894" & nfl_advancedStatsPFR_seasonal$pos == "DB")] <- NA # drop S Marcus Williams, keep DB David Young
#nfl_advancedStatsPFR_weekly$gsis_id[which(nfl_advancedStatsPFR_weekly$gsis_id == "00-0033894" & nfl_advancedStatsPFR_weekly$pos == "DB")] <- NA # drop S Marcus Williams

nfl_advancedStatsPFR_seasonal$gsis_id[which(nfl_advancedStatsPFR_seasonal$gsis_id == "00-0038407" & nfl_advancedStatsPFR_seasonal$pos == "DB")] <- NA # drop DB Jaylon Jones, keep CB Jaylon Jones
#nfl_advancedStatsPFR_weekly$gsis_id[which(nfl_advancedStatsPFR_weekly$gsis_id == "00-0038407" & nfl_advancedStatsPFR_weekly$pos == "DB")] <- NA # drop DB Jaylon Jones, keep CB Jaylon Jones

nfl_advancedStatsPFR_seasonal$gsis_id[which(nfl_advancedStatsPFR_seasonal$gsis_id == "00-0037106" & nfl_advancedStatsPFR_seasonal$pos == "DB")] <- NA # drop DB Jaylon Jones, keep CB Jaylon Jones
#nfl_advancedStatsPFR_weekly$gsis_id[which(nfl_advancedStatsPFR_weekly$gsis_id == "00-0037106" & nfl_advancedStatsPFR_weekly$pos == "DB")] <- NA # drop DB Jaylon Jones, keep CB Jaylon Jones

nfl_advancedStatsPFR_seasonal$gsis_id[which(nfl_advancedStatsPFR_seasonal$gsis_id == "00-0038549" & nfl_advancedStatsPFR_seasonal$pos == "WR")] <- NA # drop WR DJ TUrner, keep CB DJ Turner
#nfl_advancedStatsPFR_weekly$gsis_id[which(nfl_advancedStatsPFR_weekly$gsis_id == "00-0038549" & nfl_advancedStatsPFR_weekly$pos == "WR")] <- NA # drop WR DJ TUrner, keep CB DJ Turner

Now, each row of the nfl_advancedStatsPFR_seasonal object should be uniquely identified by the combination of gsis_id (or pfr_id), and season. Each row of the nfl_advancedStatsPFR_weekly object should be uniquely identified by the combination of gsis_id (or pfr_id), season, and week or by the combination of gsis_id (or pfr_id), season, game_type, and week.

Let’s check again for duplicate game-player or player-season-week instances:

Code
# Based on gsis_id
nfl_advancedStatsPFR_seasonal %>% 
  select(gsis_id, everything()) %>% 
  filter(!is.na(gsis_id)) %>% 
  group_by(gsis_id, season) %>% 
  filter(n() > 1) %>% 
  head()
Code
nfl_advancedStatsPFR_weekly %>% 
  select(gsis_id, everything()) %>% 
  filter(!is.na(gsis_id)) %>% 
  arrange(game_id, gsis_id) %>% 
  group_by(game_id, gsis_id) %>% 
  filter(n() > 1) %>% 
  head()
Code
nfl_advancedStatsPFR_weekly %>% 
  select(gsis_id, everything()) %>% 
  filter(!is.na(gsis_id)) %>% 
  arrange(gsis_id, season, week) %>% 
  group_by(gsis_id, season, week) %>% 
  filter(n() > 1) %>% 
  head()
Code
# Based on pfr_id
nfl_advancedStatsPFR_seasonal %>% 
  group_by(pfr_id, season) %>% 
  filter(n() > 1) %>% 
  head()
Code
nfl_advancedStatsPFR_weekly %>% 
  arrange(game_id, pfr_id) %>% 
  group_by(game_id, pfr_id) %>% 
  filter(n() > 1) %>% 
  head()
Code
nfl_advancedStatsPFR_weekly %>% 
  arrange(pfr_id, season, week) %>% 
  group_by(pfr_id, season, week) %>% 
  filter(n() > 1) %>% 
  head()
Code
save(
  nfl_advancedStatsPFR_seasonal,
  file = "./data/nfl_advancedStatsPFR_seasonal.RData"
)

save(
  nfl_advancedStatsPFR_weekly,
  file = "./data/nfl_advancedStatsPFR_weekly.RData"
)

4.3.22 Player Contracts

A Data Dictionary for player contracts data is located at the following link: https://nflreadr.nflverse.com/articles/dictionary_contracts.html

Code
nfl_playerContracts_raw <- progressr::with_progress(
  nflreadr::load_contracts())
Code
save(
  nfl_playerContracts_raw,
  file = "./data/nfl_playerContracts_raw.RData"
)

The nfl_playerContracts object is in player-year-team-value form. That is, each row should be uniquely identified by the combination of otc_id, year_signed, team, and value. Let’s rearrange the data accordingly:

Code
nfl_playerContracts <- nfl_playerContracts_raw %>% 
  select(otc_id, year_signed, team, everything()) %>% 
  arrange(player, otc_id, year_signed, team)

Let’s check for duplicate player-year-team-value instances:

Code
nfl_playerContracts %>% 
  group_by(otc_id, year_signed, team, value) %>% 
  filter(n() > 1) %>% 
  head()
Code
save(
  nfl_playerContracts,
  file = "./data/nfl_playerContracts.RData"
)

4.3.23 FTN Charting Data

A Data Dictionary for FTN Charting data is located at the following link: https://nflreadr.nflverse.com/articles/dictionary_ftn_charting.html

Code
nfl_ftnCharting_raw <- progressr::with_progress(
  nflreadr::load_ftn_charting(seasons = TRUE))
Code
save(
  nfl_ftnCharting_raw,
  file = "./data/nfl_ftnCharting_raw.RData"
)

The nfl_ftnCharting object is in game-play form. That is, each row should be uniquely identified by the combination of nflverse_game_id and play_id. Let’s rearrange the data accordingly:

Code
nfl_ftnCharting <- nfl_ftnCharting_raw %>% 
  select(nflverse_game_id, nflverse_play_id, everything()) %>% 
  arrange(nflverse_game_id, nflverse_play_id)

Let’s check for duplicate game-drive-play instances:

Code
nfl_ftnCharting %>% 
  group_by(nflverse_game_id, nflverse_play_id) %>% 
  filter(n() > 1) %>% 
  head()
Code
save(
  nfl_ftnCharting,
  file = "./data/nfl_ftnCharting.RData"
)

4.3.24 FantasyPros Rankings

A Data Dictionary for FantasyPros ranking data is located at the following link: https://nflreadr.nflverse.com/articles/dictionary_ff_rankings.html

Code
#nfl_rankings_raw <- progressr::with_progress( # currently throws error
#  nflreadr::load_ff_rankings(type = "all"))

nfl_rankings_draft_raw <- progressr::with_progress(
  nflreadr::load_ff_rankings(type = "draft"))

nfl_rankings_weekly_raw <- progressr::with_progress(
  nflreadr::load_ff_rankings(type = "week"))
Code
save(
  nfl_rankings_draft_raw,
  file = "./data/nfl_rankings_draft_raw.RData"
)

save(
  nfl_rankings_weekly_raw,
  file = "./data/nfl_rankings_weekly_raw.RData"
)

The nfl_rankings_draft object is in player-page_type form. That is, each row should be uniquely identified by the player’s id. Let’s rearrange the data accordingly:

Code
nfl_rankings_draft <- nfl_rankings_draft_raw %>% 
  select(id, page_type, player, pos, team, everything()) %>% 
  arrange(player, id, pos, page_type)

Let’s check for duplicate player-page_type instances:

Code
nfl_rankings_draft %>% 
  group_by(id, page_type) %>% 
  filter(n() > 1) %>% 
  head()

The nfl_rankings_weekly object is in player-page form. That is, each row should be uniquely identified by fantasypros_id and page. Let’s rearrange the data accordingly:

Code
nfl_rankings_weekly <- nfl_rankings_weekly_raw %>% 
  select(fantasypros_id, page, player_name, pos, team, everything()) %>% 
  arrange(player_name, fantasypros_id, page, pos)

Let’s check for duplicate player-page instances:

Code
nfl_rankings_weekly %>% 
  group_by(fantasypros_id, page) %>% 
  filter(n() > 1) %>% 
  head()
Code
save(
  nfl_rankings_draft,
  file = "./data/nfl_rankings_draft.RData"
)

save(
  nfl_rankings_weekly,
  file = "./data/nfl_rankings_weekly.RData"
)

4.3.25 Expected Fantasy Points

A Data Dictionary for expected fantasy points data is located at the following link: https://nflreadr.nflverse.com/articles/dictionary_ff_opportunity.html

Code
nfl_expectedFantasyPoints_weekly_raw <- progressr::with_progress(
  nflreadr::load_ff_opportunity(
    seasons = TRUE,
    stat_type = "weekly",
    model_version = "latest"
  ))

nfl_expectedFantasyPoints_pass_raw <- progressr::with_progress(
  nflreadr::load_ff_opportunity(
    seasons = TRUE,
    stat_type = "pbp_pass",
    model_version = "latest"
  ))

nfl_expectedFantasyPoints_rush_raw <- progressr::with_progress(
  nflreadr::load_ff_opportunity(
    seasons = TRUE,
    stat_type = "pbp_rush",
    model_version = "latest"
  ))
Code
nfl_expectedFantasyPoints_weekly_raw <- nflreadr::load_ff_opportunity(
  seasons = TRUE,
  stat_type = "weekly",
  model_version = "latest"
  )

nfl_expectedFantasyPoints_pass_raw <- nflreadr::load_ff_opportunity(
  seasons = TRUE,
  stat_type = "pbp_pass",
  model_version = "latest"
  )

nfl_expectedFantasyPoints_rush_raw <- nflreadr::load_ff_opportunity(
  seasons = TRUE,
  stat_type = "pbp_rush",
  model_version = "latest"
  )
Code
save(
  nfl_expectedFantasyPoints_weekly_raw,
  file = "./data/nfl_expectedFantasyPoints_weekly_raw.RData"
)

save(
  nfl_expectedFantasyPoints_pass_raw,
  nfl_expectedFantasyPoints_rush_raw,
  file = "./data/nfl_expectedFantasyPoints_pbp_raw.RData"
)
Code
nfl_expectedFantasyPoints_pbp_list <- list(
  nfl_expectedFantasyPoints_pass_raw,
  nfl_expectedFantasyPoints_rush_raw)

nfl_expectedFantasyPoints_pbp <- full_join(
  nfl_expectedFantasyPoints_pass_raw,
  nfl_expectedFantasyPoints_rush_raw,
  by = c("game_id","fixed_drive","play_id"),
  suffix = c(".pass", ".rush")
)

The nfl_expectedFantasyPoints_weekly object is in game-player form and in player-season-week form. That is, each row should be uniquely identified by the combination of game_id and player_id. Each row should also be uniquely identified by the combination of player_id, season, and week. Let’s rearrange the data accordingly:

Code
nfl_expectedFantasyPoints_weekly <- nfl_expectedFantasyPoints_weekly_raw %>% 
  select(player_id, season, week, game_id, full_name, posteam, position, everything()) %>% 
  arrange(full_name, player_id, season, week)

Let’s check for duplicate game-player instances and player-season-week instances:

Code
nfl_expectedFantasyPoints_weekly %>% 
  group_by(game_id, player_id) %>% 
  filter(n() > 1) %>% 
  head()
Code
nfl_expectedFantasyPoints_weekly %>% 
  group_by(player_id, season, week) %>% 
  filter(n() > 1) %>% 
  head()

4.3.25.1 Processing

Let’s do some data cleanup:

Code
# Drop players with missing values for player_id
nfl_expectedFantasyPoints_weekly <- nfl_expectedFantasyPoints_weekly %>% 
  filter(!is.na(player_id))

Let’s check again for duplicate game-player instances and season-week-player instances:

Code
nfl_expectedFantasyPoints_weekly %>% 
  group_by(game_id, player_id) %>% 
  filter(n() > 1) %>% 
  head()
Code
nfl_expectedFantasyPoints_weekly %>% 
  group_by(player_id, season, week) %>% 
  filter(n() > 1) %>% 
  head()

The nfl_expectedFantasyPoints_pbp object is in game-drive-play form. That is, each row should be uniquely identified by the combination of game_id, fixed_drive, and play_id. Let’s rearrange the data accordingly:

Code
nfl_expectedFantasyPoints_pbp <- nfl_expectedFantasyPoints_pbp %>% 
  select(game_id, fixed_drive, play_id, everything()) %>% 
  arrange(game_id, fixed_drive, play_id)

Let’s check for duplicate game-player instances and season-week-player instances:

Code
nfl_expectedFantasyPoints_pbp %>% 
  group_by(game_id, fixed_drive, play_id) %>% 
  filter(n() > 1) %>% 
  head()
Code
save(
  nfl_expectedFantasyPoints_weekly,
  file = "./data/nfl_expectedFantasyPoints_weekly.RData"
)

save(
  nfl_expectedFantasyPoints_pbp,
  file = "./data/nfl_expectedFantasyPoints_pbp.RData"
)

4.3.26 Fantasy Football Projections

4.3.26.1 Download Players’ Projections

Note 4.3: Downloading players’ seasonal projections

Note: the following code takes a while to run.

Code
# Seasonal Projections
players_projections_seasonal_raw <- ffanalytics::scrape_data(
  season = NULL, # NULL grabs the current season
  week = 0) # 0 grabs seasonal projections

# Weekly Projections
players_projections_weekly_raw <- ffanalytics::scrape_data(
  season = NULL, # NULL grabs the current season
  week = NULL) # NULL grabs the current week
Code
save(
  players_projections_seasonal_raw,
  file = "./data/players_projections_seasonal_raw.RData"
)

save(
  players_projections_weekly_raw,
  file = "./data/players_projections_weekly_raw.RData"
)

The data file is saved in the project repository and can be loaded using the following command:

Code
load(file = "./data/players_projections_seasonal_raw.RData")
load(file = "./data/players_projections_weekly_raw.RData")

The players_projections_seasonal_raw and players_projections_weekly_raw object is in player-position-projection source form. That is, each row should be uniquely identified by the combination of id, pos and data_src. Each row should also be uniquely identified by the combination of player, pos, and data_src.

Let’s check for duplicate player-position-projection source instances:

Code
players_projections_seasonal_raw %>% 
  bind_rows() %>% 
  select(id, pos, data_src, everything()) %>%
  arrange(player, id, pos, data_src) %>% 
  group_by(id, pos, data_src) %>% 
  filter(n() > 1, !is.na(id)) %>% 
  head()
Code
players_projections_seasonal_raw %>% 
  bind_rows() %>% 
  select(player, pos, data_src, everything()) %>%
  arrange(player, id, pos, data_src) %>% 
  group_by(id, pos, data_src) %>% 
  filter(n() > 1, !is.na(id)) %>% 
  head()
Code
players_projections_weekly_raw %>% 
  bind_rows() %>% 
  select(id, pos, data_src, everything()) %>%
  arrange(player, id, pos, data_src) %>% 
  group_by(id, pos, data_src) %>% 
  filter(n() > 1, !is.na(id)) %>% 
  head()
Code
players_projections_weekly_raw %>% 
  bind_rows() %>% 
  select(player, pos, data_src, everything()) %>%
  arrange(player, id, pos, data_src) %>% 
  group_by(id, pos, data_src) %>% 
  filter(n() > 1, !is.na(id)) %>% 
  head()

4.3.26.2 Specify League Scoring Settings

First, create a scoring object using the default scoring object:

Code
scoring_obj_default <- ffanalytics::scoring

View the default scoring settings:

Code
scoring_obj_default
$pass
$pass$pass_att
[1] 0

$pass$pass_comp
[1] 0

$pass$pass_inc
[1] 0

$pass$pass_yds
[1] 0.04

$pass$pass_tds
[1] 4

$pass$pass_int
[1] -3

$pass$pass_40_yds
[1] 0

$pass$pass_300_yds
[1] 0

$pass$pass_350_yds
[1] 0

$pass$pass_400_yds
[1] 0


$rush
$rush$all_pos
[1] TRUE

$rush$rush_yds
[1] 0.1

$rush$rush_att
[1] 0

$rush$rush_40_yds
[1] 0

$rush$rush_tds
[1] 6

$rush$rush_100_yds
[1] 0

$rush$rush_150_yds
[1] 0

$rush$rush_200_yds
[1] 0


$rec
$rec$all_pos
[1] TRUE

$rec$rec
[1] 0

$rec$rec_yds
[1] 0.1

$rec$rec_tds
[1] 6

$rec$rec_40_yds
[1] 0

$rec$rec_100_yds
[1] 0

$rec$rec_150_yds
[1] 0

$rec$rec_200_yds
[1] 0


$misc
$misc$all_pos
[1] TRUE

$misc$fumbles_lost
[1] -3

$misc$fumbles_total
[1] 0

$misc$sacks
[1] 0

$misc$two_pts
[1] 2


$kick
$kick$xp
[1] 1

$kick$fg_0019
[1] 3

$kick$fg_2029
[1] 3

$kick$fg_3039
[1] 3

$kick$fg_4049
[1] 4

$kick$fg_50
[1] 5

$kick$fg_miss
[1] 0


$ret
$ret$all_pos
[1] TRUE

$ret$return_tds
[1] 6

$ret$return_yds
[1] 0


$idp
$idp$all_pos
[1] TRUE

$idp$idp_solo
[1] 1

$idp$idp_asst
[1] 0.5

$idp$idp_sack
[1] 2

$idp$idp_int
[1] 3

$idp$idp_fum_force
[1] 3

$idp$idp_fum_rec
[1] 2

$idp$idp_pd
[1] 1

$idp$idp_td
[1] 6

$idp$idp_safety
[1] 2


$dst
$dst$dst_fum_rec
[1] 2

$dst$dst_int
[1] 2

$dst$dst_safety
[1] 2

$dst$dst_sacks
[1] 1

$dst$dst_td
[1] 6

$dst$dst_blk
[1] 1.5

$dst$dst_ret_yds
[1] 0

$dst$dst_pts_allowed
[1] 0


$pts_bracket
$pts_bracket[[1]]
$pts_bracket[[1]]$threshold
[1] 0

$pts_bracket[[1]]$points
[1] 10


$pts_bracket[[2]]
$pts_bracket[[2]]$threshold
[1] 6

$pts_bracket[[2]]$points
[1] 7


$pts_bracket[[3]]
$pts_bracket[[3]]$threshold
[1] 20

$pts_bracket[[3]]$points
[1] 4


$pts_bracket[[4]]
$pts_bracket[[4]]$threshold
[1] 34

$pts_bracket[[4]]$points
[1] 0


$pts_bracket[[5]]
$pts_bracket[[5]]$threshold
[1] 99

$pts_bracket[[5]]$points
[1] -4

Now, modify the scoring settings to match our league settings (based on scoring settings for fantasy leagues on NFL.com):

Code
scoring_obj <- scoring_obj_default

# Offense
scoring_obj$pass$pass_int <- -2
scoring_obj$rec$rec <- 1
scoring_obj$misc$fumbles_lost <- -2

# Kickers
scoring_obj$kick$fg_4049 <- 3

# Defense/Special Teams
scoring_obj$pts_bracket <- list(
  list(threshold = 0, points = 10),
  list(threshold = 6, points = 7),
  list(threshold = 13, points = 4),
  list(threshold = 20, points = 1),
  list(threshold = 27, points = 0),
  list(threshold = 34, points = -1),
  list(threshold = 99, points = -4)
)

View our scoring settings:

Code
scoring_obj
$pass
$pass$pass_att
[1] 0

$pass$pass_comp
[1] 0

$pass$pass_inc
[1] 0

$pass$pass_yds
[1] 0.04

$pass$pass_tds
[1] 4

$pass$pass_int
[1] -2

$pass$pass_40_yds
[1] 0

$pass$pass_300_yds
[1] 0

$pass$pass_350_yds
[1] 0

$pass$pass_400_yds
[1] 0


$rush
$rush$all_pos
[1] TRUE

$rush$rush_yds
[1] 0.1

$rush$rush_att
[1] 0

$rush$rush_40_yds
[1] 0

$rush$rush_tds
[1] 6

$rush$rush_100_yds
[1] 0

$rush$rush_150_yds
[1] 0

$rush$rush_200_yds
[1] 0


$rec
$rec$all_pos
[1] TRUE

$rec$rec
[1] 1

$rec$rec_yds
[1] 0.1

$rec$rec_tds
[1] 6

$rec$rec_40_yds
[1] 0

$rec$rec_100_yds
[1] 0

$rec$rec_150_yds
[1] 0

$rec$rec_200_yds
[1] 0


$misc
$misc$all_pos
[1] TRUE

$misc$fumbles_lost
[1] -2

$misc$fumbles_total
[1] 0

$misc$sacks
[1] 0

$misc$two_pts
[1] 2


$kick
$kick$xp
[1] 1

$kick$fg_0019
[1] 3

$kick$fg_2029
[1] 3

$kick$fg_3039
[1] 3

$kick$fg_4049
[1] 3

$kick$fg_50
[1] 5

$kick$fg_miss
[1] 0


$ret
$ret$all_pos
[1] TRUE

$ret$return_tds
[1] 6

$ret$return_yds
[1] 0


$idp
$idp$all_pos
[1] TRUE

$idp$idp_solo
[1] 1

$idp$idp_asst
[1] 0.5

$idp$idp_sack
[1] 2

$idp$idp_int
[1] 3

$idp$idp_fum_force
[1] 3

$idp$idp_fum_rec
[1] 2

$idp$idp_pd
[1] 1

$idp$idp_td
[1] 6

$idp$idp_safety
[1] 2


$dst
$dst$dst_fum_rec
[1] 2

$dst$dst_int
[1] 2

$dst$dst_safety
[1] 2

$dst$dst_sacks
[1] 1

$dst$dst_td
[1] 6

$dst$dst_blk
[1] 1.5

$dst$dst_ret_yds
[1] 0

$dst$dst_pts_allowed
[1] 0


$pts_bracket
$pts_bracket[[1]]
$pts_bracket[[1]]$threshold
[1] 0

$pts_bracket[[1]]$points
[1] 10


$pts_bracket[[2]]
$pts_bracket[[2]]$threshold
[1] 6

$pts_bracket[[2]]$points
[1] 7


$pts_bracket[[3]]
$pts_bracket[[3]]$threshold
[1] 13

$pts_bracket[[3]]$points
[1] 4


$pts_bracket[[4]]
$pts_bracket[[4]]$threshold
[1] 20

$pts_bracket[[4]]$points
[1] 1


$pts_bracket[[5]]
$pts_bracket[[5]]$threshold
[1] 27

$pts_bracket[[5]]$points
[1] 0


$pts_bracket[[6]]
$pts_bracket[[6]]$threshold
[1] 34

$pts_bracket[[6]]$points
[1] -1


$pts_bracket[[7]]
$pts_bracket[[7]]$threshold
[1] 99

$pts_bracket[[7]]$points
[1] -4

4.3.26.3 Calculate Projected Points

Calculate projected points by source:

Code
players_projectedPoints_seasonal <- ffanalytics:::impute_and_score_sources(
  data_result = players_projections_seasonal_raw,
  scoring_rules = scoring_obj)

players_projectedPoints_weekly <- ffanalytics:::impute_and_score_sources(
  data_result = players_projections_weekly_raw,
  scoring_rules = scoring_obj)

Calculate projected statistics and points, averaged across sources:

Code
# Seasonal Projections
players_projectedStatsAverage_seasonal <- ffanalytics::projections_table(
  players_projections_seasonal_raw,
  scoring_rules = scoring_obj,
  return_raw_stats = TRUE)

players_projectedPointsAverage_seasonal <- ffanalytics::projections_table(
  players_projections_seasonal_raw,
  scoring_rules = scoring_obj,
  return_raw_stats = FALSE)

# Weekly Projections
players_projectedStatsAverage_weekly <- ffanalytics::projections_table(
  players_projections_weekly_raw,
  scoring_rules = scoring_obj,
  return_raw_stats = TRUE)

players_projectedPointsAverage_weekly <- ffanalytics::projections_table(
  players_projections_weekly_raw,
  scoring_rules = scoring_obj,
  return_raw_stats = FALSE)

The players_projectedPoints_seasonal, players_projectedStatsAverage_seasonal, players_projectedPointsAverage_seasonal, players_projectedPoints_weekly, players_projectedStatsAverage_weekly, and players_projectedPointsAverage_weekly objects are in player-average type-position form. That is, each row should be uniquely identified by the combination of id, avg_type and pos (or position).

Let’s check for duplicate player-position-projection source instances:

Code
players_projectedPoints_seasonal %>% 
  bind_rows() %>% 
  select(id, pos, data_src, everything()) %>%
  arrange(player, id, pos, data_src) %>% 
  group_by(id, pos, data_src) %>% 
  filter(n() > 1, !is.na(id)) %>% 
  head()
Code
players_projectedStatsAverage_seasonal %>% 
  group_by(id, avg_type, position) %>% 
  filter(n() > 1) %>% 
  head()
Code
players_projectedPointsAverage_seasonal %>% 
  group_by(id, avg_type, pos) %>% 
  filter(n() > 1) %>% 
  head()
Code
players_projectedPoints_weekly %>% 
  bind_rows() %>% 
  select(id, pos, data_src, everything()) %>%
  arrange(player, id, pos, data_src) %>% 
  group_by(id, pos, data_src) %>% 
  filter(n() > 1, !is.na(id)) %>% 
  head()
Code
players_projectedStatsAverage_weekly %>% 
  group_by(id, avg_type, position) %>% 
  filter(n() > 1) %>% 
  head()
Code
players_projectedPointsAverage_weekly %>% 
  group_by(id, avg_type, pos) %>% 
  filter(n() > 1) %>% 
  head()

Let’s merge the two averaged projected statistics and points objects:

Code
players_projections_seasonal_average <- full_join(
  players_projectedPointsAverage_seasonal,
  players_projectedStatsAverage_seasonal,
  by = c("id","avg_type","pos" = "position")
)

players_projections_weekly_average <- full_join(
  players_projectedPointsAverage_weekly,
  players_projectedStatsAverage_weekly,
  by = c("id","avg_type","pos" = "position")
)

Let’s again check for duplicate player-position-projection source instances:

Code
players_projections_seasonal_average %>% 
  group_by(id, avg_type, pos) %>% 
  filter(n() > 1) %>% 
  head()
Code
players_projections_weekly_average %>% 
  group_by(id, avg_type, pos) %>% 
  filter(n() > 1) %>% 
  head()

4.3.26.4 Add Additional Player Information

Code
players_projections_seasonal_average <- players_projections_seasonal_average %>% 
  add_ecr() %>% 
  add_adp() %>% 
  add_aav() %>%
  add_uncertainty() %>% 
  add_player_info()

players_projections_weekly_average <- players_projections_weekly_average %>% 
  add_ecr() %>% 
  #add_uncertainty() %>% # currently throws an error
  add_player_info()
Code
save(
  players_projectedPoints_seasonal, players_projections_seasonal_average,
  file = "./data/players_projectedPoints_seasonal.RData"
)

save(
  players_projectedPoints_weekly, players_projections_weekly_average,
  file = "./data/players_projectedPoints_weekly.RData"
)

The data file is saved in the project repository and can be loaded using the following command:

Code
load(file = "./data/players_projectedPoints_seasonal.RData")
load(file = "./data/players_projectedPoints_weekly.RData")

4.4 Calculations

4.4.1 Historical Actual Player Statistics

In addition to week-by-week actual player statistics, we can also compute historical actual player statistics as a function of different timeframes, including season-by-season and career statistics.

4.4.1.1 Career Statistics

First, we can compute the players’ career statistics using the calculate_player_stats(), calculate_player_stats_def(), and calculate_player_stats_kicking() functions from the nflfastR package for offensive players, defensive players, and kickers, respectively.

Note 4.4: Calculating players’ career statistics

Note: the following code takes a while to run.

Code
nfl_actualStats_offense_career_raw <- nflfastR::calculate_player_stats(
  nfl_pbp,
  weekly = FALSE)

nfl_actualStats_defense_career_raw <- nflfastR::calculate_player_stats_def(
  nfl_pbp,
  weekly = FALSE)

nfl_actualStats_kicking_career_raw <- nflfastR::calculate_player_stats_kicking(
  nfl_pbp,
  weekly = FALSE)
Code
save(
  nfl_actualStats_offense_career_raw, nfl_actualStats_defense_career_raw, nfl_actualStats_kicking_career_raw,
  file = "./data/nfl_actualStats_career_raw.RData"
)

The nfl_actualStats_career objects are in player form. That is, each row should be uniquely identified by the combination of player_id. Let’s rearrange the data accordingly:

Code
nfl_actualStats_offense_career <- nfl_actualStats_offense_career_raw %>% 
  arrange(player_display_name, player_id)

nfl_actualStats_defense_career <- nfl_actualStats_defense_career_raw %>% 
  arrange(player_display_name, player_id)

nfl_actualStats_kicking_career <- nfl_actualStats_kicking_career_raw %>% 
  arrange(player_display_name, player_id)

Let’s check for duplicate player instances:

Code
nfl_actualStats_offense_career %>% 
  group_by(player_id) %>% 
  filter(n() > 1) %>% 
  head()
Code
nfl_actualStats_defense_career %>% 
  group_by(player_id) %>% 
  filter(n() > 1) %>% 
  head()
Code
nfl_actualStats_kicking_career %>% 
  group_by(player_id) %>% 
  filter(n() > 1) %>% 
  head()

Let’s do some data cleanup:

Code
# Sum statistics across a player's years with multiple teams
nfl_actualStats_defense_career <- nfl_actualStats_defense_career %>% 
  group_by(player_id) %>% 
  summarise(
    across(where(is.numeric), ~ sum(.x, na.rm = TRUE)),
    .groups = "drop"
  )

nfl_actualStats_kicking_career <- nfl_actualStats_kicking_career %>% 
  group_by(player_id) %>% 
  summarise(
    across(where(is.numeric), ~ sum(.x, na.rm = TRUE)),
    .groups = "drop"
  )

# Re-calculate percentage stats
nfl_actualStats_kicking_career <- nfl_actualStats_kicking_career %>% 
  mutate(
    fg_pct = fg_made / fg_att,
    pat_pct = pat_made / pat_att
  )

# Merge data back with player info
nfl_actualStats_defense_career_raw_playerInfo <- nfl_actualStats_defense_career_raw %>% 
  select(player_id, player_name, player_display_name, position, position_group, team, headshot_url) %>% 
  unique() %>% 
  group_by(player_id) %>% 
  slice_head(n = 1) %>% 
  ungroup()

nfl_actualStats_kicking_career_raw_playerInfo <- nfl_actualStats_kicking_career_raw %>% 
  select(player_id, player_name, player_display_name, position, position_group, team, headshot_url) %>% 
  unique() %>% 
  group_by(player_id) %>% 
  slice_head(n = 1) %>% 
  ungroup()

nfl_actualStats_defense_career <- nfl_actualStats_defense_career_raw_playerInfo %>% 
  right_join(
    nfl_actualStats_defense_career,
    by = "player_id"
  )

nfl_actualStats_kicking_career <- nfl_actualStats_kicking_career_raw_playerInfo %>% 
  right_join(
    nfl_actualStats_kicking_career,
    by = "player_id"
  )

# Rearrange data
nfl_actualStats_defense_career <- nfl_actualStats_defense_career %>% 
  arrange(player_display_name, player_id)

nfl_actualStats_kicking_career <- nfl_actualStats_kicking_career %>% 
  arrange(player_display_name, player_id)

Let’s check again for duplicate player instances:

Code
nfl_actualStats_offense_career %>% 
  group_by(player_id) %>% 
  filter(n() > 1) %>% 
  head()
Code
nfl_actualStats_defense_career %>% 
  group_by(player_id) %>% 
  filter(n() > 1) %>% 
  head()
Code
nfl_actualStats_kicking_career %>% 
  group_by(player_id) %>% 
  filter(n() > 1) %>% 
  head()

4.4.1.2 Season-by-Season Statistics

Second, we can compute the players’ season-by-season statistics.

Code
seasons <- unique(nfl_pbp$season)

nfl_pbp_seasonalList <- list()
nfl_actualStats_offense_seasonalList <- list()
nfl_actualStats_defense_seasonalList <- list()
nfl_actualStats_kicking_seasonalList <- list()
Note 4.5: Calculating players’ season-by-season statistics

Note: the following code takes a while to run.

Code
pb <- txtProgressBar(
  min = 0,
  max = length(seasons),
  style = 3)

for(i in 1:length(seasons)){
  # Subset play-by-play data by season
  nfl_pbp_seasonalList[[i]] <- nfl_pbp %>% 
    dplyr::filter(season == seasons[i])
  
  # Compute actual statistics by season
  nfl_actualStats_offense_seasonalList[[i]] <- 
    nflfastR::calculate_player_stats(
      nfl_pbp_seasonalList[[i]],
      weekly = FALSE)
  
  nfl_actualStats_defense_seasonalList[[i]] <- 
    nflfastR::calculate_player_stats_def(
      nfl_pbp_seasonalList[[i]],
      weekly = FALSE)
  
  nfl_actualStats_kicking_seasonalList[[i]] <- 
    nflfastR::calculate_player_stats_kicking(
      nfl_pbp_seasonalList[[i]],
      weekly = FALSE)
  
  nfl_actualStats_offense_seasonalList[[i]]$season <- seasons[i]
  nfl_actualStats_defense_seasonalList[[i]]$season <- seasons[i]
  nfl_actualStats_kicking_seasonalList[[i]]$season <- seasons[i]
  
  print(
    paste("Completed computing projections for season: ", seasons[i], sep = ""))
  
  # Update the progress bar
  setTxtProgressBar(pb, i)
}

# Close the progress bar
close(pb)

nfl_actualStats_offense_seasonal_raw <- nfl_actualStats_offense_seasonalList %>% 
  dplyr::bind_rows()
nfl_actualStats_defense_seasonal_raw <- nfl_actualStats_defense_seasonalList %>% 
  dplyr::bind_rows()
nfl_actualStats_kicking_seasonal_raw <- nfl_actualStats_kicking_seasonalList %>% 
  dplyr::bind_rows()
Code
save(
  nfl_actualStats_offense_seasonal_raw, nfl_actualStats_defense_seasonal_raw, nfl_actualStats_kicking_seasonal_raw,
  file = "./data/nfl_actualStats_seasonal_raw.RData"
)

The nfl_actualStats_seasonal objects are in player-season form. That is, each row should be uniquely identified by the combination of player_id and season. Let’s rearrange the data accordingly:

Code
nfl_actualStats_offense_seasonal <- nfl_actualStats_offense_seasonal_raw %>% 
  rename(team = recent_team) %>% 
  select(player_id, season, everything()) %>% 
  arrange(player_display_name, player_id, season)

nfl_actualStats_defense_seasonal <- nfl_actualStats_defense_seasonal_raw %>% 
  select(player_id, season, everything()) %>% 
  arrange(player_display_name, player_id, season)

nfl_actualStats_kicking_seasonal <- nfl_actualStats_kicking_seasonal_raw %>% 
  select(player_id, season, everything()) %>% 
  arrange(player_display_name, player_id, season)

Let’s check for duplicate player instances:

Code
nfl_actualStats_offense_seasonal %>% 
  group_by(player_id, season) %>% 
  filter(n() > 1) %>% 
  head()
Code
nfl_actualStats_defense_seasonal %>% 
  group_by(player_id, season) %>% 
  filter(n() > 1) %>% 
  head()
Code
nfl_actualStats_kicking_seasonal %>% 
  group_by(player_id, season) %>% 
  filter(n() > 1) %>% 
  head()

Let’s do some data cleanup:

Code
# Sum statistics across a player's years with multiple teams
nfl_actualStats_defense_seasonal <- nfl_actualStats_defense_seasonal %>% 
  group_by(player_id, season) %>% 
  summarise(
    across(where(is.numeric), ~ sum(.x, na.rm = TRUE)),
    .groups = "drop"
  )

nfl_actualStats_kicking_seasonal <- nfl_actualStats_kicking_seasonal %>% 
  group_by(player_id, season) %>% 
  summarise(
    across(where(is.numeric), ~ sum(.x, na.rm = TRUE)),
    .groups = "drop"
  )

# Re-calculate percentage stats
nfl_actualStats_kicking_seasonal <- nfl_actualStats_kicking_seasonal %>% 
  mutate(
    fg_pct = fg_made / fg_att,
    pat_pct = pat_made / pat_att
  )

# Merge data back with player info
nfl_actualStats_defense_seasonal_raw_playerInfo <- nfl_actualStats_defense_seasonal_raw %>% 
  select(player_id, season, player_name, player_display_name, position, position_group, team, headshot_url) %>% 
  unique() %>% 
  group_by(player_id) %>% 
  slice_head(n = 1) %>% 
  ungroup()

nfl_actualStats_kicking_seasonal_raw_playerInfo <- nfl_actualStats_kicking_seasonal_raw %>% 
  select(player_id, season, player_name, player_display_name, position, position_group, team, headshot_url) %>% 
  unique() %>% 
  group_by(player_id) %>% 
  slice_head(n = 1) %>% 
  ungroup()

nfl_actualStats_defense_seasonal <- nfl_actualStats_defense_seasonal_raw_playerInfo %>% 
  right_join(
    nfl_actualStats_defense_seasonal,
    by = c("player_id","season")
  )

nfl_actualStats_kicking_seasonal <- nfl_actualStats_kicking_seasonal_raw_playerInfo %>% 
  right_join(
    nfl_actualStats_kicking_seasonal,
    by = c("player_id","season")
  )

# Rearrange data
nfl_actualStats_defense_seasonal <- nfl_actualStats_defense_seasonal %>% 
  select(player_id, season, everything()) %>% 
  arrange(player_display_name, player_id, season)

nfl_actualStats_kicking_seasonal <- nfl_actualStats_kicking_seasonal %>% 
  select(player_id, season, everything()) %>% 
  arrange(player_display_name, player_id, season)

Let’s check again for duplicate player instances:

Code
nfl_actualStats_offense_seasonal %>% 
  group_by(player_id, season) %>% 
  filter(n() > 1) %>% 
  head()
Code
nfl_actualStats_defense_seasonal %>% 
  group_by(player_id, season) %>% 
  filter(n() > 1) %>% 
  head()
Code
nfl_actualStats_kicking_seasonal %>% 
  group_by(player_id, season) %>% 
  filter(n() > 1) %>% 
  head()

4.4.1.3 Week-by-Week Statistics

We already load players’ week-by-week statistics above. Nevertheless, we could compute players’ weekly statistics from the play-by-play data using the following syntax:

Code
nfl_actualStats_offense_weekly <- nflfastR::calculate_player_stats(
  nfl_pbp,
  weekly = TRUE)

nfl_actualStats_defense_weekly <- nflfastR::calculate_player_stats_def(
  nfl_pbp,
  weekly = TRUE)

nfl_actualStats_kicking_weekly <- nflfastR::calculate_player_stats_kicking(
  nfl_pbp,
  weekly = TRUE)

4.4.2 Historical Actual Fantasy Points

Specify scoring settings:

4.4.2.1 Weekly

4.4.2.2 Seasonal

4.4.2.3 Career

4.4.3 Player Age and Experience

4.4.3.1 Weekly

Code
# Reshape from wide to long format
nfl_actualStats_offense_weekly_long <- nfl_actualStats_offense_weekly %>% 
  tidyr::pivot_longer(
    cols = c(team, opponent_team),
    names_to = "role",
    values_to = "team")

# Perform separate inner join operations for the home_team and away_team
nfl_actualStats_offense_weekly_home <- dplyr::inner_join(
  nfl_actualStats_offense_weekly_long,
  nfl_schedules,
  by = c("season","week","team" = "home_team")) %>% 
  mutate(home_away = "home_team")

nfl_actualStats_offense_weekly_away <- dplyr::inner_join(
  nfl_actualStats_offense_weekly_long,
  nfl_schedules,
  by = c("season","week","team" = "away_team")) %>% 
  mutate(home_away = "away_team")

nfl_actualStats_defense_weekly_home <- dplyr::inner_join(
  nfl_actualStats_defense_weekly,
  nfl_schedules,
  by = c("season","week","team" = "home_team")) %>% 
  mutate(home_away = "home_team")

nfl_actualStats_defense_weekly_away <- dplyr::inner_join(
  nfl_actualStats_defense_weekly,
  nfl_schedules,
  by = c("season","week","team" = "away_team")) %>% 
  mutate(home_away = "away_team")

nfl_actualStats_kicking_weekly_home <- dplyr::inner_join(
  nfl_actualStats_kicking_weekly,
  nfl_schedules,
  by = c("season","week","team" = "home_team")) %>% 
  mutate(home_away = "home_team")

nfl_actualStats_kicking_weekly_away <- dplyr::inner_join(
  nfl_actualStats_kicking_weekly,
  nfl_schedules,
  by = c("season","week","team" = "away_team")) %>% 
  mutate(home_away = "away_team")

# Combine the results of the join operations
nfl_actualStats_offense_weekly_schedules_long <- dplyr::bind_rows(
  nfl_actualStats_offense_weekly_home,
  nfl_actualStats_offense_weekly_away)

nfl_actualStats_defense_weekly_schedules_long <- dplyr::bind_rows(
  nfl_actualStats_defense_weekly_home,
  nfl_actualStats_defense_weekly_away)

nfl_actualStats_kicking_weekly_schedules_long <- dplyr::bind_rows(
  nfl_actualStats_kicking_weekly_home,
  nfl_actualStats_kicking_weekly_away)

# Reshape from long to wide
player_game_gameday_offense <- nfl_actualStats_offense_weekly_schedules_long %>%
  dplyr::distinct(player_id, season, week, game_id, home_away, team, gameday) %>% #, .keep_all = TRUE
  tidyr::pivot_wider(
    names_from = home_away,
    values_from = team)

player_game_gameday_defense <- nfl_actualStats_defense_weekly_schedules_long %>%
  dplyr::distinct(player_id, season, week, game_id, home_away, team, gameday) %>% #, .keep_all = TRUE
  tidyr::pivot_wider(
    names_from = home_away,
    values_from = team)

player_game_gameday_kicking <- nfl_actualStats_kicking_weekly_schedules_long %>%
  dplyr::distinct(player_id, season, week, game_id, home_away, team, gameday) %>% #, .keep_all = TRUE
  tidyr::pivot_wider(
    names_from = home_away,
    values_from = team)

# Merge player birthdate and the game date
player_game_birthdate_gameday_offense <- dplyr::left_join(
  player_game_gameday_offense,
  unique(nfl_players[,c("gsis_id","birth_date")]),
  by = c("player_id" = "gsis_id")
)

player_game_birthdate_gameday_defense <- dplyr::left_join(
  player_game_gameday_defense,
  unique(nfl_players[,c("gsis_id","birth_date")]),
  by = c("player_id" = "gsis_id")
)

player_game_birthdate_gameday_kicking <- dplyr::left_join(
  player_game_gameday_kicking,
  unique(nfl_players[,c("gsis_id","birth_date")]),
  by = c("player_id" = "gsis_id")
)

player_game_birthdate_gameday_offense$birth_date <- lubridate::ymd(player_game_birthdate_gameday_offense$birth_date)
player_game_birthdate_gameday_offense$gameday <- lubridate::ymd(player_game_birthdate_gameday_offense$gameday)

player_game_birthdate_gameday_defense$birth_date <- lubridate::ymd(player_game_birthdate_gameday_defense$birth_date)
player_game_birthdate_gameday_defense$gameday <- lubridate::ymd(player_game_birthdate_gameday_defense$gameday)

player_game_birthdate_gameday_kicking$birth_date <- lubridate::ymd(player_game_birthdate_gameday_kicking$birth_date)
player_game_birthdate_gameday_kicking$gameday <- lubridate::ymd(player_game_birthdate_gameday_kicking$gameday)

# Calculate player's age for a given week as the difference between their birthdate and the game date
player_game_birthdate_gameday_offense$age <- lubridate::interval(
  start = player_game_birthdate_gameday_offense$birth_date,
  end = player_game_birthdate_gameday_offense$gameday
) %>% 
  lubridate::time_length(unit = "years")

player_game_birthdate_gameday_defense$age <- lubridate::interval(
  start = player_game_birthdate_gameday_defense$birth_date,
  end = player_game_birthdate_gameday_defense$gameday
) %>% 
  lubridate::time_length(unit = "years")

player_game_birthdate_gameday_kicking$age <- lubridate::interval(
  start = player_game_birthdate_gameday_kicking$birth_date,
  end = player_game_birthdate_gameday_kicking$gameday
) %>% 
  lubridate::time_length(unit = "years")

# Merge with Pro Football Reference Data on Player Age by Season
player_game_birthdate_gameday_offense <- player_game_birthdate_gameday_offense %>% 
  dplyr::left_join(
    nfl_advancedStatsPFR_seasonal %>% filter(!is.na(gsis_id), !is.na(season), !is.na(age)) %>% select(gsis_id, season, age) %>% unique(),
    by = c("player_id" = "gsis_id", "season")
  )

player_game_birthdate_gameday_defense <- player_game_birthdate_gameday_defense %>% 
  dplyr::left_join(
    nfl_advancedStatsPFR_seasonal %>% filter(!is.na(gsis_id), !is.na(season), !is.na(age)) %>% select(gsis_id, season, age) %>% unique(),
    by = c("player_id" = "gsis_id", "season")
  )

player_game_birthdate_gameday_kicking <- player_game_birthdate_gameday_kicking %>% 
  dplyr::left_join(
    nfl_advancedStatsPFR_seasonal %>% filter(!is.na(gsis_id), !is.na(season), !is.na(age)) %>% select(gsis_id, season, age) %>% unique(),
    by = c("player_id" = "gsis_id", "season")
  )

# Set age as first non-missing value from calculation above or from PFR
player_game_birthdate_gameday_offense <- player_game_birthdate_gameday_offense %>% 
  mutate(age = coalesce(age.x, age.y)) %>% 
  select(-age.x, -age.y)

player_game_birthdate_gameday_defense <- player_game_birthdate_gameday_defense %>% 
  mutate(age = coalesce(age.x, age.y)) %>% 
  select(-age.x, -age.y)

player_game_birthdate_gameday_kicking <- player_game_birthdate_gameday_kicking %>% 
  mutate(age = coalesce(age.x, age.y)) %>% 
  select(-age.x, -age.y)

# Calculate ageCentered and ageCenteredQuadratic
player_game_birthdate_gameday_offense$ageCentered20 <- player_game_birthdate_gameday_offense$age - 20
player_game_birthdate_gameday_offense$ageCentered20Quadratic <- player_game_birthdate_gameday_offense$ageCentered20 ^ 2

player_game_birthdate_gameday_defense$ageCentered20 <- player_game_birthdate_gameday_defense$age - 20
player_game_birthdate_gameday_defense$ageCentered20Quadratic <- player_game_birthdate_gameday_defense$ageCentered20 ^ 2

player_game_birthdate_gameday_kicking$ageCentered20 <- player_game_birthdate_gameday_kicking$age - 20
player_game_birthdate_gameday_kicking$ageCentered20Quadratic <- player_game_birthdate_gameday_kicking$ageCentered20 ^ 2

# Merge with player info
player_age_offense <- dplyr::left_join(
  player_game_birthdate_gameday_offense,
  nfl_players %>% select(-birth_date, -team_abbr, - team_seq),
  by = c("player_id" = "gsis_id"))

player_age_defense <- dplyr::left_join(
  player_game_birthdate_gameday_defense,
  nfl_players %>% select(-birth_date, -team_abbr, - team_seq),
  by = c("player_id" = "gsis_id"))

player_age_kicking <- dplyr::left_join(
  player_game_birthdate_gameday_kicking,
  nfl_players %>% select(-birth_date, -team_abbr, - team_seq),
  by = c("player_id" = "gsis_id"))

# Add game_id to weekly stats to facilitate merging
nfl_actualStats_game_offense_weekly <- nfl_actualStats_offense_weekly %>% 
  dplyr::left_join(
    player_age_offense[,c("season","week","player_id","game_id")],
    by = c("season","week","player_id"))

nfl_actualStats_game_defense_weekly <- nfl_actualStats_defense_weekly %>% 
  dplyr::left_join(
    player_age_offense[,c("season","week","player_id","game_id")],
    by = c("season","week","player_id"))

nfl_actualStats_game_kicking_weekly <- nfl_actualStats_kicking_weekly %>% 
  dplyr::left_join(
    player_age_offense[,c("season","week","player_id","game_id")],
    by = c("season","week","player_id"))

# Merge with player weekly stats
player_stats_weekly_offense <- dplyr::full_join(
  player_age_offense %>% select(-position, -position_group),
  nfl_actualStats_game_offense_weekly,
  by = c("season","week","player_id","game_id"))

player_stats_weekly_defense <- dplyr::full_join(
  player_age_defense %>% select(-position, -position_group),
  nfl_actualStats_game_defense_weekly,
  by = c("season","week","player_id","game_id"))

player_stats_weekly_kicking <- dplyr::full_join(
  player_age_kicking %>% select(-position, -position_group),
  nfl_actualStats_game_kicking_weekly,
  by = c("season","week","player_id","game_id"))

player_stats_weekly_offense$total_years_of_experience <- as.integer(player_stats_weekly_offense$years_of_experience)
player_stats_weekly_defense$total_years_of_experience <- as.integer(player_stats_weekly_defense$years_of_experience)
player_stats_weekly_kicking$total_years_of_experience <- as.integer(player_stats_weekly_kicking$years_of_experience)

player_stats_weekly_offense$years_of_experience <- NULL
player_stats_weekly_defense$years_of_experience <- NULL
player_stats_weekly_kicking$years_of_experience <- NULL

distinct_seasons_offense <- player_stats_weekly_offense %>%
  dplyr::select(player_id, season) %>%
  dplyr::distinct() %>% 
  dplyr::left_join(
    nfl_players[,c("gsis_id","years_of_experience")],
    by = c("player_id" = "gsis_id")
  ) %>% 
  dplyr::mutate(total_years_of_experience = as.integer(years_of_experience)) %>% 
  dplyr::select(-years_of_experience)

distinct_seasons_defense <- player_stats_weekly_defense %>%
  dplyr::select(player_id, season) %>%
  dplyr::distinct() %>% 
  dplyr::left_join(
    nfl_players[,c("gsis_id","years_of_experience")],
    by = c("player_id" = "gsis_id")
  ) %>% 
  dplyr::mutate(total_years_of_experience = as.integer(years_of_experience)) %>% 
  dplyr::select(-years_of_experience)

distinct_seasons_kicking <- player_stats_weekly_kicking %>%
  dplyr::select(player_id, season) %>%
  dplyr::distinct() %>% 
  dplyr::left_join(
    nfl_players[,c("gsis_id","years_of_experience")],
    by = c("player_id" = "gsis_id")
  ) %>% 
  dplyr::mutate(total_years_of_experience = as.integer(years_of_experience)) %>% 
  dplyr::select(-years_of_experience)

years_of_experience_offense <- distinct_seasons_offense %>% 
  dplyr::arrange(player_id, -season) %>% 
  dplyr::group_by(player_id) %>%
  dplyr::mutate(years_of_experience = first(total_years_of_experience) - (row_number() - 1)) %>%
  dplyr::ungroup()

years_of_experience_defense <- distinct_seasons_defense %>% 
  dplyr::arrange(player_id, -season) %>% 
  dplyr::group_by(player_id) %>%
  dplyr::mutate(years_of_experience = first(total_years_of_experience) - (row_number() - 1)) %>%
  dplyr::ungroup()

years_of_experience_kicking <- distinct_seasons_kicking %>% 
  dplyr::arrange(player_id, -season) %>% 
  dplyr::group_by(player_id) %>%
  dplyr::mutate(years_of_experience = first(total_years_of_experience) - (row_number() - 1)) %>%
  dplyr::ungroup()

years_of_experience_offense$years_of_experience[which(years_of_experience_offense$years_of_experience < 0)] <- 0
years_of_experience_defense$years_of_experience[which(years_of_experience_defense$years_of_experience < 0)] <- 0
years_of_experience_kicking$years_of_experience[which(years_of_experience_kicking$years_of_experience < 0)] <- 0

player_stats_weekly_offense <- player_stats_weekly_offense %>% 
  dplyr::left_join(
    years_of_experience_offense[,c("player_id","season","years_of_experience")],
    by = c("player_id","season")
  )

player_stats_weekly_defense <- player_stats_weekly_defense %>% 
  dplyr::left_join(
    years_of_experience_offense[,c("player_id","season","years_of_experience")],
    by = c("player_id","season")
  )

player_stats_weekly_kicking <- player_stats_weekly_kicking %>% 
  dplyr::left_join(
    years_of_experience_offense[,c("player_id","season","years_of_experience")],
    by = c("player_id","season")
  )

The player_stats_weekly objects are in player-season-week form. That is, each row should be uniquely identified by the combination of player_id, season, and week. Let’s rearrange the data accordingly:

Code
player_stats_weekly_offense <- player_stats_weekly_offense %>% 
  arrange(player_display_name, player_id, season, week)

player_stats_weekly_defense <- player_stats_weekly_defense %>% 
  arrange(player_display_name, player_id, season, week)

player_stats_weekly_kicking <- player_stats_weekly_kicking %>% 
  arrange(player_display_name, player_id, season, week)

Let’s check for duplicate player-season-week instances:

Code
player_stats_weekly_offense %>% 
  group_by(player_id, season, week) %>% 
  filter(n() > 1) %>% 
  head()
Code
player_stats_weekly_defense %>% 
  group_by(player_id, season, week) %>% 
  filter(n() > 1) %>% 
  head() #todo: fix duplicate instances
Code
player_stats_weekly_kicking %>% 
  group_by(player_id, season, week) %>% 
  filter(n() > 1) %>% 
  head() #todo: fix duplicate instances
Code
# Save data
save(
  player_stats_weekly_offense, player_stats_weekly_defense, player_stats_weekly_kicking,
  file = "./data/player_stats_weekly.RData"
)

4.4.3.2 Seasonal

Code
# Merge player info with seasonal stats
player_stats_seasonal_offense <- dplyr::full_join(
  nfl_actualStats_offense_seasonal,
  nfl_players %>% select(-position, -position_group, -team_abbr, - team_seq),
  by = c("player_id" = "gsis_id")
)

player_stats_seasonal_defense <- dplyr::full_join(
  nfl_actualStats_defense_seasonal,
  nfl_players %>% select(-position, -position_group, -team_abbr, - team_seq),
  by = c("player_id" = "gsis_id")
)

player_stats_seasonal_kicking <- dplyr::full_join(
  nfl_actualStats_kicking_seasonal,
  nfl_players %>% select(-position, -position_group, -team_abbr, - team_seq),
  by = c("player_id" = "gsis_id")
)

# Calculate age
season_startdate <- nfl_schedules %>% 
  dplyr::group_by(season) %>% 
  dplyr::summarise(startdate = min(gameday, na.rm = TRUE))

player_stats_seasonal_offense <- player_stats_seasonal_offense %>% 
  dplyr::left_join(
    season_startdate,
    by = "season"
  )

player_stats_seasonal_defense <- player_stats_seasonal_defense %>% 
  dplyr::left_join(
    season_startdate,
    by = "season"
  )

player_stats_seasonal_kicking <- player_stats_seasonal_kicking %>% 
  dplyr::left_join(
    season_startdate,
    by = "season"
  )

player_stats_seasonal_offense$age <- lubridate::interval(
  start = player_stats_seasonal_offense$birth_date,
  end = player_stats_seasonal_offense$startdate
) %>% 
  lubridate::time_length(unit = "years")

player_stats_seasonal_defense$age <- lubridate::interval(
  start = player_stats_seasonal_defense$birth_date,
  end = player_stats_seasonal_defense$startdate
) %>% 
  lubridate::time_length(unit = "years")

player_stats_seasonal_kicking$age <- lubridate::interval(
  start = player_stats_seasonal_kicking$birth_date,
  end = player_stats_seasonal_kicking$startdate
) %>% 
  lubridate::time_length(unit = "years")

# Merge with Pro Football Reference Data on Player Age by Season
player_stats_seasonal_offense <- player_stats_seasonal_offense %>% 
  dplyr::left_join(
    nfl_advancedStatsPFR_seasonal %>% filter(!is.na(gsis_id), !is.na(season), !is.na(age)) %>% select(gsis_id, season, age) %>% unique(),
    by = c("player_id" = "gsis_id", "season")
  )

player_stats_seasonal_defense <- player_stats_seasonal_defense %>% 
  dplyr::left_join(
    nfl_advancedStatsPFR_seasonal %>% filter(!is.na(gsis_id), !is.na(season), !is.na(age)) %>% select(gsis_id, season, age) %>% unique(),
    by = c("player_id" = "gsis_id", "season")
  )

player_stats_seasonal_kicking <- player_stats_seasonal_kicking %>% 
  dplyr::left_join(
    nfl_advancedStatsPFR_seasonal %>% filter(!is.na(gsis_id), !is.na(season), !is.na(age)) %>% select(gsis_id, season, age) %>% unique(),
    by = c("player_id" = "gsis_id", "season")
  )

# Set age as first non-missing value from calculation above or from PFR
player_stats_seasonal_offense <- player_stats_seasonal_offense %>% 
  mutate(age = coalesce(age.x, age.y)) %>% 
  select(-age.x, -age.y)

player_stats_seasonal_defense <- player_stats_seasonal_defense %>% 
  mutate(age = coalesce(age.x, age.y)) %>% 
  select(-age.x, -age.y)

player_stats_seasonal_kicking <- player_stats_seasonal_kicking %>% 
  mutate(age = coalesce(age.x, age.y)) %>% 
  select(-age.x, -age.y)

# Calculate ageCentered and ageCenteredQuadratic
player_stats_seasonal_offense$ageCentered20 <- player_stats_seasonal_offense$age - 20
player_stats_seasonal_offense$ageCentered20Quadratic <- player_stats_seasonal_offense$ageCentered20 ^ 2

player_stats_seasonal_defense$ageCentered20 <- player_stats_seasonal_defense$age - 20
player_stats_seasonal_defense$ageCentered20Quadratic <- player_stats_seasonal_defense$ageCentered20 ^ 2

player_stats_seasonal_kicking$ageCentered20 <- player_stats_seasonal_kicking$age - 20
player_stats_seasonal_kicking$ageCentered20Quadratic <- player_stats_seasonal_kicking$ageCentered20 ^ 2

# Years of experience
player_stats_seasonal_offense$years_of_experience <- NULL
player_stats_seasonal_defense$years_of_experience <- NULL
player_stats_seasonal_kicking$years_of_experience <- NULL

player_stats_seasonal_offense <- player_stats_seasonal_offense %>% 
  dplyr::left_join(
    years_of_experience_offense[,c("player_id","season","years_of_experience")],
    by = c("player_id","season")
  )

player_stats_seasonal_defense <- player_stats_seasonal_defense %>% 
  dplyr::left_join(
    years_of_experience_offense[,c("player_id","season","years_of_experience")],
    by = c("player_id","season")
  )

player_stats_seasonal_kicking <- player_stats_seasonal_kicking %>% 
  dplyr::left_join(
    years_of_experience_offense[,c("player_id","season","years_of_experience")],
    by = c("player_id","season")
  )

The player_stats_seasonal objects are in player-season form. That is, each row should be uniquely identified by the combination of player_id and season. Let’s rearrange the data accordingly:

Code
player_stats_seasonal_offense <- player_stats_seasonal_offense %>% 
  select(player_id, season, everything()) %>% 
  arrange(player_display_name, player_id, season)

player_stats_seasonal_defense <- player_stats_seasonal_defense %>% 
  select(player_id, season, everything()) %>% 
  arrange(player_display_name, player_id, season)

player_stats_seasonal_kicking <- player_stats_seasonal_kicking %>% 
  select(player_id, season, everything()) %>% 
  arrange(player_display_name, player_id, season)

Let’s check for duplicate player-season instances:

Code
player_stats_seasonal_offense %>% 
  group_by(player_id, season) %>% 
  filter(n() > 1) %>% 
  head()
Code
player_stats_seasonal_defense %>% 
  group_by(player_id, season) %>% 
  filter(n() > 1) %>% 
  head()
Code
player_stats_seasonal_kicking %>% 
  group_by(player_id, season) %>% 
  filter(n() > 1) %>% 
  head()
Code
# Save data
save(
  player_stats_seasonal_offense, player_stats_seasonal_defense, player_stats_seasonal_kicking,
  file = "./data/player_stats_seasonal.RData"
)

4.5 Session Info

Code
sessionInfo()
R version 4.4.2 (2024-10-31)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 22.04.5 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
 [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
 [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
[10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   

time zone: UTC
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] forcats_1.0.0          stringr_1.5.1          dplyr_1.1.4           
 [4] purrr_1.0.2            readr_2.1.5            tidyr_1.3.1           
 [7] tibble_3.2.1           ggplot2_3.5.1          tidyverse_2.0.0       
[10] lubridate_1.9.3        progressr_0.15.0       nflplotR_1.4.0        
[13] nfl4th_1.0.4           nflfastR_4.6.1         nflreadr_1.4.1        
[16] petersenlab_1.1.0      ffanalytics_3.1.2.0006

loaded via a namespace (and not attached):
 [1] DBI_1.2.3          mnormt_2.1.1       gridExtra_2.3      httr2_1.0.6       
 [5] readxl_1.4.3       rlang_1.1.4        magrittr_2.0.3     snakecase_0.11.1  
 [9] furrr_0.3.1        compiler_4.4.2     mgcv_1.9-1         vctrs_0.6.5       
[13] reshape2_1.4.4     quadprog_1.5-8     rvest_1.0.4        pkgconfig_2.0.3   
[17] fastmap_1.2.0      backports_1.5.0    pbivnorm_0.6.0     utf8_1.2.4        
[21] promises_1.3.0     rmarkdown_2.29     tzdb_0.4.0         ps_1.8.1          
[25] xfun_0.49          cachem_1.1.0       jsonlite_1.8.9     later_1.3.2       
[29] rrapply_1.2.7      psych_2.4.6.26     parallel_4.4.2     lavaan_0.6-19     
[33] cluster_2.1.6      R6_2.5.1           stringi_1.8.4      RColorBrewer_1.1-3
[37] parallelly_1.39.0  rpart_4.1.23       cellranger_1.1.0   xgboost_1.7.8.1   
[41] Rcpp_1.0.13-1      knitr_1.49         base64enc_0.1-3    splines_4.4.2     
[45] Matrix_1.7-1       nnet_7.3-19        timechange_0.3.0   tidyselect_1.2.1  
[49] rstudioapi_0.17.1  yaml_2.3.10        codetools_0.2-20   websocket_1.4.2   
[53] curl_6.0.1         processx_3.8.4     listenv_0.9.1      lattice_0.22-6    
[57] plyr_1.8.9         withr_3.0.2        evaluate_1.0.1     foreign_0.8-87    
[61] future_1.34.0      xml2_1.3.6         pillar_1.9.0       checkmate_2.3.2   
[65] stats4_4.4.2       generics_0.1.3     chromote_0.3.1     hms_1.1.3         
[69] mix_1.0-12         munsell_0.5.1      scales_1.3.0       globals_0.16.3    
[73] xtable_1.8-4       glue_1.8.0         janitor_2.2.0      Hmisc_5.2-0       
[77] tools_4.4.2        data.table_1.16.2  mvtnorm_1.3-2      grid_4.4.2        
[81] mitools_2.4        colorspace_2.1-1   nlme_3.1-166       htmlTable_2.4.3   
[85] Formula_1.2-5      cli_3.6.3          rappdirs_0.3.3     fansi_1.0.6       
[89] viridisLite_0.4.2  gt_0.11.1          gtable_0.3.6       fastrmodels_1.0.2 
[93] digest_0.6.37      htmlwidgets_1.6.4  memoise_2.0.1      htmltools_0.5.8.1 
[97] lifecycle_1.0.4    httr_1.4.7        

Feedback

Please consider providing feedback about this textbook, so that I can make it as helpful as possible. You can provide feedback at the following link: https://forms.gle/LsnVKwqmS1VuxWD18

Email Notification

The online version of this book will remain open access. If you want to know when the print version of the book is for sale, enter your email below so I can let you know.