I want your feedback to make the book better for you and other readers. If you find typos, errors, or places where the text may be improved, please let me know. The best ways to provide feedback are by GitHub or hypothes.is annotations.
You can leave a comment at the bottom of the page/chapter, or open an issue or submit a pull request on GitHub: https://github.com/isaactpetersen/Fantasy-Football-Analytics-Textbook
Alternatively, you can leave an annotation using hypothes.is.
To add an annotation, select some text and then click the
symbol on the pop-up menu.
To see the annotations of others, click the
symbol in the upper right-hand corner of the page.
20 Modern Portfolio Theory
20.1 Getting Started
20.1.1 Load Packages
20.1.2 Load Data
20.2 Overview
20.3 Fantasy Football is Like Stock Picking
Selecting players for your fantasy team is like picking stocks. In both fantasy football and the stock market, your goal is to pick assets (i.e., players/stocks) that will perform best and that others undervalue. But what is the best way to do that? Below, we discuss approaches to picking players/stocks.
20.3.1 The Wisdom of the Crowd (or Market)
In picking players, there are various approaches one could take. You could do lots of research to pick players/stocks with strong fundamentals that you think will do particularly well next year. By picking these players/stocks, you are predicting that they will outperform their expectations. However, all of your information is likely already reflected in the current valuation of the player/stock, so your prediction is basically a gamble. This is evidenced by the fact that people do not reliably beat the crowd/market.
Even so-called experts do not beat the market reliably. There is little consistency in the performance of mutual fund managers over time. In the book, “The Drunkard’s Walk: How Randomness Rules Our Lives”, Mlodinow (2008) reported essentially no correlation between performance of the top mutual funds in a five-year period with their performance over the subsequent five years. That is, the best funds in a one period were not necessarily the best funds in another period. This suggests that mutual fund managers differ in great part because of luck or chance rather than reliable skill. In any given year, some mutual funds will do better than other mutual funds. But this overperformance in a given year likely reflects more randomness than skill. That is likely why a cat beat professional investors in a stock market challenge [Goldstein (2013); archived at https://perma.cc/R3XU-K6J8]. That is, “few stock pickers, if any, have the skill needed to beat the market consistently, year after year” (Kahneman, 2011, p. 214). Although our sample size is much smaller with fantasy football projections, there also appears to be little consistency in fantasy football sites’ rank in accuracy over time [INSERT], suggesting that the projection sources are not reliably better than each other (or the crowd) over time.
The market reflects all of the knowledge of the crowd. One common misconception is that if you go with the market, you will receive “average” returns (by “average”, I mean that you will be in the 50th percentile among investors). This is not true—it has been shown that most mutual funds (about 80%) underperform the average returns of the stock market. So, by going with the market average, you will likely perform better than the “average” fund/investor. Consistent with this, crowd-averaged fantasy football projections tend to be more accurate than any individual’s projection: INSERT This evidence is consistent with the notion of the wisdom of the crowd, described in Section 26.3. Moreover, even if the stock market is relatively accurate (“efficient”) in terms of valuing stocks based on all (publicly) available information (i.e., the efficient market hypothesis), your fantasy football league is likely not. Thus, it may be effective to use crowd-based projections to identify players who are undervalued by your league.
20.3.2 Diversification
Modern portfolio theory (mean-variance theory) is a framework for determining the optimal composition of an investment portfolio to maximize expected returns for a given level of risk. Here, risk refers to the variability (e.g., standard deviation or variance) of returns across time. Given two portfolios with the same expected returns over time, people will prefer the “safer” portfolio—that is, the portfolio with less variability/volatility across time. One of the powerful notions of modern portfolio theory is that, through diversification, one can achieve lower risk with the same expected returns. In investing, diversification involves owning multiple asset classes (e.g., domestic and international stocks and bonds), with the goal of having asset classes that are either uncorrelated or negatively correlated. That is, owning different types of assets is safer than owning only one type. If you have too much money in one asset and that asset tanks, you will lose your money. In other words, you do not want to put all of your eggs in one basket. By owning different asset classes, you can limit your downside risk without sacrificing much in terms of expected return.
This lesson can also apply to fantasy football. When assembling a team, you are essentially putting together a portfolio of assets (i.e., team of players). As with stocks, each player has an expected return (i.e., projection) and a degree of risk. In fantasy football, a player’s risk might be quantified in terms of the variability of projected scores for a player across projection sources (e.g., Projection Source A, Source B, Souce C, etc.), or as historical game-to-game variability. Variability of projected scores for a player across projection sources could reflect the uncertainty of projections for a player. Variability of historical (actual) fantasy points across games could reflect many factors, including risks due to injuries, situational changes (e.g., being traded to a new team or changes in team composition such as due to the acquisition of new players on the team), game scripts, and the tendency for the player to be “boom-or-bust” (e.g., if they are highly dependent on scoring touchdowns or long receptions for fantasy points). All things equal, we want to minimize our risk for a given level of expected returns. That way, we have the best chance of winning any given week. For the same level of expected returns, higher risk teams might have a few amazing games, but their teams might fall flat in other weeks. That is, for a given (high) rate of return, you are best off in the long run (i.e., over the course of a season) with a lower risk team compared to a higher risk team [Hitchings (2012); archived at https://perma.cc/NE35-G6LR].
In terms of diversification, it can be helpful to diversify in multiple ways. First, it can be helpful not to rely on just one or two “stud” players. If they are on bye or have a down week, your team is more likely to suffer. Also, there are risks in picking multiple offensive players from the same team. If you draft your starting Quarterback and Wide Receiver from the same team (e.g., the Cowboys), you are exposing your fantasy team to considerable risk. For instance, if you have the Quarterback and Wide Receiver from the same team, and the team has a poor offensive outing, that will have a greater impact. You can limit your downside risk by diversifying—drafting players from different teams. That way if the Cowboys’ offense does poorly in a given week, your fantasy team will not be as affected. Having multiple players on a juggernaut offense can be a boon, but it can be challenging to predict which offense will lead the league.
However, sometimes having two players on the same team might be beneficial because some positions may be uncorrelated or even negatively correlated, which can also reduce risk. For instance, the performance of the Tight End and Running Back on the same team tends to be slightly negatively correlated, so it might not be a bad idea to start the Tight End and Running Back from the same team. For a correlation matrix of all positions on the team, see: https://assets-global.website-files.com/5f1af76ed86d6771ad48324b/607a4434a565aa7763bd1312_AndyAsh-Sharpstack-RPpaper.pdf [Sherman & Goldner (2021); archived at https://perma.cc/JQ6G-KSRT].
Another important idea from modern portfolio theory is that, if you want to achieve higher returns, you may be able to by accepting additional—and the right combination of—risk. In general, risk is positively correlated with return. That is, receiving higher returns generally requires taking on additional risk—at least as long as we stay along the efficient frontier, described next.
20.4 The Efficient Frontier of a Stock Portfolio
The ultimate goal in fantasy football is to draft players for your starting lineup that provide the most projected points (i.e., the highest returns) and the smallest downside risk. That is, your goal is to achieve the optimal portfolio at a given level of risk, depending on how much risk you are willing to tolerate. One of the key tools in modern portfolio theory for identifying the optimal portfolio (for a given risk level) is the efficient frontier. The efficient frontier is a visual depiction of the maximum expected returns for a given level of risk (where risk is the variability in returns over time). The efficient frontier is helpful for identifying the optimal portfolio—the optimal combination and weighting of assets—for a given risk level. Anything below the efficient frontier is considered inefficient (i.e., lower-than-maximum returns for a given level of risk).
In the example below, we use historical returns (since 2012) as the expected future returns. However, using historical returns as the expected future returns is risky because, as described in the common disclaimer, “Past performance does not guarantee future results.” If you select a relatively short period of historical returns, you may be selecting a period when the stock performed particularly well. When evaluating historical returns it is preferable to evaluate long time horizons and to evaluate how the stock performed during period of both boom (i.e., “bull markets”) and bust (i.e., “bear markets”, such as in a recession).
20.4.1 Download Historical Stock Prices
We download historical stock prices using the quantmod
package (Ryan & Ulrich, 2024):
20.4.2 Calculate Stock Returns
20.4.3 Create Portfolio
We use the fPortfolio
package (Wuertz et al., 2023) to determine the optimal portfolio.
20.4.4 Determine the Efficient Frontier
Code
Title:
MV Portfolio Frontier
Estimator: covEstimator
Solver: solveRquadprog
Optimize: minRisk
Constraints: LongOnly
Portfolio Points: 5 of 1000
Portfolio Weights:
AAPL.Close MSFT.Close GOOGL.Close AMZN.Close META.Close V.Close DIS.Close
1 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 1.0000
250 0.0978 0.1724 0.0893 0.0596 0.0116 0.3566 0.1146
500 0.0000 0.2002 0.0000 0.1282 0.1071 0.2930 0.0000
750 0.0000 0.0660 0.0000 0.1473 0.1989 0.0000 0.0000
1000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
NKE.Close TSLA.Close
1 0.0000 0.0000
250 0.0643 0.0337
500 0.0000 0.2715
750 0.0000 0.5878
1000 0.0000 1.0000
Covariance Risk Budgets:
AAPL.Close MSFT.Close GOOGL.Close AMZN.Close META.Close V.Close DIS.Close
1 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 1.0000
250 0.0978 0.1781 0.0895 0.0641 0.0127 0.3556 0.0985
500 0.0000 0.1400 0.0000 0.1046 0.0954 0.1759 0.0000
750 0.0000 0.0224 0.0000 0.0663 0.1036 0.0000 0.0000
1000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
NKE.Close TSLA.Close
1 0.0000 0.0000
250 0.0559 0.0479
500 0.0000 0.4841
750 0.0000 0.8077
1000 0.0000 1.0000
Target Returns and Risks:
mean Cov CVaR VaR
1 0.0005 0.0167 0.0381 0.0237
250 0.0009 0.0131 0.0310 0.0202
500 0.0013 0.0171 0.0389 0.0268
750 0.0018 0.0255 0.0565 0.0380
1000 0.0022 0.0365 0.0787 0.0520
Description:
Wed Jul 2 00:38:17 2025 by user:
Code
# Extract the coordinates of individual assets
asset_means <- colMeans(returns)
asset_sd <- apply(returns, 2, sd)
# Add some padding to plot limits (so ticker symbols don't get cut off)
xlim <- range(asset_sd) * c(0.9, 1.1)
ylim <- range(asset_means) * c(0.9, 1.1)
xlim[1] <- 0
ylim[1] <- 0
# Set scientific notation penalty
options(scipen = 999)
plot(
efficientFrontier,
which = c(
1, # efficient frontier
3, # tangency portfolio
4), # risk/return of individual assets
control = list(
xlim = xlim,
ylim = ylim
))
# Add text labels for individual assets
points(
asset_sd,
asset_means,
col = "red",
pch = 19)
text(
asset_sd,
asset_means,
labels = symbols,
pos = 4,
cex = 0.8,
col = "black")
20.4.5 Identify the Optimal Weights
20.4.5.1 Tangency Portfolio
The tangency portfolio is the portfolio with the highest Sharpe ratio (i.e., the highest ratio of return to risk). In other words, it is the portfolio with the greatest risk-adjusted returns.
Code
# Find the tangency portfolio (portfolio with the highest Sharpe ratio)
tangencyPortfolio <- fPortfolio::tangencyPortfolio(
data = returns_ts,
spec = portfolioSpec)
# Extract optimal weights
tangencyPortfolio_optimalWeights <- fPortfolio::getWeights(tangencyPortfolio)
tangencyPortfolio_optimalWeights
AAPL.Close MSFT.Close GOOGL.Close AMZN.Close META.Close V.Close
0.0000000 0.2244047 0.0000000 0.1236976 0.0883652 0.3554384
DIS.Close NKE.Close TSLA.Close
0.0000000 0.0000000 0.2080940
Title:
MV Tangency Portfolio
Estimator: covEstimator
Solver: solveRquadprog
Optimize: minRisk
Constraints: LongOnly
Portfolio Weights:
AAPL.Close MSFT.Close GOOGL.Close AMZN.Close META.Close V.Close
0.0000 0.2244 0.0000 0.1237 0.0884 0.3554
DIS.Close NKE.Close TSLA.Close
0.0000 0.0000 0.2081
Covariance Risk Budgets:
AAPL.Close MSFT.Close GOOGL.Close AMZN.Close META.Close V.Close
0.0000 0.1795 0.0000 0.1120 0.0859 0.2525
DIS.Close NKE.Close TSLA.Close
0.0000 0.0000 0.3701
Target Returns and Risks:
mean Cov CVaR VaR
0.0012 0.0159 0.0363 0.0247
Description:
Wed Jul 2 00:38:17 2025 by user:
20.4.5.2 Portfolio with Max Return at a Given Risk Level
Code
# Define target risk levels
targetRisks <- seq(0, 0.3, by = 0.01)
# Initialize storage for optimal portfolios
optimalPortfolios <- list()
optimalWeights_list <- list()
# Find optimal weightings for each target risk level
for (risk in targetRisks) {
# Create a portfolio optimization specification with the target risk
portfolioSpec <- fPortfolio::portfolioSpec()
fPortfolio::setTargetRisk(portfolioSpec) <- risk
# Solve for the maximum return at this target risk
optimal_portfolio <- fPortfolio::maxreturnPortfolio(
returns_ts,
spec = portfolioSpec)
# Store the optimal portfolio
optimalPortfolios[[as.character(risk)]] <- optimal_portfolio
# Store the optimal portfolio weights with risk level
optimal_weights <- fPortfolio::getWeights(optimal_portfolio)
optimalWeights_list[[as.character(risk)]] <- c(RiskLevel = risk, optimal_weights)
}
optimalWeightsByRisk <- dplyr::bind_rows(optimalWeights_list)
optimalWeightsByRisk
20.5 The Efficient Frontier of a Fantasy Team
In fantasy football, the efficient frontier can be helpful for identifying the optimal players to draft for a given risk level (and potentially within the salary cap). It can also be helpful for identifying potential trades. In this way, modern portfolio theory and the efficient frontier can be helpful for arbitrage—buying and selling the same asset (in this case, player) to take advantage of different prices for the same asset. That is, you could buy low and, for players who outperform expectations, sell high—in the form of a trade.
20.5.1 Based on Variability Across Projection Sources
We can examine the efficient frontier of fantasy performance based on the mean and variability of players’ projected fantasy points across projection sources.
Code
all_proj_summary <- all_proj %>%
group_by(id) %>%
summarise(
mean = mean(projectedPoints, na.rm = TRUE),
sd = sd(projectedPoints, na.rm = TRUE),
var = var(projectedPoints, na.rm = TRUE)
)
all_proj_summary <- all_proj_summary %>%
left_join(
nfl_playerIDs[,c("mfl_id","name","merge_name","position","team")],
by = c("id" = "mfl_id")
) %>%
select(name, team, position, everything()) %>%
arrange(-mean)
As shown below, the correlation between mean and standard deviation of projected points is positive. That is, greater expected returns (i.e., mean of projected fantasy points) are associated with greater risk or volatility (i.e., standard deviation of projected fantasy points). Thus, to obtain a lineup that scores greater fantasy points, it may be necessary to take on greater risk. Moreover, you would not want take take on more risk for the same number of expected fantasy points; in addition, you would not want to score fewer fantasy points for the same amount of risk [Hitchings (2012); archived at https://perma.cc/NE35-G6LR].
Pearson's product-moment correlation
data: mean and sd
t = 10.273, df = 995, p-value < 0.00000000000000022
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.2524313 0.3647383
sample estimates:
cor
0.3096644
A scatterplot of the expected returns versus risk, based on projected fantasy points, is in Figure 20.2.
Code
plot_projectedPointsMeanVsSD <- ggplot2::ggplot(
data = all_proj_summary,
aes(
x = sd,
y = mean)) +
geom_point(
aes(
text = name, # add player name for mouse over tooltip
label = position)) + # add season for mouse over tooltip
#geom_smooth() +
coord_cartesian(
xlim = c(0,NA),
ylim = c(0,NA),
expand = FALSE) +
labs(
x = "Standard Deviation of Player's Projected Fantasy Points (Season) Across Sources",
y = "Mean of Player's Projected Fantasy Points (Season) Across Sources",
title = "Mean vs Standard Deviation of Players' Projected Fantasy Points"
) +
theme_classic()
plotly::ggplotly(plot_projectedPointsMeanVsSD)
Now let’s consider the expected returns versus risk for a combination of players. The formula for the variance (and standard deviation) of the sum of two variables is in Equation 20.1:
\[ \begin{aligned} \text{Var}(x + y) &= \text{Var}(x) + \text{Var}(y) + 2\text{Cov}(x, y)\\ s^2_{(x + y)} &= s^2_x + s^2_y + 2\text{Cov}(x, y)\\ \text{Std Dev}(x + y) &= \sqrt{\text{Var}(x) + \text{Var}(y) + 2\text{Cov}(x, y)}\\ s_{(x + y)} &= \sqrt{s^2_x + s^2_y + 2\text{Cov}(x, y)}\\ \end{aligned} \tag{20.1}\]
If we assume that players’ performance is uncorrelated with one another, the covariance between the players is zero, so the formula simplifies to Equation 20.2:
\[ \begin{aligned} \text{Var}(x + y) &= \text{Var}(x) + \text{Var}(y)\\ s^2_{(x + y)} &= s^2_x + s^2_y\\ \text{Std Dev}(x + y) &= \sqrt{\text{Var}(x) + \text{Var}(y)}\\ s_{(x + y)} &= \sqrt{s^2_x + s^2_y}\\ \end{aligned} \tag{20.2}\]
That is, if players’ performance is independent of each other, the variance of two players’ points is merely the sum of their variances; their standard deviation is then the square root of that. In reality, players’ performance is not truly independent—particularly for players on the same team or, for a given game, for players on opposing teams. However, for players not playing in the same game, it is a reasonable assumption to make and it greatly simplifies the math. If you wanted to, you could account for players’ covariance by using Equation 20.1. An example variance-covariance matrix of players’ performance is in Section 20.5.2.
Below is code for obtaining the expected returns and risk for various combinations of Quarterback, Running Back, and Wide Receiver.
Code
hiLo_permutations <- expand.grid(
rep(list(c("Hi", "Lo")), 3),
stringsAsFactors = FALSE
)
qbRbWr_projections <- data.frame(
condition = c("Hi", "Lo"),
qb_name = c("Josh Allen", "Matthew Stafford"),
rb_name = c("Saquon Barkley", "Joe Mixon"),
wr_name = c("Ja'Marr Chase", "Courtland Sutton")
)
qbRbWr_projections$qb_mean <- c(
all_proj_summary$mean[which(all_proj_summary$position == "QB" & all_proj_summary$name == qbRbWr_projections$qb_name[1])],
all_proj_summary$mean[which(all_proj_summary$position == "QB" & all_proj_summary$name == qbRbWr_projections$qb_name[2])])
qbRbWr_projections$qb_sd <- c(
all_proj_summary$sd[which(all_proj_summary$position == "QB" & all_proj_summary$name == qbRbWr_projections$qb_name[1])],
all_proj_summary$sd[which(all_proj_summary$position == "QB" & all_proj_summary$name == qbRbWr_projections$qb_name[2])])
qbRbWr_projections$rb_mean <- c(
all_proj_summary$mean[which(all_proj_summary$position == "RB" & all_proj_summary$name == qbRbWr_projections$rb_name[1])],
all_proj_summary$mean[which(all_proj_summary$position == "RB" & all_proj_summary$name == qbRbWr_projections$rb_name[2])])
qbRbWr_projections$rb_sd <- c(
all_proj_summary$sd[which(all_proj_summary$position == "RB" & all_proj_summary$name == qbRbWr_projections$rb_name[1])],
all_proj_summary$sd[which(all_proj_summary$position == "RB" & all_proj_summary$name == qbRbWr_projections$rb_name[2])])
qbRbWr_projections$wr_mean <- c(
all_proj_summary$mean[which(all_proj_summary$position == "WR" & all_proj_summary$name == qbRbWr_projections$wr_name[1])],
all_proj_summary$mean[which(all_proj_summary$position == "WR" & all_proj_summary$name == qbRbWr_projections$wr_name[2])])
qbRbWr_projections$wr_sd <- c(
all_proj_summary$sd[which(all_proj_summary$position == "WR" & all_proj_summary$name == qbRbWr_projections$wr_name[1])],
all_proj_summary$sd[which(all_proj_summary$position == "WR" & all_proj_summary$name == qbRbWr_projections$wr_name[2])])
qbRbWr_projections$qb_lastName <- sapply(strsplit(qbRbWr_projections$qb_name, " "), function(x) tail(x, 1))
qbRbWr_projections$rb_lastName <- sapply(strsplit(qbRbWr_projections$rb_name, " "), function(x) tail(x, 1))
qbRbWr_projections$wr_lastName <- sapply(strsplit(qbRbWr_projections$wr_name, " "), function(x) tail(x, 1))
team_projections <- data.frame(
team = apply(hiLo_permutations, 1, paste0, collapse = ""),
team_name = NA,
team_mean = NA,
team_sd = NA,
qb = hiLo_permutations$Var1,
rb = hiLo_permutations$Var2,
wr = hiLo_permutations$Var3,
qb_name = NA,
qb_lastName = NA,
qb_mean = NA,
qb_sd = NA,
rb_name = NA,
rb_lastName = NA,
rb_mean = NA,
rb_sd = NA,
wr_name = NA,
wr_lastName = NA,
wr_mean = NA,
wr_sd = NA
)
team_projections$qb_name[which(team_projections$qb == "Hi")] <- qbRbWr_projections$qb_name[which(qbRbWr_projections$condition == "Hi")]
team_projections$qb_name[which(team_projections$qb == "Lo")] <- qbRbWr_projections$qb_name[which(qbRbWr_projections$condition == "Lo")]
team_projections$qb_lastName[which(team_projections$qb == "Hi")] <- qbRbWr_projections$qb_lastName[which(qbRbWr_projections$condition == "Hi")]
team_projections$qb_lastName[which(team_projections$qb == "Lo")] <- qbRbWr_projections$qb_lastName[which(qbRbWr_projections$condition == "Lo")]
team_projections$qb_mean[which(team_projections$qb == "Hi")] <- qbRbWr_projections$qb_mean[which(qbRbWr_projections$condition == "Hi")]
team_projections$qb_mean[which(team_projections$qb == "Lo")] <- qbRbWr_projections$qb_mean[which(qbRbWr_projections$condition == "Lo")]
team_projections$qb_sd[which(team_projections$qb == "Hi")] <- qbRbWr_projections$qb_sd[which(qbRbWr_projections$condition == "Hi")]
team_projections$qb_sd[which(team_projections$qb == "Lo")] <- qbRbWr_projections$qb_sd[which(qbRbWr_projections$condition == "Lo")]
team_projections$rb_name[which(team_projections$rb == "Hi")] <- qbRbWr_projections$rb_name[which(qbRbWr_projections$condition == "Hi")]
team_projections$rb_name[which(team_projections$rb == "Lo")] <- qbRbWr_projections$rb_name[which(qbRbWr_projections$condition == "Lo")]
team_projections$rb_lastName[which(team_projections$rb == "Hi")] <- qbRbWr_projections$rb_lastName[which(qbRbWr_projections$condition == "Hi")]
team_projections$rb_lastName[which(team_projections$rb == "Lo")] <- qbRbWr_projections$rb_lastName[which(qbRbWr_projections$condition == "Lo")]
team_projections$rb_mean[which(team_projections$rb == "Hi")] <- qbRbWr_projections$rb_mean[which(qbRbWr_projections$condition == "Hi")]
team_projections$rb_mean[which(team_projections$rb == "Lo")] <- qbRbWr_projections$rb_mean[which(qbRbWr_projections$condition == "Lo")]
team_projections$rb_sd[which(team_projections$rb == "Hi")] <- qbRbWr_projections$rb_sd[which(qbRbWr_projections$condition == "Hi")]
team_projections$rb_sd[which(team_projections$rb == "Lo")] <- qbRbWr_projections$rb_sd[which(qbRbWr_projections$condition == "Lo")]
team_projections$wr_name[which(team_projections$wr == "Hi")] <- qbRbWr_projections$wr_name[which(qbRbWr_projections$condition == "Hi")]
team_projections$wr_name[which(team_projections$wr == "Lo")] <- qbRbWr_projections$wr_name[which(qbRbWr_projections$condition == "Lo")]
team_projections$wr_lastName[which(team_projections$wr == "Hi")] <- qbRbWr_projections$wr_lastName[which(qbRbWr_projections$condition == "Hi")]
team_projections$wr_lastName[which(team_projections$wr == "Lo")] <- qbRbWr_projections$wr_lastName[which(qbRbWr_projections$condition == "Lo")]
team_projections$wr_mean[which(team_projections$wr == "Hi")] <- qbRbWr_projections$wr_mean[which(qbRbWr_projections$condition == "Hi")]
team_projections$wr_mean[which(team_projections$wr == "Lo")] <- qbRbWr_projections$wr_mean[which(qbRbWr_projections$condition == "Lo")]
team_projections$wr_sd[which(team_projections$wr == "Hi")] <- qbRbWr_projections$wr_sd[which(qbRbWr_projections$condition == "Hi")]
team_projections$wr_sd[which(team_projections$wr == "Lo")] <- qbRbWr_projections$wr_sd[which(qbRbWr_projections$condition == "Lo")]
team_projections$team_name <- apply(
team_projections[, c("qb_lastName", "rb_lastName", "wr_lastName")],
1,
paste,
collapse = "/"
)
team_projections$team_mean <- rowSums(team_projections[,c("qb_mean","rb_mean","wr_mean")])
team_projections$team_sd <- rowSums(team_projections[,c("qb_sd","rb_sd","wr_sd")])
In Figure 20.3, we depict the combined mean of projected fantasy points for a Quarterback/Running Back/Wide Receiver combination as a function of the standard deviation of their projected fantasy points across sources.
Code
ggplot2::ggplot(
data = team_projections,
aes(
x = team_sd,
y = team_mean,
label = team_name)) +
geom_point() +
ggrepel::geom_text_repel() +
labs(
x = "Standard Deviation of Players' Projected Fantasy Points (Season) Across Sources",
y = "Mean of Players' Projected Fantasy Points (Season) Across Sources",
title = "Mean vs Standard Deviation of Players' Projected Fantasy Points"
) +
theme_classic()
Now let’s generate the efficient frontier across all players.
Below, we use the mvFrontier()
function from the NMOF
package (Gilli et al., 2019; Schumann, 2011--2024, 2024) to compute the efficient frontier.
Code
const_cor <- function(rho, numAssets) {
C <- array(rho, dim = c(numAssets, numAssets))
diag(C) <- 1
C
}
varCovMatrix <- diag(all_proj_summary_noNA_removeZeroVar$sd) %*% const_cor(0, nrow(all_proj_summary_noNA_removeZeroVar)) %*% diag(all_proj_summary_noNA_removeZeroVar$sd) # create a variance-covariance matrix assuming assets/players are uncorrelated
efficientFrontierData <- NMOF::mvFrontier(
all_proj_summary_noNA_removeZeroVar$mean,
varCovMatrix,
wmin = 0,
wmax = 1,
n = 50)
When you evaluate all possible combinations of players, you can obtain the expected returns and standard deviations for each player combination. Based on that, you can determine the maximum number of fantasy points (i.e., the best possible team) at each level of risk. The best expected returns at each level of risk is known as the efficient frontier. The efficient frontier of projected fantasy points across sources is in Figure 20.4.
Code
20.5.2 Based on Historical Game-to-Game Variability
We can also examine the efficient frontier of fantasy performance based on the mean and variability of players’ projected week-to-week fantasy points [Hitchings (2012); archived at https://perma.cc/JQ6G-KSRT].
Code
player_stats_weekly_recent <- player_stats_weekly %>%
filter(season == max(season))
player_stats_weekly_recent_summary <- player_stats_weekly_recent %>%
group_by(player_id) %>%
summarise(
mean = mean(fantasyPoints, na.rm = TRUE),
sd = sd(fantasyPoints, na.rm = TRUE),
var = var(fantasyPoints, na.rm = TRUE)
)
player_stats_weekly_recent_summary <- player_stats_weekly_recent_summary %>%
left_join(
nfl_playerIDs[,c("gsis_id","name","merge_name","position","team")],
by = c("player_id" = "gsis_id")
) %>%
select(name, team, position, everything()) %>%
arrange(-mean)
Pearson's product-moment correlation
data: mean and sd
t = 237.14, df = 8953, p-value < 0.00000000000000022
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.9258926 0.9315841
sample estimates:
cor
0.9287931
Here is the code to generate the variance-covariance matrix of players’ performance (it is very large because there are many players):
A scatterplot of the expected returns versus risk, based on weekly fantasy points, is in Figure 20.5.
Code
plot_fantasyPointsMeanVsSD <- ggplot2::ggplot(
data = player_stats_weekly_recent_summary,
aes(
x = sd,
y = mean)) +
geom_point(
aes(
text = name, # add player name for mouse over tooltip
label = position)) + # add season for mouse over tooltip
#geom_smooth() +
coord_cartesian(
xlim = c(0,NA),
ylim = c(0,NA),
expand = FALSE) +
labs(
x = "Standard Deviation of Player's Fantasy Points From Week-to-Week",
y = "Mean of Player's Fantasy Points Across Weeks",
title = "Mean vs Standard Deviation of Players' Fantasy Points"
) +
theme_classic()
plotly::ggplotly(plot_fantasyPointsMeanVsSD)
Now let’s consider the expected returns versus risk for a combination of players. Below is code for obtaining the expected returns and risk for various combinations of Quarterback, Running Back, and Wide Receiver.
Code
qbRbWr_projections2 <- data.frame(
condition = c("Hi", "Lo"),
qb_name = c("Josh Allen", "Patrick Mahomes"),
rb_name = c("Saquon Barkley", "David Montgomery"),
wr_name = c("Ja'Marr Chase", "Stefon Diggs")
)
qbRbWr_projections2$qb_mean <- c(
player_stats_weekly_recent_summary$mean[which(player_stats_weekly_recent_summary$position == "QB" & player_stats_weekly_recent_summary$name == qbRbWr_projections2$qb_name[1])],
player_stats_weekly_recent_summary$mean[which(player_stats_weekly_recent_summary$position == "QB" & player_stats_weekly_recent_summary$name == qbRbWr_projections2$qb_name[2])])
qbRbWr_projections2$qb_sd <- c(
player_stats_weekly_recent_summary$sd[which(player_stats_weekly_recent_summary$position == "QB" & player_stats_weekly_recent_summary$name == qbRbWr_projections2$qb_name[1])],
player_stats_weekly_recent_summary$sd[which(player_stats_weekly_recent_summary$position == "QB" & player_stats_weekly_recent_summary$name == qbRbWr_projections2$qb_name[2])])
qbRbWr_projections2$rb_mean <- c(
player_stats_weekly_recent_summary$mean[which(player_stats_weekly_recent_summary$position == "RB" & player_stats_weekly_recent_summary$name == qbRbWr_projections2$rb_name[1])],
player_stats_weekly_recent_summary$mean[which(player_stats_weekly_recent_summary$position == "RB" & player_stats_weekly_recent_summary$name == qbRbWr_projections2$rb_name[2])])
qbRbWr_projections2$rb_sd <- c(
player_stats_weekly_recent_summary$sd[which(player_stats_weekly_recent_summary$position == "RB" & player_stats_weekly_recent_summary$name == qbRbWr_projections2$rb_name[1])],
player_stats_weekly_recent_summary$sd[which(player_stats_weekly_recent_summary$position == "RB" & player_stats_weekly_recent_summary$name == qbRbWr_projections2$rb_name[2])])
qbRbWr_projections2$wr_mean <- c(
player_stats_weekly_recent_summary$mean[which(player_stats_weekly_recent_summary$position == "WR" & player_stats_weekly_recent_summary$name == qbRbWr_projections2$wr_name[1])],
player_stats_weekly_recent_summary$mean[which(player_stats_weekly_recent_summary$position == "WR" & player_stats_weekly_recent_summary$name == qbRbWr_projections2$wr_name[2])])
qbRbWr_projections2$wr_sd <- c(
player_stats_weekly_recent_summary$sd[which(player_stats_weekly_recent_summary$position == "WR" & player_stats_weekly_recent_summary$name == qbRbWr_projections2$wr_name[1])],
player_stats_weekly_recent_summary$sd[which(player_stats_weekly_recent_summary$position == "WR" & player_stats_weekly_recent_summary$name == qbRbWr_projections2$wr_name[2])])
qbRbWr_projections2$qb_lastName <- sapply(strsplit(qbRbWr_projections2$qb_name, " "), function(x) tail(x, 1))
qbRbWr_projections2$rb_lastName <- sapply(strsplit(qbRbWr_projections2$rb_name, " "), function(x) tail(x, 1))
qbRbWr_projections2$wr_lastName <- sapply(strsplit(qbRbWr_projections2$wr_name, " "), function(x) tail(x, 1))
team_projections2 <- data.frame(
team = apply(hiLo_permutations, 1, paste0, collapse = ""),
team_name = NA,
team_mean = NA,
team_sd = NA,
qb = hiLo_permutations$Var1,
rb = hiLo_permutations$Var2,
wr = hiLo_permutations$Var3,
qb_name = NA,
qb_lastName = NA,
qb_mean = NA,
qb_sd = NA,
rb_name = NA,
rb_lastName = NA,
rb_mean = NA,
rb_sd = NA,
wr_name = NA,
wr_lastName = NA,
wr_mean = NA,
wr_sd = NA
)
team_projections2$qb_name[which(team_projections2$qb == "Hi")] <- qbRbWr_projections2$qb_name[which(qbRbWr_projections2$condition == "Hi")]
team_projections2$qb_name[which(team_projections2$qb == "Lo")] <- qbRbWr_projections2$qb_name[which(qbRbWr_projections2$condition == "Lo")]
team_projections2$qb_lastName[which(team_projections2$qb == "Hi")] <- qbRbWr_projections2$qb_lastName[which(qbRbWr_projections2$condition == "Hi")]
team_projections2$qb_lastName[which(team_projections2$qb == "Lo")] <- qbRbWr_projections2$qb_lastName[which(qbRbWr_projections2$condition == "Lo")]
team_projections2$qb_mean[which(team_projections2$qb == "Hi")] <- qbRbWr_projections2$qb_mean[which(qbRbWr_projections2$condition == "Hi")]
team_projections2$qb_mean[which(team_projections2$qb == "Lo")] <- qbRbWr_projections2$qb_mean[which(qbRbWr_projections2$condition == "Lo")]
team_projections2$qb_sd[which(team_projections2$qb == "Hi")] <- qbRbWr_projections2$qb_sd[which(qbRbWr_projections2$condition == "Hi")]
team_projections2$qb_sd[which(team_projections2$qb == "Lo")] <- qbRbWr_projections2$qb_sd[which(qbRbWr_projections2$condition == "Lo")]
team_projections2$rb_name[which(team_projections2$rb == "Hi")] <- qbRbWr_projections2$rb_name[which(qbRbWr_projections2$condition == "Hi")]
team_projections2$rb_name[which(team_projections2$rb == "Lo")] <- qbRbWr_projections2$rb_name[which(qbRbWr_projections2$condition == "Lo")]
team_projections2$rb_lastName[which(team_projections2$rb == "Hi")] <- qbRbWr_projections2$rb_lastName[which(qbRbWr_projections2$condition == "Hi")]
team_projections2$rb_lastName[which(team_projections2$rb == "Lo")] <- qbRbWr_projections2$rb_lastName[which(qbRbWr_projections2$condition == "Lo")]
team_projections2$rb_mean[which(team_projections2$rb == "Hi")] <- qbRbWr_projections2$rb_mean[which(qbRbWr_projections2$condition == "Hi")]
team_projections2$rb_mean[which(team_projections2$rb == "Lo")] <- qbRbWr_projections2$rb_mean[which(qbRbWr_projections2$condition == "Lo")]
team_projections2$rb_sd[which(team_projections2$rb == "Hi")] <- qbRbWr_projections2$rb_sd[which(qbRbWr_projections2$condition == "Hi")]
team_projections2$rb_sd[which(team_projections2$rb == "Lo")] <- qbRbWr_projections2$rb_sd[which(qbRbWr_projections2$condition == "Lo")]
team_projections2$wr_name[which(team_projections2$wr == "Hi")] <- qbRbWr_projections2$wr_name[which(qbRbWr_projections2$condition == "Hi")]
team_projections2$wr_name[which(team_projections2$wr == "Lo")] <- qbRbWr_projections2$wr_name[which(qbRbWr_projections2$condition == "Lo")]
team_projections2$wr_lastName[which(team_projections2$wr == "Hi")] <- qbRbWr_projections2$wr_lastName[which(qbRbWr_projections2$condition == "Hi")]
team_projections2$wr_lastName[which(team_projections2$wr == "Lo")] <- qbRbWr_projections2$wr_lastName[which(qbRbWr_projections2$condition == "Lo")]
team_projections2$wr_mean[which(team_projections2$wr == "Hi")] <- qbRbWr_projections2$wr_mean[which(qbRbWr_projections2$condition == "Hi")]
team_projections2$wr_mean[which(team_projections2$wr == "Lo")] <- qbRbWr_projections2$wr_mean[which(qbRbWr_projections2$condition == "Lo")]
team_projections2$wr_sd[which(team_projections2$wr == "Hi")] <- qbRbWr_projections2$wr_sd[which(qbRbWr_projections2$condition == "Hi")]
team_projections2$wr_sd[which(team_projections2$wr == "Lo")] <- qbRbWr_projections2$wr_sd[which(qbRbWr_projections2$condition == "Lo")]
team_projections2$team_name <- apply(
team_projections2[, c("qb_lastName", "rb_lastName", "wr_lastName")],
1,
paste,
collapse = "/"
)
team_projections2$team_mean <- rowSums(team_projections2[,c("qb_mean","rb_mean","wr_mean")])
team_projections2$team_sd <- rowSums(team_projections2[,c("qb_sd","rb_sd","wr_sd")])
In Figure 20.6, we depict the combined mean of projected fantasy points for a Quarterback/Running Back/Wide Receiver combination as a function of the standard deviation of their projected fantasy points across sources.
Code
ggplot2::ggplot(
data = team_projections2,
aes(
x = team_sd,
y = team_mean,
label = team_name)) +
geom_point() +
ggrepel::geom_text_repel() +
labs(
x = "Standard Deviation of Players' Fantasy Points From Week-to-Week",
y = "Mean of Players' Fantasy Points Across Weeks",
title = "Mean vs Standard Deviation of Players' Projected Fantasy Points"
) +
theme_classic()
Now let’s generate the efficient frontier across all players.
Below, we use the mvFrontier()
function from the NMOF
package (Gilli et al., 2019; Schumann, 2011--2024, 2024) to compute the efficient frontier.
Code
varCovMatrix2 <- diag(player_stats_weekly_recent_summary_noNA_removeZeroVar$sd) %*% const_cor(0, nrow(player_stats_weekly_recent_summary_noNA_removeZeroVar)) %*% diag(player_stats_weekly_recent_summary_noNA_removeZeroVar$sd) # create a variance-covariance matrix assuming assets/players are uncorrelated
efficientFrontierData2 <- NMOF::mvFrontier(
player_stats_weekly_recent_summary_noNA_removeZeroVar$mean,
varCovMatrix2,
wmin = 0,
wmax = 1,
n = 50)
The efficient frontier of fantasy points across weeks is in Figure 20.7.
Code
In these examples, we have computed the efficient frontier across all players. However, it would be more relevant to establish the efficient frontier for the possible combinations of players (e.g., one Quarterback, two Running Backs, two Wide Receivers, one Tight End, one Kicker, etc.). Nevertheless, the above examples illustrate how one might do this; one could extend the examples to consider whole teams rather than just three players at a time.
20.6 Conclusion
In summary, fantasy football is similar to stock picking. You are most likely to pick the best players if you go with the wisdom of the crowd (e.g., average projections across projection sources) and diversify. Most projections are public information, so you might wonder whether using crowd projections gains you anything because everybody else has access to public information. However, this is also the case with stocks, and people still consistently perform best over time when they go with the market. Nevertheless, crowd projections are not highly accurate. And fantasy football is a game, so feel free to have fun and deviate from the crowd! However, you may be just as (if not more) likely to be wrong by deviating from the crowd.
20.7 Session Info
R version 4.5.1 (2025-06-13)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 24.04.2 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
locale:
[1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
[4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
[7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
time zone: UTC
tzcode source: system (glibc)
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] lubridate_1.9.4 forcats_1.0.0 stringr_1.5.1
[4] dplyr_1.1.4 purrr_1.0.4 readr_2.1.5
[7] tidyr_1.3.1 tibble_3.3.0 ggplot2_3.5.2
[10] tidyverse_2.0.0 NMOF_2.10-1 fPortfolio_4023.84
[13] fAssets_4023.85 fBasics_4041.97 timeSeries_4041.111
[16] timeDate_4041.110 quantmod_0.4.28 TTR_0.24.4
[19] xts_0.14.1 zoo_1.8-14
loaded via a namespace (and not attached):
[1] tidyselect_1.2.1 viridisLite_0.4.2 farver_2.1.2
[4] bitops_1.0-9 lazyeval_0.2.2 fastmap_1.2.0
[7] RCurl_1.98-1.17 XML_3.99-0.18 digest_0.6.37
[10] timechange_0.3.0 lifecycle_1.0.4 mvnormtest_0.1-9-3
[13] Rsolnp_2.0.1 magrittr_2.0.3 kernlab_0.9-33
[16] compiler_4.5.1 rlang_1.1.6 tools_4.5.1
[19] igraph_2.1.4 yaml_2.3.10 data.table_1.17.6
[22] knitr_1.50 sn_2.1.1 labeling_0.4.3
[25] htmlwidgets_1.6.4 mnormt_2.1.1 curl_6.4.0
[28] RColorBrewer_1.1-3 withr_3.0.2 numDeriv_2016.8-1.1
[31] grid_4.5.1 stats4_4.5.1 future_1.58.0
[34] globals_0.18.0 scales_1.4.0 MASS_7.3-65
[37] cli_3.6.5 mvtnorm_1.3-3 rmarkdown_2.29
[40] Rglpk_0.6-5.1 generics_0.1.4 future.apply_1.20.0
[43] robustbase_0.99-4-1 httr_1.4.7 tzdb_0.5.0
[46] energy_1.7-12 parallel_4.5.1 vctrs_0.6.5
[49] boot_1.3-31 jsonlite_2.0.0 slam_0.1-55
[52] hms_1.1.3 ggrepel_0.9.6 listenv_0.9.1
[55] crosstalk_1.2.1 rneos_0.4-0 plotly_4.11.0
[58] spatial_7.3-18 glue_1.8.0 parallelly_1.45.0
[61] DEoptimR_1.1-3-1 codetools_0.2-20 ecodist_2.1.3
[64] stringi_1.8.7 gtable_0.3.6 fMultivar_4031.84
[67] quadprog_1.5-8 gsl_2.1-8 pillar_1.10.2
[70] htmltools_0.5.8.1 truncnorm_1.0-9 R6_2.6.1
[73] evaluate_1.0.4 lattice_0.22-7 Rcpp_1.0.14
[76] fCopulae_4022.85 xfun_0.52 pkgconfig_2.0.3