I need your help!

I want your feedback to make the book better for you and other readers. If you find typos, errors, or places where the text may be improved, please let me know. The best ways to provide feedback are by GitHub or hypothes.is annotations.

Opening an issue or submitting a pull request on GitHub: https://github.com/isaactpetersen/Fantasy-Football-Analytics-Textbook

Hypothesis Adding an annotation using hypothes.is. To add an annotation, select some text and then click the symbol on the pop-up menu. To see the annotations of others, click the symbol in the upper right-hand corner of the page.

9  Basic Statistics

9.1 Getting Started

9.1.1 Load Packages

Code
library("petersenlab")
library("DescTools")
library("pwr")
library("pwrss")
library("WebPower")
library("grid")
library("tidyverse")

9.1.2 Load Data

Code
load(file = "./data/player_stats_seasonal.RData")

We created the player_stats_seasonal.RData object in Section 4.4.3.

9.2 Descriptive Statistics

Descriptive statistics are used to describe data. For instance, they may be used to describe the center, spread, or shape of the data. There are various indices of each.

9.2.1 Center

Indices to describe the center (central tendency) of a variable’s data include:

  • mean (aka “average”)
  • median
  • Hodges-Lehmann statistic (aka pseudomedian)
  • mode
  • weighted mean
  • weighted median

The mean of \(X\) (written as: \(\bar{X}\)) is calculated as in Equation 9.5:

\[ \bar{X} = \frac{\sum X_i}{n} = \frac{X_1 + X_2 + ... + X_n}{n} \tag{9.1}\]

Code
exampleValues <- c(0, 0, 10, 15, 20, 30, 1000)
exampleValues_mean <- apa(mean(exampleValues), 2)

That is, to compute the mean, sum all of the values and divide by the number of values (\(n\)). One issue with the mean is that it is sensitive to extreme (outlying) values. For instance, the mean of the values of 0, 0, 10, 15, 20, 30, and 1000 is 153.57.

Code
exampleValues_median <- median(exampleValues)

The median is determined as the value at the 50th percentile (i.e., the value that is higher than 50% of the values and is lower than the other 50% of values). Compared to the mean, the median is less influenced by outliers. The median of the values of 0, 0, 10, 15, 20, 30, and 1000 is 15.

Code
exampleValues_pseudomedian <- DescTools::HodgesLehmann(exampleValues)

The Hodges-Lehmann statistic (aka pseudomedian) is computed as the median of all pairwise means, and it is also robust to outliers. The pseudomedian of the values of 0, 0, 10, 15, 20, 30, and 1000 is 15.

Code
exampleValues_mode <- petersenlab::Mode(exampleValues)

The mode is the most common/frequent value. The mode of the values of 0, 0, 10, 15, 20, 30, and 1000 is 0. The petersenlab package (Petersen, 2024a) contains the Mode() function for computing the mode of a set of data.

If you want to give some values more weight to others, you can calculate a weighted mean and a weighted median (or other quantile), while assigning a weight to each value. The petersenlab package (Petersen, 2024a) contains various functions for computing the weighted median (i.e., a weighted quantile at the 0.5 quantile, which is equivalent to the 50th percentile) based on Akinshin (2023). Because some projections are outliers, we use a trimmed version of the weighted Harrell-Davis quantile estimator for greater robustness.

Below is R code to estimate each:

Code
mean(player_stats_seasonal$fantasyPoints, na.rm = TRUE)
[1] 42.56248
Code
median(player_stats_seasonal$fantasyPoints, na.rm = TRUE)
[1] 26
Code
DescTools::HodgesLehmann(player_stats_seasonal$fantasyPoints, na.rm = TRUE)
[1] 33.85
Code
petersenlab::Mode(player_stats_seasonal$fantasyPoints)
[1] 1
Code
weighted.mean(
  player_stats_seasonal$fantasyPoints,
  weights = sample( # randomly generate weights (could specify them manually)
    x = 1:3,
    size = length(player_stats_seasonal$fantasyPoints),
    replace = TRUE),
  na.rm = TRUE)
[1] 42.56248
Code
petersenlab::wthdquantile(
  player_stats_seasonal$fantasyPoints,
  w = sample( # randomly generate weights (could specify them manually)
    x = 1:3,
    size = length(player_stats_seasonal$fantasyPoints),
    replace = TRUE),
  probs = 0.5)
[1] 26.02339

9.2.2 Spread

Indices to describe the spread (variability) of a variable’s data include:

  • standard deviation
  • variance
  • range
  • minimum and maximum
  • interquartile range (IQR)
  • median absolute deviation

The (sample) variance of \(X\) (written as: \(s^2\)) is calculated as in Equation 9.2:

\[ s^2 = \frac{\sum (X_i - \bar{X})^2}{n-1} \tag{9.2}\]

where \(X_i\) is each data point, \(\bar{X}\) is the mean of \(X\), and \(n\) is the number of data points.

The (sample) standard deviation of \(X\) (written as: \(s\)) is calculated as in Equation 9.3:

\[ s = \sqrt{\frac{\sum (X_i - \bar{X})^2}{n-1}} \tag{9.3}\]

The range is calculated of \(X\) is calculated as in Equation 9.4:

\[ \text{range} = \text{maximum} - \text{minimum} \tag{9.4}\]

The interquartile range (IQR) is calculated as in Equation 9.5:

\[ \text{IQR} = Q_3 - Q_1 \tag{9.5}\]

where \(Q_3\) is the score at the third quartile (i.e., 75th percentile), and \(Q_1\) is the score at the first quartile (i.e., 25th percentile).

The median absolute deviation (MAD) is the median of all deviations from the median, and is calculated as in Equation 9.6:

\[ \text{MAD} = \text{median}(|X_i - \tilde{X}|) \tag{9.6}\]

where \(\tilde{X}\) is the median of X. Compared to the standard deviation, the median absolute deviation is more robust to outliers.

Below is R code to estimate each:

9.2.3 Shape

Indices to describe the shape of a variable’s data include:

  • skewness
  • kurtosis

Positive skewness (right-skewed) reflects a longer or heavier right-tailed distribution, whereas negative skewness (left-skewed) reflects a longer or heavier left-tailed distribution. Fantasy points tend to be positively skewed.

The kurtosis reflects the extent of extreme (outlying) values in a distribution relative to a normal distribution (or bell curve). A mesokurtic distribution (with a kurtosis value near zero) reflects a normal amount of tailedness. Positive kurtosis values reflect a leptokurtic distribution, where there are lighter tails and a sharper peak than a normal distribution. Negative kurtosis values reflect a platykurtic distribution, where there are heavier tails and a flatter peak than a normal distribution. Fantasy points tend to have a leptokurtic distribution.

Below is R code to estimate each:

9.2.4 Combination

To estimate multiple indices of center, spread, and shape of the data, you can use the following code:

Code
psych::describe(player_stats_seasonal["fantasyPoints"])
Code
player_stats_seasonal %>% 
  select(age, years_of_experience, fantasyPoints) %>% 
  summarise(across(
      everything(),
      .fns = list(
        n = ~ length(na.omit(.)),
        missingness = ~ mean(is.na(.)) * 100,
        M = ~ mean(., na.rm = TRUE),
        SD = ~ sd(., na.rm = TRUE),
        min = ~ min(., na.rm = TRUE),
        max = ~ max(., na.rm = TRUE),
        range = ~ max(., na.rm = TRUE) - min(., na.rm = TRUE),
        IQR = ~ IQR(., na.rm = TRUE),
        MAD = ~ mad(., na.rm = TRUE),
        median = ~ median(., na.rm = TRUE),
        pseudomedian = ~ DescTools::HodgesLehmann(., na.rm = TRUE),
        mode = ~ petersenlab::Mode(., multipleModes = "mean"),
        skewness = ~ psych::skew(., na.rm = TRUE),
        kurtosis = ~ psych::kurtosi(., na.rm = TRUE)),
      .names = "{.col}.{.fn}")) %>%
    pivot_longer(
      cols = everything(),
      names_to = c("variable","index"),
      names_sep = "\\.") %>% 
    pivot_wider(
      names_from = index,
      values_from = value)

9.3 Scores and Scales

There are many different types of scores and scales. This book focuses on raw scores and z-scores. For information on other scores and scales, including percentile ranks, T-scores, standard scores, scaled scores, and stanine scores, see here: https://isaactpetersen.github.io/Principles-Psychological-Assessment/scoresScales.html#scoreTransformation (Petersen, 2024b).

9.3.1 Raw Scores

Raw scores are the original data on the original metric. Thus, raw scores are considered unstandardized. For example, raw scores that represent the players’ age may range from 20 to 40. Raw scores depend on the construct and unit; thus raw scores may not be comparable across variables.

9.3.2 z Scores

z scores have a mean of zero and a standard deviation of one. z scores are frequently used to render scores across variables more comparable. Thus, z scores are considered a form of a standardized score.

z scores are calculated using Equation 9.7:

\[ z = \frac{X - \bar{X}}{\sigma} \tag{9.7}\]

where \(X\) is the observed score, \(\bar{X}\) is the mean observed score, and \(\sigma\) is the standard deviation of the observed scores.

You can easily convert a variable to a z score using the scale() function:

Code
scale(variable)

With a standard normal curve, 68% of scores fall within one standard deviation of the mean. 95% of scores fall within two standard deviations of the mean. 99.7% of scores fall within three standard deviations of the mean.

The area under a normal curve within one standard deviation of the mean is calculated below using the pnorm() function, which calculates the cumulative density function for a normal curve.

Code
stdDeviations <- 1

pnorm(stdDeviations) - pnorm(stdDeviations * -1)
[1] 0.6826895

The area under a normal curve within one standard deviation of the mean is depicted in Figure 9.1.

Code
x <- seq(-4, 4, length = 200)
y <- dnorm(x, mean = 0, sd = 1)
plot(x, y, type = "l",
     xlab = "z Score",
     ylab = "Normal Density")

x <- seq(stdDeviations * -1, stdDeviations, length = 100)
y <- dnorm(x, mean = 0, sd = 1)
polygon(c(stdDeviations * -1, x, stdDeviations),
        c(0, y, 0),
        col = "blue")
Figure 9.1: Density of Standard Normal Distribution. The blue region represents the area within one standard deviation of the mean.

The area under a normal curve within two standard deviations of the mean is calculated below:

Code
stdDeviations <- 2

pnorm(stdDeviations) - pnorm(stdDeviations * -1)
[1] 0.9544997

The area under a normal curve within two standard deviations of the mean is depicted in Figure 9.2.

Code
x <- seq(-4, 4, length = 200)
y <- dnorm(x, mean = 0, sd = 1)
plot(x, y, type = "l",
     xlab = "z Score",
     ylab = "Normal Density")

x <- seq(stdDeviations * -1, stdDeviations, length = 100)
y <- dnorm(x, mean = 0, sd = 1)
polygon(c(stdDeviations * -1, x, stdDeviations),
        c(0, y, 0),
        col = "blue")
Figure 9.2: Density of Standard Normal Distribution. The blue region represents the area within two standard deviations of the mean.

The area under a normal curve within three standard deviations of the mean is calculated below:

Code
stdDeviations <- 3

pnorm(stdDeviations) - pnorm(stdDeviations * -1)
[1] 0.9973002

The area under a normal curve within three standard deviations of the mean is depicted in Figure 9.3.

Code
x <- seq(-4, 4, length = 200)
y <- dnorm(x, mean = 0, sd = 1)
plot(x, y, type = "l",
     xlab = "z Score",
     ylab = "Normal Density")

x <- seq(stdDeviations * -1, stdDeviations, length = 100)
y <- dnorm(x, mean = 0, sd = 1)
polygon(c(stdDeviations * -1, x, stdDeviations),
        c(0, y, 0),
        col = "blue")
Figure 9.3: Density of Standard Normal Distribution. The blue region represents the area within three standard deviations of the mean.

If you want to determine the z score associated with a particular percentile in a normal distribution, you can use the qnorm() function. For instance, the z score associated with the 37th percentile is:

Code
qnorm(.37)
[1] -0.3318533

9.4 Inferential Statistics

Inferential statistics are used to draw inferences regarding whether there is (a) a difference in level on variable across groups or (b) an association between variables. For instance, inferential statistics may be used to evaluate whether Quarterbacks tend to have longer careers compared to Running Backs. Or, they could be used to evaluate whether number of carries is associated with injury likelihood. To apply inferential statistics, we make use of the null hypothesis (\(H_0\)) and the alternative hypothesis (\(H_1\)).

9.4.1 Null Hypothesis Significance Testing

To draw statistical inferences, the frequentist statistics paradigm leverages null hypothesis significance testing. Frequentist statistics is the most widely used statistical paradigm. However, frequentist statistics is not the only statistical paradigm. Other statistical paradigms exist, including Bayesian statistics, which is based on Bayes’ theorem. This chapter focuses on the frequentist approach to hypothesis testing, known as null hypothesis significance testing. We discuss Bayesian statistics in Chapter 16.

9.4.1.1 Null Hypothesis (\(H_0\))

When testing whether there are differences in level across groups on a variable of interest, the null hypothesis (\(H_0\)) is that there is no difference in level across groups. For instance, when testing whether Quarterbacks tend to have longer careers compared to Running Backs, the null hypothesis (\(H_0\)) is that Quarterbacks do not systematically differ from Running Backs in the length of their career.

When testing whether there is an association between variables, the null hypothesis (\(H_0\)) is that there is no association between the variables. For instance, when testing whether number of carries is associated with injury likelihood, the null hypothesis (\(H_0\)) is that there is no association between number of carries and injury likelihood.

9.4.1.2 Alternative Hypothesis (\(H_1\))

The alternative hypothesis (\(H_1\)) is the researcher’s hypothesis that they want to evaluate. An alternative hypothesis (\(H_1\)) might be directional (i.e., one-sided) or non-directional (i.e., two-sided).

Directional hypotheses specify a particular direction, such as which group will have larger scores or which direction (positive or negative) two variables will be associated. Examples of directional hypotheses include:

  • Quarterbacks have longer careers compared to Running Backs
  • Number of carries is positively associated with injury likelihood

Non-directional hypotheses do not specify a particular direction. For instance, non-directional hypotheses may state that two groups differ but do not specify which group will have larger scores. Or, non-directional hypotheses may state that two variables are associated but do not state what the sign is of the association—i.e., positive or negative. Examples of non-directional hypotheses include:

  • Quarterbacks differ in the length of their careers compared to Running Backs
  • Number of carries is associated with injury likelihood

9.4.1.3 Statistical Significance

In science, statistical significance is evaluated with the p-value. The p-value does not represent the probability that you observed the result by chance. The p-value represents a conditional probability—it examines the probability of one event given another event. In particular, the p-value evaluates the likelihood that you would detect a result as at least as extreme as the one observed (in terms of the magnitude of the difference or of the association) given that the null hypothesis (\(H_0\)) is true.

This can be expressed in conditional probability notation, \(P(A | B)\), which is the probability (likelihood) of event A occurring given that event B occurred (or given condition B).

The conditional probability notation for a left-tailed directional test (i.e., Quarterbacks have shorter careers than Running Backs; or number of carries is negatively associated with injury likelihood) is in Equation 9.8.

\[ p\text{-value} = P(T \le t | H_0) \tag{9.8}\]

where \(T\) is the test statistic of interest (e.g., the distribution of \(t\)-, \(r-\), or \(F\) values, depending on the test) and \(t\) is the observed test statistic (e.g., \(t\)-, \(r-\), or \(F\)-coefficient, depending on the test).

The conditional probability notation for a right-tailed directional test (i.e., Quarterbacks have longer careers than Running Backs; or number of carries is positively associated with injury likelihood) is in Equation 9.9.

\[ p\text{-value} = P(T \ge t | H_0) \tag{9.9}\]

The conditional probability notation for a two-tailed non-directional test (i.e., Quarterbacks differ in the length of their careers compared to Running Backs; or number of carries is associated with injury likelihood) is in Equation 9.10.

\[ p\text{-value} = 2 \times \text{min}(P(T \le t | H_0), P(T \ge t | H_0)) \tag{9.10}\]

where min(a, b) is the smaller number of a and b.

If the distribution of the test statistic is symmetric around zero, the p-value for the two-tailed non-directional test simplifies to Equation 9.11.

\[ p\text{-value} = 2 \times P(T \ge |t| | H_0) \tag{9.11}\]

Nevertheless, to be conservative (i.e., to avoid false positive/Type I errors), many researchers use two-tailed p-values regardless whether their hypothesis is one- or two-tailed.

For a test of group differences, the p-value evaluates the likelihood that you would observe a difference as large or larger than the one you observed between the groups if there were no systematic difference between the groups in the population, as depicted in Figure 9.4. For instance, when evaluating whether Quarterbacks have longer careers than Running Backs, and you observed a mean difference of 0.03 years, the p-value evaluates the likelihood that you would observe a difference as large or larger than 0.03 years between the groups if, in truth among all Quarterbacks and Running Backs in the NFL, Quarterbacks do not differ from Running Backs in terms of the length of their career.

Code
set.seed(52242)

nObserved <- 1000
nPopulation <- 1000000

observedGroups <- data.frame(
  score = c(rnorm(nObserved, mean = 47, sd = 3), rnorm(nObserved, mean = 52, sd = 3)),
  group = as.factor(c(rep("Group 1", nObserved), rep("Group 2", nObserved)))
)

populationGroups <- data.frame(
  score = c(rnorm(nPopulation, mean = 50, sd = 3.03), rnorm(nPopulation, mean = 50, sd = 3)),
  group = as.factor(c(rep("Group 1", nPopulation), rep("Group 2", nPopulation)))
)

ggplot2::ggplot(
  data = observedGroups,
  mapping = aes(
    x = score,
    fill = group,
    color = group
  )
) +
  geom_density(alpha = 0.5) +
  scale_color_manual(values = c("red", "blue")) +
  scale_fill_manual(values = c("red","blue")) +
  geom_vline(xintercept = mean(observedGroups$score[which(observedGroups$group == "Group 1")])) +
  geom_vline(xintercept = mean(observedGroups$score[which(observedGroups$group == "Group 2")])) +
  ggplot2::labs(
    x = "Score",
    y = "Frequency",
    title = "What is the probability my data would look like this..."
  ) +
  ggplot2::theme_classic(
    base_size = 16) +
  ggplot2::theme(
    legend.title = element_blank(),
    axis.text.x = element_blank(),
    axis.ticks.x = element_blank(),
    axis.text.y = element_blank(),
    axis.ticks.y = element_blank(),
    #plot.title.position = "plot"
    legend.position = "inside",
    legend.margin = margin(0, 0, 0, 0),
    legend.justification.top = "left",
    legend.justification.left = "top",
    legend.justification.bottom = "right",
    legend.justification.inside = c(1, 1),
    legend.location = "plot")

ggplot2::ggplot(
  data = populationGroups,
  mapping = aes(
    x = score,
    fill = group,
    color = group
  )
) +
  geom_density(alpha = 0.5) +
  scale_color_manual(values = c("red", "blue")) +
  scale_fill_manual(values = c("red","blue")) +
  geom_vline(xintercept = mean(populationGroups$score[which(populationGroups$group == "Group 1")])) +
  geom_vline(xintercept = mean(populationGroups$score[which(populationGroups$group == "Group 2")])) +
  ggplot2::labs(
    x = "Score",
    y = "Frequency",
    title = "...if in the population, the groups were really this:"
  ) +
  ggplot2::theme_classic(
    base_size = 16) +
  ggplot2::theme(
    legend.title = element_blank(),
    axis.text.x = element_blank(),
    axis.ticks.x = element_blank(),
    axis.text.y = element_blank(),
    axis.ticks.y = element_blank(),
    #plot.title.position = "plot",
    legend.position = "inside",
    legend.margin = margin(0, 0, 0, 0),
    legend.justification.top = "left",
    legend.justification.left = "top",
    legend.justification.bottom = "right",
    legend.justification.inside = c(1, 1),
    legend.location = "plot")
Interpretation of *p*-Values When Examining The Differences Between Groups. The vertical black lines reflect the group means.
(a) What is the probability my data would look like this…
Interpretation of *p*-Values When Examining The Differences Between Groups. The vertical black lines reflect the group means.
(b) …if in the population, the groups were really this?
Figure 9.4: Interpretation of p-Values When Examining The Differences Between Groups. The vertical black lines reflect the group means.

For a test of whether two variables are associated, the p-value evaluates the likelihood that you would observe an association as strong or stronger than the one you observed if there were no actual association between the variables in the population, as depicted in Figure 9.5. For instance, when evaluating whether number of carries is positively associated with injury likelihood, and you observed a correlation coefficient of \(r = .25\) between number of carries and injury likelihood, the p-value evaluates the likelihood that you would observe a correlation as strong or stronger than \(r = .25\) between the variables if, in truth among all NFL Running Backs, number of carries is not associated with injury likelihood.

Code
set.seed(52242)

observedCorrelation <- 0.9

correlations <- data.frame(criterion = rnorm(2000))
correlations$sample <- NA
correlations$sample[1:100] <- complement(correlations$criterion[1:100], observedCorrelation)
correlations$population <- complement(correlations$criterion, 0)

ggplot2::ggplot(
  data = correlations,
  mapping = aes(
    x = sample,
    y = criterion
  )
) +
  geom_point() +
  geom_smooth(method = "lm") +
  scale_x_continuous(
    limits = c(-3.5,3)
  ) +
  annotate(
    x = 0,
    y = 4,
    label = paste("italic(r) != ", 0, sep = ""),
    parse = TRUE,
    geom = "text",
    size = 7) + 
  labs(
    x = "Predictor Variable",
    y = "Outcome Variable",
    title = "What is the probability my data would look like this..."
  ) +
  theme_classic(
    base_size = 16) +
  theme(
    legend.title = element_blank(),
    axis.text.x = element_blank(),
    axis.ticks.x = element_blank(),
    axis.text.y = element_blank(),
    axis.ticks.y = element_blank())

ggplot2::ggplot(
  data = correlations,
  mapping = aes(
    x = population,
    y = criterion
  )
) +
  geom_point() +
  geom_smooth(
    method = "lm",
    se = FALSE) +
  scale_x_continuous(
    limits = c(-2.5,2.5)
  ) +
  annotate(
    x = 0,
    y = 4,
    label = paste("italic(r) == '", "0.00", "'", sep = ""),
    parse = TRUE,
    geom = "text",
    size = 7) + 
  labs(
    x = "Predictor Variable",
    y = "Outcome Variable",
    title = "...if in the population, the association was really this:"
  ) +
  theme_classic(
    base_size = 16) +
  theme(
    legend.title = element_blank(),
    axis.text.x = element_blank(),
    axis.ticks.x = element_blank(),
    axis.text.y = element_blank(),
    axis.ticks.y = element_blank())
Interpretation of *p*-Values When Examining The Association Between Variables.
(a) What is the probability my data would look like this…
Interpretation of *p*-Values When Examining The Association Between Variables.
(b) …if in the population, the association was really this?
Figure 9.5: Interpretation of p-Values When Examining The Association Between Variables.

Using what is called null-hypothesis significance testing (NHST), we consider an effect to be statistically significant if the p-value is less than some threshold, called the alpha level. In science, we typically want to be conservative because a false positive (i.e., Type I error) is considered more problematic than a false negative (i.e., Type II error). That is, we would rather say an effect does not exist when it really does than to say an effect does exist when it really does not. Thus, we typically set the alpha level to a low value, commonly .05. Then, we would consider an effect to be statistically significant if the p-value is less than .05. That is, there is a small chance (5%; or 1 in 20 times) that we would observe an effect at least as extreme as the effect observed, if the null hypothesis were true. So, you might expect around 5% of tests where the null hypothesis is true to be statistically significant just by chance. We could lower the rate of Type II (i.e., false negative) errors—i.e., we could detect more effects—if we set the alpha level to a higher value (e.g., .10); however, raising the alpha level would raise the possibility of Type I (false positive) errors.

If the p-value is less than .05, we reject the null hypothesis (\(H_0\)) that there was no difference or association. Thus, we conclude that there was a statistically significant (non-zero) difference or association. If the p-value is greater than .05, we fail to reject the null hypothesis; the difference/association was not statistically significant. Thus, we do not have confidence that there was a difference or association. However, we do not accept the null hypothesis; it could be there we did not observe an effect because we did not have adequate power to detect the effect—e.g., if the effect size was small, the data were noisy, and the sample size was small and/or unrepresentative.

There are four general possibilities of decision making outcomes when performing null-hypothesis significance testing:

  1. We (correctly) reject the null hypothesis when it is in fact false (\(1 - \beta\)). This is a true positive. For instance, we may correctly determine that Quarterbacks have longer careers than Running Backs.
  2. We (correctly) fail to reject the null hypothesis when it is in fact true (\(1 - \alpha\)). This is a true negative. For instance, we may correctly determine that Quarterbacks do not have longer careers than Running Backs.
  3. We (incorrectly) reject the null hypothesis when it is in fact true (\(\alpha\)). This is a false positive. When performing null hypothesis testing, a false positive is known as a Type I error. For instance, we may incorrectly determine that Quarterbacks have longer careers than Running Backs when, in fact, Quarterbacks and Running Backs do not differ in their career length.
  4. We (incorrectly) fail to reject the null hypothesis when it is in fact false (\(\beta\)). This is a false negative. When performing null hypothesis testing, a false negative is known as a Type II error. For instance, we may incorrectly determine that Quarterbacks and Running Backs do not differ in their career length when, in fact, Quarterbacks have longer careers than Running Backs.

A two-by-two confusion matrix for null-hypothesis significance testing is in Figure 9.6.

A Two-by-Two Confusion Matrix for Null-Hypothesis Significance Testing.
Figure 9.6: A Two-by-Two Confusion Matrix for Null-Hypothesis Significance Testing.

In statistics, power is the probability of detecting an effect, if, in fact, the effect exists. Otherwise said, power is the probability of rejecting the null hypothesis, if, in fact, the null hypothesis is false. Power is influenced by several variables:

  • the sample size (N): the larger the N, the greater the power
    • for group comparisons, the power depends on the sample size of each group
  • the effect size: the larger the effect, the greater the power
    • for group comparisons, larger effect sizes reflect:
      • larger between-group variance, and
      • smaller within-group variance (i.e., strong measurement precision, i.e., reliability)
  • the alpha level: the researcher specifies the alpha level (though it is typically set at .05); the higher the alpha level, the greater the power; however, the higher we set the alpha level, the higher the likelihood of Type I errors (false positives)
  • one- versus two-tailed tests: one-tailed tests have higher power than two-tailed tests
  • within-subject versus between-subject comparisons: within-subject designs tend to have greater power than between-subject designs

A plot of statistical power is in Figure 9.7.

Code
m1 <- 0  # mu H0
sd1 <- 1.5 # sigma H0
m2 <- 3.5 # mu HA
sd2 <- 1.5 # sigma HA
 
z_crit <- qnorm(1-(0.05/2), m1, sd1)
 
# set length of tails
min1 <- m1-sd1*4
max1 <- m1+sd1*4
min2 <- m2-sd2*4
max2 <- m2+sd2*4          
# create x sequence
x <- seq(min(min1,min2), max(max1, max2), .01)
# generate normal dist #1
y1 <- dnorm(x, m1, sd1)
# put in data frame
df1 <- data.frame("x" = x, "y" = y1)
# generate normal dist #2
y2 <- dnorm(x, m2, sd2)
# put in data frame
df2 <- data.frame("x" = x, "y" = y2)
 
# Alpha polygon
y.poly <- pmin(y1,y2)
poly1 <- data.frame(x=x, y=y.poly)
poly1 <- poly1[poly1$x >= z_crit, ] 
poly1<-rbind(poly1, c(z_crit, 0))  # add lower-left corner
 
# Beta polygon
poly2 <- df2
poly2 <- poly2[poly2$x <= z_crit,] 
poly2<-rbind(poly2, c(z_crit, 0))  # add lower-left corner
 
# power polygon; 1-beta
poly3 <- df2
poly3 <- poly3[poly3$x >= z_crit,] 
poly3 <-rbind(poly3, c(z_crit, 0))  # add lower-left corner
 
# combine polygons. 
poly1$id <- 3 # alpha, give it the highest number to make it the top layer
poly2$id <- 2 # beta
poly3$id <- 1 # power; 1 - beta
poly <- rbind(poly1, poly2, poly3)
poly$id <- factor(poly$id,  labels=c("power","beta","alpha"))

# plot with ggplot2
ggplot(poly, aes(x,y, fill=id, group=id)) +
  geom_polygon(show.legend=F, alpha=I(8/10)) +
  # add line for treatment group
  geom_line(data=df1, aes(x,y, color="H0", group=NULL, fill=NULL), linewidth=1.5, show_guide=F) + 
  # add line for treatment group. These lines could be combined into one dataframe.
  geom_line(data=df2, aes(color="HA", group=NULL, fill=NULL),linewidth=1.5, show_guide=F) +
  # add vlines for z_crit
  geom_vline(xintercept = z_crit, linewidth=1, linetype="dashed") +
  # change colors 
  scale_color_manual("Group", 
                     values= c("HA" = "#981e0b","H0" = "black")) +
  scale_fill_manual("test", values= c("alpha" = "#0d6374","beta" = "#be805e","power"="#7cecee")) +
  # beta arrow
  annotate("segment", x=0.1, y=0.045, xend=1.3, yend=0.01, arrow = arrow(length = unit(0.3, "cm")), linewidth=1) +
  annotate("text", label="beta", x=0, y=0.05, parse=T, size=8) +
  # alpha arrow
  annotate("segment", x=4, y=0.043, xend=3.4, yend=0.01, arrow = arrow(length = unit(0.3, "cm")), linewidth=1) +
  annotate("text", label="frac(alpha,2)", x=4.2, y=0.05, parse=T, size=8) +
  # power arrow
  annotate("segment", x=6, y=0.2, xend=4.5, yend=0.15, arrow = arrow(length = unit(0.3, "cm")), linewidth=1) +
  annotate("text", label=expression(paste(1-beta, "  (\"power\")")), x=6.1, y=0.21, parse=T, size=8) +
  # H_0 title
  annotate("text", label="H[0]", x=m1, y=0.28, parse=T, size=8) +
  # H_a title
  annotate("text", label="H[1]", x=m2, y=0.28, parse=T, size=8) +
  ggtitle("Statistical Power") +
  # remove some elements
  theme(
    panel.grid.minor = element_blank(),
    panel.grid.major = element_blank(),
    panel.background = element_blank(),
    plot.background = element_rect(fill="white"),
    panel.border = element_blank(),
    axis.line = element_blank(),
    axis.text.x = element_blank(),
    axis.text.y = element_blank(),
    axis.ticks = element_blank(),
    axis.title.x = element_blank(),
    axis.title.y = element_blank(),
    plot.title = element_text(size=22))
Statistical Power (Adapted from Kristoffer Magnusson: <https://rpsychologist.com/creating-a-typical-textbook-illustration-of-statistical-power-using-either-ggplot-or-base-graphics>; archived at <https://perma.cc/FG3J-85L6>). The dashed line represents the critical value or threshold.
Figure 9.7: Statistical Power (Adapted from Kristoffer Magnusson: https://rpsychologist.com/creating-a-typical-textbook-illustration-of-statistical-power-using-either-ggplot-or-base-graphics; archived at https://perma.cc/FG3J-85L6). The dashed line represents the critical value or threshold.

Interactive visualizations by Kristoffer Magnusson on p-values and null-hypothesis significance testing are below:

Twelve misconceptions about p-values (Goodman, 2008) are in Table 9.1.

Table 9.1: Twelve Misconceptions About p-Values from Goodman (2008). Goodman also provides a discussion about why each statement is false.
Number Misconception
1 If \(p = .05\), the null hypothesis has only a 5% chance of being true.
2 A nonsignificant difference (eg, \(p > .05\)) means there is no difference between groups.
3 A statistically significant finding is clinically important.
4 Studies with \(p\)-values on opposite sides of .05 are conflicting.
5 Studies with the same \(p\)-value provide the same evidence against the null hypothesis.
6 \(p = .05\) means that we have observed data that would occur only 5% of the time under the null hypothesis.
7 \(p = .05\) and \(p < .05\) mean the same thing.
8 \(p\)-values are properly written as inequalities (e.g., “\(p \le .05\)” when \(p = .015\)).
9 \(p = .05\) means that if you reject the null hypothesis, the probability of a Type I error is only 5%.
10 With a \(p = .05\) threshold for significance, the chance of a Type I error will be 5%.
11 You should use a one-sided \(p\)-value when you don’t care about a result in one direction, or a difference in that direction is impossible.
12 A scientific conclusion or treatment policy should be based on whether or not the \(p\)-value is significant.

That is, the p-value is not:

  • the probability that the effect was due to chance
  • the probability that the null hypothesis is true
  • the size of the effect
  • the importance of the effect
  • whether the effect is true, real, or causal

Statistical significance involves the consistency of an effect/association/difference; it suggests that the association/difference is reliably non-zero. However, just because something is statistically significant does not mean that it is important. For instance, consider that we discover that players who consume sports drink before a game tend to perform better than players who do not (\(p < .05\)). However, what if consumption of sports drinks is associated with an average improvement of 0.002 points per game. A small effect such as this might be detectable with a large sample size. This effect would be considered to be reliable/consistent because it is statistically significant. However, such an effect is so small that it results in differences that are not practically important. Thus, in addition to statistical significance, it is also important to consider practical significance.

9.4.2 Practical Significance

Practical significance deals with how large or important the effect/association/difference is. It is based on the magnitude of the effect, called the effect size. Effect size can be quantified in various ways including:

  • Cohen’s \(d\)
  • Standardized regression coefficient (beta; \(\beta\))
  • Correlation coefficient (\(r\))
  • Cohen’s \(\omega\) (omega)
  • Cohen’s \(f\)
  • Cohen’s \(f^2\)
  • Coefficient of determination (\(R^2\))
  • Eta squared (\(\eta^2\))
  • Partial eta squared (\(\eta_p^2\))

9.4.2.1 Cohen’s \(d\)

Cohen’s \(d\) is calculated as in Equation 9.12:

\[ \begin{aligned} d &= \frac{\text{mean difference}}{\text{pooled standard deviation}} \\ &= \frac{\bar{X_1} - \bar{X_2}}{s} \\ \end{aligned} \tag{9.12}\]

where:

\[ s = \sqrt{\frac{(n_1 - 1)s_1^2 + (n_2 - 1)s_2^2}{n_1 + n_2 - 2}} \tag{9.13}\]

where \(n_1\) and \(n_2\) is the sample size of group 1 and group 2, respectively, and \(s_1\) and \(s_2\) is the standard deviation of group 1 and group 2, respectively.

9.4.2.2 Standardized Regression Coefficient (Beta; \(\beta\))

The standardized regression coefficient (beta; \(\beta\)) is used in multiple regression, and is calculated as in Equation 9.14:

\[ \beta_x = B_x \times \frac{s_x}{s_y} \tag{9.14}\]

where \(B_x\) is the unstandardized regression coefficient of the predictor variable \(x\) in predicting the outcome variable \(y\), \(s_x\) is the standard deviation of \(x\), and \(s_y\) is the standard deviation of \(y\).

9.4.2.3 Correlation Coefficient (\(r\))

The formula for the correlation coefficient is in Chapter 10.

9.4.2.4 Cohen’s \(\omega\)

Cohen’s \(\omega\) is used in chi-square tests, and is calculated as in Equation 9.15:

\[ \omega = \sqrt{\frac{\chi^2}{N} - \frac{df}{N}} \tag{9.15}\]

where \(\chi^2\) is the chi-square statistic from the test, \(N\) is the sample size, and \(df\) is the degrees of freedom.

9.4.2.5 Cohen’s \(f\)

Cohen’s \(f\) is commonly used in ANOVA, and is calculated as in Equation 9.16:

\[ \begin{aligned} f &= \sqrt{\frac{R^2}{1 - R^2}} \\ &= \sqrt{\frac{\eta^2}{1 - \eta^2}} \end{aligned} \tag{9.16}\]

9.4.2.6 Cohen’s \(f^2\)

Cohen’s \(f^2\) is commonly used in regression, and is calculated as in Equation 9.17:

\[ \begin{aligned} f^2 &= \frac{R^2}{1 - R^2} \\ &= \frac{\eta^2}{1 - \eta^2} \end{aligned} \tag{9.17}\]

To calculate the effect size of a particular predictor, you can calculate \(\Delta f^2\) as in Equation 9.18:

\[ \begin{aligned} \Delta f^2 &= \frac{R^2_{\text{model}} - R^2_{\text{reduced}}}{1 - R^2_{\text{model}}} \\ &= \frac{\eta^2_{\text{model}} - \eta^2_{\text{reduced}}}{1 - \eta^2_{\text{model}}} \end{aligned} \tag{9.18}\]

where \(R^2_{\text{model}}\) is the \(R^2\) of the model with the predictor variable of interest and \(R^2_{\text{reduced}}\) is the \(R^2\) of the model without the predictor variable of interest.

9.4.2.7 Coefficient of Determination (\(R^2\))

The coefficient of determination (\(R^2\)) reflects the proportion of variance in the outcome variable that is explained by the predictor variable(s). \(R^2\) is commonly used in regression, and is calculated as in Equation 9.19:

\[ \begin{aligned} R^2 &= 1 - \frac{\sum (Y_i - \hat{Y}_i)^2}{\sum (Y_i - \bar{Y})^2} \\ &= 1 - \frac{SS_{\text{residual}}}{SS_{\text{total}}} \\ &= 1 - \frac{\text{sum of squared residuals}}{\text{total sum of squares}} \\ &= \frac{f^2}{1 + f^2} \\ &= \eta^2 \\ &= \frac{\text{variance explained in }Y}{\text{total variance in }Y} \end{aligned} \tag{9.19}\]

where \(Y_i\) is the observed value of the outcome variable for the \(i\)th observation, \(\hat{Y}_i\) is the model predicted value for the \(i\)th observation, \(\bar{Y}\) is the mean of the observed values of the outcome variable. The total sum of squares is an index of the total variation in the outcome variable.

9.4.2.8 Eta Squared (\(\eta^2\)) and Partial Eta Squared (\(\eta_p^2\))

Like \(R^2\), eta squared (\(\eta^2\)) reflects the proportion of variance in the dependent variable that is explained by the independent variable(s). \(\eta^2\) is commonly used in ANOVA, and is calculated as in Equation 9.20:

\[ \begin{aligned} \eta^2 &= \frac{SS_{\text{effect}}}{SS_{\text{total}}} \\ &= 1 - \frac{SS_{\text{residual}}}{SS_{\text{total}}} \\ &= 1 - \frac{\text{sum of squared residuals}}{\text{total sum of squares}} \\ &= \frac{f^2}{1 + f^2} \\ &= R^2 \end{aligned} \tag{9.20}\]

where \(SS_{\text{effect}}\) is the sum of squares for the effect of interest and \(SS_{\text{total}}\) is the total sum of squares.

Partial eta squared (\(\eta_p^2\)) reflects the proportion of variance in the dependent variable that is explained by the independent variable while controlling for the other independent variables. \(\eta_p^2\) is commonly used in ANOVA, and is calculated as in Equation 9.21:

\[ \eta_p^2 = \frac{SS_{\text{effect}}}{SS_{\text{effect}} + SS_{\text{error}}} \tag{9.21}\]

where \(SS_{\text{effect}}\) is the sum of squares for the effect of interest and \(SS_{\text{error}}\) is the sum of squares for the residual error term.

9.4.2.9 Effect Size Thresholds

Effect size thresholds (Cohen, 1988; McGrath & Meyer, 2006) for small, medium, and large effect sizes are in Table 9.2.

Table 9.2: Effect Size Thresholds for Small, Medium, and Large Effect Sizes.
Effect Size Index Small Medium Large
Cohen’s \(d\) \(\ge |.20|\) \(\ge |.50|\) \(\ge |.80|\)
Standardized regression coefficient (beta; \(\beta\)) \(\ge |.10|\) \(\ge |.24|\) \(\ge |.37|\)
Correlation coefficient (\(r\)) \(\ge |.10|\) \(\ge |.24|\) \(\ge |.37|\)
Cohen’s \(\omega\) \(\ge .10\) \(\ge .30\) \(\ge .50\)
Cohen’s \(f\) \(\ge .10\) \(\ge .25\) \(\ge .40\)
Cohen’s \(f^2\) \(\ge .01\) \(\ge .06\) \(\ge .16\)
Coefficient of determination (\(R^2\)) \(\ge .01\) \(\ge .06\) \(\ge .14\)
Eta squared (\(\eta^2\)) \(\ge .01\) \(\ge .06\) \(\ge .14\)
Partial eta squared (\(\eta_p^2\)) \(\ge .01\) \(\ge .06\) \(\ge .14\)

9.5 Statistical Decision Tree

A statistical decision tree is a flowchart or decision tree that depicts which statistical test to use given the purpose of analysis, the type of data, etc. An example statistical decision tree is depicted in Figure 9.8.

A Statistical Decision Tree For Choosing an Appropriate Statistical Procedure. Adapted from: <https://commons.wikimedia.org/wiki/File:InferentialStatisticalDecisionMakingTrees.pdf>. The original source is: Corston, R. & Colman, A. M. (2000). *A crash course in SPSS for Windows*. Wiley-Blackwell. Changes were made to the original, including the addition of several statistical tests.
Figure 9.8: A Statistical Decision Tree For Choosing an Appropriate Statistical Procedure. Adapted from: https://commons.wikimedia.org/wiki/File:InferentialStatisticalDecisionMakingTrees.pdf. The original source is: Corston, R. & Colman, A. M. (2000). A crash course in SPSS for Windows. Wiley-Blackwell. Changes were made to the original, including the addition of several statistical tests. Note: “Interval” as a level of measurement includes data with an “interval” or higher level of measurement; thus, it also includes data with a “ratio” level of measurement.

This statistical decision tree can be generally summarized such that associations are examined with the correlation/regression family, and differences are examined with the t-test/ANOVA family, as depicted in Figure 9.9.

Summary of A Statistical Decision Tree For Choosing an Appropriate Statistical Procedure.
Figure 9.9: Summary of A Statistical Decision Tree For Choosing an Appropriate Statistical Procedure.

However, many statistical tests can be re-formulated in a regression framework, as in Figure 9.10.

A Statistical Decision Tree For Choosing an Appropriate Statistical Procedure, Re-Formulated in a Regression Framework. Adapted from: <https://commons.wikimedia.org/wiki/File:InferentialStatisticalDecisionMakingTrees.pdf>. The original source is: Corston, R. & Colman, A. M. (2000). *A crash course in SPSS for Windows*. Wiley-Blackwell. Changes were made to the original, including re-formulating the tests in a regression framework.
Figure 9.10: A Statistical Decision Tree For Choosing an Appropriate Statistical Procedure, Re-Formulated in a Regression Framework. Adapted from: https://commons.wikimedia.org/wiki/File:InferentialStatisticalDecisionMakingTrees.pdf. The original source is: Corston, R. & Colman, A. M. (2000). A crash course in SPSS for Windows. Wiley-Blackwell. Changes were made to the original, including re-formulating the tests in a regression framework.

Both associations and differences can be examined with the regression family, which greatly simplifies our summary of the statistical decision tree, as depicted in Figure 9.11.

Summary of A Statistical Decision Tree For Choosing an Appropriate Statistical Procedure.
Figure 9.11: Summary of A Statistical Decision Tree For Choosing an Appropriate Statistical Procedure.

Thus, in most cases, the regression framework can be used to examine most questions regarding associations between variables or differences between groups.

For an online, interactive statistical decision tree to help you decide which statistical analysis to use, see here: https://www.statsflowchart.co.uk

9.6 Statistical Tests

9.6.1 t-Test

There are several t-tests:

  • one-sample t-test
  • two-samples t-test
    • independent samples t-test
    • paired samples t-test

A one-sample t-test is used to evaluate whether a sample mean differs systematically from a particular value. The null hypothesis is that the sample mean does not differ systematically from the pre-specified value. The alternative hypothesis is that the sample mean differs systematically from the pre-specified value. For instance, let’s say you want to test out a new draft strategy. You could participate in a mock draft and draft players using the new strategy. Then, you could use a one-sample t-test to evaluate whether your new draft strategy yields players with more projected points than the average of players’ projected points for other teams.

Two-samples t-tests are used to test for differences between scores of two groups. If the two groups are independent, the independent samples t-test is used. If the two groups involve paired samples, the paired samples t-test is used. The null hypothesis is that the mean of group 1 does not differ systematically from the mean of group 2. The alternative hypothesis is that the mean of group 1 differs systematically from the mean of group 2. For instance, you could use an independent-samples t-test if you want to examine whether Quarterbacks tend to have have longer careers than Running Backs. By contrast, you could use a paired samples t-test if you want to examine whether Quarterbacks tend to score more points in the second year of their contract compared to their rookie year, because the same subjects were assessed twice (i.e., a within-subject design).

9.6.2 Analysis of Variance

Analysis of variance (ANOVA) allows examining whether groups differ systematically as a function of one or more factors. There are multiple variants of ANOVA:

  • one-way ANOVA
  • factorial ANOVA
  • repeated measures ANOVA (RM-ANOVA)
  • multivariate ANOVA (MANOVA)

Like two-samples t-tests, ANOVA allows examining whether groups differ as a function of an independent variable. However, unlike a t-test, ANOVA allows examining multiple multiple independent variables and more than two groups. The null hypothesis is that the the groups’ mean value does not differ systematically. The alternative hypothesis is that the groups’ mean value differs systematically.

A one-way ANOVA examines whether two or more groups differ as a function of an independent variable. For instance, you could use a one-way ANOVA to evaluate if you want to evaluate whether multiple positions differ in their length of career. Factorial ANOVA examines whether two or more groups differ as a function of multiple independent variables. For instance, you could use factorial ANOVA to evaluate whether one’s length of career depends on one’s position and weight. Repeated measures ANOVA examines whether scores differ across repeated measures (e.g., across time) for the same participants. For instance, you could use repeated-measures ANOVA to evaluate whether rookies score more points as the season progresses. Multivariate ANOVA examines whether multiple dependent variables differ as a function of one or more factor(s). For instance, you could use MANOVA to evaluate whether one’s contract length and pay differ as a function of one’s position.

9.6.3 Correlation

Correlation examines the association between a predictor and outcome variable. The null hypothesis is that the the two variables are not associated. The alternative hypothesis is that the two variables are associated.

The Pearson correlation coefficient (\(r\)) is calculated as in Equation 9.22:

\[ r = \frac{\sum (X_i - \bar{X})(Y_i - \bar{Y})}{\sqrt{\sum (X_i - \bar{X})^2 \sum (Y_i - \bar{Y})^2}} \tag{9.22}\]

where \(X\) is the predictor variable and \(Y\) is the outcome variable.

9.6.4 (Multiple) Regression

Regression, like correlation, examines the association between a predictor and outcome variable. However, unlike correlation, regression allows multiple predictor variables.

Regression with a single predictor takes the form in Equation 11.1. A regression line is depicted in Figure 11.4. Multiple regression (i.e., regression with multiple predictors) takes the form in Equation 11.2.

The null hypothesis is that the the predictor variable(s) are not associated with the outcome variable. The alternative hypothesis is that the predictor variable(s) are associated with the outcome variable.

9.6.5 Chi-Square Test

There are two primary types of chi-square tests:

  • chi-square goodness-of-fit test
  • chi-square test for association (aka test of independence)

The chi-square goodness-of-fit test evaluates whether a set of categorical data came from a specified distribution. The null hypothesis is that the data came from the specified distribution. The alternative hypothesis is that the data did not come from the specified distribution.

The chi-square test for association evaluates whether two categorical variables are associated. The null hypothesis is that the two variables are not associated. The alternative hypothesis is that the two variables are associated.

9.6.6 Formulating Statistical Tests in Terms of Partitioned Variance

Many statistical tests can be formulated in terms of partitioned variance.

For instance, the t statistic from the independent-samples t-test and the F statistic from ANOVA can be thought of as the ratio of between-group variance to within-group variance, as in Equation 9.23:

\[ t \text{ or } F = \frac{\text{between-group variance}}{\text{within-group variance}} \tag{9.23}\]

The correlation coefficient can be thought of as the ratio of shared variance (i.e., covariance) to total variance, as in Equation 9.24:

\[ r = \frac{\text{shared variance}}{\text{total variance}} \tag{9.24}\]

The coefficient of determination (\(R^2\)) is the proportion of variance in the outcome variable that is explained by the predictor variables. \(\eta^2\) is the proportion of variance in the dependent variable that is explained by the independent variables. The coefficient of determination and \(\eta^2\) can be expressed as the ratio of variance explained in the outcome or dependent variable to the total variance in the outcome or dependent variable, as in Equation 9.25:

\[ R^2 \text{ or } \eta^2 = \frac{\text{variance explained in the outcome variable}}{\text{total variance in the outcome variable}} \tag{9.25}\]

9.6.7 Critical Value

The critical value is the test value for a given test, above which the effect is considered to be statistically significant. The critical value for statistical significance for each test can be determined based on the degrees of freedom and alpha level. The degrees of freedom (df) refer to the number of values in the calculation of a test statistic that are free to vary.

Code
alpha <- .05
N <- 200
nGroup1 <- 150
nGroup2 <- 150
numGroups <- 4
numLevelsFactorA <- 3
numLevelsFactorB <- 4
numMeasurements <- 4
numPredictors <- 5
numCategories <- 6
numRows <- 5
numColumns <- 2

9.6.7.1 One-Sample t-Test

For a one-sample t-test, the degrees of freedom is in Equation 9.26:

\[ df = N - 1 \tag{9.26}\]

where \(N\) is sample size.

Code
df_oneSampleTtest <- N - 1

One-tailed test:

Code
qt(1 - alpha, df_oneSampleTtest)
[1] 1.652547

Two-tailed test:

Code
qt(1 - alpha/2, df_oneSampleTtest)
[1] 1.971957

9.6.7.2 Independent-Samples t-Test

For an independent-samples t-test, the degrees of freedom is in Equation 9.27:

\[ df = n_1 + n_2 - 2 \tag{9.27}\]

where \(n_1\) is the sample size of group 1 and \(n_2\) is the sample size of group 2.

Code
df_independentSamplesTtest <- nGroup1 + nGroup2 - 2

One-tailed test:

Code
qt(1 - alpha, df_independentSamplesTtest)
[1] 1.649983

Two-tailed test:

Code
qt(1 - alpha/2, df_independentSamplesTtest)
[1] 1.967957

9.6.7.3 Paired-Samples t-Test

For a paired-samples t-test, the degrees of freedom is in Equation 9.28:

\[ df = N - 1 \tag{9.28}\]

where \(N\) is sample size (i.e., the number of paired observations).

Code
df_pairedSamplesTtest <- N - 1

One-tailed test:

Code
qt(1 - alpha, df_pairedSamplesTtest)
[1] 1.652547

Two-tailed test:

Code
qt(1 - alpha/2, df_pairedSamplesTtest)
[1] 1.971957

9.6.7.4 One-Way ANOVA

For a one-way ANOVA, the degrees of freedom is in Equation 9.29:

\[ \begin{aligned} df_\text{between} &= g - 1 \\ df_\text{within} &= N - g \end{aligned} \tag{9.29}\]

where \(N\) is sample size and \(g\) is the number of groups.

Code
df_betweenOneWayANOVA <- numGroups - 1
df_withinOneWayANOVA <- N - numGroups

One-tailed test:

Code
qf(1 - alpha, df_betweenOneWayANOVA, df_withinOneWayANOVA)
[1] 2.650677

Two-tailed test:

Code
qf(1 - alpha/2, df_betweenOneWayANOVA, df_withinOneWayANOVA)
[1] 3.183378

9.6.7.5 Factorial ANOVA

For a factorial two-way ANOVA, the degrees of freedom is in Equation 9.30:

\[ \begin{aligned} df_\text{Factor A} &= a - 1 \\ df_\text{Factor B} &= b - 1 \\ df_\text{Interaction} &= (a - 1)(b - 1) \\ df_\text{error} &= ab(N - 1) \end{aligned} \tag{9.30}\]

where \(N\) is sample size, \(a\) is the number of levels for factor A, and \(b\) is the number of levels for factor B.

Code
df_factorA <- numLevelsFactorA - 1
df_factorB <- numLevelsFactorB - 1
df_interaction <- df_factorA * df_factorB
df_error <- numLevelsFactorA * numLevelsFactorB * (N - 1)

Factor A (one-tailed test):

Code
qf(1 - alpha, df_factorA, df_error)
[1] 2.999494

Factor B (one-tailed test):

Code
qf(1 - alpha, df_factorB, df_error)
[1] 2.608629

Interaction (one-tailed test):

Code
qf(1 - alpha, df_interaction, df_error)
[1] 2.102376

Factor A (two-tailed test):

Code
qf(1 - alpha/2, df_factorA, df_error)
[1] 3.694584

Factor B (two-tailed test):

Code
qf(1 - alpha/2, df_factorB, df_error)
[1] 3.121587

Interaction (two-tailed test):

Code
qf(1 - alpha/2, df_interaction, df_error)
[1] 2.413504

9.6.7.6 Repeated Measures ANOVA

For a repeated measures ANOVA, the degrees of freedom is in Equation 9.31:

\[ \begin{aligned} df_1 &= T - 1 \\ df_2 &= (T - 1)(N - 1) \end{aligned} \tag{9.31}\]

where \(N\) is sample size and \(T\) is the number of measurements (i.e., the number of levels of the within-person factor: e.g., timepoints or conditions).

Code
df1_RMANOVA <- numMeasurements - 1
df2_RMANOVA <- (numMeasurements - 1) * (N - 1)

One-tailed test:

Code
qf(1 - alpha, df1_RMANOVA, df2_RMANOVA)
[1] 2.619828

Two-tailed test:

Code
qf(1 - alpha/2, df1_RMANOVA, df2_RMANOVA)
[1] 3.138017

9.6.7.7 Correlation

For a correlation, the degrees of freedom is in Equation 9.32:

\[ df = N - 2 \tag{9.32}\]

where \(N\) is sample size.

Code
df_correlation <- N - 2

One-tailed test:

Code
qt(1 - alpha, df_correlation)
[1] 1.652586

Two-tailed test:

Code
qt(1 - alpha/2, df_correlation)
[1] 1.972017

9.6.7.8 Multiple Regression

For multiple regression, the degrees of freedom is in Equation 9.33:

\[ \begin{aligned} df_1 &= p \\ df_2 &= N - p - 1 \end{aligned} \tag{9.33}\]

where \(N\) is sample size and \(p\) is the number of predictors.

Code
df1_regression <- numPredictors
df2_regression <- N - numPredictors - 1

One-tailed test:

Code
qf(1 - alpha, df1_regression, df2_regression)
[1] 2.260647

Two-tailed test:

Code
qf(1 - alpha/2, df1_regression, df2_regression)
[1] 2.63243

9.6.7.9 Chi-Square Goodness-of-Fit Test

For the chi-square goodness-of-fit test, the degrees of freedom is in Equation 9.34:

\[ df = c - 1 \tag{9.34}\]

where \(c\) is the number of categories.

Code
df_chisquareGOF <- numCategories - 1

One-tailed test:

Code
qchisq(1 - alpha, df_chisquareGOF)
[1] 11.0705

Two-tailed test:

Code
qchisq(1 - alpha/2, df_chisquareGOF)
[1] 12.8325

9.6.7.10 Chi-Square Test for Association

For the chi-square test for association, the degrees of freedom is in Equation 9.35:

\[ df = (r - 1) \times (c - 1) \tag{9.35}\]

where \(r\) is the number of rows in the contingency table and \(c\) is the number of columns in the contingency table.

Code
df_chisquareAssociation <- (numRows - 1) * (numColumns - 1)

One-tailed test:

Code
qchisq(1 - alpha, df_chisquareAssociation)
[1] 9.487729

Two-tailed test:

Code
qchisq(1 - alpha/2, df_chisquareAssociation)
[1] 11.14329

9.6.8 Statistical Power

As described above, statistical power is the probability of detecting an effect, if, in fact, the effect exists. Statistical power for a given test can be calculated based on three factors:

Knowing any three of the following, you can calculate the fourth: statistical power, effect size, sample size, and alpha level. Below is R code for calculating power for each of various statistical tests (i.e., a power analysis). For free point-and-click software for calculating statistical power, see G*Power: https://www.psychologie.hhu.de/arbeitsgruppen/allgemeine-psychologie-und-arbeitspsychologie/gpower.html

Code
power <- .8
effectSize_d <- .5
effectSize_r <- .24
effectSize_beta <- .24
effectSize_f <- .25
effectSize_fSquared <- .06
effectSize_omega <- .3

When designing a study, it is important to consider power and the sample size needed to detect the hypothesized effect size. If your sample size is too small and you do not detect an effect (i.e., \(p > .05\)), you do not know whether your failure to detect the effect was because a) the effect does not exist, or b) the effect exists but you did not have enough power to detect it.

9.6.8.1 One-Sample t-Test

Solving for statistical power achieved (given effect size, sample size, and alpha level):

Code
pwr::pwr.t.test(
  n = N,
  d = effectSize_d,
  sig.level = alpha,
  type = "one.sample",
  alternative = "two.sided")

     One-sample t test power calculation 

              n = 200
              d = 0.5
      sig.level = 0.05
          power = 0.9999998
    alternative = two.sided

Solving for sample size needed (given effect size, power, and alpha level):

Code
pwr::pwr.t.test(
  power = power,
  d = effectSize_d,
  sig.level = alpha,
  type = "one.sample",
  alternative = "two.sided")

     One-sample t test power calculation 

              n = 33.36713
              d = 0.5
      sig.level = 0.05
          power = 0.8
    alternative = two.sided

Solving for the minimum detectable effect size (given sample size, power, and alpha level):

Code
pwr::pwr.t.test(
  power = power,
  n = N,
  sig.level = alpha,
  type = "one.sample",
  alternative = "two.sided")

     One-sample t test power calculation 

              n = 200
              d = 0.1990655
      sig.level = 0.05
          power = 0.8
    alternative = two.sided

9.6.8.2 Independent-Samples t-Test

9.6.8.2.1 Balanced Group Sizes

Solving for statistical power achieved (given effect size, sample size per group, and alpha level):

Code
pwr::pwr.t.test(
  n = N,
  d = effectSize_d,
  sig.level = alpha,
  type = "two.sample",
  alternative = "two.sided")

     Two-sample t test power calculation 

              n = 200
              d = 0.5
      sig.level = 0.05
          power = 0.9987689
    alternative = two.sided

NOTE: n is number in *each* group

Solving for sample size per group needed (given effect size, power, and alpha level):

Code
pwr::pwr.t.test(
  power = power,
  d = effectSize_d,
  sig.level = alpha,
  type = "two.sample",
  alternative = "two.sided")

     Two-sample t test power calculation 

              n = 63.76561
              d = 0.5
      sig.level = 0.05
          power = 0.8
    alternative = two.sided

NOTE: n is number in *each* group

Solving for the minimum detectable effect size (given sample size per group, power, and alpha level):

Code
pwr::pwr.t.test(
  power = power,
  n = N,
  sig.level = alpha,
  type = "two.sample",
  alternative = "two.sided")

     Two-sample t test power calculation 

              n = 200
              d = 0.2808267
      sig.level = 0.05
          power = 0.8
    alternative = two.sided

NOTE: n is number in *each* group
9.6.8.2.2 Unbalanced Group Sizes

Solving for statistical power achieved (given effect size, sample size per group, and alpha level):

Code
pwr::pwr.t2n.test(
  n1 = nGroup1,
  n2 = nGroup2,
  d = effectSize_d,
  sig.level = alpha,
  alternative = "two.sided")

     t test power calculation 

             n1 = 150
             n2 = 150
              d = 0.5
      sig.level = 0.05
          power = 0.9907677
    alternative = two.sided

Solving for sample size per group needed (given effect size, power, and alpha level):

Code
pwr::pwr.t2n.test(
  power = power,
  n1 = nGroup1,
  d = effectSize_d,
  sig.level = alpha,
  alternative = "two.sided")

     t test power calculation 

             n1 = 150
             n2 = 40.22483
              d = 0.5
      sig.level = 0.05
          power = 0.8
    alternative = two.sided

Solving for the minimum detectable effect size (given sample size per group, power, and alpha level):

Code
pwr::pwr.t2n.test(
  power = power,
  n1 = nGroup1,
  n2 = nGroup2,
  sig.level = alpha,
  alternative = "two.sided")

     t test power calculation 

             n1 = 150
             n2 = 150
              d = 0.3245459
      sig.level = 0.05
          power = 0.8
    alternative = two.sided

9.6.8.3 Paired-Samples t-Test

Solving for statistical power achieved (given effect size, sample size per group, and alpha level):

Code
pwr::pwr.t.test(
  n = N,
  d = effectSize_d,
  sig.level = alpha,
  type = "paired",
  alternative = "two.sided")

     Paired t test power calculation 

              n = 200
              d = 0.5
      sig.level = 0.05
          power = 0.9999998
    alternative = two.sided

NOTE: n is number of *pairs*

Solving for sample size per group needed (given effect size, power, and alpha level):

Code
pwr::pwr.t.test(
  power = power,
  d = effectSize_d,
  sig.level = alpha,
  type = "paired",
  alternative = "two.sided")

     Paired t test power calculation 

              n = 33.36713
              d = 0.5
      sig.level = 0.05
          power = 0.8
    alternative = two.sided

NOTE: n is number of *pairs*

Solving for the minimum detectable effect size (given sample size per group, power, and alpha level):

Code
pwr::pwr.t.test(
  power = power,
  n = N,
  sig.level = alpha,
  type = "paired",
  alternative = "two.sided")

     Paired t test power calculation 

              n = 200
              d = 0.1990655
      sig.level = 0.05
          power = 0.8
    alternative = two.sided

NOTE: n is number of *pairs*

9.6.8.4 One-Way ANOVA

Solving for statistical power achieved (given effect size, sample size per group, and alpha level):

Code
pwr::pwr.anova.test(
  n = N,
  f = effectSize_f,
  sig.level = alpha,
  k = numGroups)

     Balanced one-way analysis of variance power calculation 

              k = 4
              n = 200
              f = 0.25
      sig.level = 0.05
          power = 0.9999962

NOTE: n is number in each group

Solving for sample size per group needed (given effect size, power, and alpha level):

Code
pwr::pwr.anova.test(
  power = power,
  f = effectSize_f,
  sig.level = alpha,
  k = numGroups)

     Balanced one-way analysis of variance power calculation 

              k = 4
              n = 44.59927
              f = 0.25
      sig.level = 0.05
          power = 0.8

NOTE: n is number in each group

Solving for the minimum detectable effect size (given sample size per group, power, and alpha level):

Code
pwr::pwr.anova.test(
  power = power,
  n = N,
  sig.level = alpha,
  k = numGroups)

     Balanced one-way analysis of variance power calculation 

              k = 4
              n = 200
              f = 0.117038
      sig.level = 0.05
          power = 0.8

NOTE: n is number in each group

The power analysis code above assumes the groups are of equal size (i.e., a balanced design). If the design is unbalanced (i.e., there are different numbers of participants in each group), it may be necessary to conduct a power analysis via a simulation. Below is an example of evaluating the statistical power for detecting an effect unbalanced designs via simulation:

Code
nSim <- 1000 # number of simulations

# Function to generate data and perform ANOVA
simulate_anova <- function(nGroup1, nGroup2, f, alpha) {
  # Means for each group
  mean1 <- 0
  mean2 <- f * sqrt((nGroup1 + nGroup2) / 2)
  
  # Generate data
  group1 <- rnorm(nGroup1, mean = mean1, sd = 1)
  group2 <- rnorm(nGroup2, mean = mean2, sd = 1)
  
  # Combine data
  data <- data.frame(
    value = c(group1, group2),
    group = factor(rep(c("Group1", "Group2"), c(nGroup1, nGroup2)))
  )
  
  # Perform ANOVA
  aov_result <- aov(value ~ group, data = data)
  p_value <- summary(aov_result)[[1]][["Pr(>F)"]][1]
  
  # Check if p-value is less than alpha
  return(p_value < alpha)
}

# Run simulations
set.seed(52242) # for reproducibility
powerSimulationOneWayAnova <- replicate(
  nSim,
  simulate_anova(
    nGroup1 = 10,
    nGroup2 = 25,
    f = effectSize_f,
    alpha = alpha))

# Estimate power
mean(powerSimulationOneWayAnova)
[1] 0.774

9.6.8.5 Factorial ANOVA

The power analysis code below assumes the groups are of equal size (i.e., a balanced design). If the design is unbalanced (i.e., there are different numbers of participants in each group), it may be necessary to conduct a power analysis via a simulation. See Section 9.6.8.4 for an example power analysis simulation for one-way ANOVA.

Solving for statistical power achieved (given effect size, sample size per group, and alpha level):

Code
pwr::pwr.anova.test(
  n = N,
  f = effectSize_f,
  sig.level = alpha,
  k = numLevelsFactorA)

     Balanced one-way analysis of variance power calculation 

              k = 3
              n = 200
              f = 0.25
      sig.level = 0.05
          power = 0.9999238

NOTE: n is number in each group
Code
pwr::pwr.anova.test(
  n = N,
  f = effectSize_f,
  sig.level = alpha,
  k = numLevelsFactorB)

     Balanced one-way analysis of variance power calculation 

              k = 4
              n = 200
              f = 0.25
      sig.level = 0.05
          power = 0.9999962

NOTE: n is number in each group
Code
pwr::pwr.anova.test(
  n = N,
  f = effectSize_f,
  sig.level = alpha,
  k = numLevelsFactorA + numLevelsFactorB)

     Balanced one-way analysis of variance power calculation 

              k = 7
              n = 200
              f = 0.25
      sig.level = 0.05
          power = 1

NOTE: n is number in each group

Solving for sample size per group needed (given effect size, power, and alpha level):

Code
pwr::pwr.anova.test(
  power = power,
  f = effectSize_f,
  sig.level = alpha,
  k = numLevelsFactorA)

     Balanced one-way analysis of variance power calculation 

              k = 3
              n = 52.3966
              f = 0.25
      sig.level = 0.05
          power = 0.8

NOTE: n is number in each group
Code
pwr::pwr.anova.test(
  power = power,
  f = effectSize_f,
  sig.level = alpha,
  k = numLevelsFactorB)

     Balanced one-way analysis of variance power calculation 

              k = 4
              n = 44.59927
              f = 0.25
      sig.level = 0.05
          power = 0.8

NOTE: n is number in each group
Code
pwr::pwr.anova.test(
  power = power,
  f = effectSize_f,
  sig.level = alpha,
  k = numLevelsFactorA + numLevelsFactorB)

     Balanced one-way analysis of variance power calculation 

              k = 7
              n = 32.05196
              f = 0.25
      sig.level = 0.05
          power = 0.8

NOTE: n is number in each group

Solving for the minimum detectable effect size (given sample size per group, power, and alpha level):

Code
pwr::pwr.anova.test(
  power = power,
  n = N,
  sig.level = alpha,
  k = numLevelsFactorA)

     Balanced one-way analysis of variance power calculation 

              k = 3
              n = 200
              f = 0.1270373
      sig.level = 0.05
          power = 0.8

NOTE: n is number in each group
Code
pwr::pwr.anova.test(
  power = power,
  n = N,
  sig.level = alpha,
  k = numLevelsFactorB)

     Balanced one-way analysis of variance power calculation 

              k = 4
              n = 200
              f = 0.117038
      sig.level = 0.05
          power = 0.8

NOTE: n is number in each group
Code
pwr::pwr.anova.test(
  power = power,
  n = N,
  sig.level = alpha,
  k = numLevelsFactorA + numLevelsFactorB)

     Balanced one-way analysis of variance power calculation 

              k = 7
              n = 200
              f = 0.09889082
      sig.level = 0.05
          power = 0.8

NOTE: n is number in each group

9.6.8.6 Repeated Measures ANOVA

Solving for statistical power achieved (given effect size, sample size per group, and alpha level):

Code
WebPower::wp.rmanova(
  n = N,
  ng = numGroups,
  nm = numMeasurements,
  f = effectSize_f,
  alpha = alpha,
  type = 0)
Repeated-measures ANOVA analysis

      n    f ng nm nscor alpha     power
    200 0.25  4  4     1  0.05 0.8484718

NOTE: Power analysis for between-effect test
URL: http://psychstat.org/rmanova
Code
WebPower::wp.rmanova(
  n = N,
  ng = numGroups,
  nm = numMeasurements,
  f = effectSize_f,
  alpha = alpha,
  type = 1)
Repeated-measures ANOVA analysis

      n    f ng nm nscor alpha     power
    200 0.25  4  4     1  0.05 0.8536292

NOTE: Power analysis for within-effect test
URL: http://psychstat.org/rmanova
Code
WebPower::wp.rmanova(
  n = N,
  ng = numGroups,
  nm = numMeasurements,
  f = effectSize_f,
  alpha = alpha,
  type = 2)
Repeated-measures ANOVA analysis

      n    f ng nm nscor alpha     power
    200 0.25  4  4     1  0.05 0.6756298

NOTE: Power analysis for interaction-effect test
URL: http://psychstat.org/rmanova

Solving for sample size per group needed (given effect size, power, and alpha level):

Code
WebPower::wp.rmanova(
  power = power,
  ng = numGroups,
  nm = numMeasurements,
  f = effectSize_f,
  alpha = alpha,
  type = 0)
Repeated-measures ANOVA analysis

           n    f ng nm nscor alpha power
    178.3971 0.25  4  4     1  0.05   0.8

NOTE: Power analysis for between-effect test
URL: http://psychstat.org/rmanova
Code
WebPower::wp.rmanova(
  power = power,
  ng = numGroups,
  nm = numMeasurements,
  f = effectSize_f,
  alpha = alpha,
  type = 1)
Repeated-measures ANOVA analysis

           n    f ng nm nscor alpha power
    175.7692 0.25  4  4     1  0.05   0.8

NOTE: Power analysis for within-effect test
URL: http://psychstat.org/rmanova
Code
WebPower::wp.rmanova(
  power = power,
  ng = numGroups,
  nm = numMeasurements,
  f = effectSize_f,
  alpha = alpha,
  type = 2)
Repeated-measures ANOVA analysis

           n    f ng nm nscor alpha power
    253.2369 0.25  4  4     1  0.05   0.8

NOTE: Power analysis for interaction-effect test
URL: http://psychstat.org/rmanova

Solving for the minimum detectable effect size (given sample size per group, power, and alpha level):

Code
WebPower::wp.rmanova(
  power = power,
  n = N,
  ng = numGroups,
  nm = numMeasurements,
  alpha = alpha,
  type = 0)
Repeated-measures ANOVA analysis

      n         f ng nm nscor alpha power
    200 0.2358259  4  4     1  0.05   0.8

NOTE: Power analysis for between-effect test
URL: http://psychstat.org/rmanova
Code
WebPower::wp.rmanova(
  power = power,
  n = N,
  ng = numGroups,
  nm = numMeasurements,
  alpha = alpha,
  type = 1)
Repeated-measures ANOVA analysis

      n         f ng nm nscor alpha power
    200 0.2342726  4  4     1  0.05   0.8

NOTE: Power analysis for within-effect test
URL: http://psychstat.org/rmanova
Code
WebPower::wp.rmanova(
  power = power,
  n = N,
  ng = numGroups,
  nm = numMeasurements,
  alpha = alpha,
  type = 2)
Repeated-measures ANOVA analysis

      n         f ng nm nscor alpha power
    200 0.2817486  4  4     1  0.05   0.8

NOTE: Power analysis for interaction-effect test
URL: http://psychstat.org/rmanova

9.6.8.7 Correlation

Solving for statistical power achieved (given effect size, sample size per group, and alpha level):

Code
pwr::pwr.r.test(
  n = N,
  r = effectSize_r,
  sig.level = alpha,
  alternative = "two.sided")

     approximate correlation power calculation (arctangh transformation) 

              n = 200
              r = 0.24
      sig.level = 0.05
          power = 0.9310138
    alternative = two.sided

Solving for sample size per group needed (given effect size, power, and alpha level):

Code
pwr::pwr.r.test(
  power = power,
  r = effectSize_r,
  sig.level = alpha,
  alternative = "two.sided")

     approximate correlation power calculation (arctangh transformation) 

              n = 133.1299
              r = 0.24
      sig.level = 0.05
          power = 0.8
    alternative = two.sided

Solving for the minimum detectable effect size (given sample size per group, power, and alpha level):

Code
pwr::pwr.r.test(
  power = power,
  n = N,
  sig.level = alpha,
  alternative = "two.sided")

     approximate correlation power calculation (arctangh transformation) 

              n = 200
              r = 0.1965767
      sig.level = 0.05
          power = 0.8
    alternative = two.sided

9.6.8.8 Multiple Regression

Solving for statistical power achieved (given effect size, sample size, and alpha level):

Code
pwr::pwr.f2.test(
  f2 = effectSize_fSquared,
  sig.level = alpha,
  u = numPredictors,
  v = N - numPredictors - 1)

     Multiple regression power calculation 

              u = 5
              v = 194
             f2 = 0.06
      sig.level = 0.05
          power = 0.7548031
Code
pwrss::pwrss.t.reg(
  n = N,
  beta1 = effectSize_beta,
  k = numPredictors,
  alpha = alpha,
  alternative = "not equal")
 Linear Regression Coefficient (t Test) 
 H0: beta1 = beta0 
 HA: beta1 != beta0 
 ------------------------------ 
  Statistical power = 0.936 
  n = 200 
 ------------------------------ 
 Alternative = "not equal" 
 Degrees of freedom = 194 
 Non-centrality parameter = 3.496 
 Type I error rate = 0.05 
 Type II error rate = 0.064 

Solving for sample size needed (given effect size, power, and alpha level)—\(v = N - \text{numberOfPredictors} - 1\); thus, \(N = v + \text{numberOfPredictors} + 1\):

Code
multipleRegressionSampleSizeModel <- pwr::pwr.f2.test(
  power = power,
  f2 = effectSize_fSquared,
  sig.level = alpha,
  u = numPredictors)

multipleRegressionSampleSizeModel

     Multiple regression power calculation 

              u = 5
              v = 213.3947
             f2 = 0.06
      sig.level = 0.05
          power = 0.8
Code
vNeeded <- multipleRegressionSampleSizeModel$v
sampleSizeNeeded <- vNeeded + numPredictors + 1
sampleSizeNeeded
[1] 219.3947
Code
pwrss::pwrss.t.reg(
  power = power,
  beta1 = effectSize_beta,
  k = numPredictors,
  alpha = alpha,
  alternative = "not equal")
 Linear Regression Coefficient (t Test) 
 H0: beta1 = beta0 
 HA: beta1 != beta0 
 ------------------------------ 
  Statistical power = 0.8 
  n = 131 
 ------------------------------ 
 Alternative = "not equal" 
 Degrees of freedom = 124.427 
 Non-centrality parameter = 2.823 
 Type I error rate = 0.05 
 Type II error rate = 0.2 

Solving for the minimum detectable effect size (given sample size, power, and alpha level):

Code
pwr::pwr.f2.test(
  power = power,
  sig.level = alpha,
  u = numPredictors,
  v = N - numPredictors - 1)

     Multiple regression power calculation 

              u = 5
              v = 194
             f2 = 0.06597765
      sig.level = 0.05
          power = 0.8

9.6.8.9 Chi-Square Goodness-of-Fit Test

Solving for statistical power achieved (given effect size, sample size, and alpha level):

Code
pwr::pwr.chisq.test(
  N = N,
  w = effectSize_omega,
  df = numCategories - 1,
  sig.level = alpha)

     Chi squared power calculation 

              w = 0.3
              N = 200
             df = 5
      sig.level = 0.05
          power = 0.9269225

NOTE: N is the number of observations

Solving for sample size needed (given effect size, power, and alpha level):

Code
pwr::pwr.chisq.test(
  power = power,
  w = effectSize_omega,
  df = numCategories - 1,
  sig.level = alpha)

     Chi squared power calculation 

              w = 0.3
              N = 142.529
             df = 5
      sig.level = 0.05
          power = 0.8

NOTE: N is the number of observations

Solving for the minimum detectable effect size (given sample size, power, and alpha level):

Code
pwr::pwr.chisq.test(
  power = power,
  N = N,
  df = numCategories - 1,
  sig.level = alpha)

     Chi squared power calculation 

              w = 0.2532543
              N = 200
             df = 5
      sig.level = 0.05
          power = 0.8

NOTE: N is the number of observations

9.6.8.10 Chi-Square Test for Association

Solving for statistical power achieved (given effect size, sample size, and alpha level):

Code
pwr::pwr.chisq.test(
  N = N,
  w = effectSize_omega,
  df = (numRows - 1)*(numColumns - 1),
  sig.level = alpha)

     Chi squared power calculation 

              w = 0.3
              N = 200
             df = 4
      sig.level = 0.05
          power = 0.9431195

NOTE: N is the number of observations

Solving for sample size needed (given effect size, power, and alpha level):

Code
pwr::pwr.chisq.test(
  power = power,
  w = effectSize_omega,
  df = (numRows - 1)*(numColumns - 1),
  sig.level = alpha)

     Chi squared power calculation 

              w = 0.3
              N = 132.6143
             df = 4
      sig.level = 0.05
          power = 0.8

NOTE: N is the number of observations

Solving for the minimum detectable effect size (given sample size, power, and alpha level):

Code
pwr::pwr.chisq.test(
  power = power,
  N = N,
  df = (numRows - 1)*(numColumns - 1),
  sig.level = alpha)

     Chi squared power calculation 

              w = 0.2442875
              N = 200
             df = 4
      sig.level = 0.05
          power = 0.8

NOTE: N is the number of observations

9.6.8.11 Multilevel Modeling

Power analysis for multilevel modeling approaches is more complicated than it is for other statistical analyses, such as correlation, multiple regression, t-tests, ANOVA, etc.

There are free web applications for calculating power in multilevel modeling:

9.6.8.12 Path Analysis, Factor Analysis, and Structural Equation Modeling

Power analysis for latent variable modeling approaches like structural equation modeling (SEM) is more complicated than it is for other statistical analyses, such as correlation, multiple regression, t-tests, ANOVA, etc.

I provide an example of power analysis in SEM using Monte Carlo simulation in R here: https://isaactpetersen.github.io/Principles-Psychological-Assessment/sem.html#monteCarloPowerAnalysis (Petersen, 2024b).

There are also free web applications for calculating power in SEM:

9.6.8.13 Mediation and Moderation

There are free tools for calculating power for tests of mediation and moderation:

9.7 Conclusion

Descriptive statistics are used to describe the data, including the center, spread, or shape of data. Inferential statistics are used to draw inferences regarding differences between groups or associations between variables. Null hypothesis signficance testing is a framework for inferential statistics, in which there is a null hypothesis and alternative hypothesis. The null hypothesis is that there is no difference between groups or that there is no association between variables. Statistical significance is evaluated with a \(p\)-value, which represents the probability of obtaining a result at least as extreme as the result observed if the null hypothesis is true. Effects with p-values less than .05 are considered statistically significant. However, it is also important to consider practical significance and effect sizes. When designing a study, it is important to consider statistical power and the sample size needed to detect the hypothesized effect size. You can determine a study’s power based on the effect size, sample size, and alpha level.

9.8 Session Info

Code
sessionInfo()
R version 4.4.2 (2024-10-31)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 24.04.1 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
 [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
 [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
[10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   

time zone: UTC
tzcode source: system (glibc)

attached base packages:
[1] grid      parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] lubridate_1.9.4   forcats_1.0.0     stringr_1.5.1     dplyr_1.1.4      
 [5] purrr_1.0.2       readr_2.1.5       tidyr_1.3.1       tibble_3.2.1     
 [9] ggplot2_3.5.1     tidyverse_2.0.0   WebPower_0.9.4    PearsonDS_1.3.1  
[13] lavaan_0.6-19     lme4_1.1-35.5     Matrix_1.7-1      MASS_7.3-61      
[17] pwrss_0.3.1       pwr_1.3-0         DescTools_0.99.58 petersenlab_1.1.0

loaded via a namespace (and not attached):
 [1] tidyselect_1.2.1   Exact_3.3          psych_2.4.12       viridisLite_0.4.2 
 [5] rootSolve_1.8.2.4  farver_2.1.2       fastmap_1.2.0      digest_0.6.37     
 [9] rpart_4.1.23       timechange_0.3.0   lifecycle_1.0.4    cluster_2.1.6     
[13] lmom_3.2           magrittr_2.0.3     compiler_4.4.2     rlang_1.1.4       
[17] Hmisc_5.2-1        tools_4.4.2        yaml_2.3.10        data.table_1.16.4 
[21] knitr_1.49         labeling_0.4.3     htmlwidgets_1.6.4  mnormt_2.1.1      
[25] plyr_1.8.9         RColorBrewer_1.1-3 expm_1.0-0         foreign_0.8-87    
[29] withr_3.0.2        nnet_7.3-19        stats4_4.4.2       xtable_1.8-4      
[33] e1071_1.7-16       colorspace_2.1-1   scales_1.3.0       cli_3.6.3         
[37] mvtnorm_1.3-2      rmarkdown_2.29     generics_0.1.3     rstudioapi_0.17.1 
[41] tzdb_0.4.0         httr_1.4.7         reshape2_1.4.4     readxl_1.4.3      
[45] gld_2.6.6          minqa_1.2.8        DBI_1.2.3          proxy_0.4-27      
[49] splines_4.4.2      cellranger_1.1.0   base64enc_0.1-3    mitools_2.4       
[53] vctrs_0.6.5        boot_1.3-31        jsonlite_1.8.9     hms_1.1.3         
[57] Formula_1.2-5      htmlTable_2.4.3    glue_1.8.0         nloptr_2.1.1      
[61] stringi_1.8.4      gtable_0.3.6       quadprog_1.5-8     munsell_0.5.1     
[65] pillar_1.10.0      htmltools_0.5.8.1  R6_2.5.1           mix_1.0-13        
[69] evaluate_1.0.1     pbivnorm_0.6.0     lattice_0.22-6     haven_2.5.4       
[73] backports_1.5.0    class_7.3-22       Rcpp_1.0.13-1      gridExtra_2.3     
[77] nlme_3.1-166       checkmate_2.3.2    mgcv_1.9-1         xfun_0.49         
[81] pkgconfig_2.0.3   

Feedback

Please consider providing feedback about this textbook, so that I can make it as helpful as possible. You can provide feedback at the following link: https://forms.gle/LsnVKwqmS1VuxWD18

Email Notification

The online version of this book will remain open access. If you want to know when the print version of the book is for sale, enter your email below so I can let you know.