Data Manipulation of Longitudinal Data
1 Preamble
1.1 Load Libraries
1.2 Simulate Data
First, let’s generate some example data.
Code
set.seed(52242) # for a reproducible result
exampleData_long <- expand.grid(
ID = 1001:1003,
timepoint = 1:4
)
exampleData_long$score <- sample(
x = 50:100,
size = nrow(exampleData_long),
replace = TRUE
)The data are in long form—each participant (identified by ID) has multiple rows (i.e., each row is uniquely identified by the combination of ID and timepoint; thus, we would say the data are in ID–timepoint form):
Code
exampleData_long2 Transform Data From Long to Wide
To transform the data from long to wide, we can use the following syntax:
Code
exampleData_wide <- exampleData_long |>
tidyr::pivot_wider(
names_from = timepoint,
values_from = score,
names_prefix = "timepoint_"
)Now the data are in wide form—each participant (identified by ID) has one rows (i.e., each row is uniquely identified by ID; thus, we would say the data are in ID form):
Code
exampleData_wide3 Transform Data From Wide to Long
Code
exampleData_long2 <- exampleData_wide |>
tidyr::pivot_longer(
cols = starts_with("timepoint_"),
names_to = "timepoint",
names_prefix = "timepoint_",
values_to = "score"
)Now the data are back in wide form—each participant (identified by ID) has multiple rows (i.e., each row is uniquely identified by the combination of ID and timepoint; thus, we would say the data are in ID–timepoint form):
Code
exampleData_long24 Example of Exploding Number of Columns When Transforming From Long to Wide
Transforming from long to wide tends to work best when participants have the same values on the time variable (e.g., age). When participants all have different values on the time variable, the number of columns can explode when the data are transformed from long to wide form. For example, consider the following data:
Code
set.seed(52242) # for a reproducible result
exampleData2_long <- expand.grid(
ID = 1001:1003,
instance = 1:4
) |>
select(-instance)
exampleData2_long$age <- sample(
x = 1:99,
size = nrow(exampleData2_long),
replace = FALSE
)
exampleData2_long$score <- sample(
x = 50:100,
size = nrow(exampleData2_long),
replace = TRUE
)Here are the data in long form:
Code
exampleData2_longNow, let’s widen the data by age:
Code
exampleData2_wide <- exampleData2_long |>
tidyr::pivot_wider(
names_from = age,
values_from = score,
names_prefix = "age_"
)Here are the data in wide form:
Code
exampleData2_wideNotice how we went from one column for age when the data were in long form to 12 columns for age when the data are in wide form. It also leads to lots of missing values. There is only one observed value for each age column.
