Data Manipulation of Longitudinal Data

1 Preamble

1.1 Load Libraries

1.2 Simulate Data

First, let’s generate some example data.

Code
set.seed(52242) # for a reproducible result

exampleData_long <- expand.grid(
  ID = 1001:1003,
  timepoint = 1:4
)

exampleData_long$score <- sample(
  x = 50:100,
  size = nrow(exampleData_long),
  replace = TRUE
)

The data are in long form—each participant (identified by ID) has multiple rows (i.e., each row is uniquely identified by the combination of ID and timepoint; thus, we would say the data are in IDtimepoint form):

Code
exampleData_long

2 Transform Data From Long to Wide

To transform the data from long to wide, we can use the following syntax:

Code
exampleData_wide <- exampleData_long |>
  tidyr::pivot_wider(
    names_from  = timepoint,
    values_from = score,
    names_prefix = "timepoint_"
  )

Now the data are in wide form—each participant (identified by ID) has one rows (i.e., each row is uniquely identified by ID; thus, we would say the data are in ID form):

Code
exampleData_wide

3 Transform Data From Wide to Long

Code
exampleData_long2 <- exampleData_wide |>
  tidyr::pivot_longer(
    cols = starts_with("timepoint_"),
    names_to = "timepoint",
    names_prefix = "timepoint_",
    values_to = "score"
  )

Now the data are back in wide form—each participant (identified by ID) has multiple rows (i.e., each row is uniquely identified by the combination of ID and timepoint; thus, we would say the data are in IDtimepoint form):

Code
exampleData_long2

4 Example of Exploding Number of Columns When Transforming From Long to Wide

Transforming from long to wide tends to work best when participants have the same values on the time variable (e.g., age). When participants all have different values on the time variable, the number of columns can explode when the data are transformed from long to wide form. For example, consider the following data:

Code
set.seed(52242) # for a reproducible result

exampleData2_long <- expand.grid(
  ID = 1001:1003,
  instance = 1:4
) |>
  select(-instance)

exampleData2_long$age <- sample(
  x = 1:99,
  size = nrow(exampleData2_long),
  replace = FALSE
)

exampleData2_long$score <- sample(
  x = 50:100,
  size = nrow(exampleData2_long),
  replace = TRUE
)

Here are the data in long form:

Code
exampleData2_long

Now, let’s widen the data by age:

Code
exampleData2_wide <- exampleData2_long |>
  tidyr::pivot_wider(
    names_from  = age,
    values_from = score,
    names_prefix = "age_"
  )

Here are the data in wide form:

Code
exampleData2_wide

Notice how we went from one column for age when the data were in long form to 12 columns for age when the data are in wide form. It also leads to lots of missing values. There is only one observed value for each age column.




Developmental Psychopathology Lab