# A tibble: 2 × 5
subject time age weight height
<chr> <dbl> <dbl> <dbl> <dbl>
1 John Smith 1 33 90 1.87
2 Mary Smith 1 NA NA 1.54
October 7, 2024
pivot_longer()
or pivot_wider()
functions from tidyr
to do thisQuery: Are these data in a tidy format?
# A tibble: 2 × 5
subject time age weight height
<chr> <dbl> <dbl> <dbl> <dbl>
1 John Smith 1 33 90 1.87
2 Mary Smith 1 NA NA 1.54
Tip
To get a list of data frames available in a package use the data()
function, e.g. data(package = "tidyr")
.
How about these data?
# A tibble: 1,064 × 20
country indicator `2000` `2001` `2002` `2003` `2004` `2005` `2006`
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 ABW SP.URB.TOTL 4.16e4 4.20e+4 4.22e+4 4.23e+4 4.23e+4 4.24e+4 4.26e+4
2 ABW SP.URB.GROW 1.66e0 9.56e-1 4.01e-1 1.97e-1 9.46e-2 1.94e-1 3.67e-1
3 ABW SP.POP.TOTL 8.91e4 9.07e+4 9.18e+4 9.27e+4 9.35e+4 9.45e+4 9.56e+4
4 ABW SP.POP.GROW 2.54e0 1.77e+0 1.19e+0 9.97e-1 9.01e-1 1.00e+0 1.18e+0
5 AFE SP.URB.TOTL 1.16e8 1.20e+8 1.24e+8 1.29e+8 1.34e+8 1.39e+8 1.44e+8
6 AFE SP.URB.GROW 3.60e0 3.66e+0 3.72e+0 3.71e+0 3.74e+0 3.81e+0 3.81e+0
7 AFE SP.POP.TOTL 4.02e8 4.12e+8 4.23e+8 4.34e+8 4.45e+8 4.57e+8 4.70e+8
8 AFE SP.POP.GROW 2.58e0 2.59e+0 2.61e+0 2.62e+0 2.64e+0 2.67e+0 2.70e+0
9 AFG SP.URB.TOTL 4.31e6 4.36e+6 4.67e+6 5.06e+6 5.30e+6 5.54e+6 5.83e+6
10 AFG SP.URB.GROW 1.86e0 1.15e+0 6.86e+0 7.95e+0 4.59e+0 4.47e+0 5.03e+0
# ℹ 1,054 more rows
# ℹ 11 more variables: `2007` <dbl>, `2008` <dbl>, `2009` <dbl>, `2010` <dbl>,
# `2011` <dbl>, `2012` <dbl>, `2013` <dbl>, `2014` <dbl>, `2015` <dbl>,
# `2016` <dbl>, `2017` <dbl>
pivot_longer()
takes three arguments:
library(tidyr)
library(dplyr)
# Pivot using pivot_longer
long_pop_data <- world_bank_pop |>
pivot_longer(
cols = `2000`:`2017`, # The columns you want to pivot (years)
names_to = "year", # New column name for the years
values_to = "pop" # New column name for the values
)
# View the tidied data
long_pop_data
# A tibble: 19,152 × 4
country indicator year pop
<chr> <chr> <chr> <dbl>
1 ABW SP.URB.TOTL 2000 41625
2 ABW SP.URB.TOTL 2001 42025
3 ABW SP.URB.TOTL 2002 42194
4 ABW SP.URB.TOTL 2003 42277
5 ABW SP.URB.TOTL 2004 42317
6 ABW SP.URB.TOTL 2005 42399
7 ABW SP.URB.TOTL 2006 42555
8 ABW SP.URB.TOTL 2007 42729
9 ABW SP.URB.TOTL 2008 42906
10 ABW SP.URB.TOTL 2009 43079
# ℹ 19,142 more rows
This is better, usable even, but are we done if we want a tidy data frame?
# A tibble: 19,152 × 4
country indicator year pop
<chr> <chr> <chr> <dbl>
1 ABW SP.URB.TOTL 2000 41625
2 ABW SP.URB.TOTL 2001 42025
3 ABW SP.URB.TOTL 2002 42194
4 ABW SP.URB.TOTL 2003 42277
5 ABW SP.URB.TOTL 2004 42317
6 ABW SP.URB.TOTL 2005 42399
7 ABW SP.URB.TOTL 2006 42555
8 ABW SP.URB.TOTL 2007 42729
9 ABW SP.URB.TOTL 2008 42906
10 ABW SP.URB.TOTL 2009 43079
# ℹ 19,142 more rows
Issue is that the data are in long form (which is OK for some purposes), but we want to make it wider. Wider, but tidy…
# A tibble: 19,152 × 4
country indicator year pop
<chr> <chr> <chr> <dbl>
1 ABW SP.URB.TOTL 2000 41625
2 ABW SP.URB.TOTL 2001 42025
3 ABW SP.URB.TOTL 2002 42194
4 ABW SP.URB.TOTL 2003 42277
5 ABW SP.URB.TOTL 2004 42317
6 ABW SP.URB.TOTL 2005 42399
7 ABW SP.URB.TOTL 2006 42555
8 ABW SP.URB.TOTL 2007 42729
9 ABW SP.URB.TOTL 2008 42906
10 ABW SP.URB.TOTL 2009 43079
# ℹ 19,142 more rows
pivot_wider()
takes three main arguments:
NA
, 0
)pivotwider()
the WB Data# pivot wider
tidy_pop_data <- long_pop_data |>
pivot_wider(
names_from = indicator,
values_from = pop
)
# view the data
tidy_pop_data
# A tibble: 4,788 × 6
country year SP.URB.TOTL SP.URB.GROW SP.POP.TOTL SP.POP.GROW
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 ABW 2000 41625 1.66 89101 2.54
2 ABW 2001 42025 0.956 90691 1.77
3 ABW 2002 42194 0.401 91781 1.19
4 ABW 2003 42277 0.197 92701 0.997
5 ABW 2004 42317 0.0946 93540 0.901
6 ABW 2005 42399 0.194 94483 1.00
7 ABW 2006 42555 0.367 95606 1.18
8 ABW 2007 42729 0.408 96787 1.23
9 ABW 2008 42906 0.413 97996 1.24
10 ABW 2009 43079 0.402 99212 1.23
# ℹ 4,778 more rows
pivotwider()
the WB Data# A tibble: 4,788 × 6
country year SP.URB.TOTL SP.URB.GROW SP.POP.TOTL SP.POP.GROW
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 ABW 2000 41625 1.66 89101 2.54
2 ABW 2001 42025 0.956 90691 1.77
3 ABW 2002 42194 0.401 91781 1.19
4 ABW 2003 42277 0.197 92701 0.997
5 ABW 2004 42317 0.0946 93540 0.901
6 ABW 2005 42399 0.194 94483 1.00
7 ABW 2006 42555 0.367 95606 1.18
8 ABW 2007 42729 0.408 96787 1.23
9 ABW 2008 42906 0.413 97996 1.24
10 ABW 2009 43079 0.402 99212 1.23
# ℹ 4,778 more rows
/data
folderpivot_longer()
to get the data in long formpivot_wider()
to get the series in the columns10:00
dplyr
verbs mutate()
or mutate_at()
mutate()
is if you want to change on variableacross()
for multiple variablesjanitor
package!read_csv()
or read_excel()