Intro to tidyverse operations

Intro to tidyverse operations#

library(tidyverse)
Hide code cell output
── Attaching core tidyverse packages ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 2.0.0 ──
 dplyr     1.1.4      readr     2.1.5
 forcats   1.0.0      stringr   1.5.1
 ggplot2   3.5.1      tibble    3.2.1
 lubridate 1.9.4      tidyr     1.3.1
 purrr     1.0.4     
── Conflicts ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
 purrr::%||%()   masks base::%||%()
 dplyr::filter() masks stats::filter()
 dplyr::lag()    masks stats::lag()
 Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Piping#

Write code in the order its run!

Read left to right instead of inside -> out!

Avoid PEMDAS confusion!

  • Without pipes

    • cool_down(workout(warm_up(person)))

  • With pipes

    • person |> warm_up() |> workout() |> cool_down()

Usually written with line breaks to help with readability:

person |> 
  warm_up() |> 
  workout() |> 
  cool_down()

Use CTRL+SHIFT+M to type the pipe automagically in RStudio (use CMD instead of CTRL on Mac ofc). Once your fingers get used to it, it is more convenient; doesn’t feel like it at first.

How can I find the absolute sum of just the first 6 elements of x? (note: the head() function is a convenient way to get the first 6 elements).

Solve this with and without piping and compare.

x <- c(1, -2, 3, -4, 5, -6, 7, -8, 9, -10)

Without pipes:

# head -> abs -> sum
sum(abs(head(x)))
21

With pipes:

# head -> abs -> sum
x |> 
  head(6) |> 
  abs() |> 
  sum()
21

Data manipulation toolkit#

Functions to know for tidy data manipulation:

  • Grab certain columns with select()

  • Filter data.frames with filter()

  • Sort data.frames with arrange()

    • Often paired with desc() for sorting in descending order

  • Create/modify columns with mutate()

  • Create summaries with group_by() |> summarise()

    • package created by a New Zealander; use z to feel more American

    • useful to use n() function to count rows within summarise()

Good to know exist, but less focus for us as we get started:

  • Join data.frames with:

    • inner_join()

    • left_join()

    • right_join()

    • full_join()

  • Stack data.frames with bind_rows()

Data practice#

Get to know the starwars data.frame

# reset data to original
data("starwars", package = "dplyr")

# summary(starwars) # scares adam v much
summary(starwars$height)
names(starwars)
nrow(starwars)
ncol(starwars)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
   66.0   167.0   180.0   174.6   191.0   264.0       6 
  1. 'name'
  2. 'height'
  3. 'mass'
  4. 'hair_color'
  5. 'skin_color'
  6. 'eye_color'
  7. 'birth_year'
  8. 'sex'
  9. 'gender'
  10. 'homeworld'
  11. 'species'
  12. 'films'
  13. 'vehicles'
  14. 'starships'
87
14

Currently height is listed in cm. Create a height_in column showing the height in inches.

starwars <- starwars |> 
  mutate(height_in = height / 2.54)

starwars |>
  select(name, height_in) |>
  arrange(height_in)
A tibble: 87 × 2
nameheight_in
<chr><dbl>
Yoda 25.98425
Ratts Tyerel 31.10236
Wicket Systri Warrick34.64567
Dud Bolt 37.00787
R2-D2 37.79528
R4-P17 37.79528
R5-D4 38.18898
Sebulba 44.09449
Gasgano 48.03150
Watto 53.93701
Leia Organa 59.05512
Mon Mothma 59.05512
Cordé 61.81102
Nien Nunb 62.99213
Shmi Skywalker 64.17323
Ben Quadinaros 64.17323
Beru Whitesun Lars 64.96063
Dormé 64.96063
Barriss Offee 65.35433
C-3PO 65.74803
Jocasta Nu 65.74803
Zam Wesell 66.14173
Wedge Antilles 66.92913
Palpatine 66.92913
Finis Valorum 66.92913
Luminara Unduli 66.92913
Eeth Koth 67.32283
Luke Skywalker 67.71654
Greedo 68.11024
Jabba Desilijic Tiure68.89764
Raymus Antilles 74.01575
Bossk 74.80315
Nute Gunray 75.19685
Bail Prestor Organa 75.19685
San Hill 75.19685
Qui-Gon Jinn 75.98425
Dooku 75.98425
Wat Tambor 75.98425
Jar Jar Binks 77.16535
Kit Fisto 77.16535
Mas Amedda 77.16535
Ki-Adi-Mundi 77.95276
Dexter Jettster 77.95276
IG-88 78.74016
Darth Vader 79.52756
Rugor Nass 81.10236
Tion Medon 81.10236
Taun We 83.85827
Grievous 85.03937
Roos Tarpals 88.18898
Chewbacca 89.76378
Lama Su 90.15748
Tarfful 92.12598
Yarael Poof 103.93701
Arvel Crynyd NA
Finn NA
Rey NA
Poe Dameron NA
BB8 NA
Captain Phasma NA

This is equivalent to:

starwars$height_in <- starwars$height / 2.54
starwars[ order(starwars$height_in) , c('name', 'height_in') ]
A tibble: 87 × 2
nameheight_in
<chr><dbl>
Yoda 25.98425
Ratts Tyerel 31.10236
Wicket Systri Warrick34.64567
Dud Bolt 37.00787
R2-D2 37.79528
R4-P17 37.79528
R5-D4 38.18898
Sebulba 44.09449
Gasgano 48.03150
Watto 53.93701
Leia Organa 59.05512
Mon Mothma 59.05512
Cordé 61.81102
Nien Nunb 62.99213
Shmi Skywalker 64.17323
Ben Quadinaros 64.17323
Beru Whitesun Lars 64.96063
Dormé 64.96063
Barriss Offee 65.35433
C-3PO 65.74803
Jocasta Nu 65.74803
Zam Wesell 66.14173
Wedge Antilles 66.92913
Palpatine 66.92913
Finis Valorum 66.92913
Luminara Unduli 66.92913
Eeth Koth 67.32283
Luke Skywalker 67.71654
Greedo 68.11024
Jabba Desilijic Tiure68.89764
Raymus Antilles 74.01575
Bossk 74.80315
Nute Gunray 75.19685
Bail Prestor Organa 75.19685
San Hill 75.19685
Qui-Gon Jinn 75.98425
Dooku 75.98425
Wat Tambor 75.98425
Jar Jar Binks 77.16535
Kit Fisto 77.16535
Mas Amedda 77.16535
Ki-Adi-Mundi 77.95276
Dexter Jettster 77.95276
IG-88 78.74016
Darth Vader 79.52756
Rugor Nass 81.10236
Tion Medon 81.10236
Taun We 83.85827
Grievous 85.03937
Roos Tarpals 88.18898
Chewbacca 89.76378
Lama Su 90.15748
Tarfful 92.12598
Yarael Poof 103.93701
Arvel Crynyd NA
Finn NA
Rey NA
Poe Dameron NA
BB8 NA
Captain Phasma NA

Filter to all the characters under 4 ft tall.

starwars |> 
  filter(height_in < 4 * 12) |> 
  select(name, height_in) |> 
  arrange(height_in)
A tibble: 8 × 2
nameheight_in
<chr><dbl>
Yoda 25.98425
Ratts Tyerel 31.10236
Wicket Systri Warrick34.64567
Dud Bolt 37.00787
R2-D2 37.79528
R4-P17 37.79528
R5-D4 38.18898
Sebulba 44.09449

This is equivalent to:

starwars_sub <- starwars
starwars_sub <- subset(starwars_sub, height_in < 4*12)
starwars_sub <- starwars_sub[ order(starwars_sub$height_in) , ]
starwars_sub[ , c('name', 'height_in') ]
A tibble: 8 × 2
nameheight_in
<chr><dbl>
Yoda 25.98425
Ratts Tyerel 31.10236
Wicket Systri Warrick34.64567
Dud Bolt 37.00787
R2-D2 37.79528
R4-P17 37.79528
R5-D4 38.18898
Sebulba 44.09449

Show the top 5 tallest characters.

starwars |>
  arrange(desc(height_in)) |> 
  select(name, height_in) |> 
  head(5)
A tibble: 5 × 2
nameheight_in
<chr><dbl>
Yarael Poof 103.93701
Tarfful 92.12598
Lama Su 90.15748
Chewbacca 89.76378
Roos Tarpals 88.18898

This is equivalent to:

starwars[ order(starwars$height_in, decreasing=TRUE)[1:5] , c('name', 'height_in') ]
A tibble: 5 × 2
nameheight_in
<chr><dbl>
Yarael Poof 103.93701
Tarfful 92.12598
Lama Su 90.15748
Chewbacca 89.76378
Roos Tarpals 88.18898

Show the top 5 tallest humans

starwars |> 
  filter(species == "Human") |> 
  arrange(desc(height_in)) |> 
  select(name, height_in) |> 
  head(5)
A tibble: 5 × 2
nameheight_in
<chr><dbl>
Darth Vader 79.52756
Qui-Gon Jinn 75.98425
Dooku 75.98425
Bail Prestor Organa75.19685
Anakin Skywalker 74.01575

This is equivalent to:

starwars_sub <- starwars
starwars_sub <- subset(starwars_sub, species=='Human')
starwars_sub <- starwars_sub[ order(starwars_sub$height, decreasing=TRUE) , ]
starwars_sub[ 1:5 , c('name', 'height_in') ]
A tibble: 5 × 2
nameheight_in
<chr><dbl>
Darth Vader 79.52756
Qui-Gon Jinn 75.98425
Dooku 75.98425
Bail Prestor Organa75.19685
Anakin Skywalker 74.01575

Show the average mass by species (this is in kilograms).

starwars |> 
  group_by(species) |>
  summarise(
    avg_mass = mean(mass, na.rm = TRUE),
    n = n()
  )
A tibble: 38 × 3
speciesavg_massn
<chr><dbl><int>
Aleena 15.00 1
Besalisk 102.00 1
Cerean 82.00 1
Chagrian NaN 1
Clawdite 55.00 1
Droid 69.75 6
Dug 40.00 1
Ewok 20.00 1
Geonosian 80.00 1
Gungan 74.00 3
Human 81.3135
Hutt 1358.00 1
Iktotchi NaN 1
Kaleesh 159.00 1
Kaminoan 88.00 2
Kel Dor 80.00 1
Mirialan 53.10 2
Mon Calamari 83.00 1
Muun NaN 1
Nautolan 87.00 1
Neimodian 90.00 1
Pau'an 80.00 1
Quermian NaN 1
Rodian 74.00 1
Skakoan 48.00 1
Sullustan 68.00 1
Tholothian 50.00 1
Togruta 57.00 1
Toong 65.00 1
Toydarian NaN 1
Trandoshan 113.00 1
Twi'lek 55.00 2
Vulptereen 45.00 1
Wookiee 124.00 2
Xexto NaN 1
Yoda's species 17.00 1
Zabrak 80.00 2
NA 81.00 4

Important

Please write the eqivalent code with R loops, conditions, and/or functions.