Intro to tidyverse operations#
library(tidyverse)
Show code cell output
── Attaching core tidyverse packages ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.0.4
── Conflicts ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
✖ purrr::%||%() masks base::%||%()
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Piping#
Write code in the order its run!
Read left to right instead of inside -> out!
Avoid PEMDAS confusion!
Without pipes
cool_down(workout(warm_up(person)))
With pipes
person |> warm_up() |> workout() |> cool_down()
Usually written with line breaks to help with readability:
person |>
warm_up() |>
workout() |>
cool_down()
Use CTRL+SHIFT+M to type the pipe automagically in RStudio (use CMD instead of CTRL on Mac ofc). Once your fingers get used to it, it is more convenient; doesn’t feel like it at first.
How can I find the absolute sum of just the first 6 elements of x? (note: the head() function is a convenient way to get the first 6 elements).
Solve this with and without piping and compare.
x <- c(1, -2, 3, -4, 5, -6, 7, -8, 9, -10)
Without pipes:
# head -> abs -> sum
sum(abs(head(x)))
With pipes:
# head -> abs -> sum
x |>
head(6) |>
abs() |>
sum()
Data manipulation toolkit#
Functions to know for tidy data manipulation:
Grab certain columns with
select()Filter data.frames with
filter()Sort data.frames with
arrange()Often paired with
desc()for sorting in descending order
Create/modify columns with
mutate()Create summaries with
group_by() |> summarise()package created by a New Zealander; use z to feel more American
useful to use
n()function to count rows withinsummarise()
Good to know exist, but less focus for us as we get started:
Join data.frames with:
inner_join()left_join()right_join()full_join()
Stack data.frames with
bind_rows()
Data practice#
Get to know the starwars data.frame
# reset data to original
data("starwars", package = "dplyr")
# summary(starwars) # scares adam v much
summary(starwars$height)
names(starwars)
nrow(starwars)
ncol(starwars)
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
66.0 167.0 180.0 174.6 191.0 264.0 6
- 'name'
- 'height'
- 'mass'
- 'hair_color'
- 'skin_color'
- 'eye_color'
- 'birth_year'
- 'sex'
- 'gender'
- 'homeworld'
- 'species'
- 'films'
- 'vehicles'
- 'starships'
Currently height is listed in cm. Create a height_in column showing the height in inches.
starwars <- starwars |>
mutate(height_in = height / 2.54)
starwars |>
select(name, height_in) |>
arrange(height_in)
| name | height_in |
|---|---|
| <chr> | <dbl> |
| Yoda | 25.98425 |
| Ratts Tyerel | 31.10236 |
| Wicket Systri Warrick | 34.64567 |
| Dud Bolt | 37.00787 |
| R2-D2 | 37.79528 |
| R4-P17 | 37.79528 |
| R5-D4 | 38.18898 |
| Sebulba | 44.09449 |
| Gasgano | 48.03150 |
| Watto | 53.93701 |
| Leia Organa | 59.05512 |
| Mon Mothma | 59.05512 |
| Cordé | 61.81102 |
| Nien Nunb | 62.99213 |
| Shmi Skywalker | 64.17323 |
| Ben Quadinaros | 64.17323 |
| Beru Whitesun Lars | 64.96063 |
| Dormé | 64.96063 |
| Barriss Offee | 65.35433 |
| C-3PO | 65.74803 |
| Jocasta Nu | 65.74803 |
| Zam Wesell | 66.14173 |
| Wedge Antilles | 66.92913 |
| Palpatine | 66.92913 |
| Finis Valorum | 66.92913 |
| Luminara Unduli | 66.92913 |
| Eeth Koth | 67.32283 |
| Luke Skywalker | 67.71654 |
| Greedo | 68.11024 |
| Jabba Desilijic Tiure | 68.89764 |
| ⋮ | ⋮ |
| Raymus Antilles | 74.01575 |
| Bossk | 74.80315 |
| Nute Gunray | 75.19685 |
| Bail Prestor Organa | 75.19685 |
| San Hill | 75.19685 |
| Qui-Gon Jinn | 75.98425 |
| Dooku | 75.98425 |
| Wat Tambor | 75.98425 |
| Jar Jar Binks | 77.16535 |
| Kit Fisto | 77.16535 |
| Mas Amedda | 77.16535 |
| Ki-Adi-Mundi | 77.95276 |
| Dexter Jettster | 77.95276 |
| IG-88 | 78.74016 |
| Darth Vader | 79.52756 |
| Rugor Nass | 81.10236 |
| Tion Medon | 81.10236 |
| Taun We | 83.85827 |
| Grievous | 85.03937 |
| Roos Tarpals | 88.18898 |
| Chewbacca | 89.76378 |
| Lama Su | 90.15748 |
| Tarfful | 92.12598 |
| Yarael Poof | 103.93701 |
| Arvel Crynyd | NA |
| Finn | NA |
| Rey | NA |
| Poe Dameron | NA |
| BB8 | NA |
| Captain Phasma | NA |
This is equivalent to:
starwars$height_in <- starwars$height / 2.54
starwars[ order(starwars$height_in) , c('name', 'height_in') ]
| name | height_in |
|---|---|
| <chr> | <dbl> |
| Yoda | 25.98425 |
| Ratts Tyerel | 31.10236 |
| Wicket Systri Warrick | 34.64567 |
| Dud Bolt | 37.00787 |
| R2-D2 | 37.79528 |
| R4-P17 | 37.79528 |
| R5-D4 | 38.18898 |
| Sebulba | 44.09449 |
| Gasgano | 48.03150 |
| Watto | 53.93701 |
| Leia Organa | 59.05512 |
| Mon Mothma | 59.05512 |
| Cordé | 61.81102 |
| Nien Nunb | 62.99213 |
| Shmi Skywalker | 64.17323 |
| Ben Quadinaros | 64.17323 |
| Beru Whitesun Lars | 64.96063 |
| Dormé | 64.96063 |
| Barriss Offee | 65.35433 |
| C-3PO | 65.74803 |
| Jocasta Nu | 65.74803 |
| Zam Wesell | 66.14173 |
| Wedge Antilles | 66.92913 |
| Palpatine | 66.92913 |
| Finis Valorum | 66.92913 |
| Luminara Unduli | 66.92913 |
| Eeth Koth | 67.32283 |
| Luke Skywalker | 67.71654 |
| Greedo | 68.11024 |
| Jabba Desilijic Tiure | 68.89764 |
| ⋮ | ⋮ |
| Raymus Antilles | 74.01575 |
| Bossk | 74.80315 |
| Nute Gunray | 75.19685 |
| Bail Prestor Organa | 75.19685 |
| San Hill | 75.19685 |
| Qui-Gon Jinn | 75.98425 |
| Dooku | 75.98425 |
| Wat Tambor | 75.98425 |
| Jar Jar Binks | 77.16535 |
| Kit Fisto | 77.16535 |
| Mas Amedda | 77.16535 |
| Ki-Adi-Mundi | 77.95276 |
| Dexter Jettster | 77.95276 |
| IG-88 | 78.74016 |
| Darth Vader | 79.52756 |
| Rugor Nass | 81.10236 |
| Tion Medon | 81.10236 |
| Taun We | 83.85827 |
| Grievous | 85.03937 |
| Roos Tarpals | 88.18898 |
| Chewbacca | 89.76378 |
| Lama Su | 90.15748 |
| Tarfful | 92.12598 |
| Yarael Poof | 103.93701 |
| Arvel Crynyd | NA |
| Finn | NA |
| Rey | NA |
| Poe Dameron | NA |
| BB8 | NA |
| Captain Phasma | NA |
Filter to all the characters under 4 ft tall.
starwars |>
filter(height_in < 4 * 12) |>
select(name, height_in) |>
arrange(height_in)
| name | height_in |
|---|---|
| <chr> | <dbl> |
| Yoda | 25.98425 |
| Ratts Tyerel | 31.10236 |
| Wicket Systri Warrick | 34.64567 |
| Dud Bolt | 37.00787 |
| R2-D2 | 37.79528 |
| R4-P17 | 37.79528 |
| R5-D4 | 38.18898 |
| Sebulba | 44.09449 |
This is equivalent to:
starwars_sub <- starwars
starwars_sub <- subset(starwars_sub, height_in < 4*12)
starwars_sub <- starwars_sub[ order(starwars_sub$height_in) , ]
starwars_sub[ , c('name', 'height_in') ]
| name | height_in |
|---|---|
| <chr> | <dbl> |
| Yoda | 25.98425 |
| Ratts Tyerel | 31.10236 |
| Wicket Systri Warrick | 34.64567 |
| Dud Bolt | 37.00787 |
| R2-D2 | 37.79528 |
| R4-P17 | 37.79528 |
| R5-D4 | 38.18898 |
| Sebulba | 44.09449 |
Show the top 5 tallest characters.
starwars |>
arrange(desc(height_in)) |>
select(name, height_in) |>
head(5)
| name | height_in |
|---|---|
| <chr> | <dbl> |
| Yarael Poof | 103.93701 |
| Tarfful | 92.12598 |
| Lama Su | 90.15748 |
| Chewbacca | 89.76378 |
| Roos Tarpals | 88.18898 |
This is equivalent to:
starwars[ order(starwars$height_in, decreasing=TRUE)[1:5] , c('name', 'height_in') ]
| name | height_in |
|---|---|
| <chr> | <dbl> |
| Yarael Poof | 103.93701 |
| Tarfful | 92.12598 |
| Lama Su | 90.15748 |
| Chewbacca | 89.76378 |
| Roos Tarpals | 88.18898 |
Show the top 5 tallest humans
starwars |>
filter(species == "Human") |>
arrange(desc(height_in)) |>
select(name, height_in) |>
head(5)
| name | height_in |
|---|---|
| <chr> | <dbl> |
| Darth Vader | 79.52756 |
| Qui-Gon Jinn | 75.98425 |
| Dooku | 75.98425 |
| Bail Prestor Organa | 75.19685 |
| Anakin Skywalker | 74.01575 |
This is equivalent to:
starwars_sub <- starwars
starwars_sub <- subset(starwars_sub, species=='Human')
starwars_sub <- starwars_sub[ order(starwars_sub$height, decreasing=TRUE) , ]
starwars_sub[ 1:5 , c('name', 'height_in') ]
| name | height_in |
|---|---|
| <chr> | <dbl> |
| Darth Vader | 79.52756 |
| Qui-Gon Jinn | 75.98425 |
| Dooku | 75.98425 |
| Bail Prestor Organa | 75.19685 |
| Anakin Skywalker | 74.01575 |
Show the average mass by species (this is in kilograms).
starwars |>
group_by(species) |>
summarise(
avg_mass = mean(mass, na.rm = TRUE),
n = n()
)
| species | avg_mass | n |
|---|---|---|
| <chr> | <dbl> | <int> |
| Aleena | 15.00 | 1 |
| Besalisk | 102.00 | 1 |
| Cerean | 82.00 | 1 |
| Chagrian | NaN | 1 |
| Clawdite | 55.00 | 1 |
| Droid | 69.75 | 6 |
| Dug | 40.00 | 1 |
| Ewok | 20.00 | 1 |
| Geonosian | 80.00 | 1 |
| Gungan | 74.00 | 3 |
| Human | 81.31 | 35 |
| Hutt | 1358.00 | 1 |
| Iktotchi | NaN | 1 |
| Kaleesh | 159.00 | 1 |
| Kaminoan | 88.00 | 2 |
| Kel Dor | 80.00 | 1 |
| Mirialan | 53.10 | 2 |
| Mon Calamari | 83.00 | 1 |
| Muun | NaN | 1 |
| Nautolan | 87.00 | 1 |
| Neimodian | 90.00 | 1 |
| Pau'an | 80.00 | 1 |
| Quermian | NaN | 1 |
| Rodian | 74.00 | 1 |
| Skakoan | 48.00 | 1 |
| Sullustan | 68.00 | 1 |
| Tholothian | 50.00 | 1 |
| Togruta | 57.00 | 1 |
| Toong | 65.00 | 1 |
| Toydarian | NaN | 1 |
| Trandoshan | 113.00 | 1 |
| Twi'lek | 55.00 | 2 |
| Vulptereen | 45.00 | 1 |
| Wookiee | 124.00 | 2 |
| Xexto | NaN | 1 |
| Yoda's species | 17.00 | 1 |
| Zabrak | 80.00 | 2 |
| NA | 81.00 | 4 |
Important
Please write the eqivalent code with R loops, conditions, and/or functions.