R: Conditional Statements#

Often our analysis requires to perform a computation one way if a certain condition is true, and another way if a different condition is true. For example, if a value is \(999\) (the code the inputter used for missing data) we want to replace it with an \(NA\), otherwise we want to leave the value alone. To accomplish this, we use the \(if\) conditional statements.

number <- 12.4
if(number >= 10) {
  print("This number is bigger than 10!")
} else {
  print("This number is smaller than 10")
}
[1] "This number is bigger than 10!"
if (condition) {
  commands when condition true
} else {
  commands when condition false
}
  • The ``condition” is some logical statement that uses symbols such as \(<, <=, >, >=, ==, !=, \%in\%\) or \(is.na\)

  • It is possible to nest conditional statements in the ``commands when\(\ldots\)” stage to make very complex statements.

  • You do not need to fit all the commands in the curly brackets on the same line.

  • If you don’t want to run any code when the ``condition” is false you can omit the else statement

General examples#

Assign a value to \(x\) of 15 if \(gender\) is \(Male\) and 10 if \(Female\):

gender <- "Male"
if(gender=="Male") { x <- 15 } else { x <- 10 }
x
gender <- "Female"
if(gender=="Male") { x <- 15 } else { x <- 10 }
x
15
10

When R sees the \(if\) statement, it checks the logical condition in the parentheses. In the first example, \(gender=="Male"\) evaluates to \(TRUE\), so the code in the immediately proceeding curly brackets is run and the code after the \(else\) is skipped.

In the second example, \(gender=="Male"\) evaluates to \(FALSE\), so R skips the code in the immediately proceeding curly brackets and searches for the \(else\), running the code in the curly brackets after the \(else\) instead.

Check to see if all elements of a vector \(x\) are the same:

x <- c(8,3,2,2,2)
if(length(unique(x)) != 1 ) {print("Not all elements the same")}
x <- c(9,9,9,9,9)
if(length(unique(x)) != 1 ) {print("Not all elements the same")}
[1] "Not all elements the same"

When R sees the \(if\) statement, it checks the logical condition in the parentheses. In the first example, \(length(unique(x)) != 1\) evaluates to \(TRUE\), so the code in the immediately proceeding curly brackets is run.

In the second example, \(length(unique(x)) != 1\) evaluates to \(FALSE\), so R skips the code in the immediately proceeding curly brackets and searches for the \(else\). There is no \(else\), so no code is run (and nothing is printed to the screen).

Checking Multiple Conditions#

Sometimes we may want something like “if A and B are both true, then \(\ldots\)” or ``if either A or B are true, then \(\ldots\)”. It is possible to put ands and ors in the conditional statements with \(\&\) and \(\|\).

x <- seq(5,55,by=5)
between10and20 <- c()  #goal:  yes if value is between 10 and 20, no otherwise
for (i in 1:length(x)) {
  if(x[i] >= 10 & x[i] <= 20) 
    between10and20[i] <- "yes"
  else
    between10and20[i] <- "no"
}
between10and20 <- factor( between10and20 )
table(between10and20)
between10and20
 no yes 
  8   3 

Imagine a multiple choice question where both A and B are correct. If the person responded A or if they responded B, their score on that question is 1; otherwise, it is 0.

answers <- c("A","C","B","D","D","A","A","B","C","C","A")
score <- rep(0,length(answers))
for (i in 1:length(answers)) {
   if(answers[i]=="A" | answers[i]=="B") 
     score[i] <- 1
   else
     score[i] <- 0
}
score
  1. 1
  2. 0
  3. 1
  4. 0
  5. 0
  6. 1
  7. 1
  8. 1
  9. 0
  10. 0
  11. 1

Checking Multiple Elements#

Note

Caution: \(if\) statements check only ONE element at a time.

You may have noticed the last few examples had the \(if\) statements nested inside \(for\) loops. Why can’t we do this for all elements in a vector at once?

x <- c(1,3,6,2,4,4,4,8,10,3)
x > 5  #this gives a vector of TRUE and FALSES
[1] FALSE FALSE  TRUE FALSE FALSE FALSE FALSE  TRUE  TRUE FALSE

if( x > 5) { biggerthan5 <- "yes" } else { biggerthan5 <- "no" }  #NOPE
Warning message:
In if (x > 5) { :
  the condition has length > 1 and only the first element will be used
biggerthan5
[1] "no"

Note

The logical statement in parentheses after the \(if\) needs to evaluate to either a single \(TRUE\) or a single \(FALSE\).

If the logical statement evaluated to a vector of \(TRUE\)s and \(FALSES\)s, only the first element is used when determining the outcome.

Shortcut: \(ifelse\)#

Because recoding values is such a common task, there is a shortcut to apply the \(if\) statement to a vector and that is the \(ifelse\) statement.

x <- c(1,3,6,2,4,4,4,8,10,3); x>5
y <- ifelse(x>5,"yes","no"); y
  1. FALSE
  2. FALSE
  3. TRUE
  4. FALSE
  5. FALSE
  6. FALSE
  7. FALSE
  8. TRUE
  9. TRUE
  10. FALSE
  1. 'no'
  2. 'no'
  3. 'yes'
  4. 'no'
  5. 'no'
  6. 'no'
  7. 'no'
  8. 'yes'
  9. 'yes'
  10. 'no'

\(ifelse\) will go through each element of the vector, equivalent to a \(for\) loop that uses an \(if\) statement.

z <- c()
for (i in 1:length(x)) { 
  if( x[i] > 5 ) { z[i] <- "yes" }
  else { z[i] <- "no" }
}
z
  1. 'no'
  2. 'no'
  3. 'yes'
  4. 'no'
  5. 'no'
  6. 'no'
  7. 'no'
  8. 'yes'
  9. 'yes'
  10. 'no'

Example - lifetime value#

A company wants to classify high-value or normal-value customers, based on whether their monetary purchases have exceeded $5000.

library(regclass); data(CUSTVALUE)
Hide code cell output
Loading required package: bestglm
Loading required package: leaps
Loading required package: VGAM
Loading required package: stats4
Loading required package: splines
Loading required package: rpart
Loading required package: randomForest
randomForest 4.7-1.2
Type rfNews() to see new features/changes/bug fixes.
Important regclass change from 1.3:
All functions that had a . in the name now have an _
all.correlations -> all_correlations, cor.demo -> cor_demo, etc.
#empty factor with levels High and Normal
outcome <- c()
for (i in 1:nrow(CUSTVALUE) ) {
  if (CUSTVALUE$LifetimeValue[i] > 5000) {outcome[i] <- "High"} 
  else {outcome[i] <- "Normal"}
}
CUSTVALUE$class1 <- outcome #classification by the for loops
CUSTVALUE$class2 <- ifelse(CUSTVALUE$LifetimeValue>5000, "High", "Normal")
head(CUSTVALUE[,c("LifetimeValue","class1","class2")])
A data.frame: 6 × 3
LifetimeValueclass1class2
<dbl><chr><chr>
16134.30High High
23523.62NormalNormal
34080.62NormalNormal
4-638.47NormalNormal
55446.32High High
63488.07NormalNormal

Nested \(if\) statements#

Grade Classification (nested \(if\)s)#

What if you are recoding values and there are more than two options (i.e., grade classification at the end of the semester). It is possible to nest \(if\) statements to accomplish this feat. Be extra mindful of parentheses.

scores <- c(63,92,91,85,74,81)  #course scores
grades <- c()  #initialize vector of grades
for (i in 1:length(scores)) {
  if(scores[i]>=90) grades[i] <- "A"
  else {
    if(scores[i]>=80) grades[i] <- "B"
    else {
      if(scores[i]>=70) grades[i] <- "C"
      else grades[i] <- "F"
    }
  }
}
grades
  1. 'F'
  2. 'A'
  3. 'A'
  4. 'B'
  5. 'C'
  6. 'B'

Sequence of \(if\)s as an alternative to nesting#

Depending on how much \(if\) statements are nested, the amount of curly brackets and indentation can make the code less readable. You can always write a sequence of \(if\) statements instead, but the logical statements they contain will be more involved.

for (i in 1:length(scores)) {
  if(scores[i]>=90) { grades[i] <- "A" }
  if(scores[i]< 90 & scores[i] >= 80) { grades[i] <- "B" }
  if(scores[i]< 80 & scores[i] >= 70) { grades[i] <- "C" }
  if(scores[i]<=70) { grades[i] <- "F" }
}

When avoiding nesting by writing a sequence of \(if\)s, be careful that it’s only possible for one of the logical statements in the sequence to be true!

Conditional iteration skipping and breaking#

Conditional \(next\)#

Imagine you want to recommend products to customers based on the coupons they have redeemed. However, when a customer hasn’t redeemed any coupons, that recommendation is not possible. If the number of coupons redeemed equals 0, you’d like the code to skip ahead to the next customer without wasting time for the one in question.

Below, the code is structured so that if the customer hasn’t used a coupon, it executes \(next\) so that it ignores the rest of the code in the loop. If the values of coupons is larger than 0, then \(next\) isn’t triggered, so the \(make.recommendation\) function is run for that customer.

recommendation <- rep(NA,length(customerIDs)) 
for (i in 1:length(customerIDs)) {
  if(coupons[i] == 0) { next } 
  recommendation[i] <- make.recommendation(customerID[i])     
}

Conditional \(break\)#

\(break\) is similar to \(next\), except it terminates the current loop instead of forcing it on to the next iteration.

results <- c()
for (i in 1:10) {
  results[i] <- 7*i+2
  if( i>=4 ) { break }
}
results
  1. 9
  2. 16
  3. 23
  4. 30

The code above is using a \(for\) loop to fill in the elements of \(results\) one by one. The first element is 7*1+2, the second element is 7*2+2, the third element is 7*3+2, etc. Even though the loop is designed to give \(results\) 10 elements, it runs into a \(break\) when \(i>=4\). The \(break\) forces the \(for\) loop to terminate, so in the end \(results\) ends up only have four elements.