Overview#

What is Business Analytics?#

Definition by example questions:

  • How do we sell the right products at the right price to the right customers at the right time?

  • It has been said that half of all advertising dollars are wasted. Which half?

  • Is the business doing what it needs to be doing to best meet customer needs?

  • Where are the areas of major opportunities?

  • What processes are not optimized and how can they be improved?

  • What don’t we know about our customers and clients that we should?

Business analytics is the art and science of replacing “hunches” and “educated guesses about business processes with insights and understandings based on data and statistical/machine learning/optimization methods.

Note

Business analytics is data-driven decision-making.

Business analytics integrates activities including data quality and management, mathematical, statistical and machine learning methods for data modeling, and techniques for visualizing data.

Business analytics uses statistical inference and models to find meaningful relationships

  • regression

  • nonlinear relationships

  • causal effects typically cannot be found with data alone

What Business Analytics ISN’T? Senior vice president of analytics consulting: ``Everyone in my team is doing analytics as they all work with numbers.”

Note

Simply ``working with numbers” does not make one an analytics practitioner.

Unfortunately, a common misconception is that business analytics is just crunching numbers, data management, or making simple visualizations and summaries (business intelligence/business analyst).

When you see job descriptions for a business analytics practitioner that mentions Excel as the only software, that company really isn’t doing analytics.

What is Data Mining?#

Many people think of data mining as ``knowledge discovery” and at its broadest is the process of analyzing data from different perspectives and summarizing it into useful information - information that can be used to increase revenue, cut costs, or both.

Note

Data mining is generally concerned with finding associations or patterns among dozens if not hundreds of fields in large relational databases and transforming them into an understandable structure for future use.

What does Data Mining involve?#

Data mining is multidisciplinary

  • machine learning (getting a computer to learn/act without explicitly being programmed), e.g., figure out what ads/stories to show people on their Facebook feed based on their interests, past clicks, etc.

  • artificial intelligence (e.g., neural networks)

  • statistics (for quantifying how much uncertainty still exists in models or predictions and helping to distinguish between random noise and significant findings)

  • database systems (for efficiently storing and processing data)

Data mining is much more than statistics and covers the entire process of data analysis, including data cleaning and preparation, visualization of the results, and how to produce predictions in real-time.

Idealized Knowledge Discovery Process#

The four pillars of data mining#

  • Classification - given a set of “training” data and the ``answers” (actual classes of each individual), develop some model that is effective at predicting the class labels of as yet unseen individuals.

  • Numerical Prediction (Regression) - given a set of training data and the ``answers” (actual \(y\)-values of each individual), develop some model that is effective at predicting the \(y\) value of as yet unseen individuals.

  • Association rules (market basket analysis) - find commonly co-occurring traits. What grocery store items are often purchased together? What sets of movies are commonly enjoyed by the same individual? What web pages are useful given a search string?

  • Clustering/Segmentation - find groups of individuals that are “similar” in some regard and ``cluster” them, though we don’t know ahead of time whether clusters exist or what they should even look like. For example, what market segments exist? The price-sensitive consumer, the die-hard fan, the early adopter, etc.

Where is data mining used today in business analytics?#

  • Merchants - determine sales trends, predict customer loyalty, identify clustomer groups so that targeted marketing campaigns can be designed

  • Market basket analysis - patterns such as customers who buy A and B also buy C, and more generally who buys what and product recommendations

  • Customer relationship management - concentrate efforts on prospects that have a high likelihood of responding to an offer

  • Uplift modeling - which people have the greatest increase in response if given an offer

  • Churn analysis - when should we be worried about a customer leaving for good?

  • Human resources - identifying characteristics of most successful employees

More:

  • use data warehouse to develop fraud detection algorithm

  • develop a model to predict whether a loan is bad or not

  • predict audience share for TV to allow executives to arrange show schedules to maximize market share and advertising revenues

  • determine when a flu outbreak is occurring by examining Facebook statuses or tweets

Example of a data mining project: Kroger#

Kroger keeps a lot of data on its customers, much gained through use of a loyalty card

  • Household transactions over many consecutive years

  • Demographic information (purchased via third party; age/income/marital status)

  • Direct marketing contact history (e.g., mailed promotions, newspaper inserts)

What are the business questions that could be answered? What sort of patterns or relationships might be interesting? How can the data be processed into information that can be used?

Kroger may have an idea of what relationships to look for (e.g., more mailed promotions = how much of an increase in visits?), but more could potentially be discovered

  • Which customers spending more (or less) over time, and what do they have in common?

  • What categories of products are growing at a faster rate

  • What categories are becoming less engaged among customers spending less over time

  • What demographic factors are associated with customer spending?

  • Is there evidence that marketing improves engagement?

  • Are there ``segments’’ of customers that buy similar products?

  • Are there groups of items that tend to be purchased together?

The ``meat’’ of the analysis would draw on many different fields

  • Correlation - customer spending vs. demographic factors

  • Regression - trend in spending

  • Statistics - quantifying increase in spending with engagement

  • Clustering - finding similar customers

  • Association rules - finding items that are purchased together

  • Machine learning - more complicated model beyond regression to make predictions

Example of data mining project - Alumni#

UT Alumni office wanted to look at predicting which people would become ``major donors’’.

Data:

  • Major, college, date of graduation

  • Club/student government/Greek activity

  • Past donation history

  • Information on spouse

  • ``wealth rating” (purchased from a third party)

An analysis could explore many different facets

  • Classification: what traits/features/model allows us to correctly classify an alumni as a major donor or not; what does a major donor look like?

  • Machine Learning: what model or algorithm learns how much an alumnus has donated based on his or her characteristics?

  • Regression/forecasting: how do donations change over time

  • Clustering: what segments exist among the alumni

  • Association Rules: what features are the most strongly associated with donating and just how strong of an indicator are they?

Jobs and Internships#

Job example (Junior Data Scientist - Data Analytics)#

  • Strong business analytical skills a must: ability to apply business logic to design and implement data mining techniques on large data sets.

  • Projects with evidence of Creative and Critical thinking a must.

  • Proficient in the use of Teradata SQL, MS SQL server (SSIS/SSAS experience preferred), Data Visualization (e.g., Tableau or other), MS Access, MS Excel, Visual Basic, and Sharepoint.

  • Working knowledge of ``Big Data’’ concepts and Hadoop/Hive, Teradata Aster, and R tools preferred.

Programming#

What language should you concentrate in?#

Programming skills are extraordinarily important. Coding is important for many jobs. Which language is the best? Short answer: SQL and either R and/or Python.

``What is your preferred tool?’’ Breakdown by job title:

  • R: Business Analyst, Data Analyst, Data Miner, Operations Researcher, Predictive Modeler, Statistician

  • Python: Computer Scientist, Data Scientist, Engineer, Machine Learning Engineer, Other, Programmer, Researcher, Scientist, Software Developer

What skills are desired#

Businesses love strong analytical and problem-solving skills

  • Know (or figure out) the ``right’’ questions to ask data.

  • Be able to use math to approach and solve the problem or to figure out what needs to be measured.

  • Hard to recommend a class for this since this isn’t the focus of any one class but is addressed indirectly all along.

Math is important

  • Good news: vast majority of mathematical problems just use algebra and logic (no calculus).

  • Bad news: if you’re not comfortable working with equations you may be shut out of certain jobs.

What employers want#

Most employers want to hire professionals who

  • have the necessary technical skills (gained in this and future classes)

  • can work effectively in multifunctional teams (practice this in internships and groupwork)

  • possess strong communication skills (practice this here and in 479)

  • hands-on experience with the kind of complex analytical tools used (gained in this and future classes)

  • understand their business model (gained through interactions with company)

  • the ability to be productive from day one (which you will have by the time you graduate)

Skills for employment#

If you want to have a job right after graduation (i.e., you’re not going to grad school), it is very useful to be comfortable with

  • Excel (many small analytics projects are accomplished here)

  • SQL (querying databases), i.e., INMT 342 and beyond

  • a scripting language like R (we will delve into programming in this class) or Python

  • computer programming in general

``Standard” jobs at the undergrad level seem to emphasize Excel and maybe SQL. More advanced and interesting jobs require programming (in R or otherwise).

Skills for grad school#

If you want to go on to graduate school (recommended), the skills are a little different

  • Strong foundation in statistical methods and math.

  • Familiarity with programming software such as R, Python, SAS programming, C++, etc.

  • Familiarity with SQL and querying databases

  • Excel plays much less of a role.

If you want to get a master’s at UT, please talk with faculties and advisors in planning your courses.

Three Branches of Analytics#

Very broadly, business analytics can be split into three major tasks

  • Descriptive Analytics

  • Predictive Analytics

  • Prescriptive Analytics

Motivating case: online sales#

Imagine that a local retailer with a storefront wishes to expand sales by selling items on eBay, amazon, etc. What are the questions that need to be answered and how should the store go about doing this?

Descriptive Analytics#

Descriptive analytics is the realm of “business intelligence” and classically what a ``business analyst’’ is responsible for.

  • Describe what has happened in the past and what is happening now.

  • Visualize distribution of quantities (sales over time, customer spending amounts, etc.)

  • A/B testing - is there a statistically significant and large difference between the click-thru rates of iPhone and Galaxy users, is there a difference in the average donation amounts of people who graduated UT 1980 and prior vs. 1981 and later?

  • Clustering/Segmentation - how many different customer types exist and what are the key features of each? (new for data mining)

Most businesses are “stuck” doing descriptive analytics. Having a handle over what the business processes ``look like” is a good first step. Mostly basic statistics and some data mining techniques are employed.

For the store wishing to branch out onto online sales, descriptive analytics would entail

  • Collecting data on current and past sales of similar products on eBay, amazon, etc., and estimating the costs of branching out.

  • Visualize distribution of sale prices for various products (price over time, price between sites)

  • A/B testing - do items sell higher on amazon or on eBay? How much higher? Is there a difference in time to sale between sites? Do Buy-It-Now sales go for more than auctions? Does offering free shipping entice buyers?

  • Study relationships - is there an association between starting price, auction duration, etc., and the final selling price? If so, what does it look like?

  • Clustering/Segmentation - do some product types act similarly? If so, can group these products together to simplify the analysis (data mining).

Predictive Analytics#

Predictive analytics is what companies typically want to be able to do. Companies already know what has happened in the past, it’s the future that holds the key to success.

  • Develop a model (regression, partition, random forest, etc.) whose specialty is in generalizing to individuals that have yet to be seen.

  • Use time-series analysis to forecast demand.

  • Data mining is largely concerned with predictive analytics.

For the store wishing to branch out onto online sales, predictive analytics would entail

  • Developing a model that predicts the sale price based on starting price, shipping price, number of competitors, time of week, site of sale.

  • Developing a model that predicts what group of products a new product should ``behave like”

  • Forecast trend in sales, forecast price of goods acquisition, etc.

Prescriptive Analytics#

Prescriptive analytics is the art of optimizing a business process. This usually entail choosing the values of decision variables to maximize (or minimize) some objective.

  • Take a model predicting profit based on sale price, discount applied, and time of year to determine the optimal value of sale price & discount that maximizes sales for that year.

  • Determine the weights of assignments to maximize your grade in the class.

  • Determine the optimal inventory control policy: how many items left in stock should trigger a re-order, and how much should be ordered?

Assuming that your models provide a reasonable reflection of reality and that you trust them, prescriptive analytics tells you how to run your business better! Data mining can help develop the models, but isn’t directly involved in optimizing them.

For the store wishing to branch out onto online sales, prescriptive analytics would entail taking the models used to forecast price of acquiring goods, to forecast demand, and to predict the sale price of an item to determine:

  • Where to sell the product (i.e., to whom, since customers of both sites may differ).

  • At what price?

  • When to sell (when is demand highest, what day of the week should an auction end).

Examples of Business Analytics#

Moneyball#

What players should be drafted for an MLB team?

  • Stolen bases?

  • Runs batted in?

  • Batting average?

  • Conventional wisdom reaching back decades said to use these variables, but this is subjective and based on casual observations, not data (and so may be flawed).

Rigorous statistical analysis showed other quantities are more indicative of success.

  • Run leads to wins, and getting to base leads to runs.

  • Better predictors of getting on base is “on-base percentage”, “slugging percentage”, etc.

Harrah’s Casino#

Harrah’s Cherokee Casino is a little over an hour away in North Carolina and is one of many hotels/casinos of the Harrah’s brand that use a common loyalty card. What information is collected an how is it used?

  • Basic demographic information (age, gender, location)

  • Amounts played on table games (craps, blackjack) and slots along with time devoted to each and the typical size of a bet.

  • Frequency of hotel stays and dining as well as how much money is being spent.

Can make interesting inferences

  • Income (predict from location, betting and spending habits)

  • Develop a customer ``profile” – casual slot player, high stakes smoking/drinker card player, elder playing away his retirement

Harrah’s uses this information to predict how valuable each customer is to them and can design custom promotions (free drink, free hotel room, free play) to keep the customers they want coming back.

More About Casinos#

Casinos Bet The Future On Customer Experience And Up The Ante With Analytics

Forbes; May 31, 2018. ``At core, this is a data challenge. Gene Lee, the chief analytics officer at Caesars Entertainment, divides it into two categories: customer and product. On the one hand, casino properties must continually reinvent the product to appeal to a shifting demographic. On the customer side, they need to capitalize on intelligent marketing. This means offering the right incentives to the right customer at the right time via the right channel.”

Churn analysis#

Contract-based companies (ATT, Verizon, Comcast, etc.) analyze why customers end contracts.

  • Trading one plan for another at the same company?

  • Defecting to another competitor?

  • Was pricing, coverage gaps, device issues a factor?

  • What issues are important to customers and what can be done to increase satisfaction?

With regression and other techniques, we can develop a model that tells us ``what matters” in influencing a customer’s decision to churn. Intervention for those customers most likely to end their contract can occur where special offers and discounts can be offered.

Transaction Analysis#

Banks such as Chase, First Tennessee, Bank of America, etc., have details on every transaction customers put on their credit cards.

Nowadays, banks and stores work out deals where the bank offers a promotional discount if customer uses their card at a store in exchange for payment from the store.

The questions become:

  • Can we predict the probability a customer would be interested in shopping at a store given their transaction history?

  • Can we predict the amount a customer will spend at a given store?

Demand for Business Analytics#

Note

Every successful business succeeds by understanding its customers’ needs better than anyone else.

To understand customers, it is imperative to identify customers and to determine what they want. The tools you will learn in this class serve as a jumping board for the techniques required to answer both of these questions.

R as a glorified calculator#

Basic Arithmetic#

Addition, subtraction, multiplication, and division use the symbols you expect. Simply type it out and press return/enter to see the result.

10+2
10-2
10*2
10/2
12
8
20
5

Reminder: order of operations#

In math (as in reading), expressions are evaluated from left to right. However, certain operations have priority over others.

Note

Expressions involving multiplication or division are evaluated (from left to right) before expressions involving addition/subtraction.

10+2-1+4  #evaluated left to right since just + and - here
10*2/1/1  #evaluated left to right since all * and / here
10*2-1/4  #mult/div first, giving 20-0.25, then add/sub
10+2*1/4  #mult/div first, giving 10+0.5, then add/sub
15
20
19.75
10.5

Parentheses#

You can override the normal order of operations by adding parentheses to evaluate expressions of interest first. Imagine we want to reproduce the following in R:

\[5\times (3+2) = 25 \hspace{1in} \frac{10}{5-3} = 5\]
5*3+2 #wrong
5*(3+2) #correct
10/5-3 #wrong
10/(5-3) #correct
17
25
-1
5

Implicit multiplication#

To save time writing equations, we often omit the multiplication sign when taking the product of two items and at least one is surrounded by parentheses.

\[5 \times (3+2) \Longrightarrow 5(3+2)\]

Although we can visually process this shorthand, \(R\) does not.

Note

It is imperative to include the implicit multiplication symbol in \(R\).

5*(3+2) #Correct, but 5(3+2) Gives Error
25

Exponents#

Exponents are product using a ``caret’’, or shift-6. Imagine we need: $\(5^2 = 25 \hspace{1in} 2^{-3} = \frac{1}{2^3} = 1/8\)$

5^2
2^(-3)
25
0.125

Note: parentheses are important when working with exponents. If we want

\[2^{5-\frac{4}{6-4}} = 2^{5-2} = 2^3 = 8\]
2^5-4/6-4 #epic wrong
2^5-4/(6-4) #still wrong
27.3333333333333
30
2^(5 - 4/(6-4)) #correct
8

Square roots and absolute values#

Square roots

  • can raise the quantity to the one-half power, i.e. \(\sqrt{9} = 9^{1/2} = 9^{.5} = 3\)

  • can invoke the command \(sqrt()\)

9^(1/2); 9^.5; sqrt(9) # the same result
3
3
3

Absolute values: invoke the command \(abs()\)

abs(10)
abs(-3)
10
3

Logarithms and ``e’’#

The natural logarithm ln is referred to \(log()\). The base 10 logarithm \(\log_{10}\) is referred to as \(log10()\).

log(100) #this is the natural log of 100
log10(100) #log base 10
4.60517018598809
2

Taking ``e to a power’’, i.e. taking 2.718282 and raising it to a power, is performed using \(exp()\) and not by \(e\^{}()\).

exp(2) #e^2 doesn't work
7.38905609893065

Scientific Notation#

Scientific notation is a convenient way of writing very large/small numbers as something ``times 10 to the \(\ldots\)’’. The exponent is the number of times to move the decimal point of the number to get it to be just after the first non-zero digit.

  • 3984000 \(\rightarrow 3.984 \times 10^{6}\)

  • 0.0024 \(\rightarrow 2.4 \times 10^{-3}\)

  • 5.3 \(\rightarrow 5.3 \times 10^{0}\)

\(R\) abbreviates ``times 10 to the \(\ldots\)” as \(e\) (which is why \(e\^{}2\) doesn’t give \(e^2\)).

84400000000; 5.2e4; 0.00000000000001472; 1.2e-5; 5e0
8.44e+10
52000
1.472e-14
1.2e-05
5

Note: spaces#

\(R\) is not picky about spaces in the expressions you evaluate. Usually they are placed strategically to make your code readable.

\[(5-\sqrt{3}) \times \left(2^{\frac{7}{2}} - \log_{10}|5-8|\right)\]
(5- sqrt(3) )*( 2^3.5-log10 (abs(5-8)))  #messy, but ok
(5-sqrt(3))* (2^3.5-log10(abs(5-8))) #clearer and prettier
35.4134165336055
35.4134165336055

Note: commenting your code#

It is good practice to put comments in your code to remind yourself what the code is trying to accomplish or to just write a note to yourself. The hashtag symbol # serves as the ``comment” symbol. Once \(R\) sees a #, anything after it on a line is ignored and not parsed by \(R\).

#A comment on a line all by itself
5+2 #A comment after an expression 
#####The number of hastags is irrelevant###
10/2 #A valid R expression after a # is still ignored  10*2
7
5

Semicolons allow you to put more than 1 command on a line!!!#

Sometimes you may want to put more than expression on a line (e.g., to keep your code more compact). Putting a semicolon will separate one expression from another.

5*2; sqrt(49); log10(9^2+19)
10
7
2

Saving Results (Left Arrow Convention)#

Why save results?#

Often you will want to save the result of some computation for later use or to streamline other computation. For example, imagine trying to evaluate

\[\frac{e^{-1 + 2\times 1.5}}{1 + e^{\sqrt{-5 + 2\times 3}}}\]

It’s possible to evaluate this expression in one go. However, consider defining \(x\) and \(y\) as follows

\[x = -1 + 2\times 1.5 \hspace{.5in} y = \sqrt{-5 + 2\times 3}\]

We can rewrite the expression as

\[\frac{e^x}{1+e^{y}}\]

which doesn’t look so daunting.

Save by naming#

To save a result, e.g., to define \(x\) as the result of \(-1 + 2 \times 1.5\), we come up with a name for the result and define it using a “left arrow’’ (actually a combination of a “less-than’’ sign and a dash). Naming and saving a result does not produce any output, but typing the name on a new line and pressing return displays what is ``in” it.

x <- -1 + 2*1.5  #won't produce output
x  
y <- sqrt(-5 + 2*3) ; y
exp(x)/( 1+exp(y) ) #produces output since the results are not being saved
2
1
1.98722324982904

Left arrow convention#

Note

To save the results of an expression command, give it a name with a left arrow.

x \(\leftarrow\) some expression

To see what’s saved, type out the name on a new line and press return.

x <- 3+5*log10(3)  #note: when this is run there is no output
x   #typing the name and return prints out how it's defined
5.38560627359831

Notes about saving#

  • Names are case-sensitive, e.g., \(y\) and \(Y\) can be given two different values

  • Some punctuation is allowed in names, e.g. \(x.new\), \(last.weeks\_sales\). Stick to using period and underscores.

  • Definitions can be self-referential. For example, you can increase the (current) value \(x\) by 3 by doing \(x <- x+3\)

x <- 5
x <- 2*x-3
x
X <- 153
X
7
153

Printing to the screen#

In most activities and homeworks, I will ask you to ``print the contents of an object to the screen”. This just means running the name of the object as a command. The example below shows how you would print the contents of \(x\) to the screen.

x <- c("Who","turned","off","the","lights","?")
x
  1. 'Who'
  2. 'turned'
  3. 'off'
  4. 'the'
  5. 'lights'
  6. '?'

When ``knitting’’ a homework for submission, printing the contents of an object to the screen allows the contents of the object to be automatically included in the writeup, and it’s how we will grade your work.

How To Think Like R (How R Processes Commands)#

How does R work?#

As you saw in the last section, commands in \(R\) can start to get complex quite quickly. At this point, it becomes useful to discuss exactly how \(R\) figures out how to process a command. Such knowledge will help you write code that does exactly what you want to do in an efficient way.

By the end of our unit on \(R\) (or at the very least, by the end of the course), you should be able to:

  • Look at a line of code and be able to translate it into English (i.e., describe what the code is trying to do).

  • Be able to translate English into \(R\) (e.g., write code that finds the second most frequently occurring level of a categorical variable in a subset of the original data that contains rows that meet two certain conditions).

The strategy here will be to learn how to translate from \(R\) into English first before learning how to write sophisticated \(R\) statements.

From Left to Right#

Unless certain operations have priority (see below), expressions are evaluated from left to right just like how you read a sentence.

This property is particularly useful if you separate multiple commands to be run on the same line of code.

x <- 5; y <- 2; x + y; x-y
7
3

Order of Operations#

\(R\) respects the ``order of operations” that you’re used to in math: multiplication and division occur before addition and subtraction. For example \(3*4+6/2\) will evaluate to \(15\) (12+3).

Further, exponentiation occurs before any other arithmetic calculation, so \(2\^{}6/4\) will evaluate to 16 since it will take 2, raise it to the 6th power (which equals 64), then divide by 4 (to get 16). Likewise, \(3*2\^{}4\) will evaluate to 48 since first 2 is taken to the 4th power (which equals 16), then 3 will be multiplied by 16 to get 48.

3*4+6/2
2^6/4
3*2^4
15
16
48

Special symbols to override default order of operations: Parentheses, square brackets, and curly brackets can be used to override the normal order of operations. R will always evaluate the expression in brackets first before applying the normal order of operations.

3*(4+6)/2 #4+6 evaluated first because in parentheses, so this is 3*10/2
2^(1/2+6/4) #1/2+6/4 evaluated first, which becomes 0.5 + 1.5 = 2, so this is 2^2
(3*2)^4  #3*2 evaluated first, so this is 6^4
200+100/(5+15)#5+15 is evaluated first, so this is 200+100/20, which becomes 200+5
15
4
1296
205

Going deep: nested brackets#

When sets of brackets are nested inside another set of brackets, \(R\) will evaluate the expression by going ``as deep” as it can inside the expression, then evaluating them from the inside out. This is best illustrated with an example.

5+2*(sqrt(log10(92+abs(2-10))))^4  #What a nightmare.  
#Starting at the outermost set of () we go as deep as we can go, running into 
#a sqrt(), then inside that a log10(), then inside that a abs().  
# That's as deep as it goes. Thus, abs(2-10) is evaluated first, which is 8
5+2*(sqrt(log10(92+8)))^4  
#Now we work our way out, so we add 92, which is 100
5+2*(sqrt(log10(100)))^4  
13
13
13
#now take the log10, which is 2
5+2*(sqrt(2))^4  
#now we take the square root, but this can't be simplified
#then we take it to the fourth power
sqrt(2) #which can't really be simplified
#Then we take it to the fourth power, which is 4
5+2*4
#Then we multiply by 2, then we add 5
13
13
1.4142135623731
13
13

Another example:

x <- c(3,8,4,7,2,3)
x[1+which(x==(min(x)+1))]  #What?
#The outermost [] has a which() function inside it, which has another set of
#() inside that, and a min() lurks inside!  min(x) is the innermost expression, 
# which evaluates to 2
x[1+which(x==(2+1))]
#Now we add 2+1, which is 3
x[1+which(x==3)]
#Now we evaluate which(x==3), which is the vector c(1,6) since the 1st and 6th positions are 3
x[1+c(1,6)]
  1. 8
  2. <NA>
  1. 8
  2. <NA>
  1. 8
  2. <NA>
  1. 8
  2. <NA>
#Now we evaluate 1+c(1,6), which is the vector c(2,7)
x[c(2,7)]
#So we are extracting the 2nd and 7th elements of x, which are 8 and NA 
# (since there's no 7th element)
  1. 8
  2. <NA>

Another example:

x <- c(3,8,4,7,2,3)
mean(x[which(x<5)])
#The outermost () has a [] inside it, which itself has a which() command
#The which() is thus evaluated first, which equals c(1,3,5,6); the positions
#of the vector x that contain elements less than 5
mean(x[c(1,3,5,6)])
#Moving to the square brackets, we see we are taking out the 1st, 3rd, 5th, 6th
#elements of x so x[] is the vector c(3,4,2,3)
mean( c(3,4,2,3) )
#This is just the average of these 4 numbers, which equals 3
#In english, the original command read "take the average of the elements of x 
# that are less than 5"
3
3
3

Reading commands summary:

  • Go as deep into each expression (i.e. nested parentheses or brackets) as possible and evaluate them from the inside out.

  • Once all expressions in parentheses have been evaluated, perform exponentiation (anything with the \(\^{}\) symbol)

  • Then evaluate all expressions that involve multiplication or division (from left to right).

  • Then evaluate all expressions that involve addition or subtraction.

Left hand side vs. Right hand side#

Nearly every command you will write in \(R\) will involve the ``left-arrow symbol”, so it’s extremely important to understand how R reads and processes them.

Note

The left-hand side of the command will be what appears to the left of the \(\leftarrow\) symbol and the right-hand side of the command will be what appears to the right of the \(\leftarrow\).

When \(R\) processes a command that has a left-arrow, the expression on the right hand side of the command is evaluated first, then the assignment with the left-arrow takes place.

This allows \(R\) to make sense of commands like \(x \leftarrow x + 1\) where the left and right hand sides of the command involve the same variable. The right-hand side is evaluated first (take the quantity stored in \(x\) and add one to it), then the assignment takes place (store the value after evaluating \(x+1\) into the variable \(x\), thus overwriting its previous value).

Example of many operations

x <- 0.4
x <- x/(x+(x+1)^2) + 2*(1-x)/(1-x^2)  #How is this read?
x
#Evaluate right hand side first, then assign that expression into x
#Follow the standard rules: dive as deep into nested brackets as possible, 
#evaluate expressions in parentheses, respect order of operations
x <- x/(x+(0.4+1)^2) + 2*(1-0.4)/(1-0.4^2)
x <- x/(x+1.4^2) + 2*0.6/(1-0.16)
x <- x/(x+1.96) + 1.2/0.84
x <- 0.4/(0.4+1.96) + 1.428571
x <- 0.4/2.36 + 1.428571
x <- 0.1694915 + 1.428571
x <- 1.598063
1.59806295399516

Fun fact: multiple assignment#

The syntax for R is quite flexible and you can start to have fun with it after a while. For example, it’s possible to assign many different objects the same value with having a bunch of right-arrows on the same line.

x <- y <- z <- 4
x
y
z
4
4
4

Fun fact: you can use a right arrow#

Although we will always use left arrows in the this class, you can do assignment with right arrows instead.

3+5 -> x
x
8