Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

QCBS R Workshop 5

Introduction to Programming with R
by

CSBQ QCBS

on 25 November 2016

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of QCBS R Workshop 5

Loops are good for:
doing something for every element of an object
doing something until the processed data runs out
doing something for every file in a folder
doing something that can fail, until it succeeds
iterating a calculation until it converges

Quebec Centre for Biodiversity Science

R Workshop Series

Workshop 5: Programming in R
Website
:
http://qcbs.ca/wiki/r_workshop5

Reuse and share your code
Achieve greater consistency
Avoid copy/paste errors
Avoid reinventing the wheel
Redo your analysis quickly and easily
Use R to do repetitive tasks for you
Understand what R is doing to your data
Do analyses that nobody has prepackaged
It’s fun! (no, really!)
Why program in R?

Install R (or use it on the computers here):
http://cran.r-project.org/

Install an R environment, such as R Studio:
http://rstudio.org/

Download the slides and the .R script:
http://bit.ly/yRjShO

Twiddle your thumbs impatiently

Pre-workshop:

Iteration

if
statement
Beware of R’s expression parsing!
Another example of a for loop.

Outline

Control Flow
Writing functions in R
Speeding up your code
Useful R packages for biologists
1. Control Flow
if(condition) {
expression
}
Use curly brackets { } so that R knows to expect more input.

Try:
if (2+2)==4 print("Arithmetic works.")
else print("Houston, we have a problem.")
This doesn't work because R evaluates the first line and doesn't know that you are going to use an
else
statement.

Instead use:


if (2+2)==4 {
print("Arithmetic works.")
} else {print("Houston, we have a problem."}
When using brackets, R waits to evaluate the command until the brackets have been closed.
Syntax
The example above would cause R to evaluate the expression 5 times. In the first iteration, R would replace every instance of
i
with 1. In the second iteration
i
would be replaced with 2, and so on.


for (m in 4:10) {
print(m*2)
}
Try
Writing Functions
- Perform a task repeatedly, but configurably
- Make your code more readable
- Make your code easier to modify and maintain
- Share code between different analyses
- Share code with other people
- Modify R’s built-in functionality

Why write functions?
What is a function?
Syntax
function_name <- function(argument1, argument2, ...) {
...

expression
# What we want the function to do
...

return(value) # Optional.
}
Arguments
function_name <- function(
argument1, argument2, ...
) {

expression
# What we want the function to do
return(value) # Optional.
}
The entry values of the function, the information required for the function to work.

They are variables available only in the function.

A function can have between 0 and an infinity of arguments
Arguments
print_number <- function(number) {
print(number)
}

> print_number(2)
> print_number(231)
Arguments
operations <- function(number1, number2, number3) {
result <- (number1 + number2) * number3
print(result)
}

> operations(1, 2, 3)
> operations(17, 23, 2)
With more than one argument :
Example
Challenge 5
> Scruffy <- "dog"
> Paws <- "cat"

> print_animal(Scruffy)
Using what you learned previously on flow control, create a function
print_animal
that takes an animal as argument and gives the following results :
The expression part can contain virtually anything : statements, loops, conditional statements, even other functions.
print_animal(Paws)
> [1] "woof"
> [1] "meow"
Challenge 5
Solution
print_animal <- function(animal) {
if (animal == "dog") {
print("woof")
} else if (animal == "cat") {
print("meow")
}
}
Default values
operations <- function(number1, number2,
number3=3
) {
result <- (number1 + number2) * number3
print(result)
}

> operations(1, 2, 3) # becomes equivalent to
> operations(1, 2)
> operations(1, 2, 2) # number3 can still be changed
To avoid writing all arguments all the time when calling the function and still be flexible
The ... argument
plot.CO2 <- function(CO2, ...) {
# We use ... to pass on arguments to plot()
plot(x=CO2$conc, y=CO2$uptake, type="n", ...)


for (i in 1:length(CO2[,1])){
if (CO2$Type[i] == "Quebec") {
# same for points
points(CO2$conc[i], CO2$uptake[i], col="red", type="p", ...)

} else if (CO2$Type[i] == "Mississippi") {
# same for points()
points(CO2$conc[i], CO2$uptake[i], col="blue", type="p", ...)
}
}
}

> plot.CO2(CO2, cex.lab=1.4, xlab="CO2 concentration", ylab="CO2 uptake")
> plot.CO2(CO2, cex.lab=1.4, xlab="CO2 concentration", ylab="CO2 uptake", pch=20)
- To pass on arguments to another function used inside your function
The ... argument
sum2 <- function(...){
args <- list(...)
result <- 0
for (i in args) {
result <- result + i
}
return (result)
}

> sum2(2, 3)
> sum2(2, 4, 5, 7688, 1)
- To allow the user to input an indefinite number of arguments
Return value
returntest <- function(a, b) {

return (a) # The function exits here

a <- a + b # Not interpreted
return (a) # Not interpreted
}

> returntest(2, 3) # Prints the return value of your function
> c <- returntest(2, 3) # assign it to another variable to save it
> c
Allows you to save the result of our function and be able to use it later.
Only one return value can be provided by a function.
The function will exit once it hits the
return()
keyword
Challenge 6
Using what you learned so far on functions and flow control, create a function
bigsum

that takes two arguments
a
and
b
and :

returns 0 if the sum of a and b is strictly less than 50
returns the sum of a and b otherwise
Challenge 6
Solution
bigsum <- function(a, b) {
result <- a + b
if (result < 50) {
return(0)
} else {
return (result)
}
}
Accessibility of variables
Always keep in mind where your variables are and if they are accessible.
Variables defined inside a function are not accessible outside
Variables defined outside a function are accessible inside. But it is NEVER a good idea!
rm(list=ls()) # remove everything to avoid any confusion

var1 <- 3 # var1 is defined outside our function
vartest <- function() {
a <- 4 # a is defined inside
print(a) # print a
print(var1) # print var1
}
a # print a. Error, a can be seen only inside the function
vartest() # calling vartest() will print a and var1
rm(var1) # remove var1
vartest() # calling the function again doesn't work anymore
Accessibility of variables
Instead, use arguments. Inside a function, arguments names will take over other variable names.
var1 <- 3 # var1 is defined outside our function
vartest <- function(var1) {
print(var1) # print var1
}
vartest(8) # Inside our function var1 is now our argument
var1 # var1 still has the same value
Be careful when creating variable inside a conditional statement.
a <- 3
if (a > 5) {
b <- 2
}
a + b # Error, b is not created because a < 5
Accessibility of variables
It is usually a good practice to define variables outside the conditions and then modify their value to avoid any problem
a <- 3
b <- 0
if (a > 5) {
b <- 2
}
a + b
Good programming practices
To make your life easier!!!

It helps achieve greater readability and makes sharing and reusing your code a lot less painful.

Having an easy to read code will reduce the time you'll spend to understand it so it's never time lost

Why?
Proper indentation and spacing is the first step to get an easy to read code. Here are some suggestions

Use spaces between and after your operators
Use consistentely the same assignation operator. `←` is often preferred, `=` is ok but don't switch all the time between the two
Use brackets when using flow control statements
Inside brackets, indent by at least two spaces.
Put closing brackets on a separate line, except when preceding an else statement.
Define each variable on its own line


Keep a clean code
Keep a clean code
a<-4;b=3
if(a<b){
if(a==0)print("a zero") } else {
if(b==0){print("b zero")} else print(b)}
What not to do
Keep a clean code
a <- 4
b <- 3
if(a < b){
if(a == 0) {
print("a zero")
}
} else {
if(b == 0){
print("b zero")
} else {
print(b)
}
}
Takes more space but easier to read and understand
Use functions
for (i in 1:length(CO2[,1])) {
if(CO2$Type[i] == "Mississippi") {
CO2$conc[i] <- CO2$conc[i] - 20
}
}
for (i in 1:length(CO2[,1])) {
if(CO2$Type[i] == "Quebec") {
CO2$conc[i] <- CO2$conc[i] + 50
}
}
Helps reducing the number of errors done by copying/pasting similar chunks of code and reduces the time needed if we want to change them.

Let's modify the example from exercise 3 and suppose that all CO2 uptake from Mississipi were overestimated by 20 and Quebec underestimated by 50.
We could write this :
Use functions
recalibrate <- function(CO2, type, bias) {
for (i in 1:nrow(CO2)) {
if(CO2$Type[i] == type) {
CO2$conc[i] <- CO2$conc[i] + bias
}
}
# return the new dataset
return (CO2)
}

# don't forget to save the results!
newCO2 <- recalibrate(CO2, "Mississipi", -20)
newCO2 <- recalibrate(newCO2, "Quebec", +50)
Or this :
Use functions
recalibrate <- function(CO2, type, bias) {
for (i in 1:nrow(CO2)) {
if(CO2$Type[i] == type) {
CO2$
uptake
[i] <- CO2$
uptake
[i] + bias
}
}
# return the new dataset
return (CO2)
}
Woops made a mistake :
Yay, less changes to make. And it looks waaay cooler!!
Use meaningful names
rc <- function(c, t, b) {
for (i in 1:nrow(c)) {
if(c$Type[i] == t) {
c$uptake[i] <- c$uptake[i] + b
}
}
return (c)
}
Same function, stupid names, way harder to understand at first sight :
Comments
## recalibrates the CO2 dataset by modifying the CO2 uptake concentration
## by a fixed amount depending on the region of sampling
# Arguments
# CO2: the CO2 dataset
# type: the type that need to be recalibrated. Values: "Mississippi" or "Quebec"
# bias: the amount to add to the concentration uptake. Use negative values for overestimations
recalibrate <- function(CO2, type, bias) {
for (i in 1:nrow(CO2)) {
if(CO2$Type[i] == type) {
CO2$uptake[i] <- CO2$uptake[i] + bias
}
}
# we have to return our new dataset because the original is not modified
return (CO2)
}
To help the others and yourself!!
Speeding up your code
Because if we want to optimize, we will need to know how much time it takes!

Profiling
system.time({
a <- 0
for (i in 1:1000) {
a <- a + i
}
})
system.time(replicate(1000, {
a <- 0
for (i in 1:1000) {
a <- a + i
}
}))
Repeating our code might be necessary for time to be measurable

To have a more detailed output of the time spent in each function, you can use the function Rprof()

Profiling
Rprof("profile.txt") # Saves results in file profile.txt
a <- 0
for (i in 1:1000000) {
a <- a + i
}
Rprof(NULL) # Ends the profiling

summaryRprof("profile.txt") # Display the result of profiling

To compare the efficiency of several functions with accurate precision, you can use the package
microbenchmark

Profiling
install.packages("microbenchmark")
library(microbenchmark)

f1 <- function() {
a <- 0
for (i in 1:1000) {
a <- a + i
}
}

# The argument times sets the number of iterations
microbenchmark(f1(), times=1000)

When we want to speed up our code, the first thing we should do is look at it and ask ourselves the following questions:

Is my code ok?
Is everything useful?
Do I repeat some tasks needlessly?
Are there other ways to do that?

To program efficiently, we have to think efficiently first and remove everything that can be removed.
This might also usually provide a simpler code to read.
First step : thinking
Let's create a function that:
Takes a number
a
Adds
a
to every number from 1 to 100
If
a
is less than 5, then we will add
2*a
instead
Sums of all the elements of the modified sequence.
First step : thinking
Here's a way to do it:
f2 <- function(a) {
# initialize our result
result <- 0
# iterate on the sequence from 1 to 100
for (i in 1:100) {
if (a < 5) {
# a is < 5, we add 2 * a to the sequence element and to a. We save it in result
result <- result + i + (2 * a)
} else {
# a is >= 5, we do not add 1
result <- result + i + a
}
}
return(result)
}
f2(4)
Our previous example works well. However, a is constant so it's useless to check if it is less than 5 in each iteration.
Here's another more efficient way:
First step : thinking
f3 <- function(a) {
# initialize our result
result <- 0

# Check if a < 5 and add 1 if true
if (a < 5) {
a <- 2 * a
}
# We don't even need an else here since a remains the same otherwise

# iterate on the sequence from 1 to n
for (i in 1:100) {
result <- result + i + a
}
return(result)
}

f3(4)

microbenchmark(f2(4),
f3(4), times=1000)
Just by thinking a little bit, our code became faster and easier to understand.
Now we can do even better with the power of R
First step : thinking
f4 <- function(a) {
result <- 0
if (a < 5) {
a <- a * 2
}
result <- sum(1:100 + a)
return(result)
}


f4(4)
microbenchmark(f3(4), f4(4), times=10000)

Wow our code just got ten times faster and also smaller...
What just happened?
R is an interpreted language actually written in C.

R code is slower since it has to be decoded into C functions.

Some R functions are direct links to C functions and are therefore way faster and optimized

R is usually optimized for vectorization, i.e. operations on vectors.

So it is usually way faster to perform operation directly on vectors instead of looping over them
Vectorization
Vectorization
v1 <- 1:5
v2 <- 2:6
v3 <- 1:3

v1 + 2 # Addition on a vector : adds 2 to all elements
v1 + v2 # Adds each element of v2 to v
v1 + v3 # v3 is recycled since it is shorter than v1

sum(v1) # Adds all elements of v1 together
sum(v1, v2) # Sums all elements of v1 and v2
mean(v1) # Average of elements in v1
mean(c(v1, v2)) # Average of elements of v1 and v2.
Here are some examples of operations on vectors
Subsetting / logical indexing
Extracts data way faster than loops.
Is done with the
[ ]
operator by providing a set of indexes or conditions returning a set of indexes.
v1 <- 1:10
v1[7] # Extracts the 7th value
v1[v1 > 5] # Extracts values > 5 only
v1[which(v1 > 5)] # same as before
In data frames, the
$
operator allows to access columns directly.
Remember that columns of a data frame are always vectors
data(CO2)
CO2$Type # Prints columns Type
CO2[, "Type"] # Same as above
CO2[CO2$Type == "Quebec", ] # Extracts all rows of the CO2 dataset where the Type is "Quebec"
Challenge 7
Create a new function
recalibrate2()
rewrites the function
recalibrate()
seen earlier using subsetting and vectorization techniques.
The new function should not be longer than 3 lines.
Reminder:
Challenge 7
Solution
recalibrate2 <- function(CO2, type, bias) {

# First get the indexes of the data with the good type
# Thinking tip : since we use the indexes twice below, instead of using which()
# twice, let's do it only once and save the result!
idx <- CO2$Type == type

# Modify only the data concerned using indexes.
CO2$uptake[idx] <- CO2$uptake[idx] + bias
return (CO2)
}

# Check the results are the same
all.equal(recalibrate(CO2, "Quebec", 20), recalibrate2(CO2, "Quebec", 20))

# Check that this is indeed way faster
microbenchmark(recalibrate(CO2, "Quebec", 20),
recalibrate2(CO2, "Quebec", 20))
recalibrate <- function(CO2, type, bias) {
for (i in 1:nrow(CO2)) {
if(CO2$Type[i] == type) {
CO2$uptake[i] <- CO2$uptake[i] + bias
}
}
return (CO2)
}
Growing objects
Sometimes loops can't be avoided. In these cases, pay extra attention to objects that grow with each iteration.
Take these two functions
growing <- function(n) {
# declare our result
result <- NULL
for (i in 1:n) {
# create our result by growing our object
result <- c(result, i)
}
return(result)
}

growing2 <- function(n) {
# declare our result : here we create a vector of length n with 0 in it
result <- numeric(n)
for (i in 1:n) {
# now we just modify our value instead of recreating the vector
result[i] <- i
}
return(result)
}
Growing objects
system.time({
growing(10000)
})
system.time({
growing2(10000)
})


system.time({
growing(50000)
})
system.time({
growing2(50000)
})
Let's compare their speeds
Performance dropped so much because when calling a function, arguments are first copied in memory.
So as your object grows, the time needed to copy it when calling c() increases.
This problem is resolved by preallocating your result object and filling it
Growing objects
The same problems appears with data frames and functions like cbind() or rbind().
However it is a bit more complex.
growingdf <- function(n, row) {
# preallocate our dataframe
df <- data.frame(numeric(n), character(n), stringsAsFactors=FALSE)
for (i in 1:n) {
# replace the ith row with row
df[i,] <- row
}
return(df)
}

growingdf2 <- function(n, row) {
# this is the way to allocate a list with n elements
df <- vector("list", n)
for (i in 1:n) {
# put row in the ith element
df[[i]] <- row
}
return(do.call(rbind, df))
}

# store our row in a list since we have different types
row <- list(1, "Hello World")
microbenchmark(growingdf(5000, row),
growingdf2(5000, row),
times=10)
The apply family
To prevent the growing object problem.
Not always the best solution performance-wise (they sometimes hide a for loop)
Allow to apply easily a function on rows or columns of a data frame
df <- data.frame(1:100, 101:200)

# Sum on rows
apply(df, 1, sum)

# Mean on columns
apply(df, 2, mean)

# we can also supply additional arguments to the function
apply(df, 2, mean, na.rm=TRUE)

# we can also define a function directly. The first argument is always what
# we iterate on. Here each row is treated as a vector of numbers as we can
# see with the str() function
apply(df, 1, function(x){str(x)})

# We can also add other arguments
apply(df, 1, function(x, y){x[2] - x[1] + y}, y=5)
The apply family
When looking for speed, The most interesting apply functions are probably lapply(), sapply and vapply() since they are primitives written in C. But they are more complex to use.
a <- list(1:100, 101:200)

# apply mean to each element of the list
lapply(a, mean) # we get a list as a result
unlist(lapply(a, mean)) # use unlist to get a vector instead
sapply(a,mean) #Same result

vapply(a, mean, 0) # the result of mean is a single number, we tell vapply our result will be a number
But remember...
Before spending time speeding up your code, first ask yourselves :

Is it really worth it??

Because, sometimes, spending 1 hour optimizing your code to effectively save 15 seconds on computing time is just not really that good a deal...
if and if/else test a single condition
Use "ifelse" function to:
test a vector of conditions
apply a function only under certain conditions

What if you want to test more than one thing?
a <- 1:10
ifelse(a > 5, "yes", "no")

a <- (-4):5
sqrt(ifelse(a >= 0, a, NA))
== equal to
!= not equal to
!x not x
< less than
<= less than or equal to
> greater than
>= greater than or equal to
x&y x AND y
x|y x OR y
isTRUE(x) test if X is true
Remember the logical operators
Exercise 1
Paws <- "cat"
Scruffy <- "dog"
Sassy <- "cat"
animals <- c(Paws, Scruffy, Sassy)
1. Use an if statement to print “meow” if Paws is a “cat”.

2. Use an if/else statement to print “woof” if you supply an object that is a “dog” and “meow” if it is not. Try it out with Paws and Scruffy.

3. Use the ifelse function to display “woof” for animals that are dogs and “meow” for animals that are cats.
The letter 'i' can be replaced with any variable name and the sequence can be almost anything, even a list of vectors.
for (a in c("Hello", "R", "Programmers")) {
print(a)
}

for (z in 1:30) {
a <- rnorm(n = 1, mean = 5, sd = 2)
print(a)
}

elements <- list(1:3, 4:10)
for (element in elements) {
print(element)
}
Loops are often used to loop over a dataset. We will use loops to perform functions on the CO2 dataset which is built in to R.
data(CO2) # This loads the built in dataset
for (i in 1:length(CO2[,1])) { # for each row in the CO2 dataset
print(CO2$conc[i]) #print the CO2 concentration
}

for (i in 1:length(CO2[,1])) { # for each row in the CO2 dataset
if(CO2$Type[i] == "Quebec") { # if the type is "Quebec"
print(CO2$conc[i]) #print the CO2 concentration }
}
}

# Tip 1 : to get the number of rows of a data frame, we can also use the function nrow
for (i in 1:nrow(CO2)) { # for each row in the CO2 dataset
print(CO2$conc[i]) #print the CO2 concentration
}
# Tip 2 : If we want to perform operations on only the elements of one column, we can directly
# iterate over it.
for (i in CO2$conc) { # for every element of the concentration column of the CO2 dataset
print(i) # print the ith element
}
The expression part of the loop can be almost anything and is usually a compound statement containing many commands.
for (i in 4:5) { # for i in 4 to 5
print(colnames(CO2)[i])
print(mean(CO2[,i]))
}
Note that this could be done more quickly using apply(), but that wouldn't teach you about loops. We will talk about it later.


Exercise 2
You have realized that your tool for measuring uptake was not calibrated properly at Quebec sites and all measurements are 2 units higher than they should be. Use a loop to correct these measurements for all Quebec sites.
Make sure you reload the data so that we are working with the raw data for the rest of the exercise:
data(CO2)
Modifying iterations
Normally, loops iterate over and over until they finish.


To change this behavior, you can use:
break

breaks out of the loops execution entirely
next
stops executing the current iteration and jumps to the next iteration.


count <- 0

for (i in 1:length(CO2[,1])) {
if (CO2$Treatment[i] == "nonchilled") next
#Skip to next iteration if treatment is nonchilled
count <- count + 1
print(CO2$conc[i])
}
print(count)

# The count and print command were performed 42 times.
count <- 0
i <- 0
repeat {
i <- i + 1
if (CO2$Treatment[i] == "nonchilled") next
# next tells R to skip this loop
count <- count + 1
print(CO2$conc[i])
if (i == length(CO2[,1])) break # stop looping
}

print(count)

Example
Print the CO2 concentrations for "chilled" treatments and keep count of how many replications there were.
This could be equivalently written using a repeat loop:
Example (Continued)
Example (Continued)
This could be equivalently written using a while loop:
i <- 0
count <- 0
while (i < length(CO2[,1]))
{
i <- i + 1
if (CO2$Treatment[i] == "nonchilled") next # skip this loop
count <- count + 1
print(CO2$conc[i])
}
print(count)
Exercise 3
Make sure you reload the data so that we are working with the raw data for the rest of the exercise:
data(CO2)
You have realized that your tool for measuring concentration didn't work properly. At Mississippi sites, concentrations less than 300 were measured correctly but concentrations >= 300 were overestimated by 20 units. Use a loop to correct these measurements for all Mississippi sites.
Using flow control to make a complex plot
Dataset
concentration
uptake
type (Quebec or Mississippi)
treatment (chilled or nonchilled)





How do we plot the points differently to show types and treatments?


plot(x=CO2$conc, y=CO2$uptake, type="n", cex.lab=1.4, xlab="CO2 concentration", ylab="CO2 uptake") # Type "n" tells R to not actually plot the points.

for (i in 1:length(CO2[,1])) {
if (CO2$Type[i] == "Quebec" & CO2$Treatment[i] == "nonchilled") {
points(CO2$conc[i], CO2$uptake[i], col="red",type="p")
}
if (CO2$Type[i] == "Quebec" & CO2$Treatment[i] == "chilled") {
points(CO2$conc[i], CO2$uptake[i], col="blue")
}
if (CO2$Type[i] == "Mississippi" & CO2$Treatment[i] == "nonchilled") {
points(CO2$conc[i], CO2$uptake[i], col="orange")
}
if (CO2$Type[i] == "Mississippi" & CO2$Treatment[i] == "chilled") {
points(CO2$conc[i], CO2$uptake[i], col="green")
}
}
Using control flow to make a complex plot
head(CO2) # Look at the dataset
unique(CO2$Type)
unique(CO2$Treatment)
Generate a plot of showing concentration versus uptake where each plant is shown using a different colour point.

Bonus points for doing it with nested loops!
Exercise 4

while
loops and
repeat
loops operate similarly to for loops

Once you understand how for loops work, you should be able to use any type of loop.

You will see some examples of while loops and repeat loops in the next section.
while loops and repeat loops
Other packages of interest
knitr
Write R code in Markdown or in Latex
Compile or 'knit' code to html, PDF or Word.
Shiny: similar concept, but for interactive web documents
Note that there are other tools to create a complex plot (such as ggplot which was covered in workshop 3)
nested loops
In some cases, you may want to use nested loops to accomplish a task. When using nested loops, it is important to use different variables as counters for each of your loops (here we used i and n).
for (i in 1:5) {
for (n in 1:5) {
print (i*n)
}
}
Program flow control can be simply defined as the order in which a program is executed

Control Flow

Flow charts can be used to plan programs, and represent structure.
Coded Solutions

It decreases the complexity and time of the task at hand.
This logical structure also means that the code has increased clarity.
It also means that many programmers can work on one program. This means increased productivity.
Why is it advantageous to have structured programs?
Start of a process and has only one output.
This is the operation carried out.
These blocks must have an input and output.
Boolean choice: it has one input and two outputs.
Opposite of the ‘Start’ symbol.
An example of a real life awkward situation
Representing structure
The two basic building blocks of codes are the following:
Selection
Iteration
Program’s execution determined by statements
Repetition, where the statement will
loop
until a criteria is met
if
if else
for
while
repeat
if(condition) {
expression 1
} else {
expression 2
}
if ... else
statement
Decision making

Decision making is an important part of programming
if (test_expression1) {
statement1
} else if (test_expression2) {
statement2
} else if (test_expression3) {
statement3
} else
statement4

nested
if ... else
statement
Not convinced that your life is a program?
Let us take a look at our graduate lives!
Every time some operation(s) has to be repeated, a
loop
may come in handy.
for (i in 1:5) {
expression
}
for
statement
for (val in sequence) {
statement
}
x <- c(2,5,3,9,6)
count <- 0

for (val in x) {
if(val %% 2 == 0)
count = count+1 }

print(count)
[1] 2
for
statement
while
statement
while (test_expression) {
statement
}
i <- 1

while (i < 6) {
print(i)
i = i+1
}
while
statement
[1] 1
[2] 2
[3] 3
[4] 4
[5] 5
break
statement
for (val in x) {
if (condition){
break
}
statement
}
for (val in x) {
if (condition){
next
}
statement
}
next
statement
repeat
statement
repeat {
statement
}
for (i in 1:length(CO2[,1])) {
if(CO2$Type[i] == "Mississippi") {
if(CO2$conc[i] < 300) next
CO2$conc[i] <- CO2$conc[i] - 20
}
}
# Note : We could also have written it that way, which is more concise and clear
for (i in 1:nrow(CO2)) {
if(CO2$Type[i] == "Mississippi" && CO2$conc[i] >= 300) {
CO2$conc[i] <- CO2$conc[i] - 20
}
}
for (i in 1:length(CO2[,1])) {
if(CO2$Type[i] == "Quebec") {
CO2$uptake[i] <- CO2$uptake[i] - 2
}
}
tapply(CO2$uptake,CO2$Type,mean)
plot(x=CO2$conc, y=CO2$uptake, type="n", cex.lab=1.4,xlab="CO2 concentration", ylab="CO2 uptake")
# Type "n" tells R to not actually plot the points.

plants <- unique(CO2$Plant)

for (i in 1:length(CO2[,1])){
for (p in 1:length(plants)) {
if (CO2$Plant[i] == plants[p]) {
points(CO2$conc[i], CO2$uptake[i], col=p, type="p")
}
}
}
library(RgoogleMaps)

myhome=getGeoCode('Stewart Biology Building, Montreal');

mymap<-GetMap(center=myhome, zoom=14)

PlotOnStaticMap(mymap,lat=myhome['lat'],lon=myhome['lon'],
cex=5,pch=10,lwd=3,col=c('red'));
RgoogleMaps
Thank you for coming!
Full transcript