만나보세요 

Prezi AI.

새로운 프레젠테이션 도우미가 기다리고 있어요.

뎌욱 빠르게 컨텐츠를 다듬고, 보강하고, 편집하고, 원하는 이미지를 찾고, 시각자료를 편집하세요.

로딩중
스크립트

for (i in 1:length(CO2[,1])) {

if(CO2$Type[i] == "Quebec") {

CO2$uptake[i] <- CO2$uptake[i] - 2

}

}

tapply(CO2$uptake,CO2$Type,mean)

plot(x=CO2$conc, y=CO2$uptake, type="n", cex.lab=1.4,xlab="CO2 concentration", ylab="CO2 uptake")

# Type "n" tells R to not actually plot the points.

plants <- unique(CO2$Plant)

for (i in 1:length(CO2[,1])){

for (p in 1:length(plants)) {

if (CO2$Plant[i] == plants[p]) {

points(CO2$conc[i], CO2$uptake[i], col=p, type="p")

}

}

}

f2 <- function(a) {

# initialize our result

result <- 0

# iterate on the sequence from 1 to 100

for (i in 1:100) {

if (a < 5) {

# a is < 5, we add 2 * a to the sequence element and to a. We save it in result

result <- result + i + (2 * a)

} else {

# a is >= 5, we do not add 1

result <- result + i + a

}

}

return(result)

}

f2(4)

f3 <- function(a) {

# initialize our result

result <- 0

# Check if a < 5 and add 1 if true

if (a < 5) {

a <- 2 * a

}

# We don't even need an else here since a remains the same otherwise

# iterate on the sequence from 1 to n

for (i in 1:100) {

result <- result + i + a

}

return(result)

}

f3(4)

microbenchmark(f2(4),

f3(4), times=1000)

Outline

Why program in R?

Pre-workshop:

  • Control Flow
  • Writing functions in R
  • Speeding up your code
  • Useful R packages for biologists
  • Reuse and share your code
  • Achieve greater consistency
  • Avoid copy/paste errors
  • Avoid reinventing the wheel
  • Redo your analysis quickly and easily
  • Use R to do repetitive tasks for you
  • Understand what R is doing to your data
  • Do analyses that nobody has prepackaged
  • It’s fun! (no, really!)

Install R (or use it on the computers here):

http://cran.r-project.org/

Install an R environment, such as R Studio:

http://rstudio.org/

Download the slides and the .R script:

http://bit.ly/yRjShO

Twiddle your thumbs impatiently

An example of a real life awkward situation

Representing structure

Control Flow

Coded Solutions

The two basic building blocks of codes are the following:

Flow charts can be used to plan programs, and represent structure.

Program flow control can be simply defined as the order in which a program is executed

Why is it advantageous to have structured programs?

  • It decreases the complexity and time of the task at hand.
  • This logical structure also means that the code has increased clarity.
  • It also means that many programmers can work on one program. This means increased productivity.

Start of a process and has only one output.

Not convinced that your life is a program?

Let us take a look at our graduate lives!

1. Control Flow

Exercise 1

Beware of R’s expression parsing!

Remember the logical operators

Decision making

Use curly brackets { } so that R knows to expect more input.

Try:

Decision making is an important part of programming

Paws <- "cat"

Scruffy <- "dog"

Sassy <- "cat"

animals <- c(Paws, Scruffy, Sassy)

if (2+2)==4 print("Arithmetic works.")

else print("Houston, we have a problem.")

This doesn't work because R evaluates the first line and doesn't know that you are going to use an else statement.

Instead use:

== equal to

!= not equal to

!x not x

< less than

<= less than or equal to

> greater than

>= greater than or equal to

x&y x AND y

x|y x OR y

isTRUE(x) test if X is true

if (2+2)==4 {

print("Arithmetic works.")

} else {print("Houston, we have a problem."}

1. Use an if statement to print “meow” if Paws is a “cat”.

2. Use an if/else statement to print “woof” if you supply an object that is a “dog” and “meow” if it is not. Try it out with Paws and Scruffy.

3. Use the ifelse function to display “woof” for animals that are dogs and “meow” for animals that are cats.

When using brackets, R waits to evaluate the command until the brackets have been closed.

nested if ... else statement

nested loops

Exercise 2

# Tip 1 : to get the number of rows of a data frame, we can also use the function nrow

Iteration

Loops are often used to loop over a dataset. We will use loops to perform functions on the CO2 dataset which is built in to R.

The expression part of the loop can be almost anything and is usually a compound statement containing many commands.

Every time some operation(s) has to be repeated, a loop may come in handy.

for (i in 1:nrow(CO2)) { # for each row in the CO2 dataset

print(CO2$conc[i]) #print the CO2 concentration

}

In some cases, you may want to use nested loops to accomplish a task. When using nested loops, it is important to use different variables as counters for each of your loops (here we used i and n).

You have realized that your tool for measuring uptake was not calibrated properly at Quebec sites and all measurements are 2 units higher than they should be. Use a loop to correct these measurements for all Quebec sites.

for (i in 4:5) { # for i in 4 to 5

print(colnames(CO2)[i])

print(mean(CO2[,i]))

}

Loops are good for:

  • doing something for every element of an object
  • doing something until the processed data runs out
  • doing something for every file in a folder
  • doing something that can fail, until it succeeds
  • iterating a calculation until it converges

data(CO2) # This loads the built in dataset

for (i in 1:length(CO2[,1])) { # for each row in the CO2 dataset

print(CO2$conc[i]) #print the CO2 concentration

}

for (i in 1:length(CO2[,1])) { # for each row in the CO2 dataset

if(CO2$Type[i] == "Quebec") { # if the type is "Quebec"

print(CO2$conc[i]) #print the CO2 concentration }

}

}

# Tip 2 : If we want to perform operations on only the elements of one column, we can directly

# iterate over it.

Make sure you reload the data so that we are working with the raw data for the rest of the exercise:

Note that this could be done more quickly using apply(), but that wouldn't teach you about loops. We will talk about it later.

while statement

while loops and repeat loops

for statement

while (test_expression) {

statement

}

while statement

i <- 1

while (i < 6) {

print(i)

i = i+1

}

[1] 1

[2] 2

[3] 3

[4] 4

[5] 5

for (i in 1:5) {

for (n in 1:5) {

print (i*n)

}

}

  • while loops and repeat loops operate similarly to for loops

  • Once you understand how for loops work, you should be able to use any type of loop.

  • You will see some examples of while loops and repeat loops in the next section.

for (val in sequence) {

statement

}

for (i in CO2$conc) { # for every element of the concentration column of the CO2 dataset

print(i) # print the ith element

}

repeat statement

repeat {

statement

}

for statement

x <- c(2,5,3,9,6)

count <- 0

for (val in x) {

if(val %% 2 == 0)

count = count+1 }

print(count)

[1] 2

Another example of a for loop.

Syntax

for (i in 1:5) {

expression

}

The example above would cause R to evaluate the expression 5 times. In the first iteration, R would replace every instance of i with 1. In the second iteration i would be replaced with 2, and so on.

Try

for (m in 4:10) {

print(m*2)

}

The letter 'i' can be replaced with any variable name and the sequence can be almost anything, even a list of vectors.

for (a in c("Hello", "R", "Programmers")) {

print(a)

}

for (z in 1:30) {

a <- rnorm(n = 1, mean = 5, sd = 2)

print(a)

}

elements <- list(1:3, 4:10)

for (element in elements) {

print(element)

}

data(CO2)

Quebec Centre for Biodiversity Science

R Workshop Series

Example (Continued)

Exercise 3

Example

Modifying iterations

This could be equivalently written using a while loop:

Print the CO2 concentrations for "chilled" treatments and keep count of how many replications there were.

This could be equivalently written using a repeat loop:

You have realized that your tool for measuring concentration didn't work properly. At Mississippi sites, concentrations less than 300 were measured correctly but concentrations >= 300 were overestimated by 20 units. Use a loop to correct these measurements for all Mississippi sites.

break statement

for (val in x) {

if (condition){

break

}

statement

}

count <- 0

i <- 0

repeat {

i <- i + 1

if (CO2$Treatment[i] == "nonchilled") next

# next tells R to skip this loop

count <- count + 1

print(CO2$conc[i])

if (i == length(CO2[,1])) break # stop looping

}

print(count)

Normally, loops iterate over and over until they finish.

To change this behavior, you can use:

  • break
  • breaks out of the loops execution entirely
  • next
  • stops executing the current iteration and jumps to the next iteration.

count <- 0

for (i in 1:length(CO2[,1])) {

if (CO2$Treatment[i] == "nonchilled") next

#Skip to next iteration if treatment is nonchilled

count <- count + 1

print(CO2$conc[i])

}

print(count)

# The count and print command were performed 42 times.

i <- 0

count <- 0

while (i < length(CO2[,1]))

{

i <- i + 1

if (CO2$Treatment[i] == "nonchilled") next # skip this loop

count <- count + 1

print(CO2$conc[i])

}

print(count)

next statement

for (val in x) {

if (condition){

next

}

statement

}

Make sure you reload the data so that we are working with the raw data for the rest of the exercise:

data(CO2)

Exercise 4

Using control flow to make a complex plot

Using flow control to make a complex plot

Generate a plot of showing concentration versus uptake where each plant is shown using a different colour point.

Bonus points for doing it with nested loops!

plot(x=CO2$conc, y=CO2$uptake, type="n", cex.lab=1.4, xlab="CO2 concentration", ylab="CO2 uptake") # Type "n" tells R to not actually plot the points.

for (i in 1:length(CO2[,1])) {

if (CO2$Type[i] == "Quebec" & CO2$Treatment[i] == "nonchilled") {

points(CO2$conc[i], CO2$uptake[i], col="red",type="p")

}

if (CO2$Type[i] == "Quebec" & CO2$Treatment[i] == "chilled") {

points(CO2$conc[i], CO2$uptake[i], col="blue")

}

if (CO2$Type[i] == "Mississippi" & CO2$Treatment[i] == "nonchilled") {

points(CO2$conc[i], CO2$uptake[i], col="orange")

}

if (CO2$Type[i] == "Mississippi" & CO2$Treatment[i] == "chilled") {

points(CO2$conc[i], CO2$uptake[i], col="green")

}

}

Dataset

  • concentration
  • uptake
  • type (Quebec or Mississippi)
  • treatment (chilled or nonchilled)

How do we plot the points differently to show types and treatments?

head(CO2) # Look at the dataset

unique(CO2$Type)

unique(CO2$Treatment)

Note that there are other tools to create a complex plot (such as ggplot which was covered in workshop 3)

Workshop 5: Programming in R

Writing Functions

Arguments

Syntax

Challenge 5

What is a function?

Why write functions?

Using what you learned previously on flow control, create a function print_animal that takes an animal as argument and gives the following results :

Example

With more than one argument :

function_name <- function(argument1, argument2, ...) {

expression # What we want the function to do

return(value) # Optional.

}

function_name <- function(argument1, argument2, ...) {

...

expression # What we want the function to do

...

return(value) # Optional.

}

> Scruffy <- "dog"

> Paws <- "cat"

> print_animal(Scruffy)

print_number <- function(number) {

print(number)

}

> print_number(2)

> print_number(231)

operations <- function(number1, number2, number3) {

result <- (number1 + number2) * number3

print(result)

}

> operations(1, 2, 3)

> operations(17, 23, 2)

- Perform a task repeatedly, but configurably

- Make your code more readable

- Make your code easier to modify and maintain

- Share code between different analyses

- Share code with other people

- Modify R’s built-in functionality

> [1] "woof"

The entry values of the function, the information required for the function to work.

They are variables available only in the function.

A function can have between 0 and an infinity of arguments

print_animal(Paws)

The expression part can contain virtually anything : statements, loops, conditional statements, even other functions.

> [1] "meow"

Return value

Challenge 6

The ... argument

Default values

The ... argument

Challenge 6

Solution

Challenge 5

Solution

- To pass on arguments to another function used inside your function

To avoid writing all arguments all the time when calling the function and still be flexible

- To allow the user to input an indefinite number of arguments

Allows you to save the result of our function and be able to use it later.

Only one return value can be provided by a function.

The function will exit once it hits the return() keyword

Using what you learned so far on functions and flow control, create a function bigsum that takes two arguments a and b and :

  • returns 0 if the sum of a and b is strictly less than 50
  • returns the sum of a and b otherwise

print_animal <- function(animal) {

if (animal == "dog") {

print("woof")

} else if (animal == "cat") {

print("meow")

}

}

bigsum <- function(a, b) {

result <- a + b

if (result < 50) {

return(0)

} else {

return (result)

}

}

operations <- function(number1, number2, number3=3) {

result <- (number1 + number2) * number3

print(result)

}

> operations(1, 2, 3) # becomes equivalent to

> operations(1, 2)

> operations(1, 2, 2) # number3 can still be changed

sum2 <- function(...){

args <- list(...)

result <- 0

for (i in args) {

result <- result + i

}

return (result)

}

> sum2(2, 3)

> sum2(2, 4, 5, 7688, 1)

plot.CO2 <- function(CO2, ...) {

# We use ... to pass on arguments to plot()

plot(x=CO2$conc, y=CO2$uptake, type="n", ...)

for (i in 1:length(CO2[,1])){

if (CO2$Type[i] == "Quebec") {

# same for points

points(CO2$conc[i], CO2$uptake[i], col="red", type="p", ...)

} else if (CO2$Type[i] == "Mississippi") {

# same for points()

points(CO2$conc[i], CO2$uptake[i], col="blue", type="p", ...)

}

}

}

> plot.CO2(CO2, cex.lab=1.4, xlab="CO2 concentration", ylab="CO2 uptake")

> plot.CO2(CO2, cex.lab=1.4, xlab="CO2 concentration", ylab="CO2 uptake", pch=20)

returntest <- function(a, b) {

return (a) # The function exits here

a <- a + b # Not interpreted

return (a) # Not interpreted

}

> returntest(2, 3) # Prints the return value of your function

> c <- returntest(2, 3) # assign it to another variable to save it

> c

Website: http://qcbs.ca/wiki/r_workshop5

Accessibility of variables

Be careful when creating variable inside a conditional statement.

Instead, use arguments. Inside a function, arguments names will take over other variable names.

Always keep in mind where your variables are and if they are accessible.

  • Variables defined inside a function are not accessible outside
  • Variables defined outside a function are accessible inside. But it is NEVER a good idea!

a <- 3

if (a > 5) {

b <- 2

}

a + b # Error, b is not created because a < 5

var1 <- 3 # var1 is defined outside our function

vartest <- function(var1) {

print(var1) # print var1

}

vartest(8) # Inside our function var1 is now our argument

var1 # var1 still has the same value

It is usually a good practice to define variables outside the conditions and then modify their value to avoid any problem

rm(list=ls()) # remove everything to avoid any confusion

var1 <- 3 # var1 is defined outside our function

vartest <- function() {

a <- 4 # a is defined inside

print(a) # print a

print(var1) # print var1

}

a # print a. Error, a can be seen only inside the function

vartest() # calling vartest() will print a and var1

rm(var1) # remove var1

vartest() # calling the function again doesn't work anymore

a <- 3

b <- 0

if (a > 5) {

b <- 2

}

a + b

Good programming practices

Use functions

Keep a clean code

Why?

Takes more space but easier to read and understand

Woops made a mistake :

Or this :

What not to do

Helps reducing the number of errors done by copying/pasting similar chunks of code and reduces the time needed if we want to change them.

Let's modify the example from exercise 3 and suppose that all CO2 uptake from Mississipi were overestimated by 20 and Quebec underestimated by 50.

We could write this :

recalibrate <- function(CO2, type, bias) {

for (i in 1:nrow(CO2)) {

if(CO2$Type[i] == type) {

CO2$uptake[i] <- CO2$uptake[i] + bias

}

}

# return the new dataset

return (CO2)

}

a<-4;b=3

if(a<b){

if(a==0)print("a zero") } else {

if(b==0){print("b zero")} else print(b)}

To make your life easier!!!

It helps achieve greater readability and makes sharing and reusing your code a lot less painful.

Having an easy to read code will reduce the time you'll spend to understand it so it's never time lost

recalibrate <- function(CO2, type, bias) {

for (i in 1:nrow(CO2)) {

if(CO2$Type[i] == type) {

CO2$conc[i] <- CO2$conc[i] + bias

}

}

# return the new dataset

return (CO2)

}

# don't forget to save the results!

newCO2 <- recalibrate(CO2, "Mississipi", -20)

newCO2 <- recalibrate(newCO2, "Quebec", +50)

a <- 4

b <- 3

if(a < b){

if(a == 0) {

print("a zero")

}

} else {

if(b == 0){

print("b zero")

} else {

print(b)

}

}

Proper indentation and spacing is the first step to get an easy to read code. Here are some suggestions

  • Use spaces between and after your operators
  • Use consistentely the same assignation operator. `←` is often preferred, `=` is ok but don't switch all the time between the two
  • Use brackets when using flow control statements
  • Inside brackets, indent by at least two spaces.
  • Put closing brackets on a separate line, except when preceding an else statement.
  • Define each variable on its own line

for (i in 1:length(CO2[,1])) {

if(CO2$Type[i] == "Mississippi") {

CO2$conc[i] <- CO2$conc[i] - 20

}

}

for (i in 1:length(CO2[,1])) {

if(CO2$Type[i] == "Quebec") {

CO2$conc[i] <- CO2$conc[i] + 50

}

}

Yay, less changes to make. And it looks waaay cooler!!

Comments

Use meaningful names

To help the others and yourself!!

Same function, stupid names, way harder to understand at first sight :

## recalibrates the CO2 dataset by modifying the CO2 uptake concentration

## by a fixed amount depending on the region of sampling

# Arguments

# CO2: the CO2 dataset

# type: the type that need to be recalibrated. Values: "Mississippi" or "Quebec"

# bias: the amount to add to the concentration uptake. Use negative values for overestimations

recalibrate <- function(CO2, type, bias) {

for (i in 1:nrow(CO2)) {

if(CO2$Type[i] == type) {

CO2$uptake[i] <- CO2$uptake[i] + bias

}

}

# we have to return our new dataset because the original is not modified

return (CO2)

}

rc <- function(c, t, b) {

for (i in 1:nrow(c)) {

if(c$Type[i] == t) {

c$uptake[i] <- c$uptake[i] + b

}

}

return (c)

}

Thank you for coming!

Speeding up your code

First step : thinking

Profiling

First step : thinking

Just by thinking a little bit, our code became faster and easier to understand.

Now we can do even better with the power of R

Our previous example works well. However, a is constant so it's useless to check if it is less than 5 in each iteration.

Here's another more efficient way:

Because if we want to optimize, we will need to know how much time it takes!

Let's create a function that:

  • Takes a number a
  • Adds a to every number from 1 to 100
  • If a is less than 5, then we will add 2*a instead
  • Sums of all the elements of the modified sequence.

To compare the efficiency of several functions with accurate precision, you can use the package microbenchmark

To have a more detailed output of the time spent in each function, you can use the function Rprof()

system.time({

a <- 0

for (i in 1:1000) {

a <- a + i

}

})

Here's a way to do it:

f4 <- function(a) {

result <- 0

if (a < 5) {

a <- a * 2

}

result <- sum(1:100 + a)

return(result)

}

f4(4)

microbenchmark(f3(4), f4(4), times=10000)

When we want to speed up our code, the first thing we should do is look at it and ask ourselves the following questions:

Is my code ok?

Is everything useful?

Do I repeat some tasks needlessly?

Are there other ways to do that?

To program efficiently, we have to think efficiently first and remove everything that can be removed.

This might also usually provide a simpler code to read.

install.packages("microbenchmark")

library(microbenchmark)

f1 <- function() {

a <- 0

for (i in 1:1000) {

a <- a + i

}

}

# The argument times sets the number of iterations

microbenchmark(f1(), times=1000)

Repeating our code might be necessary for time to be measurable

Rprof("profile.txt") # Saves results in file profile.txt

a <- 0

for (i in 1:1000000) {

a <- a + i

}

Rprof(NULL) # Ends the profiling

summaryRprof("profile.txt") # Display the result of profiling

Wow our code just got ten times faster and also smaller...

What just happened?

system.time(replicate(1000, {

a <- 0

for (i in 1:1000) {

a <- a + i

}

}))

Challenge 7

Vectorization

Growing objects

Subsetting / logical indexing

Challenge 7

Solution

Let's compare their speeds

Here are some examples of operations on vectors

Extracts data way faster than loops.

Is done with the [ ] operator by providing a set of indexes or conditions returning a set of indexes.

Create a new function recalibrate2() rewrites the function recalibrate() seen earlier using subsetting and vectorization techniques.

The new function should not be longer than 3 lines.

Reminder:

Sometimes loops can't be avoided. In these cases, pay extra attention to objects that grow with each iteration.

Take these two functions

system.time({

growing(10000)

})

system.time({

growing2(10000)

})

system.time({

growing(50000)

})

system.time({

growing2(50000)

})

v1 <- 1:10

v1[7] # Extracts the 7th value

v1[v1 > 5] # Extracts values > 5 only

v1[which(v1 > 5)] # same as before

v1 <- 1:5

v2 <- 2:6

v3 <- 1:3

v1 + 2 # Addition on a vector : adds 2 to all elements

v1 + v2 # Adds each element of v2 to v

v1 + v3 # v3 is recycled since it is shorter than v1

sum(v1) # Adds all elements of v1 together

sum(v1, v2) # Sums all elements of v1 and v2

mean(v1) # Average of elements in v1

mean(c(v1, v2)) # Average of elements of v1 and v2.

recalibrate2 <- function(CO2, type, bias) {

# First get the indexes of the data with the good type

# Thinking tip : since we use the indexes twice below, instead of using which()

# twice, let's do it only once and save the result!

idx <- CO2$Type == type

# Modify only the data concerned using indexes.

CO2$uptake[idx] <- CO2$uptake[idx] + bias

return (CO2)

}

# Check the results are the same

all.equal(recalibrate(CO2, "Quebec", 20), recalibrate2(CO2, "Quebec", 20))

# Check that this is indeed way faster

microbenchmark(recalibrate(CO2, "Quebec", 20),

recalibrate2(CO2, "Quebec", 20))

R is an interpreted language actually written in C.

R code is slower since it has to be decoded into C functions.

Some R functions are direct links to C functions and are therefore way faster and optimized

R is usually optimized for vectorization, i.e. operations on vectors.

So it is usually way faster to perform operation directly on vectors instead of looping over them

recalibrate <- function(CO2, type, bias) {

for (i in 1:nrow(CO2)) {

if(CO2$Type[i] == type) {

CO2$uptake[i] <- CO2$uptake[i] + bias

}

}

return (CO2)

}

In data frames, the $ operator allows to access columns directly.

Remember that columns of a data frame are always vectors

growing <- function(n) {

# declare our result

result <- NULL

for (i in 1:n) {

# create our result by growing our object

result <- c(result, i)

}

return(result)

}

growing2 <- function(n) {

# declare our result : here we create a vector of length n with 0 in it

result <- numeric(n)

for (i in 1:n) {

# now we just modify our value instead of recreating the vector

result[i] <- i

}

return(result)

}

Performance dropped so much because when calling a function, arguments are first copied in memory.

So as your object grows, the time needed to copy it when calling c() increases.

This problem is resolved by preallocating your result object and filling it

data(CO2)

CO2$Type # Prints columns Type

CO2[, "Type"] # Same as above

CO2[CO2$Type == "Quebec", ] # Extracts all rows of the CO2 dataset where the Type is "Quebec"

But remember...

The apply family

Growing objects

When looking for speed, The most interesting apply functions are probably lapply(), sapply and vapply() since they are primitives written in C. But they are more complex to use.

The same problems appears with data frames and functions like cbind() or rbind().

However it is a bit more complex.

To prevent the growing object problem.

Not always the best solution performance-wise (they sometimes hide a for loop)

Allow to apply easily a function on rows or columns of a data frame

Before spending time speeding up your code, first ask yourselves :

Is it really worth it??

Because, sometimes, spending 1 hour optimizing your code to effectively save 15 seconds on computing time is just not really that good a deal...

a <- list(1:100, 101:200)

# apply mean to each element of the list

lapply(a, mean) # we get a list as a result

unlist(lapply(a, mean)) # use unlist to get a vector instead

sapply(a,mean) #Same result

vapply(a, mean, 0) # the result of mean is a single number, we tell vapply our result will be a number

growingdf <- function(n, row) {

# preallocate our dataframe

df <- data.frame(numeric(n), character(n), stringsAsFactors=FALSE)

for (i in 1:n) {

# replace the ith row with row

df[i,] <- row

}

return(df)

}

growingdf2 <- function(n, row) {

# this is the way to allocate a list with n elements

df <- vector("list", n)

for (i in 1:n) {

# put row in the ith element

df[[i]] <- row

}

return(do.call(rbind, df))

}

# store our row in a list since we have different types

row <- list(1, "Hello World")

microbenchmark(growingdf(5000, row),

growingdf2(5000, row),

times=10)

df <- data.frame(1:100, 101:200)

# Sum on rows

apply(df, 1, sum)

# Mean on columns

apply(df, 2, mean)

# we can also supply additional arguments to the function

apply(df, 2, mean, na.rm=TRUE)

# we can also define a function directly. The first argument is always what

# we iterate on. Here each row is treated as a vector of numbers as we can

# see with the str() function

apply(df, 1, function(x){str(x)})

# We can also add other arguments

apply(df, 1, function(x, y){x[2] - x[1] + y}, y=5)

Other packages of interest

knitr

  • Write R code in Markdown or in Latex
  • Compile or 'knit' code to html, PDF or Word.
  • Shiny: similar concept, but for interactive web documents

RgoogleMaps

library(RgoogleMaps)

myhome=getGeoCode('Stewart Biology Building, Montreal');

mymap<-GetMap(center=myhome, zoom=14)

PlotOnStaticMap(mymap,lat=myhome['lat'],lon=myhome['lon'],

cex=5,pch=10,lwd=3,col=c('red'));

This is the operation carried out.

These blocks must have an input and output.

Boolean choice: it has one input and two outputs.

Opposite of the ‘Start’ symbol.

Selection

Program’s execution determined by statements

if

if else

Iteration

Repetition, where the statement will loop until a criteria is met

for

while

repeat

if statement

if(condition) {

expression

}

if ... else statement

if(condition) {

expression 1

} else {

expression 2

}

What if you want to test more than one thing?

  • if and if/else test a single condition
  • Use "ifelse" function to:
  • test a vector of conditions
  • apply a function only under certain conditions

a <- 1:10

ifelse(a > 5, "yes", "no")

a <- (-4):5

sqrt(ifelse(a >= 0, a, NA))

if (test_expression1) {

statement1

} else if (test_expression2) {

statement2

} else if (test_expression3) {

statement3

} else

statement4

for (i in 1:length(CO2[,1])) {

if(CO2$Type[i] == "Mississippi") {

if(CO2$conc[i] < 300) next

CO2$conc[i] <- CO2$conc[i] - 20

}

}

# Note : We could also have written it that way, which is more concise and clear

for (i in 1:nrow(CO2)) {

if(CO2$Type[i] == "Mississippi" && CO2$conc[i] >= 300) {

CO2$conc[i] <- CO2$conc[i] - 20

}

}

프레지로 더욱 인상깊고 역동적인 프레젠테이션을 만들어 보세요