**Quebec Centre for Biodiversity Science**

R Workshop Series

R Workshop Series

**Workshop 1: Introduction to R**

**Website:**

http://qcbs.ca/wiki/r/workshop1

http://qcbs.ca/wiki/r/workshop1

What is R?

R is an

open source

programming language designed for statistical analysis, data mining, and data visualization.

It's open source

Improved by the public, for the public!

Free

Types of data structures in R

Vectors

Data frames

One of the most common objects in R

Vectors

Data Frames

Used to store data tables

Matrices, arrays and lists

Indexing objects

Sometimes, we only want to look at or extract part of our data.

This is done with brackets: [ ]

We indicate the position of values we want to see between brackets. This is called indexing!

Indexing Vectors

> num.vector[3]

[1] 5

> num.vector[-3]

[1] 1 2 3 6 -2 4

> num.vector[num.vector > 5]

[1] 6

> col.vector[col.vector == "blue"]

[1] "blue"

Objects

One of the most useful concepts in R!

object name <- assigned value

You can store values as named objects using the assignment operator: "<-"

Objects

letters

numbers

periods

underscore

Object names can only include:

a-z A-Z

0-9

.

_

Objects should always begin with a

letter

!

Challenge 5

Create an object with a value of 1 + 1.718282 (Euler's number) and name it euler.value

It is also possible to use the "=" sign, but this can cause problems as this sign is also used for other purposes.

Avoid it!

Objects

> mean.x <- (2+6)/2

> mean.x

[1] 4

Try having short & explicit names for your variables. Naming a variable "var" isn't very informative!

When typing the object's name, R returns its value.

Adding spaces before and after the "<-" is recommended because it adds clarity.

Indexing Data Frames

Challenge 9

Explore the difference between these 2 lines of code:

> col.vector[col.vector == "blue"]

> col.vector == "blue"

Also, the names

Data1

and

data1

are not the same. R is case-sensitive!

The value on the

right

is assigned to the name on the

left

with the assignment operator "<-"

> col.vector[c(1,3)]

[1] "blue" "green"

> col.vector[c(1,4)]

[1] "blue" NA

We specify two dimensions: row & column number

data.frame.name[row,column]

Some Examples

> my.first.df[1,]

> my.first.df[,3]

Extracts the first line

Extracts the third column

> my.first.df[2,4]

Extracts the second element of the fourth column

> my.first.df[c(2:4),]

Extracts lines 2 to 4

> my.first.df$Site_ID

Extracts the variable "Site ID" from the data frame with $ sign

> my.first.df[c("Site_ID","soil.pH")]

Extracts the "Site ID" and "pH" variables from the data frame

**Intro & R as a calculator**

**Objects and indexing**

**Functions**

**Getting Help & Additional resources**

Some Useful R Books

A list of vectors of the same length

Columns = variables

Rows = observations, cases, sites, replicates...

Different

modes can be stored

***

The first four examples are also valid for indexing matrices.

Site_ID

soil pH

A1.01

A1.02

B1.01

B1.02

5.6

7.3

4.1

6.0

# of sp.

17

23

15

7

Treatment

Fert

Fert

No.Fert

No.Fert

Data Frames

We can use vectors for calculations.

Vectors

> x <- 1:5

> y <- 6

> x+y

[1] 7 8 9 10 11

> x*x

[1] 1 4 9 16 25

Operations are executed on each item.

Let's say we want to have this data frame in R:

Data Frames

We start by creating vectors:

> Site_ID<-c("A1.01","A1.02","B1.01","B1.02")

> soil.pH<-c(5.6,7.3,4.1,6.0)

> Treatment<-c("Fert","Fert","No.Fert","No.Fert")

> num.sp<-c(17,23,15,7)

We then combine them to create a data frame:

> my.first.df<-data.frame(Site_ID,soil.pH,num.sp,Treatment)

data.frame() is a function. We will come back to functions later.

Challenge 10

a) Extract the num.sp column from

my.first.df

and multiply its values by the first four values of

num.vec

.

b) After that, write a statement that checks if the values you obtained are greater than 25.

Why use R?

Why use R?

Why use R?

What people have traditionally done to analyze their data:

R allows you to do everything with one program!

Why use R?

More and more scientists use it every year!

Increasing capacities

Why use R?

It's compatible

R works on most existing operating systems

Challenges

Throughout these workshops you will be presented with a series of challenges that will be indicated by these rubiks cubes

During challenges, collaborate with your neighbours!

Challenge 1

Open R-Studio

The R Studio console

> input

How to read the console

[1] This is the output

> input

How to read the console

[1] This is the output

> input

How to read the console

[1] This is the output

> 1+1

Using the console as a calculator

[1] 2

Challenge 3

> 2+16*24-56/(2+1)-457

Challenge 3

Solution

[1] -89.66667

What does this bracket in the output mean ?

These brackets help you locate "where" you are in the output

[1] 1 2 3 4 5

[6] 6 7 8 9 10

> 2*2

[1] 4

> 10-1

[1] 8

[1] 9

> 2^3

Addition and subtraction:

Multiplication

and division:

> 8/2

[1] 4

Exponents:

Use R Studio to calculate the following skill testing question:

2 + 16 x 24 - 56/ (2+1) - 457

Hints:

think about the order of operations (PEMDAS)

Question:

2 + 16 x 24 - 56/ (2+1) - 457

Solution:

*Note that R follows the order of operations

Functions

A function is a tool used to simplify your life

It allows you to quickly execute operations on objects without having to write every mathematical step

A function needs entry values called

arguments

(or parameters). It then performs hidden operations on these arguments and gives a

return value

.

Functions

To use a function (call), the command must be structured properly, following the "grammar rules" of the R language (syntax)

> sum(1, 2)

Function name

Parenthesis

Argument 1

Argument 2

Comma

**Course outline**

Some Useful R Websites

http://stats.stackexchange.com

Challenge 2

Use R Studio to calculate the following skill testing question:

2 + 16 x 24 - 56

> 2+16*24-56

Challenge 2

Solution

[1] 330

Question:

2 + 16 x 24 - 56

Solution:

Hints:

the * symbol is used to multiply

Challenge 4

What is the area of a circle with a radius of 5cm?

> 3.1416*5^2

Challenge 4

Solution

[1] 78.54

Question:

What is the area of a circle with a radius of 5 cm?

Solution:

*Note there is no need to use parenthesis

Challenge 5

Solution

Question:

Create an object with a value of 1+ 1.718282 (Euler's number) and name it euler.value

Solution:

> euler.value <- 1 + 1.718282

[1] 2.718282

> euler.value

R Command line tip

Use the "up" and "down" arrow keys to reproduce previous commands

Give it a try!

You have to push "enter" for the output to appear

R Command line tip

Use the tab key to auto-complete scripts

This helps avoid spelling errors and speeds up command entering

R Command line tip

Enter:

> eu

Push "Tab"

Push "enter" to select the correct auto complete

Let's try it!

Challenge 9

Solution

> col.vector[col.vector == "blue"]

> col.vector == "blue"

In this line of code, you test a

logical statement

. For each entry in the "col.vector" vector, R checks whether the entry is equal to "blue" or not.

In this line of code, you ask R to extract all values within the "col.vector" vector that are exactly equal to "blue".

Challenge 8

a) Extract the 4th value of the "num.vector" vector

b) Extract the 1st and 3rd values of the "num.vector" vector

c) Extract all values from the "num.vector" vector except for the 2nd and 4th values

Challenge 8

Solution

a)

[1] 3

> num.vector[4]

b)

[1] 1 5

> num.vector[c(1,3)]

c)

[1] 1 5 6 -2 4

> num.vector[c(-2,-4)]

Challenge 10

Solution

Part 1

Part 2

> my.first.df$num.sp * num.vector[c(1:4)]

[1] 17 46 75 21

> (my.first.df$num.sp * num.vector[c(1:4)]) > 25

[1] FALSE TRUE TRUE FALSE

https://www.zoology.ubc.ca/~schluter/R/

http://www.statmethods.net/

http://www.rseek.org/

Getting help with functions

Challenge 6

Solution

Question:

Create a second object (you decide the value) with a name that starts with a number. What happens?

Solution:

Creating an object with a name that starts with a number will return the following error:

unexpected symbol in "your object name"

http://cran.r-project.org/doc/contrib/Baggott-refcard-v2.pdf

http://www.cookbook-r.com/

Challenge 6

Create a second object (you decide the name) with a name that starts with a number. What happens?

Arguments

Arguments

are

values

and the

instructions

the function needs to run.

Objects can be passed into functions:

Challenge 12

plot(x, y) is a function that draws a graph of y as a function of x. It requires two arguments named

x

and

y

. What are the differences between the following lines?

Arguments

Arguments each have a

name

that can be provided during a function call.

If the name is not present, the

order

of the arguments does matter.

If the name is present, the

order

of the arguments does not matter.

log(8,

base=

2)

Argument name

Challenge 11

a) - Create a vector

a

that contains all numbers from 1 to 5

- Create an object

b

with value of 2

- Add

a

and

b

together using the basic

"+"

operator and save the result in an object called

result_add

- Add

a

and

b

together using the

sum()

function and save the result in an object called

result_sum

- Compare

result_add

and

result_sum

. Are they different?

b) Add 5 to

result_sum

using the sum() function.

> a <- 3

> b <- 4

> sum(a, b)

[1] 7

This is the

return value

of the function

Challenge 11

Solution

a)

[1] 17

[1] 3 4 5 6 7

> sum(result_sum, 5)

[1] 22

> a <- 1:5

> b <- 2

> result_add <- a + b

> result_sum <- sum(a, b)

> a <- 1:100

> b <- a^2

> plot(a, b)

> plot(b, a)

> plot(x=a, y=b)

> plot(y=b, x=a)

Challenge 12

Solution

> plot(a, b)

> plot(b, a)

The shape of the plot has changed, the order of the arguments is important

> plot(x=a, y=b)

> plot(y=b, x=a)

Same as plot(a, b)

Same as plot(a, b).

The argument name is provided,

the order is not important

Some common functions

sqrt

log

exp

max

min

sum

mean

sd

var

summary

plot

par

paste

format

head

length

str

names

typeof

class

attributes

library

ls

rm

setwd

getwd

file.choose

c

seq

rep

tapply

lapply

aggregate

merge

cbind

rbind

unique

help ?

help.search ??

help.start

Packages

Packages are a grouping of

functions

and/or

datasets

that share a similar

theme

.

Ex : statistics, spatial analysis, plotting...

Everyone can develop packages and make them available to others.

They are usually available through the Comprehensive R Archive Network (CRAN)

http://cran.r-project.org/web/packages/

Currently, more than 5877 package are publicly available.

> result_sum

The function sum() adds all values of a

and b. It is the same as doing 1 + 2 + 3

+ 4 + 5 + 2. The result is a

number

.

The operation on the vector adds 2 to each element. The result is a

vector

.

b)

> result_add

Packages

To install packages on your computer, use the function

install.packages()

> install.packages("ggplot2")

Installing a package is not enough to use it. You need to

load

it before each use using the

library()

function.

> qplot(1:10, 1:10)

Error: could not find function "qplot"

> library("ggplot2")

> qplot(1:10, 1:10)

WOW!!! R is SO great! So many functions to do what I want!!!

But... how do I find them?

To find a function that does something specific in your installed packages, you can use

??

followed by a search term

.

Let's say we want to create a

sequence

of odd numbers between 0 and 10. We will search in our packages all functions with the word "sequence" in them.

> ??sequence

Functions list

package_name::function_name

Function description

Getting help with functions

OK! So let's use the seq() function!!

But how does it work? What arguments does it need?

To find information about a function, use

?

function

.

> ?seq

Function name

Package name

Short description

How to call the function

List of arguments

If a

name = value

is present, a default value is provided if the argument is missing. The argument becomes

optional

.

Other functions described in this help page

Description of all the arguments and what they are used for

Detailed description of how the functions work and their characteristics

Description of the return value

Other related functions that can be useful

Challenge 13

a) Create a sequence of even numbers from 0 to 10 using the seq function

b) - Create an unsorted vector of your favourite numbers.

- Find out how to sort it using

?sort.

- Sort your vector in reverse order.

Challenge 13

Solution

a)

[1] 0 2 4 6 8 10

> numbers <- c(4, 55, 6, 22, 3)

> sort(numbers, decreasing=TRUE)

[1] 55 22 6 4 3

> seq(from=0, to=10, by=2)

b)

> seq(0, 10, 2)

We could also have written

[1] 0 2 4 6 8 10

Other ways to get help

Usually, your best source of information will be your favorite search engine (Google, Bing, Yahoo, etc.)

Here are some tips on how to use them efficiently:

Search in English

Use the keyword "R" at the beginning of your search

Define precisely what you are looking for

Learn to read discussion forums. Chances are that other people already had your problem and asked about it.

Don't hesitate to search again with different keywords!

Challenge 14

Find the appropriate functions to perform the following operations

a) Square root

b) Calculate the mean of numbers

c) Combine two dataframes by columns

d) List available objects

Challenge 14

Solution

a) sqrt

b) mean

c) cbind

d) ls

Vectors

> num.vector<-c(1,2,5,3,6,-2,4)

Numeric vector

Character vector

> col.vector<-c(

"

blue

"

,

"

red

"

,

"

green

"

)

> logic.vector<-c(TRUE,TRUE,FALSE)

Logical vector

Mode

Numeric

Character

Logical

Only numbers

Text or a mix of text and other modes

True/False entries

An entity consisting of a list of related values

A single value is called an atomic vector

All values of a vector must have the same

mode

Creating vectors usually requires the c() function

c stands for combine or concatenate

The format is: vector.name <- c(value1, value2, ...)

Challenge 7

Create a vector containing the first five odd numbers (starting from 1) and name it odd.n.

Challenge 7

Solution

> odd.n <- c(1,3,5,7,9)

[1] 1 3 5 7 9

A quick note on logical statements

<

<=

>

>=

==

!=

x | y

x & y

less than

less than or equal to

greater than

greater than or equal to

exactly equal to

not equal to

x OR y

x AND y

Operator

Description

Examples of logical statements

x2 <- c(1:5)

y2 <- c(1,2,-7,4,5)

x2 >= 3

[1] FALSE FALSE TRUE TRUE TRUE

x2 == y2

[1] TRUE TRUE FALSE TRUE TRUE

3 != 4

[1] TRUE

x2 > 2 & x2 < 5

[1] FALSE FALSE TRUE TRUE FALSE

R allows testing of logical statements,

i.e.

testing whether a statement is true or false

You need to use logical operators for that.

> my.first.df

You have to repeat x2!

> logic.vector<-c(T,T,F)

same thing!

> pi*5^2

Challenge 4

Solution

[1] 78.54

Question:

What is the area of a circle with a radius of 5 cm?

Alternative solution:

*Note that R has many built in constants you can use, such as "pi"

Note for Windows users

If the restriction:

"unable to write on disk"

appears when you try to open R-Studio, don't worry, we have a solution:

Right-click on your R-Studio icon and chose:

"Execute as administrator"

to open the program.

**Quebec Centre for Biodiversity Science**

R Workshop Series

R Workshop Series

**Workshop 1: Introduction to R**

**We want your feedback:**

https://docs.google.com/spreadsheet/ccc?key=0AhCQzc0AsZ0OdHZoWE1PUi1kNmttZV96VEViY0sxVEE#gid=0

https://docs.google.com/spreadsheet/ccc?key=0AhCQzc0AsZ0OdHZoWE1PUi1kNmttZV96VEViY0sxVEE#gid=0

**Thank you for attending!**