Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

QCBS R Workshop 1

No description
by

CSBQ QCBS

on 6 December 2014

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of QCBS R Workshop 1

Quebec Centre for Biodiversity Science

R Workshop Series

Workshop 1: Introduction to R
Website:
http://qcbs.ca/wiki/r/workshop1

What is R?
R is an
open source
programming language designed for statistical analysis, data mining, and data visualization.
It's open source
Improved by the public, for the public!
Free
Types of data structures in R
Vectors
Data frames
One of the most common objects in R
Vectors
Data Frames
Used to store data tables
Matrices, arrays and lists
Indexing objects
Sometimes, we only want to look at or extract part of our data.

This is done with brackets: [ ]

We indicate the position of values we want to see between brackets. This is called indexing!
Indexing Vectors
> num.vector[3]
[1] 5
> num.vector[-3]
[1] 1 2 3 6 -2 4
> num.vector[num.vector > 5]
[1] 6
> col.vector[col.vector == "blue"]
[1] "blue"
Objects
One of the most useful concepts in R!
object name <- assigned value
You can store values as named objects using the assignment operator: "<-"
Objects
letters
numbers
periods
underscore
Object names can only include:
a-z A-Z
0-9
.
_
Objects should always begin with a
letter
!
Challenge 5
Create an object with a value of 1 + 1.718282 (Euler's number) and name it euler.value
It is also possible to use the "=" sign, but this can cause problems as this sign is also used for other purposes.
Avoid it!
Objects
> mean.x <- (2+6)/2
> mean.x
[1] 4
Try having short & explicit names for your variables. Naming a variable "var" isn't very informative!
When typing the object's name, R returns its value.
Adding spaces before and after the "<-" is recommended because it adds clarity.
Indexing Data Frames
Challenge 9
Explore the difference between these 2 lines of code:
> col.vector[col.vector == "blue"]
> col.vector == "blue"
Also, the names
Data1
and
data1
are not the same. R is case-sensitive!
The value on the
right
is assigned to the name on the
left
with the assignment operator "<-"
> col.vector[c(1,3)]
[1] "blue" "green"
> col.vector[c(1,4)]
[1] "blue" NA
We specify two dimensions: row & column number
data.frame.name[row,column]
Some Examples
> my.first.df[1,]
> my.first.df[,3]
Extracts the first line
Extracts the third column
> my.first.df[2,4]
Extracts the second element of the fourth column
> my.first.df[c(2:4),]
Extracts lines 2 to 4
> my.first.df$Site_ID
Extracts the variable "Site ID" from the data frame with $ sign
> my.first.df[c("Site_ID","soil.pH")]
Extracts the "Site ID" and "pH" variables from the data frame
Intro & R as a calculator
Objects and indexing
Functions
Getting Help & Additional resources
Some Useful R Books
A list of vectors of the same length
Columns = variables
Rows = observations, cases, sites, replicates...
Different
modes can be stored
***
The first four examples are also valid for indexing matrices.
Site_ID
soil pH
A1.01
A1.02
B1.01
B1.02
5.6
7.3
4.1
6.0
# of sp.
17
23
15
7
Treatment
Fert
Fert
No.Fert
No.Fert
Data Frames
We can use vectors for calculations.
Vectors
> x <- 1:5
> y <- 6
> x+y
[1] 7 8 9 10 11
> x*x
[1] 1 4 9 16 25
Operations are executed on each item.
Let's say we want to have this data frame in R:
Data Frames
We start by creating vectors:
> Site_ID<-c("A1.01","A1.02","B1.01","B1.02")
> soil.pH<-c(5.6,7.3,4.1,6.0)
> Treatment<-c("Fert","Fert","No.Fert","No.Fert")
> num.sp<-c(17,23,15,7)
We then combine them to create a data frame:
> my.first.df<-data.frame(Site_ID,soil.pH,num.sp,Treatment)
data.frame() is a function. We will come back to functions later.
Challenge 10
a) Extract the num.sp column from
my.first.df
and multiply its values by the first four values of
num.vec
.
b) After that, write a statement that checks if the values you obtained are greater than 25.
Why use R?
Why use R?
Why use R?
What people have traditionally done to analyze their data:
R allows you to do everything with one program!
Why use R?
More and more scientists use it every year!
Increasing capacities
Why use R?
It's compatible
R works on most existing operating systems
Challenges
Throughout these workshops you will be presented with a series of challenges that will be indicated by these rubiks cubes

During challenges, collaborate with your neighbours!
Challenge 1
Open R-Studio
The R Studio console
> input
How to read the console
[1] This is the output
> input
How to read the console
[1] This is the output
> input
How to read the console
[1] This is the output
> 1+1
Using the console as a calculator
[1] 2
Challenge 3
> 2+16*24-56/(2+1)-457
Challenge 3
Solution
[1] -89.66667
What does this bracket in the output mean ?
These brackets help you locate "where" you are in the output
[1] 1 2 3 4 5
[6] 6 7 8 9 10
> 2*2
[1] 4
> 10-1
[1] 8
[1] 9
> 2^3
Addition and subtraction:
Multiplication
and division:
> 8/2
[1] 4
Exponents:
Use R Studio to calculate the following skill testing question:
2 + 16 x 24 - 56/ (2+1) - 457
Hints:
think about the order of operations (PEMDAS)
Question:

2 + 16 x 24 - 56/ (2+1) - 457
Solution:


*Note that R follows the order of operations
Functions
A function is a tool used to simplify your life

It allows you to quickly execute operations on objects without having to write every mathematical step

A function needs entry values called
arguments
(or parameters). It then performs hidden operations on these arguments and gives a
return value
.
Functions
To use a function (call), the command must be structured properly, following the "grammar rules" of the R language (syntax)
> sum(1, 2)
Function name
Parenthesis
Argument 1
Argument 2
Comma
Course outline
Some Useful R Websites
http://stats.stackexchange.com
Challenge 2
Use R Studio to calculate the following skill testing question:
2 + 16 x 24 - 56
> 2+16*24-56
Challenge 2
Solution
[1] 330
Question:

2 + 16 x 24 - 56
Solution:


Hints:
the * symbol is used to multiply
Challenge 4
What is the area of a circle with a radius of 5cm?
> 3.1416*5^2
Challenge 4
Solution
[1] 78.54
Question:

What is the area of a circle with a radius of 5 cm?
Solution:


*Note there is no need to use parenthesis
Challenge 5
Solution
Question:
Create an object with a value of 1+ 1.718282 (Euler's number) and name it euler.value
Solution:
> euler.value <- 1 + 1.718282
[1] 2.718282
> euler.value
R Command line tip
Use the "up" and "down" arrow keys to reproduce previous commands
Give it a try!
You have to push "enter" for the output to appear
R Command line tip
Use the tab key to auto-complete scripts

This helps avoid spelling errors and speeds up command entering
R Command line tip
Enter:
> eu
Push "Tab"


Push "enter" to select the correct auto complete


Let's try it!
Challenge 9
Solution
> col.vector[col.vector == "blue"]
> col.vector == "blue"
In this line of code, you test a
logical statement
. For each entry in the "col.vector" vector, R checks whether the entry is equal to "blue" or not.
In this line of code, you ask R to extract all values within the "col.vector" vector that are exactly equal to "blue".
Challenge 8
a) Extract the 4th value of the "num.vector" vector

b) Extract the 1st and 3rd values of the "num.vector" vector

c) Extract all values from the "num.vector" vector except for the 2nd and 4th values
Challenge 8
Solution
a)
[1] 3
> num.vector[4]
b)
[1] 1 5
> num.vector[c(1,3)]
c)
[1] 1 5 6 -2 4

> num.vector[c(-2,-4)]
Challenge 10
Solution
Part 1
Part 2
> my.first.df$num.sp * num.vector[c(1:4)]
[1] 17 46 75 21
> (my.first.df$num.sp * num.vector[c(1:4)]) > 25
[1] FALSE TRUE TRUE FALSE
https://www.zoology.ubc.ca/~schluter/R/
http://www.statmethods.net/
http://www.rseek.org/
Getting help with functions
Challenge 6
Solution
Question:
Create a second object (you decide the value) with a name that starts with a number. What happens?
Solution:
Creating an object with a name that starts with a number will return the following error:
unexpected symbol in "your object name"
http://cran.r-project.org/doc/contrib/Baggott-refcard-v2.pdf
http://www.cookbook-r.com/
Challenge 6
Create a second object (you decide the name) with a name that starts with a number. What happens?
Arguments
Arguments
are
values
and the
instructions
the function needs to run.

Objects can be passed into functions:
Challenge 12
plot(x, y) is a function that draws a graph of y as a function of x. It requires two arguments named
x
and
y
. What are the differences between the following lines?
Arguments
Arguments each have a
name
that can be provided during a function call.

If the name is not present, the
order
of the arguments does matter.
If the name is present, the
order
of the arguments does not matter.
log(8,
base=
2)
Argument name
Challenge 11
a) - Create a vector
a
that contains all numbers from 1 to 5
- Create an object
b
with value of 2
- Add
a
and
b
together using the basic
"+"
operator and save the result in an object called
result_add
- Add
a
and
b
together using the
sum()
function and save the result in an object called
result_sum
- Compare
result_add
and
result_sum
. Are they different?

b) Add 5 to
result_sum
using the sum() function.
> a <- 3
> b <- 4
> sum(a, b)
[1] 7
This is the
return value
of the function
Challenge 11
Solution
a)
[1] 17
[1] 3 4 5 6 7
> sum(result_sum, 5)
[1] 22
> a <- 1:5
> b <- 2
> result_add <- a + b
> result_sum <- sum(a, b)
> a <- 1:100
> b <- a^2
> plot(a, b)
> plot(b, a)
> plot(x=a, y=b)
> plot(y=b, x=a)
Challenge 12
Solution
> plot(a, b)
> plot(b, a)
The shape of the plot has changed, the order of the arguments is important
> plot(x=a, y=b)
> plot(y=b, x=a)
Same as plot(a, b)
Same as plot(a, b).
The argument name is provided,
the order is not important
Some common functions
sqrt
log
exp
max
min
sum
mean
sd
var
summary
plot
par
paste
format
head
length
str
names
typeof
class
attributes
library
ls
rm
setwd
getwd
file.choose
c
seq
rep
tapply
lapply
aggregate
merge
cbind
rbind
unique

help ?
help.search ??
help.start
Packages
Packages are a grouping of

functions
and/or
datasets
that share a similar
theme
.
Ex : statistics, spatial analysis, plotting...

Everyone can develop packages and make them available to others.

They are usually available through the Comprehensive R Archive Network (CRAN)
http://cran.r-project.org/web/packages/

Currently, more than 5877 package are publicly available.
> result_sum
The function sum() adds all values of a
and b. It is the same as doing 1 + 2 + 3
+ 4 + 5 + 2. The result is a
number
.

The operation on the vector adds 2 to each element. The result is a
vector
.
b)
> result_add
Packages
To install packages on your computer, use the function
install.packages()
> install.packages("ggplot2")
Installing a package is not enough to use it. You need to
load
it before each use using the
library()
function.
> qplot(1:10, 1:10)
Error: could not find function "qplot"
> library("ggplot2")
> qplot(1:10, 1:10)
WOW!!! R is SO great! So many functions to do what I want!!!

But... how do I find them?

To find a function that does something specific in your installed packages, you can use
??
followed by a search term
.
Let's say we want to create a
sequence
of odd numbers between 0 and 10. We will search in our packages all functions with the word "sequence" in them.

> ??sequence
Functions list
package_name::function_name
Function description
Getting help with functions
OK! So let's use the seq() function!!

But how does it work? What arguments does it need?

To find information about a function, use
?
function
.

> ?seq
Function name
Package name
Short description
How to call the function
List of arguments

If a
name = value
is present, a default value is provided if the argument is missing. The argument becomes
optional
.

Other functions described in this help page
Description of all the arguments and what they are used for
Detailed description of how the functions work and their characteristics
Description of the return value
Other related functions that can be useful
Challenge 13
a) Create a sequence of even numbers from 0 to 10 using the seq function

b) - Create an unsorted vector of your favourite numbers.
- Find out how to sort it using
?sort.
- Sort your vector in reverse order.

Challenge 13
Solution
a)
[1] 0 2 4 6 8 10
> numbers <- c(4, 55, 6, 22, 3)
> sort(numbers, decreasing=TRUE)
[1] 55 22 6 4 3
> seq(from=0, to=10, by=2)
b)
> seq(0, 10, 2)
We could also have written
[1] 0 2 4 6 8 10
Other ways to get help
Usually, your best source of information will be your favorite search engine (Google, Bing, Yahoo, etc.)

Here are some tips on how to use them efficiently:
Search in English
Use the keyword "R" at the beginning of your search
Define precisely what you are looking for
Learn to read discussion forums. Chances are that other people already had your problem and asked about it.
Don't hesitate to search again with different keywords!
Challenge 14
Find the appropriate functions to perform the following operations

a) Square root
b) Calculate the mean of numbers
c) Combine two dataframes by columns
d) List available objects
Challenge 14
Solution
a) sqrt

b) mean

c) cbind

d) ls
Vectors
> num.vector<-c(1,2,5,3,6,-2,4)
Numeric vector
Character vector
> col.vector<-c(
"
blue
"
,
"
red
"
,
"
green
"
)
> logic.vector<-c(TRUE,TRUE,FALSE)
Logical vector
Mode
Numeric
Character
Logical
Only numbers
Text or a mix of text and other modes
True/False entries
An entity consisting of a list of related values
A single value is called an atomic vector
All values of a vector must have the same
mode
Creating vectors usually requires the c() function
c stands for combine or concatenate
The format is: vector.name <- c(value1, value2, ...)
Challenge 7
Create a vector containing the first five odd numbers (starting from 1) and name it odd.n.
Challenge 7
Solution
> odd.n <- c(1,3,5,7,9)
[1] 1 3 5 7 9
A quick note on logical statements
<
<=
>
>=
==
!=
x | y
x & y
less than
less than or equal to
greater than
greater than or equal to
exactly equal to
not equal to
x OR y
x AND y
Operator
Description
Examples of logical statements
x2 <- c(1:5)
y2 <- c(1,2,-7,4,5)
x2 >= 3
[1] FALSE FALSE TRUE TRUE TRUE
x2 == y2
[1] TRUE TRUE FALSE TRUE TRUE
3 != 4
[1] TRUE
x2 > 2 & x2 < 5
[1] FALSE FALSE TRUE TRUE FALSE
R allows testing of logical statements,
i.e.
testing whether a statement is true or false
You need to use logical operators for that.
> my.first.df
You have to repeat x2!
> logic.vector<-c(T,T,F)
same thing!
> pi*5^2
Challenge 4
Solution
[1] 78.54
Question:

What is the area of a circle with a radius of 5 cm?
Alternative solution:


*Note that R has many built in constants you can use, such as "pi"
Note for Windows users
If the restriction:
"unable to write on disk"
appears when you try to open R-Studio, don't worry, we have a solution:
Right-click on your R-Studio icon and chose:
"Execute as administrator"
to open the program.

Quebec Centre for Biodiversity Science

R Workshop Series

Workshop 1: Introduction to R
We want your feedback:
https://docs.google.com/spreadsheet/ccc?key=0AhCQzc0AsZ0OdHZoWE1PUi1kNmttZV96VEViY0sxVEE#gid=0

Thank you for attending!
Full transcript