만나보세요 

Prezi AI.

새로운 프레젠테이션 도우미가 기다리고 있어요.

뎌욱 빠르게 컨텐츠를 다듬고, 보강하고, 편집하고, 원하는 이미지를 찾고, 시각자료를 편집하세요.

로딩중
스크립트

4.3. Principal Coordinates Analysis

  • The Shepard diagram and stress values can be obtained from:

Overview

0. Intro

Methods for scientific research

Follow along

Required packages

Intro

Ordination in reduced space

4, 5, 6 + Dimensions

Famous 1D Example

2 Dimensions

3 Dimensions

Ordination in reduced space

Questions / Hypothesis

Experimental design

Data collection

> install.packages("vegan")

> install.packages("gclus")

> install.packages("ape")

> require(vegan)

> require(gclus)

> require(ape)

> source(file.choose()) #coldiss.R

Ordination?

...

All the cool kids are doing it, what does it mean?

Transformation / Distance

?

  • Have you ever "ordinated"?
  • What kind of data?
  • Community composition?
  • Genetic?
  • What software?
  • Canoco?
  • R? vegan? ecodist? ape?

Recommendations:

  • create your own new script
  • refer to provided code only if needed
  • avoid copy pasting or running the code directly from script

Analysis

Data and R code available at:

qcbs.ca/wiki/r/r_workshop9

Navigate to the wiki page and download:

  • Rscript
  • DoubsEnv.csv
  • DoubsSpe.csv
  • coldiss.R

Redaction

  • Matrix algebra is complex and hard to understand
  • A global understanding is enough in order to use ordination methods adequately

You !

# Remember the directory!

Communication...

ape

gclus

vegan

What if we are interested in this response for different species of algae involved in the algal bloom density?

1. Exploring Data

2.2 Transforming community data

decostand()

Doubs Environmental Data

Doubs River Fish Dataset

Explore Doubs Dataset

Species Frequencies

Total Species Richness

Species Frequencies

Transforming your community data - considerations

Transforming your community data - examples

Understand your data to choose the appropriate transformation and distance

Visualize how many species are present at each site:

Explore the content of the environmental data:

> ?decostand

Load the Doubs River species data (DoubsSpe.csv):

Take a look at the distribution of species frequencies:

How many zeros?

  • Transforming counts into presence-absence

Explore the content of the fish community dataset:

> sum(spe==0)

> spe <- read.csv(file.choose(), row.names=1)

> spe <- spe[-8,] #remove site with no data

> site.pres <- rowSums(spe>0)

> barplot(site.pres, main="Species richness",

las=1, xlab="Sites",

ylab="Number of species", col="grey")

Standardizing environmental variables is crucial as you cannot compare the effects of variables with different units:

> (ab=table(unlist(spe)))

> barplot(ab, las=1, xlab=”Abundance class”,

ylab=”Frequency”, col=grey(5:0/5))

Are there many zeros ?

What do they mean ?

> spe.pa <- decostand(spe, method="pa")

It is important that you understand the strengths and limitations of your community data prior to computing ordinations, and standardize and/or transform accordingly:

> names(env) #names of objects

> dim(env) #dimensions

> str(env) #structure of objects

> head(env) #first 5 rows

> summary(env) #summary statistics

408!

MUST BE STANDARDIZED !

> env.z <- decostand(env, method="standardize")

  • Reducing the weight of rare species:

Load the Doubs River environmental data (DoubsEnv.csv):

> names(spe) #names of objects

> dim(spe) #dimensions

> str(spe) #structure of objects

> head(spe) #first 5 rows

> summary(spe) #summary statistics

1D represention

?rarefy()

What proportion of zeros?

> spe.hel <- decostand(spe, method="hellinger")

> spe.chi <- decostand(spe, method="chi.square")

Explore collinearity by visualizing correlations between variables:

> env <- read.csv(file.choose(), row.names=1)

> env <- env[-8,] #remove site with no data

Verneaux (1973) dataset:

  • characterization of fish communities
  • 27 different species
  • 30 different sites
  • 11 environmental variables

?diversity()

This centers and scales the variables to make your downstream analysis more appropriate:

A measured 0 (i) is not the same than a 0 representing an absence (ii).

(i): 0 mg/L, 0°C ...

(ii): real absence, phenology disturbance, season ...

> sum(spe==0)/(nrow(spe)*ncol(spe))

  • relative abundances/counts/presence-absence?
  • asymmetrical distributions?
  • many rare species?
  • overabundance of dominant species?
  • double zero problem?

> pairs(env, main="Bivariate Plots of the Environmental Data" )

  • Reducing the weight of very abundant species:

> apply(env.z, 2, mean)

> apply(env.z, 2, sd)

# PROCEED WITH CAUTION ONLY EXECUTE ONCE

>50% !

Distance metrics are very useful for exploring your data, but ordinations preserve more information contained in your dataset

> spe.pa <- decostand(spe, method="log")

2. Similarity/Dissimilarity/Clustering

Schematic diagram of multivariate analysis

Breaking Out of 1D

Similarity/Dissimilarity

Solution #1

2.1 Association measures

Challenge #1

Comparing Doubs Sites

Community distance measures

Matrix X

(envi var.)

Matrix Y

(communauty)

And what about ordination ?

Comparing Doubs Sites

Visualization of distance matrices

standardized / transformed / normalized data

  • Matrix algebra is at the heart of all ordinations

Suggestions?

  • Useful to understand your dataset
  • Appropriate measure required by some types of ordinations

The vegdist() function contains all common distances:

  • Euclidean
  • Manhattan
  • Chord
  • Hellinger
  • Chi-square
  • Bray-Curtis

[Workshop 10]

[Workshop 9]

Discuss with your neighbour: How can we tell how similar objects are when we have multivariate data?

Make a list of all your suggestions.

Indirect

(unconstrained)

-exploratory-

Direct

(constrained)

-analytic-

> ?vegdist

Similarity: S = 1 - D

Distance: D = 1 - S

Y and/or X

Y ~ X

?

  • As you have seen, ecological datasets can sometimes be very be large matrices

  • Ordinations compute the relationships between species or between sites.

  • We can simplify these relationships using measures of dissimilarity

# Each of these will be useful in different situations

How different is the community composition across the 30 sites of the Doubs River?

With ordination methods, we order our objects (site) according to their similarity.

The more the sites are similar, the closer they are in the ordination space (small distance).

In ecology, we usually calculate the similarity between sites according to their species composition or their environmental conditions.

PCA

CA

RDA

CCA

Clustering

MRT

  • Exploring various measures of distance between objects provides some understanding of the engine under the hood

> spe.db.pa <- vegdist(spe,method="bray")

PCoA

nMDS

LDA

Quebec Centre for Biodiversity Science

R Workshop Series

3. Clustering

Workshop 9: Multivariate analyses

How to choose the right method?

Comparison

Ward's method

Complete linkage clustering

Single linkage clustering

Comparison

Hierarchical methods

Clustering

Ward's minimum variance method

Overview of 3 hierarchical methods

  • A distance matrix is first sorted in increasing distance order
  • Compute the Ward's minimum variance clustering:
  • Create a distance matrix from Hellinger transformed Doubs river data
  • the two closest objets merge
  • the next two objects/cluster will agglomerate when linked to the furthest element of the group
  • the two closest objects merge
  • the next two closest objects/clusters merge
  • and so on.

> spe.dhel.ward <- hclust(spe.dhel, method="ward.D2")

  • Elements of lower clusters become members of larger, higher ranking clusters
  • e.g. species, genus, family, order...

> spe.dhel <- vegdist(spe.hel, method="euclidean")

  • Single linkage agglomerative clustering
  • Complete linkage agglomerative clustering
  • Ward's minimum variance clustering
  • To highlight structures in the data by partitioning either objects or the descriptors
  • Results are represented as dendrograms (trees)
  • Not a statistical method!
  • Uses the criterion of least squares to cluster objects into groups
  • at each step, the pair of clusters merging is the one leading to the minimum increase in total within-group sum of squares
  • Plot the dendrogram by using the square root of the distances:
  • Depends on the objective:
  • highlight gradients? contrasts?
  • If more than one method seems appropriate --> compare dendrograms
  • Again: this is not a statistical method
  • But! It is possible to:
  • determine the optimal number of interpretable clusters
  • compute clustering statistics
  • combine clustering to ordination to distinguish groups of sites
  • Compute the single linkage clustering:

> spe.dhel.single <- hclust(spe.dhel, method="single")

> plot(spe.dhel.single)

> spe.dhel.ward$height <- sqrt(spe.dhel.ward$height)

> plot(spe.dhel.ward, hang=-1)

Complete linkage

Contrasted groups are formed

  • Compute the complete linkage clustering:

hang=-1 aligns objects at the same level (at distance 0)

Single linkage

Chains of objects occur (e.g. 19, 29, 30, 20, 26, etc.)

Clusters generated using this method tend to be more spherical and to contain similar numbers of objects

> spe.dhel.complete <- hclust(spe.dhel, method="complete")

> plot(spe.dhel.complete)

Website: http://qcbs.ca/wiki/r_workshop9

4. Unconstrained ordination

Définitions

Schematic diagram of multivariate analysis

More definitions

Unconstrained Ordination

Matrix X

(envi var.)

Matrix Y

(communauty)

Eigenvalues : Proportion of variance (dispersion) represented by one ordination axe.

standardized / transformed / normalized data

Variance : measure of a variable yj dispersion from its mean

Orthogonality = right angle between 2 axis or 2 arrows which means that these 2 are independent = non correlated.

[Workshop 10]

[Workshop 9]

Indirect

(unconstrained)

-exploratory-

Direct

(constrained)

-analytic-

Covariance : measure of co-dispersion of 2 variables yj et yk from their means

Y and/or X

Y ~ X

Score = position of a dot on an axis. All the scores of a dot give its coordinates in the multidimensional space. They can be used as new variable for other analyses (e.g. linear combination of measured variables).

  • Assess relationships within a set of variables (species or environmental variables, not between sets, i.e. constrained analysis).

  • Find key components of variation between samples, sites, species, etc...

  • Reduce the number dimensions in multivariate data without substantial loss of information.

  • Create new variables for use in subsequent analysis (such as regression)

PCA

CA

RDA

CCA

Correlation : measure of the link strength between 2 variables : rij = (σij / σj . σk)

Dispersion (inertia) : Measure of the total variability of the scatter plot (descriptors) in the multidimensional space with regards to its center of gravity.

Clustering

MRT

PCoA

nMDS

LDA

QUIZ TIME !!

4.1. Principal Components Analysis

Prime time 4 quiz time

1. What does PCA stand for?

a) Principal Coordinate Analysis

b) Principle Compound Analysis

c) Principal Component Analysis

d) Principle Component Analysis

You !

3.1 Principal Component Analysis

PCA - What you need

PCA - Walkthrough

PCA - Let's try it on Fish Species!

PCA - Interpretation of Output

PCA - Multidimensional case

PCA - Walkthrough

PCA - Interpretation of Output

  • A set of variables that are response variables (e.g. community composition) OR explanatory variables (e.g. environmental variables)
  • For both PCA and RDA, we will be using the rda() function in the vegan package:

NOT BOTH!

> ?rda

  • PC1 --> axis that maximizes the variance of the points that are projected perpendicularly onto the axis.
  • PC2 --> must be perpendicular to PC1, but the direction is again the one in which variance is maximized when points are perpendicularly projected.
  • PC3 --> and so on: perpendicular to the first two axes.
  • List of the eigenvalues associated to each Principal Component (in this output there are 27 PCs, as this is the number of dimensions in the data)
  • Run a PCA on the Hellinger-transformed fish data

> spe.h.pca <- rda(spe.hel)

  • Preserves, in 2D, the maximum amount of variation in the data
  • The resulting synthetic variables are orthogonal (and therefore uncorrelated),

*An eigenvalue is the value of the change in the length of a vector, and for our purposes is the amount of variation explained by Principal Component.

  • List of the proportion of variance explained by each Principal Component (as well as cumulative)
  • Samples that are measured for the same set of variables
  • Generally a dataset that is longer than it is wide is preferred
  • Extract the results:
  • Total variance explained by the descriptors (here the fish species)
  • In PCA, note that the "Total" and "Unconstrained" portion of the explained variance is identical

When there are more than two dimensions, PCA produces a new space in which all PCA axes are orthogonal (i.e. correlation among any two axes is null) and where the PCA axes are ordered according to the percent of the variance of the original data they explain.

# NOTE THE SPELLING ;)

The final plot then is the two PC axes rotated where the axes are now principal components as opposed to species

51.3% of 0.5025 is 0.258

0.258 + 0.064 + .... = 0.5025 Total explained variance

> summary(spe.h.pca)

In 2D, we would plot the sites like this... Notice the dispersion in the scatterplot

Our first component is essentially drawn through the maximum amount of observed variation... or the best fit line through the points

A simplified example...

A second principal component is then added perpendicular (90 degrees in 2D) to the first axis...

Accessing Parts of the Output

Selecting Significant PCs

Kaiser-Guttman criterion

PCA - environmental variables

PCA - Interpretation of Output

PCA - Visualization

Prime time 4 quiz time

PCA - environmental variables

Kaiser-Guttman criterion -visualization

PCA - Interpretation of Output

  • Extract the eigenvalues associated to the PCs:

2. Which of the following is the best way to visualize the distances between the community composition of many sites?

a) Principal Component Analysis

b) Principal Coordinate Analysis

c) Principal Response Curves

d) Redundancy Analysis

You !

Selects principal components which capture more variance than the average of all PCs.

We can also run PCAs on standardized environmental variables, to compare sites for example, or how variables are correlated...

You may wish to extract the scores (either from species or sites) for use in subsequent analysis or for plotting purposes :

The output is very dense, but you can access specific information if needed:

> ev <- env.pca$CA$eig

The strength of PCA is that we can condense the variance contained in a huge dataset into a set of synthetic variables that is manageable.

The abundance of information produced by PCA is easier to understand and interpret using biplots to visualize patterns.

  • Extract the eigenvalues associated to the PCs:
  • Select the all eigenvalues above average:

> n <- length(ev)

> barplot(ev, main="Eigenvalues",

col="grey", las=2)

> abline(h=mean(ev), col="red")

> legend("topright", "Average eigenvalue",

lwd=1, col=2, bty="n")

  • Access the species scores along the 1st and 2nd PC:
  • Run a PCA on the standardized environmental variables
  • Access the eigenvalues associated contribution to variance:

> ev <- spe.h.pca$CA$eig

> ev[ev>mean(ev)]

> env.pca <- rda(env.z)

  • We can produce a quick biplot of the species data using function "plot()" in base R:

> spe.scores <- scores(spe.h.pca,

display="species",

choices=c(1,2))

> summary(spe.h.pca, display=NULL)

  • Plot:

In our case there are still 27 Principal Components, but only the first few account for any significant amount of variance, while the rest can simply be discarded as noise...

  • Select the all eigenvalues above average:
  • There are two ways to represent an ordination in 2D, here the output is informing us that it used the default scaling, which is type 2...
  • Species refers to your descriptors (i.e. the columns in your dataset), which here are the Fish species
  • Scores refer to the position of every species in the PC, they are essentially the coordinates of each species along a principal component...

> ev[ev>mean(ev)]

  • Extract the results
  • Calculate the eigenvalues from scratch:

> plot(spe.h.pca)

  • Calculate the eigenvalues from scratch:

> n <- length(ev)

> barplot(ev, main="Eigenvalues",

col="grey", las=2)

> abline(h=mean(ev), col="red")

> legend("topright", "Average eigenvalue",

lwd=1, col=2, bty="n")

  • Site refers to your the rows in your dataset, which here are the different sites along the Doubs river (but it can be points in time, etc...)
  • Scores are essentially the coordinates of each species along a principal component...

> eigen(cov(spe.hel))

> summary(env.pca)

> summary(env.pca, scaling=2) #default

How do we manage this?

More on this later!

> site.scores <- scores(spe.h.pca,

display="sites",

choices=c(1,2))

PCA basic biplot with plot()

PCA basic biplot with biplot()

PCA scaling types

Scaling 2 - Scaling 1

Advanced "biplotting"

Solution #3

EASTER EGG: ggvegan

Solution #3

EASTER EGG: rgl and vegan3d

WARNINGS

Solution #3

Challenge #3

  • A set of tools for producing biplots using ggplot2:
  • Interactive 3D biplots using rgl
  • Hellinger transform the species data:
  • Using the function biplot() from base R, arrows are plotted to show the directionality and angle of the descriptors in the ordination:
  • By extracting specific parts of the PCA output, we can build more detailed and aesthetic plots:
  • Type 1 scaling: attempts to preserves the Euclidean distance (in multidimensional space) among objects (sites); the angles among descriptor (species) vectors are meaningless. Best for interpreting relationships among objects (sites)!

> mite.spe.hel <- decostand(mite.spe, method="hellinger")

> require(rgl)

> require(vegan3d)

> ordirgl(spe.h.pca)

Using everything you have learned to execute a PCA on the mite species abundance data:

> install.packages("devtools")

> require("devtools")

> install_github("ggvegan", "gavinsimpson")

> require(ggvegan)

> autoplot()

  • PCA is a linear method and thus relies on a few assumptions:
  • multinormal distribution of the data (only if you wish to make inferences)
  • not too many zeros
  • the gradient of interest is causing the majority of the variance in the dataset

> biplot(spe.h.pca)

  • Compute PCA:

> data(mite)

> mite.spe.h.pca <- rda(mite.spe.hel)

Prime time 4 quiz time

> plot(spe.h.pca, scaling=1, type="none",

xlab = c("PC1 (%)", round((spe.h.pca$CA$eig[1]/sum(spe.h.pca$CAeig))*100,2)),

ylab = c("PC2 (%)", round((spe.h.pca$CA$eig[2]/sum(spe.h.pca$CAeig))*100,2)))

> points(scores(spe.h.pca, display="sites", choices=c(1,2), scaling=1),

pch=21, col="black", bg="steelblue", cex=1.2)

> text(scores(spe.h.pca, display="species", choices=c(1), scaling=1),

scores(spe.h.pca, display="species", choices=c(2), scaling=1),

labels=rownames(scores(spe.h.pca, display="species", scaling=1)),

col="red", cex=0.8)

Prime time 4 quiz time

  • Check significant axes using the Guttman-Kaiser criterion:

4. Spot what is sketchy!

3. What does an eigenvalue represent in

PCA?

a) A measure of statistical significance

b) The proportion of variance explained by a principal component

c) Coordinates of objects in reduced space

d) The answer to life, the universe, and everything else (aka 42)

  • THE BASIC PLOT FUNCTION IS QUICK BUT IT IS HARD TO INTERPRET THE ANGLES BETWEEN THE SPECIES

You !

  • Type 2 scaling (DEFAULT): distances among objects are not approximations of Euclidean distances; angles between descriptor (species) vectors reflect their correlations. Best for interpreting relationships among descriptors (species)!

You !

  • Descriptors at 180 degrees of each other are negatively correlated
  • Descriptors at 90 degrees of each other have zero correlation
  • Descriptors at 0 degrees of each other are positively correlated

Prime time 4 quiz time

5. Spot what is sketchy!

# see the arrows() function in {graphics} to add vectors

> ev <- mite.spe.h.pca$CA$eig

> ev[ev>mean(ev)]

> n <- length(ev)

> barplot(ev, main="Eigenvalues", col="grey", las=2)

> abline(h=mean(ev), col="red")

> legend("topright", "Average eigenvalue",

lwd=1,col=2, bty="n")

# Violation of these can cause a horseshoe shape in your biplots, where opposite ends of the horseshoe are close together but in reality represent opposite ends of a gradient

You !

> spe.sc<-scores(spe.h.pca,choices=1:2,scaling=,display="sp")

> arrows(0,0,spe.sc[,1],spe.sc[,2],length=0)

  • What are the significant axes?
  • Which groups of sites can you identify?
  • Which groups of species are related to these groups of sites?

WARNINGS

Live Long and Ordinate

  • We can avoid some of these problems in PCA by choosing appropriate transformations for your community composition data or environmental data
  • In some cases, such as studies that cover very large environmental gradients, it may be appropriate to use other types of unconstrained ordinations

4.2. Correspondence Analysis

Challenge #4

Solution #4

CA: R output

CA: interpretation of results

CA: interpretation of biplots

CA: biplots

Solution #4

How to run a CA ?

Euclidean vs Chi2 distances

Principles of CA

  • Compute PCA:
  • CA results are presented in the same way as PCA results and call be called using:

Using everything you have learned to execute a CA on the mite species abundance data:

> mite.spe.ca <- rda(mite.spe)

  • CA is implemented in the vegan package using function cca:

> data(mite)

  • In such cases, CA should be prefered compared to PCA as it preserves Chi2 distances between sites...and thus better represents unimodal relationships
  • Check significant axes using the Guttman-Kaiser criterion:
  • 26 CA axes identified

  • % CA1= 51.50%

  • %CA2=12.37%
  • These biplots show that a group of sites located in the left part with similar fish community characterized by numerous species such as GAR, TAN, PER, ROT, PSO and CAR; in the upper right corner, an other site cluster characterized by the species LOC, VAI and TRU is identified; the last site cluster in the lower right corner of the biplot is characterized by the species BLA, CHA and OMB.

> spe.ca <- cca(spe)

> summary(spe.ca)

  • PCA preserves euclidean distances between objects, and thus postulates linear relationships between species, and between species and environmental gradients

  • ...but in some cases, species instead present unimodal responses to environmental gradients

> ev <- mite.spe.ca$CA$eig

> ev[ev>mean(ev)]

> n <- length(ev)

> barplot(ev, main="Eigenvalues", col="grey", las=2)

> abline(h=mean(ev), col="red")

> legend("topright", "Average eigenvalue",

lwd=1,col=2, bty="n")

  • CA on fish species abundances
  • What are the significant axes?
  • Which groups of sites can you identify?
  • Which groups of species are related to these groups of sites?

PCoA and non-metric distances

PCoA biplot with biplot.pcoa()

Solution #5

Challenge #5

Solution #5

Principal Coordinate Analysis

PCoA - Let's try it on Fish Species!

PCoA - Interpretation of Output

Solution #5

  • Build a biplot to visualize the data:
  • PCoA can also be used to capture information contained in non-metric distances, such as the popular Bray-Curtis distance. Let's give it a try:

... 27

  • Now let's visualize this using a biplot without the species (more common approach for PCoA):
  • We can display, in 2D, the distances between sites using the biplot.pcoa() function, as well as the species associated to each site (though this is less common):
  • For computing PCoA, we can use the cmdscale() or the pcoa() functions from the {stats} and {ape} packages:

> biplot.pcoa(mite.spe.h.pcoa, mite.spe.hel)

  • Hellinger transform the species data:

> biplot.pcoa(spe.bray.pcoa)

Execute a PCoA on the Hellinger-transformed mite species abundance data:

> spe.bray.pcoa <- pcoa(spe.db)

> spe.bray.pcoa

> biplot.pcoa(spe.h.pcoa, spe.hel)

> mite.spe.hel <- decostand(mite.spe, method="hellinger")

> ?cmdscale

> ?pcoa

... 27

... 30

  • Compute PCoA:
  • Run a PCoA on the Hellinger-distances of the fish dataset:

> mite.spe.h.pcoa<-pcoa(dist(mite.spe.hel))

  • Vectors: the eigenvectors associated to each eigenvalue contain the coordinates, in Euclidean space, of each site.
  • Examine the output and notice the negative eigenvalues. This is because non-metric distances cannot be represented in Euclidean space without corrections (see Legendre & Legendre 2012 for more details on this):

> spe.h.pcoa <- pcoa(dist(spe.hel))

  • What are the significant axes?
  • Which groups of sites can you identify?
  • Which groups of species are related to these groups of sites?
  • How do the PCoA results compare with the PCA results?
  • In PCA, we preserve the maximum amount of variation in the data
  • In PCoA, we preserve as best we can in 2D the (Euclidean) distances between each object in multidimensional space

# These are the most useful for subsequent analysis as they capture the distance among objects very well.

  • Eigenvalues
  • Relative Eigenvalues
  • Broken stick model: to evaluate which axes are significant
  • Cumulative eigenvalues: cumulative value of the relative eigenvalues
  • Cumulative broken stick: cumulative value of the broken stick model
  • Extract the results:

> spe.bray.pcoa <- pcoa(spe.db,

correction="cailliez")

> spe.bray.pcoa

> spe.h.pcoa

# PCoA CAN BE ESPECIALLY USEFUL WHEN

THE DATASET IS WIDER THAN IT IS LONG

(TYPICAL PROBLEM IN GENETICS)

4.4. Nonmetric Multidimensional Scaling

Challenge #6

Solution #6

Principles of NMDS

Solution #6

NMDS

NMDS on fish abundances

NMDS: goodness-of-fit

NMDS on fish abundances

  • Construct the biplot
  • Run the NMDS
  • Run the NMDS

  • nMDS is implemented in the vegan package using function metaMDS:

> mite.spe.nmds<-metaMDS(mite.spe, distance='bray', k=2)

> spe.nmds<-metaMDS(spe, distance='bray', k=2)

  • NMDS
  • is the non-metric counterpart of PCoA
  • uses an iterative optimization algorithm to find the best representation of distances in reduced space
  • increasingly popular

  • In NMDS, users can thus specify;
  • the number of dimensions
  • the distance measure
  • Extract the results

Run the NMDS of the mite species abundances in 2 dimensions based on a Bray-Curtis distance. Assess the goodness-of-fit of the ordination and interpret the biplot

> spe.nmds$stress

> stressplot(spe.nmds, main='Shepard plot')

  • In PCA, CA and PCoA, objects are ordinated in a few number of dimensions (i.e. axis) generally > 2

  • Consequently, 2D-biplots can fail to represents all the variation of the dataset

  • In some cases, the objective is however to represent the data in a specified small number of dimensions.

  • Then, how do you plot the ordination space to represent all the variation in the data ?

> spe.nmds <- (spe, distance='bray', k=2)

  • The Shepard plot identifies a strong correlation between observed dissimilarity and ordination distance (R2 > 0.95), highlighting a high goodness-of-fit of the NMDS.
  • The biplot of the NMDS shows a group of closed sites characterized by the species BLA, TRU, VAI, LOC, CHA and OMB, while the other species form a cluster of sites in the upper right part of the graph. Four sites in the lower part of the graph are strongly different from the others.

> spe.nmds

> windows()

> plot(spe.nmds, type="none", main=paste('NMDS/Bray - Stress=', round(spe.nmds$stress, 3)), xlab=c("NMDS1"),ylab=c("NMDS2"))

> points(scores(spe.nmds, display="sites", choices=c(1,2)),pch=21, col="black", bg="steelblue", cex=1.2)

> text(scores(spe.nmds, display="species", choices=c(1)), scores(spe.nmds, display="species", choices=c(2)), labels=rownames(scores(spe.nmds, display="species")),col="red", cex=0.8)

  • Assess the goodness-of-fit

  • The correlation between observed dissimilarity and ordination distance (R2 > 0.91) and the stress value relatively low, showing together a good accuracy of the NMDS ordination.

  • No cluster of sites can be precisely defined from the NMDS biplot showing that most of the species occurred in most of the sites, i.e. a few sites shelter specific communities.

> mite.spe.nmds$stress

> stressplot(mite.spe.nmds, main='Shepard plot')>

  • Assess the goodness-of-fit
  • distance specifies the distance to use
  • k specifies the number of dimensions

> spe.nmds$stress

> stressplot(spe.nmds, main='Shepard plot')

Conclusion

Many ordination techniques exist, but their specificity should guide your choices on which methods to use

Species 1 Species 2

Site 1

Site 2

Var 1 Var 2

Site 1

Site 2

Function rda()

  • RDA is in 2 steps:

- multiple regressions

- PCA on regressed values

-> If we give only one table to the function rda()

it does directly a PCA without doing regression

rda(Y~X) ->RDA

rda(Y) or rda(X) -> PCA

How to run a NMDS ?

NMDS: goodness-of-fit

  • NMDS applies an iterative procedure that tries to position the objects in the requested number of dimensions in such a way as to minimize a stress function (scaled from 0 to 1) which measure the goodness-of-fit of the distance adjustment in the reduced-space configuration. Consequently, the lower the stress value, the better the representation of objects in the ordination-space is.

2 FIRST PCs EXPLAIN

100% OF THE VARIATION

  • Distances between objets
  • Correlations between species

PCA basic biplot with plot()

HOW TO INTERPRET THE POSITIONS ?

Species scores

2nd Principal Component

Site scores

1st Principal Component

Notice the abundance of 0s

  • Diagonal is zero
  • Site 2 and 3 most similar
  • Site 1 and 3 most different

DATA NOT CENTERED, YIKES!

프레지로 더욱 인상깊고 역동적인 프레젠테이션을 만들어 보세요