Introducing
Your new presentation assistant.
Refine, enhance, and tailor your content, source relevant images, and edit visuals quicker than ever before.
Trending searches
Throughout the workshop there will be a series of challenges denoted by a rubix cube.
During challenges collaborate with your neighbors!
Solution
1. Separate:
A. Allow intercepts and or slopes to vary for a given factor.
A. Allow intercepts and or slopes to vary for a given factor.
2. Lump:
Assume intercepts come from a normal distribution (ND)
Same thing for lake
Same concept for slopes just harder to visualize
Only need to estimate mean and standard deviation of ND instead of *3 intercepts
Fixed effect
But how?
A. Intercepts and/or slopes are allowed to vary by lake and species.
B. Intercepts, slopes, and associated confidence intervals are adjusted to account for the structure of data.
Estimate 2 parameters (mean and SD) instead of 6 intercepts - saves degrees of freedom
LMM's are a balance between separating and lumping. They:
1. Estimate slope and intercept parameters for each species and lake (separating).
2. Use all the data available (lumping) while accounting for pseudoreplication and controlling for differences among lakes and species.
Website: http://qcbs.ca/wiki/r/workshop6
Fixed VS Random effect
In the literature of LMMs, you will meet those terms often.
There are many possible definitions of fixed and random effects and we chose to present definitions that are easy to apply to your data.
Random effect
Again, only estimate mean and SD instead of 3 slopes
This is done including species from all lakes
This is done including individuals of all species
B. Adjust intercepts and slopes to account for the structure of data.
B. Adjust confidence intervals of intercepts and slopes to account for pseudoreplication.
smaller effective sample size larger confidence intervals
larger effective sample size smaller confidence intervals
If a certain species or lake is data poor it will rely more heavily on the group level model for the intercept and slope in this species or lake.
Based on:
Intraclass correlation coefficient (ICC) - How much variation is there among groups versus within groups?
Will treat individual observations more like they are independent
Will treat these more like one observation
--> Data exploration
Step 1: A priori model building and data exploration
Check for collinearity between variables
Look at the distribution of continuous variables
Step 1: A priori model building and data exploration
--> Data exploration
Check for collinearity between variables
What we know:
Look at the distribution of samples across factors:
Always make sure you've done "Housekeeping" on your data before you start building models!
Step 2: Coding potential models and model selection
We are interested in finding out if trophic position can be predicted by length, while accounting for variation between species and lakes
Is the data in the right structure?
collinear with length?
Step 3: Model validation
So we now want a model that looks
something like this:
Step 4: Interpreting results
and visualizing the model
Trophic Positionijk ~ Fish Lengthi + Lakej + Speciesk
Major skews can lead to problems with model homogeneity down the road. So if necessary, make transformations. In this case, the data seems OK.
*This data set is perfectly balanced, but mixed models can deal with unbalanced designs (like we so often have in Ecology!).
--> Data exploration
How?
--> Data exploration
Consider the scale of your data
How?
Trophic position:
Short scale
Length:
Long scale
#Species Effect
#Lake Effect
To know if a mixed model is necessary for your data set you must establish if it is actually important to account for variation in the factors that might be affecting the relationship that you're interested in
z correct Length:
#Species Effect
#Lake Effect
z correct Trophic position:
Step 2: Coding potential models and model selection
Step 1: A priori model building and data exploration
Step 2: Coding potential models and model selection
Step 2: Coding potential models and model selection
Trophic Positionijk ~ Fish Lengthi + Lakej + Speciesk
indicates varying intercepts
"linear mixed model" function
Re-write the following code so that the slopes of the trophic position/length relationship also vary by lake and species
Step 3: Model validation
But what if we also want the slopes to vary?
Note on estimation methods
Step 4: Interpreting results
and visualizing the model
Estimation method
REML (Restricted Maximum Likelihood) is the default estimation method in the "lmer" function. Generally, REML is preferred to Maximum Likelihood (ML) to compare nested random effect models, but it is safest to use ML when comparing fixed effect models. (Which we will do shortly!)
Step 2: Coding potential models and model selection
#full model with varying intercepts
To find out of the AICc value of a model use:
#full model with varying intercepts and slopes
Step 2: Coding potential models and model selection
Make a list of 7 alternative models to the following model that can be built and compared:
Step 2: Coding potential models and model selection
#no Lake, varying intercepts only
#no Species, varying intercepts only
To group all the AICc values of the different models into one table that is easy to read use:
Step 2: Coding potential models and model selection
#No Lake, varying intercepts and slopes
#No Species, varying intercepts and slopes
#Full model with varying intercepts and slopes only varying by lake
#Full model with varying intercepts and slopes only varying by species
*Note - If we had different fixed effects among models we would have to indicate “REML=FALSE” to compare with likelihood methods like AIC.
#Bonus Model!
It is always useful to build the simple linear model without any varying intercept and slope factors to see the difference in AICc values. Although "lm" does not use the same estimation method as lmer, the AICc values can be compared between the two if REML = FALSE in all lmer models
Step 2: Coding potential models and model selection
Step 1: A priori model building and data exploration
What is the structure of the best-fit model?
What do these AICc values tell us?
What is the structure of the best-fit model?
Step 2: Coding potential models and model selection
Take 2 minutes with your neighbour to draw out the model structure of M2. Biologically, how does it differ from M8? Why is it not surprising that it's AICc value was 2nd best?
In-class group discussion
Step 3: Model validation
Step 4: Interpreting results
and visualizing the model
This tells us that both the slopes and intercepts of the relationship between TP and Length vary at species level factor in the best fit-model.
Step 3: Model validation
B. Look at independence
ii) plot residuals vs each covariate not in the model
B. Look at independence
i) plot residuals vs each covariate in the model
Step 3: Model validation
B. Look at independence
i) plot residuals vs each covariate in the model
A. Look at homogeneity
-plot models fitted values vs residuals values
Step 3: Model validation
C. Look at normality of residuals
A. Look at homogeneity
-plot models fitted values vs residuals values
B. Look at independence
-plot residuals vs each covariate in the model
-plot residuals vs each covariate not in the model
C. Look at normality
-histogram
*Even spread of the residuals suggest that the model is a good fit for the data.
Normalized residuals
No visible patterns observed, which suggests there are no problems with independence in relation to the Lake and Species variables
Fitted values
*A plot like this would suggest that there variation in the dataset that the model was unable to account for
Step 4: Interpreting results
Step 4: Interpreting results
and visualizing the model
Step 1: A priori model building and data exploration
Step 3: Model validation
a) What is the slope and confidence interval of the Z_Length variable in the M8 model?
-Slope = 0.422
-CI = 0.09*2 = 0.18
b) Is the Z_Length slope significantly different to 0?
-Yes, because the CI does not overlap with 0
SD^2
Step 2: Coding potential models and model selection
a) What is the slope and confidence interval of the Z_Length variable in the M8 model?
b) Is the Z_Length slope significantly different from 0?
<--Variation in Lake intercepts
A. Look at homogeneity
-plot models fitted values vs residuals values
B. Look at independence
-plot residuals vs each covariate in the model
-plot residuals vs each covariate not in the model
C. Look at normality
-histogram
<--Variation in Species intercepts
Step 3: Model validation
<--Variation in Species slopes
<-- Residual variation
Step 4: Interpreting results
and visualizing the model
The estimated slope +/- the 95% confidence interval
SE*2
Insignificant slope
Significant slope
Step 4: Interpreting results
a) All data grouped
and visualizing the model
a) All data grouped
1-Obtain coefficients of interest
Intercept = -0.0009059
Slope = 0.4222697
2- Illustrate coefs in a figure
#Plot all data into one figure
Take two minutes to sketch out different ways you could plot out the results of M8.
b) The species level figure
1-Obtain coefficients of interest
1-Obtain coefficients of interest
2-Plot the data color coded by species and add regression lines for each species using extracted coefs
*See code for details concerning the figure
c) The lake level figure
*hint: consider the different "levels" of the model
c) The lake level figure
2-Plot the data color coded by species and add regression lines for each species using extracted coefs
*See code for details concerning the figure
Mixed models are really good at accounting for variation in ecological data while not loosing too many degrees of freedom
Situation:
You collected biodiversity estimates from 1000 quadrats that are within 10 different sites which are within 10 different forests. You measured the productivity in each quadrat as well. You are mainly interested in knowing if productivity is a good predictor of biodiversity.
-What mixed model could you use for this data set?
Classroom discussion of examples
Situation:
You collected 200 fish from 12 different sites evenly distributed across 4 habitat types found within 1 lake. You measured the length of each fish and the quantity of mercury in its tissue. You are mainly interested in knowing if habitat is a good predictor of mercury concentration.
-What mixed model could you use for this data set?
Discuss the data set your currently working on with you neighbour and determine if a mixed model would be appropriate for it.
If so, work with your neighbour to write out the code you would use in R for this model.
If not, come up with a fictive ecological data set for which a mixed model could be useful and write out the model
#Add an abline line with coefs