Loading…
Transcript

0. The linear model... and where if fails

1. Introduction to GAMs

2. Multiple smooth terms

3. Interactions

4. Changing basis

6. Quick intro to GAMMs

7. GAMs behind the scenes

Outline

Topics

Learning objectives

Prerequisites

  • What is non linearity?
  • What are GAMs and how do they allow us to handle non-linear relationships?
  • How to understand and plot non-linear relationships
  • How to include interaction terms
  • Some experience in R; enough to be able to run a script and examine data and R objects
  • A basic knowledge of regression; you should know what we mean by linear regression and ANOVA
  • Use the mgcv package to fit non-linear relationships
  • Understand the output of a GAM to help you understand your data
  • Use tests to determine if a non-linear model fits better than a linear one
  • Include smooth interactions between variables
  • Understand the idea of a basis function, and why it makes GAMs so powerful
  • Account for dependence in data (autocorrelation, hierarchical structure) using GAMMs

What do we mean by the linear model?

There's only one way for the linear model to be right:

And yet so many ways for it to fail:

So how do we fix it?

  • Regression is the workhorse of statistics
  • It allows us to model a response variable as a function of predictors plus error
  • Linear regression is what most people encounter first
  • It makes four major assumptions:

To fix non-linearity, we must know what the regression model is trying to do:

  • fit a line that goes through the middle of the data
  • do this without overfitting the data, such as simply using a line between each point and its neighbours

Linear models do this by finding the best fit straight line that passes through the data.

Additive models do this by fitting a curve through the data, but controlling

how wiggly it can get (more on this later).

  • normally distributed error
  • the variance of the errors doesn't change
  • each error is independent of the others
  • the response is linear: y = a*x + b

Using GAMs to test for linearity

Here's how we'd do it using a smoothed term in the GAM (section 1.3):

Challenge 1

Let's look at an example. First, we'll generate some data, and plot it.

Run section 1.1 in the code:

Solution

mgcv also includes a default plot to look at the smooths (section 1.4):

Here's how we'd fit this if we were using linear regression (section 1.2):

We can use the gam() and anova() commands to formally test whether an assumption of linearity is justified. We simply need to set it up so our smoothed model is nested in our linear model (section 1.5):

> plot(gam_model)

> gam_model = gam(y_obs~s(x))

> summary(gam_model)

> data_plot = data_plot +

geom_line(colour="blue",aes(y=fitted(gam_model)))

> print(data_plot)

Now we're going to try this with some more made-up data, just to get a handle on it. First we'll generate the data (section 1.6):

> linear_model_test <- gam(y_test_obs~x_test)

> nested_gam_model_test <- gam(y_test_obs~s(x_test)+x_test)

> print(anova(linear_model_test, nested_gam_model_test, test="Chisq"))

> linear_model = gam(y_obs~x)

> nested_gam_model = gam(y_obs~s(x)+x)

> print(anova(linear_model, nested_gam_model, test="Chisq"))

> library(mgcv)

> linear_model = gam(y_obs~x)

> model_summary=summary(linear_model)

> print(model_summary)

> data_plot = data_plot+

geom_line(colour="red",

aes(y=fitted(linear_model)))

> print(data_plot)

Family: gaussian

Link function: identity

Formula:

y_obs ~ x

Parametric coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 0.765606 0.021181 36.15 <2e-16 ***

x 0.157892 0.007515 21.01 <2e-16 ***

---

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

R-sq.(adj) = 0.639 Deviance explained = 64%

GCV score = 0.026765 Scale est. = 0.026551 n = 250

> library(ggplot2)

> set.seed(10)

> n = 250

> x = runif(n,0,5)

> y_model = 3*x/(1+2*x)

> y_obs = rnorm(n,y_model,0.1)

> data_plot = qplot(x, y_obs) +

geom_line(aes(y=y_model)) +

theme_bw()

> print(data_plot)

Family: gaussian

Link function: identity

Formula:

y_obs ~ s(x)

Parametric coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 1.154422 0.006444 179.1 <2e-16 ***

---

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Approximate significance of smooth terms:

edf Ref.df F p-value

s(x) 8.317 8.872 170.9 <2e-16 ***

---

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

R-sq.(adj) = 0.859 Deviance explained = 86.3%

GCV score = 0.010784 Scale est. = 0.010382 n = 250

> n= 250

> x_test = runif(n,-5,5)

> y_test_fit = 4*dnorm(x_test)

> y_test_obs = rnorm(n,y_test_fit, 0.2)

Think of this as a plot of the effect (or contribution) of the smooth term.

Or, said differently, how the coefficient varies over the range of xi.

Model 1: y_test_obs ~ x_test

Model 2: y_test_obs ~ s(x_test) + x_test

Resid. Df Resid. Dev Df Deviance Pr(>Chi)

1 248.0 81.09

2 240.5 7.46 7.5012 73.629 < 2.2e-16 ***

Note that with GAMs the relationship between individual predictors and dependent variable is estimated by a non-linear smooth function:

g(y) = s(x1) + s(x2, x3) + ...

> summary(nested_gam_model_test)$s.table

Analysis of Deviance Table

Model 1: y_obs ~ x

Model 2: y_obs ~ s(x) + x

Resid. Df Resid. Dev Df Deviance Pr(>Chi)

1 248.00 6.5846

2 240.68 2.4988 7.3168 4.0858 < 2.2e-16 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

As we will see in the next slide, the advantage of GAM over manual specification of non-linearities is that the optimal shape is determined automatically.

  • Appropriate degree of smoothness is determined on the basis of cross validation to prevent overfitting

Fit a linear and smoothed GAM model to the relation between x_test and y_test_obs.

What is the effective degrees of freedom of the smoothed term? Determine if linearity is justified for this data.

edf Ref.df F p-value

s(x_test) 7.602145 8.029057 294.0944 0

Note: as opposed to one fixed coefficient (beta in linear regression), the smooth function can continually change over the range of xi

Yes, non-linearity is justified. The edf (effective degrees of freedom) is >> 1. (we will get back to this in the next section...)

Quebec Centre for Biodiversity Science

R Workshop Series

Or once again using vis.gam:

Finally, we'll look at interactions between smooth terms (section 3.2):

> plot(smooth_interact,page=1)

Alternatively, we can use the vis.gam function to create a 2D plot

The 'by' argument lets you have a smooth term vary between different

levels of a factor (section 3.1):

How to include interactions in a GAM

> plot(categorical_interact,page=1)

We will use an ANOVA to test if the interaction term is necessary:

> anova(two_smooth_model,smooth_interact,test="Chisq")

> vis.gam(smooth_interact,view=c("x1","x2"),theta=40, n.grid=500, border=NA)

> vis.gam(categorical_interact,view=c("x2","x0"),theta=40,n.grid=500, border=NA)

> anova(two_smooth_model, categorical_interact,test="Chisq")

> smooth_interact = gam(y~x0+s(x1,x2),data= gam_data)

> smooth_interact_summary = summary(smooth_interact)

> print(smooth_interact_summary$s.table)

> categorical_interact = gam(y~x0+s(x1)+s(x2,by=x0),

data= gam_data)

> categorical_interact_summary = summary(categorical_interact)

> print(categorical_interact_summary$s.table)

There are two ways to include interactions between variables:

  • if one variable is continuous, and the other is categorial: using the 'by' argument s(x, by=factor)
  • if both variables are continuous, by putting terms under a non-linear fonction s(x1,x2)

Analysis of Deviance Table

Model 1: y ~ x0 + s(x1) + s(x2)

Model 2: y ~ x0 + s(x1, x2)

Resid. Df Resid.Dev Df Deviance Pr(>Chi)

1 385.73 1839.5

2 370.08 1710.3 15.644 129.2 0.02804 *

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

edf Ref.df F p-value

s(x1,x2) 25.91803 28.42892 36.39357 9.315854e-165

Analysis of Deviance Table

Model 1: y ~ x0 + s(x1) + s(x2)

Model 2: y ~ x0 + s(x1) + s(x2, by = x0)

Resid. Df Resid. Dev Df Deviance Pr(>Chi)

1 385.73 1839.5

2 368.67 1740.9 17.06 98.608 0.2347

Start with a basic model, with one smoothed term (x1) and one categorical

predictor (x0, which has 4 levels) (section 2.2).

> plot(two_term_model)

Multiple variables

Or a second smoothed term (section 2.4):

A brief note on edf...

Now add a second, linear, term (section 2.3):

Let's plot the model to see the shape of the smoother:

Challenge 2

> plot(two_smooth_model,page=1)

As before, we can use an ANOVA to test if the smoothed term is necessary (section 2.5):

Solution

> basic_model = gam(y~x0+s(x1), data= gam_data)

edf Ref.df F p-value

s(x1) 2.401350 2.996664 67.18567 6.325091e-38

s(x2):x01 6.606775 7.708292 21.40363 2.572371e-27

s(x2):x02 6.436261 7.571201 19.96801 5.469274e-25

s(x2):x03 5.467172 6.618716 28.73636 2.938870e-32

s(x2):x04 6.422564 7.574388 26.43148 2.092391e-33

> plot(basic_model)

GAMs make it easy to include both smooth and linear terms, multiple smoothed terms, and smoothed interactions. For this section, we'll be using a dataset automatically generated by mgcv (section 2.1):

> two_term_model = gam(y~x0+s(x1)+x2, data= gam_data)

> two_term_summary = summary(two_term_model)

> print(two_term_summary$p.table)

> two_smooth_model = gam(y~x0+s(x1)+s(x2), data= gam_data)

> two_smooth_summary = summary(two_smooth_model)

> print(two_smooth_summary$p.table)

From here we can look at the coefficient tables for this model:

> anova(basic_model,two_term_model,two_smooth_model, test="Chisq")

Visually, we saw that the shape of smooth terms were largely comparable among the four levels of x0. The anova test confirms this as well (Deviance = 98.6, p = 0.2347).

> basic_summary = summary(basic_model)

> print(basic_summary$p.table)

> gam_data = gamSim(eg=5)

> anova(two_smooth_model,three_term_model,test="Chisq")

We will revisit the interpretation of edf later in the workshop, but for now keep in mind that the higher the edf, the more non-linear is the smoothing spline.

That is, a high value (8–10 or higher) means that the curve is highly non-linear, whereas a smoother with 1 degree of freedom is a straight line.

In contrast, in OLS regression the model degrees of freedom is equivalent to the number of free parameters, p, in the model (and the residual df = n - p)

In our basic model the edf of smooth function s(x1) is ~2, which suggests a non-linear curve.

Estimate Std. Error t value Pr(>|t|)

(Intercept) 8.937862 0.2217506 40.305927 2.373755e-140

x02 2.008045 0.3137690 6.399756 4.518133e-10

x03 3.832496 0.3143049 12.193562 3.758930e-29

x04 6.041521 0.3145299 19.208098 3.520507e-58

Estimate Std. Error t value Pr(>|t|)

(Intercept) 11.400658 0.4177614 27.289879 9.396207e-93

x02 2.314405 0.4552138 5.084216 5.723467e-07

x03 4.487653 0.4543299 9.877520 1.063008e-20

x04 6.596149 0.4585778 14.383925 5.468771e-38

x2 -5.825948 0.5436671 -10.716021 1.114046e-23

Additive model + factor

> three_term_model <- gam(y~x0+s(x1)+s(x2)+x3, data=gam_data)

> three_smooth_model <- gam(y~x0+s(x1)+s(x2)+s(x3), data=gam_data)

> three_smooth_summary <- summary(three_smooth_model)

> print(three_smooth_summary$s.table)

> plot(three_smooth_model,page=1)

Estimate Std. Error t value Pr(>|t|)

(Intercept) 8.550030 0.3655849 23.387258 1.717989e-76

x02 2.418682 0.5165515 4.682364 3.908046e-06

x03 4.486193 0.5156501 8.700072 9.124666e-17

x04 6.528518 0.5204234 12.544629 1.322632e-30

Create two new models, with x3 as a linear and smoothed term.

Use plots, coefficient tables and the anova() function to determine if x3 is an important term to include.

*plot(smooth_interact,page=1,scheme=1) gives a similar plot

> head(gam_data)

Analysis of Deviance Table

Model 1: y ~ x0 + s(x1)

Model 2: y ~ x0 + s(x1) + x2

Model 3: y ~ x0 + s(x1) + s(x2)

Resid. Df Resid. Dev Df Deviance Pr(>Chi)

1 394.08 5231.6

2 393.10 4051.3 0.97695 1180.2 < 2.2e-16 ***

3 385.73 1839.5 7.37288 2211.8 < 2.2e-16 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Model 1: y ~ x0 + s(x1) + s(x2)

Model 2: y ~ x0 + s(x1) + s(x2) + x3

Resid. Df Resid. Dev Df Deviance Pr(>Chi)

1 385.73 1839.5

2 384.73 1839.3 0.99997 0.18818 0.8427

> print(two_smooth_summary$s.table)

> print(basic_summary$s.table)

> print(two_term_summary$s.table)

edf Ref.df F p-value

s(x1) 2.542296 3.170441 67.75922314 1.573017e-39

s(x2) 7.731424 8.584355 80.90188472 1.745178e-119

s(x3) 1.000000 1.000000 0.02039697 8.865087e-01

y x0 x1 x2 x3

1 4.723147 1 0.02573032 0.70706571 0.69248543

2 8.886671 2 0.83272144 0.84997218 0.88974095

3 11.196905 3 0.66302652 0.88025265 0.08469529

4 10.886068 4 0.11126873 0.80087554 0.15109792

5 12.270534 1 0.87969756 0.37692184 0.51467778

6 9.020910 2 0.12441532 0.05154493 0.86526950

edf Ref.df F p-value

s(x1) 1.923913 2.406719 42.84268 1.076396e-19

edf Ref.df F p-value

s(x1) 1.900864 2.377544 49.85908 1.938884e-22

edf Ref.df F p-value

s(x1) 2.546757 3.175726 68.08822 9.272491e-40

s(x2) 7.726989 8.582003 81.42428 2.845063e-120

The term x3 is not significant. It should be dropped from the model.

Here, the wiggleness of the smoothed term s(x1) is indicated by the edf parameter (effective degrees of freedom).

The edf = 1 therefore term is linear (no need for the smooth term). Let's next evaluate the significance of the linear relationship between y and x3...

We'll look at how we can predict y based on the other variables.

Workshop 8: Generalized additive models

5. Other distributions

Website: http://qcbs.ca/wiki/r_workshop8

> plot(year_gam,page=1, scale=0)

Visualizing the trend over time

Number of successes relative to failures as x1 increases.

Parameters of smooth functions

Expanding the basic GAM

Monthly data from the years 1920 through to 1940

Smooth terms

Lets test this using a logistic GAMs (section 5.3).

Lastly, to help interpret the results, we could transform the summed effects back to proportions with the plot_smooth() function (section 5.4)

How to define the distribution for non-normal or count data?

To model this, we need to use what's called a cyclical cubic spline, or "cc", to model month effects as well as a term for year (section 4.2):

seasonal term

(cyclic cubic spline)

trend term

Lastly, we will see the different ways this relationship could be represented graphically (section 5.4).

Smooth functions have many parameters that could change their behavior. The most often-used parameters are:

emptyPlot(range(gam_data3$x1), c(0,1), h=.5)

avg <- aggregate(prop ~ x1, data=gam_data3, mean, na.rm=TRUE)

lines(avg$x1, avg$prop, col="orange",lwd=2)

Recall that the model uses count data to calculate the logit, which is the log odds ratio between successes and failures:

- If successes = failures, the ratio is 1 and the logit is 0 (log(1) = 0).

- If successes > failures, the ratio is > 1 and the logit has a positive value (e.g., log(2) = 0.69).

- If successes < failures, the ratio is < 1 and the logit has a negative value (e.g., log(0.5) = -0.69).

To model a nonlinear smooth or surface, three different smooth functions are available:

Let's look at one place where knowing how to change the basis can help:

cyclical data.

That is, data where you want the predictor to match at the ends.

Imagine you have a time series of climate data, broken down by monthly measurement, and you want to see if there's a time-trend in yearly temperature.

We'll use the Nottingham temperature time series for this (section 4.1):

Fitted values, intercept included:

Equal amounts of S and F up to about ~400

> prop_model <- gam(prop~ s(x1), data=gam_data3, weights=total, family="binomial")

> prop_summary <- summary(prop_model)

> print(prop_summary$p.table)

> print(prop_summary$s.table)

> plot(prop_model)

We will now briefly look at how to implement other distribution families (e.g., Gamma, binomial, Poisson, negative binomial) to GAMs, using data where a binomial family distribution is needed (measuring the number of successes vs failures of a given event; section 5.1)

> year_gam = gam(nottem~s(nottem_year) + s(nottem_month, bs = "cc"))

> summary(year_gam)$s.table

Here, the intercept is positive, which means that on average there are more successes than failures.

We won't go too much further into this today, but you should know that you can expand on the basic model we've looked at today:

  • more complicated smooths, by changing the basis
  • other distributions: anything you can do with a GLM using the family argument
  • mixed effect models, using gamm or the gamm4 package

We'll first have a look at different smooth functions and changing the basis, followed by a quick intro on using other distributions and then mixed additive models (GAMMs)...

Would you say that this trend is linear or non-linear?

Contribution/ partial effect:

Over time log odds increases (successes increase, failures decrease).

edf Ref.df F p-value

s(nottem_year) 2.333879 2.906998 2.528043 0.06007933

s(nottem_month) 4.923943 8.000000 390.029032 0.00000000

s():

  • for modeling a 1-dimensional smooth, or for modeling interactions among variables measured on same scale

te():

  • for modeling 2- or n-dimensional interaction surfaces of variables that are not on the same scale. Includes ‘main’ effects.

ti():

  • for modeling 2- or n-dimensional interaction surfaces that do not include the ‘main effects’.

k: number of ‘knots’ - determines upper bound of # of base functions used to build the curve (constrains wiggleness)

  • # of base functions is reflected in the edf of the summary
  • default upper bound of k for s() is ~9 and for te() and ti() 5 is per dimension (k should be < # of unique data points)

d - specifies predictors in an interaction that are on the same scale; only used in te() and ti()

  • For example, te(time, width, height, d=c(1,2)), indicates that width and height are on the same scale, but time is not

bs - specifies the type of underlying base function

  • For s() this defaults to "tp" (thin plate regression spline) and for te() and ti() this defaults to "cr" (cubic regression spline)

> gam_data3 <- read.csv("other_dist.csv")

> summary(gam_data3)

> str(gam_data3)

Logit

Estimate Std. Error z value Pr(>|z|)

(Intercept) 1.173978 0.02709613 43.32641 0

edf Ref.df Chi.sq p-value

s(x1) 4.591542 5.615235 798.9407 2.027751e-164

> data(nottem)

> n_years = length(nottem)/12

> nottem_month = rep(1:12, times= n_years)

> nottem_year = rep(1920:(1920+n_years-1),each=12)

> nottem_plot = qplot(nottem_month,nottem, colour=factor(nottem_year),

geom="line")

> print(nottem_plot)

smooth functions for the trend and seasonal features

'data.frame': 514 obs. of 4 variables:

$ prop : num 1 1 1 1 0 1 1 1 1 1 ...

$ total: int 4 20 20 18 18 18 20 20 20 20 ...

$ x1 : int 550 650 750 850 950 650 750 850 950 550 ...

$ fac : Factor w/ 4 levels "f1","f2","f3",..: 1 1 1 1 1 1 1 1 1 1 ...

What does the intercept represent in this model? What about the smooth term?

There is about 1-1.5 degree rise in temperature over the period, but within a given year there is about 20 degrees variation in temperature, on average.

The actual data vary around these values and that is the unexplained variance.

What do these plots tell us about successes versus failures?

As we already derived from the logit plot, we see that at around x1=400 the proportion of successes increases significantly above 0.5

> summary(gamm_intercept)$s.table

Model with correlated errors

Here, if the random slope varied along x0, we would see different curves for each level:

Mixed modeling

Note that the random slope is static in this case:

Looks like a low-order AR model is needed, so we can fit two models; and AR(1) and AR(2) (section 6.2)

Dealing with non-independence

We will now include both a random intercept and slope (section 6.5).

Next we will run a model with a random slope (section 6.4).

Plot the summed effects for the x0 without random effects and for all four levels of fac:

Lastly, we will examine a model with a random smooth (section 6.6).

We will first examine a GAMM with just a random intercept.

As before, we will use gamSim() to automatically generate a dataset, this time with a random effect component generated, then run a model with a random intercept using fac as the random factor (section 6.3)

To start, we will revisit the Nottingham temperature model and examine the residuals using the (partial) autocorrelation function (section 6.1)

edf Ref.df F p-value

s(x0) 3.044725 3.776643 2.811258 2.835663e-02

s(fac) 2.960269 3.000000 95.146996 2.696335e-54

Plot the summed effects for the x0 without random effects, and for the four levels of fac:

plot(gamm_smooth, select=1)

Plot the summed effects for x0 without random effects, and for each level of fac:

plot(gamm_int_slope, select=3)

fac 4

As we saw in the section on changing basis, bs specifies the type of underlying base function.

For random intercepts and linear random slopes we use bs="re", but for random smooths we use bs="fs".

> gamm_smooth <- gam(y ~ s(x0)

+ s(x0, fac, bs="fs", m=1), data=gam_data2)

> summary(gamm_smooth)

> par(mfrow=c(1,2))

> acf(resid(year_gam), lag.max = 36, main = "ACF")

> pacf(resid(year_gam), lag.max = 36, main = "pACF")

> gamm_int_slope <- gam(y ~ s(x0) + s(fac, bs="re")

+ s(fac, x0, bs="re"), data=gam_data2)

> summary(gamm_int_slope)

Note the random intercept term in the summary table. You can view it as follows:

> gamm_slope <- gam(y ~ s(x0) + s(x0, fac, bs="re"), data=gam_data2)

> summary(gamm_slope)

fac 3

> gam_data2 <- gamSim(eg=6)

> year_gam <- gamm(nottem~s(nottem_year)+ s(nottem_month, bs="cc"))

> year_gam_AR1 <- gamm(nottem~s(nottem_year)+

s(nottem_month, bs="cc"),correlation = corARMA(form = ~ 1|

nottem_year, p = 1))

> year_gam_AR2 <- gamm(nottem~s(nottem_year)+

s(nottem_month, bs="cc"),correlation = > corARMA(form = ~ 1|

nottem_year, p = 2))

> anova(year_gam$lme,year_gam_AR1$lme,year_gam_AR2$lme)

fac 2

4 term additive + random effect

A closer look at GAMs

Resources

When observations are not independent, GAMs can be used to either incorporate:

  • a serial correlation structure to model the residual autocorrelation (autoregressive AR, moving average MA, or a combination of the two ARMA)
  • random effects that the model independence among observations from the same site, group, subject, etc...

Illustration of the principle behind cross validation

Here's a representation of a smooth function using a rank 5 cubic spline basis with knot locations at increments of 0.2:

And plot it:

A simple example: a polynomial basis

Another example: a cubic spline basis

The basis functions are each multiplied by a real valued parameter, βi, and are then summed to give the final curve f(x)

Choosing lambda: cross validation

A note on effective degrees of freedom (edf)

Controlling the degree of smoothing with penalized regression splines

> head(gam_data)

plot_smooth(gamm_int_slope, view="x0", rm.ranef=TRUE, main="intercept + s(x0)",

rug=FALSE)

plot_smooth(gamm_int_slope, view="x0", cond=list(fac="1"), main="... + s(fac) + s(fac, x0)",

col='orange', ylim=c(7,22), rug=FALSE)

plot_smooth(gamm_int_slope, view="x0", cond=list(fac="2"), add=TRUE,

col='red', xpd=TRUE)

plot_smooth(gamm_int_slope, view="x0", cond=list(fac="3"), add=TRUE,

col='purple', xpd=TRUE)

plot_smooth(gamm_int_slope, view="x0", cond=list(fac="4"), add=TRUE,

col='turquoise', xpd=TRUE)

fac 1

par(mfrow=c(1,2), cex=1.1)

plot_smooth(gamm_intercept, view="x0",

rm.ranef=TRUE, main="intercept + s(x1)", rug=FALSE)

plot_smooth(gamm_intercept, view="x0", cond=list(fac="1"), main="... + s(fac)",

col='orange', ylim=c(8,21), rug=FALSE)

plot_smooth(gamm_intercept, view="x0", cond=list(fac="2"), add=TRUE,

col='red')

plot_smooth(gamm_intercept, view="x0", cond=list(fac="3"), add=TRUE,

col='purple')

plot_smooth(gamm_intercept, view="x0", cond=list(fac="4"), add=TRUE,

col='turquoise')

penalized regression spline fit to remaining data (º)

plot_smooth(gamm_slope, view="x0", rm.ranef=TRUE,

main="intercept + s(x0)", rug=FALSE)

plot_smooth(gamm_slope, view="x0", cond=list(fac="1"), main="... + s(fac)",

col='orange',ylim=c(7,22), rug=FALSE)

plot_smooth(gamm_slope, view="x0", cond=list(fac="2"), add=TRUE,

col='red')

plot_smooth(gamm_slope, view="x0", cond=list(fac="3"), add=TRUE,

col='purple')

plot_smooth(gamm_slope, view="x0", cond=list(fac="4"), add=TRUE,

col='turquoise')

omitted from fitting (•)

GCV function

Fitted model which minimizes the GCV score

A cubic spline is a curve constructed from sections of cubic polynomial joined together so that they are continuous in value

Model df AIC BIC logLik Test L.Ratio p-value

year_gam$lme 1 5 1109.908 1127.311 -549.9538

year_gam_AR1$lme 2 6 1101.218 1122.102 -544.6092 1 vs 2 10.68921 0.0011

year_gam_AR2$lme 3 7 1101.598 1125.962 -543.7988 2 vs 3 1.62082 0.2030

Suppose that f is believed to be a 4th order polynomial, so that the space of polynomials of order 4 and below contains f

A basis for this space

Three different types of random effects are distinguished when using GAMMs:

  • random intercepts adjust the height of other model terms with a constant value: s(Subject, bs="re")
  • random slopes adjust the slope of the trend of a numeric predictor: s(Subject, Time, bs="re")
  • random smooths adjust the trend of a numeric predictor in a nonlinear way: s(Time, Subject, bs="fs", m=1), where the argument m=1 sets a heavier penalty for the smooth moving away from 0, causing shrinkage to the mean.

We will now take a few minutes to look at what GAMs are actually doing behind the scenes.

Lets first consider a model containing one smooth function of one covariate, xi:

If lambda is too high, then the data will be over smoothed, and if it is too low then the data will be under smoothed.

It would be good to choose lambda so that the predicted f is as close as possible to f. A suitable criterion might be to choose lambda to minimize:

Instead of controlling smoothness by altering the number of knots, we keep that fixed to size a little larger than reasonably necessary, and control the model’s smoothness by adding a “wiggleness” penalty.

So, rather than fitting the model by minimizing (as with OLS):

y x0 x1 x2 x3 [...] fac

1 4.925515 0.04879793 0.273053853 0.59921214 0.18689231 1

2 10.352629 0.73410097 0.312996866 0.99805171 0.03375583 2

3 13.527934 0.32637799 0.165864279 0.52143350 0.99192673 3

4 15.511255 0.33131391 0.009219346 0.07767554 0.61888402 4

5 11.455404 0.58911290 0.469944164 0.48423685 0.22028845 1

6 10.124126 0.04054477 0.009816532 0.42261209 0.28148881 2

> plot_smooth(gamm_smooth, view="x0", rm.ranef=TRUE,

main="intercept + s(x0)", rug=FALSE)

> plot_smooth(gamm_smooth, view="x0", cond=list(fac="1"),

main="... + s(x0, fac)", col='orange',

ylim=c(7,21), rug=FALSE)

> plot_smooth(gamm_smooth, view="x0", cond=list(fac="2"),

add=TRUE, col='red', xpd=TRUE)

> plot_smooth(gamm_smooth, view="x0", cond=list(fac="3"),

add=TRUE, col='purple', xpd=TRUE)

> plot_smooth(gamm_smooth, view="x0", cond=list(fac="4"),

add=TRUE, col='turquoise', xpd=TRUE)

spline

GCV score

The AR(1) provides a significant increase in fit over the naive model, but there is very little improvement in moving to the AR(2).

so that f(x) becomes

> gamm_intercept <- gam(y ~ s(x0) + s(fac, bs="re"), data=gam_data2)

How many degrees of freedom does a fitted GAM have?

Instead of providing the output of the cross-validation in terms of lambda (model complexity), GAM uses a term called the effective degrees of freedom (edf) - more intuitive for us to speak of df.

Because the number of free parameters in GAMs is difficult to define, the edf are related to lambda, such that the greater the penalty, the smaller the edf.

For example: if the arbitrarily large number of knots is k=10, then k-1 sets the upper limit on the df, which decreases as lambda increases to its best-fit penalty value (found by cross-validation).

There's a great deal more out there on GAMs... this was just the very surface. Could test for concurvity of variables. See this workshop on model selection:

http://distancesampling.org/workshops/duke-spatial-2015/practicals/3-advanced-dsms-solutions.html

  • or use 'select = TRUE' when fitting the model (puts an additional penalty on each spline to prune out terms by shrinking flat/linear basis functions)

Simon Wood, the author of the mgcv package has a very useful website with introductory talks and notes on how to use GAMs:

http://people.bath.ac.uk/sw283/mgcv/

He's also written a book, Generalized Additive Models: An Introduction with R, which we used as reference for this workshop.

Material from this workshop was obtained from the following fantastic blogs/ tutorials:

http://www.fromthebottomoftheheap.net/blog/

http://www.sfs.uni-tuebingen.de/~jvanrij/Tutorial/GAMM.html

http://www.sfs.uni-tuebingen.de/~jvanrij/LSA2015/AnswersLab2.html

Finally, the help pages, available through ?gam in R, are an excellent resource.

Finally, all of the previous mixed models can be compared using anova() as in the previous sections to determine the best fit model.

The ACF (and pACF) provide the cross correlation (and partial correlation) of a time series with itself at different time lags, and are used to identify after how many time steps observations start to be independent.

To estimate f, we must represent it in such a way that the equation above becomes a linear model. This can be done by choosing a basis, bi(x), defining the space of functions of which f is an element:

it could be fit by minimizing:

knots

Lambda

Scaled capacity

and the full model now becomes

Since f is unknown, M is estimated using a generalized cross validation technique that leaves out each datum from the data in turn and considers the average ability of models fitted to the remaining data to predict the left out datum.

The knots are evenly spaced through the range of observed x values. But, the choice of the degree of model smoothness is controlled by the the number of knots, which was arbitrary.

Fits many of the data poorly and does no better with the missing point.

Fits the underlying signal quite well, smoothing through the noise and the missing datum is reasonably

well predicted

Fits the noise as well as the signal and the extra variability induced causes it to predict the missing datum rather poorly.

By varying the βi we can vary the form of f(x) to produce any polynomial function of order 4 or lower

As, lambda goes to infinity, the model becomes linear...

Each section of cubic has different coefficients

Is there a better way to select the knot locations?

The smooth term represents how the log odds of successes vs failures changes over x1. Since edf>1, the proportion of successes increases faster over time.

> plot(gamm_intercept, select=2)