There's only one way for the linear model to be right:
And yet so many ways for it to fail:
To fix non-linearity, we must know what the regression model is trying to do:
Linear models do this by finding the best fit straight line that passes through the data.
Additive models do this by fitting a curve through the data, but controlling
how wiggly it can get (more on this later).
Here's how we'd do it using a smoothed term in the GAM (section 1.3):
Let's look at an example. First, we'll generate some data, and plot it.
Run section 1.1 in the code:
mgcv also includes a default plot to look at the smooths (section 1.4):
Here's how we'd fit this if we were using linear regression (section 1.2):
We can use the gam() and anova() commands to formally test whether an assumption of linearity is justified. We simply need to set it up so our smoothed model is nested in our linear model (section 1.5):
> plot(gam_model)
> gam_model = gam(y_obs~s(x))
> summary(gam_model)
> data_plot = data_plot +
geom_line(colour="blue",aes(y=fitted(gam_model)))
> print(data_plot)
Now we're going to try this with some more made-up data, just to get a handle on it. First we'll generate the data (section 1.6):
> linear_model_test <- gam(y_test_obs~x_test)
> nested_gam_model_test <- gam(y_test_obs~s(x_test)+x_test)
> print(anova(linear_model_test, nested_gam_model_test, test="Chisq"))
> linear_model = gam(y_obs~x)
> nested_gam_model = gam(y_obs~s(x)+x)
> print(anova(linear_model, nested_gam_model, test="Chisq"))
> library(mgcv)
> linear_model = gam(y_obs~x)
> model_summary=summary(linear_model)
> print(model_summary)
> data_plot = data_plot+
geom_line(colour="red",
aes(y=fitted(linear_model)))
> print(data_plot)
Family: gaussian
Link function: identity
Formula:
y_obs ~ x
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.765606 0.021181 36.15 <2e-16 ***
x 0.157892 0.007515 21.01 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
R-sq.(adj) = 0.639 Deviance explained = 64%
GCV score = 0.026765 Scale est. = 0.026551 n = 250
> library(ggplot2)
> set.seed(10)
> n = 250
> x = runif(n,0,5)
> y_model = 3*x/(1+2*x)
> y_obs = rnorm(n,y_model,0.1)
> data_plot = qplot(x, y_obs) +
geom_line(aes(y=y_model)) +
theme_bw()
> print(data_plot)
Family: gaussian
Link function: identity
Formula:
y_obs ~ s(x)
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.154422 0.006444 179.1 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Approximate significance of smooth terms:
edf Ref.df F p-value
s(x) 8.317 8.872 170.9 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
R-sq.(adj) = 0.859 Deviance explained = 86.3%
GCV score = 0.010784 Scale est. = 0.010382 n = 250
> n= 250
> x_test = runif(n,-5,5)
> y_test_fit = 4*dnorm(x_test)
> y_test_obs = rnorm(n,y_test_fit, 0.2)
Think of this as a plot of the effect (or contribution) of the smooth term.
Or, said differently, how the coefficient varies over the range of xi.
Model 1: y_test_obs ~ x_test
Model 2: y_test_obs ~ s(x_test) + x_test
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1 248.0 81.09
2 240.5 7.46 7.5012 73.629 < 2.2e-16 ***
Note that with GAMs the relationship between individual predictors and dependent variable is estimated by a non-linear smooth function:
g(y) = s(x1) + s(x2, x3) + ...
> summary(nested_gam_model_test)$s.table
Analysis of Deviance Table
Model 1: y_obs ~ x
Model 2: y_obs ~ s(x) + x
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1 248.00 6.5846
2 240.68 2.4988 7.3168 4.0858 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
As we will see in the next slide, the advantage of GAM over manual specification of non-linearities is that the optimal shape is determined automatically.
Fit a linear and smoothed GAM model to the relation between x_test and y_test_obs.
What is the effective degrees of freedom of the smoothed term? Determine if linearity is justified for this data.
edf Ref.df F p-value
s(x_test) 7.602145 8.029057 294.0944 0
Note: as opposed to one fixed coefficient (beta in linear regression), the smooth function can continually change over the range of xi
Yes, non-linearity is justified. The edf (effective degrees of freedom) is >> 1. (we will get back to this in the next section...)
Or once again using vis.gam:
Finally, we'll look at interactions between smooth terms (section 3.2):
> plot(smooth_interact,page=1)
Alternatively, we can use the vis.gam function to create a 2D plot
The 'by' argument lets you have a smooth term vary between different
levels of a factor (section 3.1):
> plot(categorical_interact,page=1)
We will use an ANOVA to test if the interaction term is necessary:
> anova(two_smooth_model,smooth_interact,test="Chisq")
> vis.gam(smooth_interact,view=c("x1","x2"),theta=40, n.grid=500, border=NA)
> vis.gam(categorical_interact,view=c("x2","x0"),theta=40,n.grid=500, border=NA)
> anova(two_smooth_model, categorical_interact,test="Chisq")
> smooth_interact = gam(y~x0+s(x1,x2),data= gam_data)
> smooth_interact_summary = summary(smooth_interact)
> print(smooth_interact_summary$s.table)
> categorical_interact = gam(y~x0+s(x1)+s(x2,by=x0),
data= gam_data)
> categorical_interact_summary = summary(categorical_interact)
> print(categorical_interact_summary$s.table)
There are two ways to include interactions between variables:
Analysis of Deviance Table
Model 1: y ~ x0 + s(x1) + s(x2)
Model 2: y ~ x0 + s(x1, x2)
Resid. Df Resid.Dev Df Deviance Pr(>Chi)
1 385.73 1839.5
2 370.08 1710.3 15.644 129.2 0.02804 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
edf Ref.df F p-value
s(x1,x2) 25.91803 28.42892 36.39357 9.315854e-165
Analysis of Deviance Table
Model 1: y ~ x0 + s(x1) + s(x2)
Model 2: y ~ x0 + s(x1) + s(x2, by = x0)
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1 385.73 1839.5
2 368.67 1740.9 17.06 98.608 0.2347
Start with a basic model, with one smoothed term (x1) and one categorical
predictor (x0, which has 4 levels) (section 2.2).
> plot(two_term_model)
Or a second smoothed term (section 2.4):
Now add a second, linear, term (section 2.3):
Let's plot the model to see the shape of the smoother:
> plot(two_smooth_model,page=1)
As before, we can use an ANOVA to test if the smoothed term is necessary (section 2.5):
> basic_model = gam(y~x0+s(x1), data= gam_data)
edf Ref.df F p-value
s(x1) 2.401350 2.996664 67.18567 6.325091e-38
s(x2):x01 6.606775 7.708292 21.40363 2.572371e-27
s(x2):x02 6.436261 7.571201 19.96801 5.469274e-25
s(x2):x03 5.467172 6.618716 28.73636 2.938870e-32
s(x2):x04 6.422564 7.574388 26.43148 2.092391e-33
> plot(basic_model)
GAMs make it easy to include both smooth and linear terms, multiple smoothed terms, and smoothed interactions. For this section, we'll be using a dataset automatically generated by mgcv (section 2.1):
> two_term_model = gam(y~x0+s(x1)+x2, data= gam_data)
> two_term_summary = summary(two_term_model)
> print(two_term_summary$p.table)
> two_smooth_model = gam(y~x0+s(x1)+s(x2), data= gam_data)
> two_smooth_summary = summary(two_smooth_model)
> print(two_smooth_summary$p.table)
From here we can look at the coefficient tables for this model:
> anova(basic_model,two_term_model,two_smooth_model, test="Chisq")
Visually, we saw that the shape of smooth terms were largely comparable among the four levels of x0. The anova test confirms this as well (Deviance = 98.6, p = 0.2347).
> basic_summary = summary(basic_model)
> print(basic_summary$p.table)
> gam_data = gamSim(eg=5)
> anova(two_smooth_model,three_term_model,test="Chisq")
We will revisit the interpretation of edf later in the workshop, but for now keep in mind that the higher the edf, the more non-linear is the smoothing spline.
That is, a high value (8–10 or higher) means that the curve is highly non-linear, whereas a smoother with 1 degree of freedom is a straight line.
In contrast, in OLS regression the model degrees of freedom is equivalent to the number of free parameters, p, in the model (and the residual df = n - p)
In our basic model the edf of smooth function s(x1) is ~2, which suggests a non-linear curve.
Estimate Std. Error t value Pr(>|t|)
(Intercept) 8.937862 0.2217506 40.305927 2.373755e-140
x02 2.008045 0.3137690 6.399756 4.518133e-10
x03 3.832496 0.3143049 12.193562 3.758930e-29
x04 6.041521 0.3145299 19.208098 3.520507e-58
Estimate Std. Error t value Pr(>|t|)
(Intercept) 11.400658 0.4177614 27.289879 9.396207e-93
x02 2.314405 0.4552138 5.084216 5.723467e-07
x03 4.487653 0.4543299 9.877520 1.063008e-20
x04 6.596149 0.4585778 14.383925 5.468771e-38
x2 -5.825948 0.5436671 -10.716021 1.114046e-23
Additive model + factor
> three_term_model <- gam(y~x0+s(x1)+s(x2)+x3, data=gam_data)
> three_smooth_model <- gam(y~x0+s(x1)+s(x2)+s(x3), data=gam_data)
> three_smooth_summary <- summary(three_smooth_model)
> print(three_smooth_summary$s.table)
> plot(three_smooth_model,page=1)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 8.550030 0.3655849 23.387258 1.717989e-76
x02 2.418682 0.5165515 4.682364 3.908046e-06
x03 4.486193 0.5156501 8.700072 9.124666e-17
x04 6.528518 0.5204234 12.544629 1.322632e-30
Create two new models, with x3 as a linear and smoothed term.
Use plots, coefficient tables and the anova() function to determine if x3 is an important term to include.
*plot(smooth_interact,page=1,scheme=1) gives a similar plot
> head(gam_data)
Analysis of Deviance Table
Model 1: y ~ x0 + s(x1)
Model 2: y ~ x0 + s(x1) + x2
Model 3: y ~ x0 + s(x1) + s(x2)
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1 394.08 5231.6
2 393.10 4051.3 0.97695 1180.2 < 2.2e-16 ***
3 385.73 1839.5 7.37288 2211.8 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Model 1: y ~ x0 + s(x1) + s(x2)
Model 2: y ~ x0 + s(x1) + s(x2) + x3
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1 385.73 1839.5
2 384.73 1839.3 0.99997 0.18818 0.8427
> print(two_smooth_summary$s.table)
> print(basic_summary$s.table)
> print(two_term_summary$s.table)
edf Ref.df F p-value
s(x1) 2.542296 3.170441 67.75922314 1.573017e-39
s(x2) 7.731424 8.584355 80.90188472 1.745178e-119
s(x3) 1.000000 1.000000 0.02039697 8.865087e-01
y x0 x1 x2 x3
1 4.723147 1 0.02573032 0.70706571 0.69248543
2 8.886671 2 0.83272144 0.84997218 0.88974095
3 11.196905 3 0.66302652 0.88025265 0.08469529
4 10.886068 4 0.11126873 0.80087554 0.15109792
5 12.270534 1 0.87969756 0.37692184 0.51467778
6 9.020910 2 0.12441532 0.05154493 0.86526950
edf Ref.df F p-value
s(x1) 1.923913 2.406719 42.84268 1.076396e-19
edf Ref.df F p-value
s(x1) 1.900864 2.377544 49.85908 1.938884e-22
edf Ref.df F p-value
s(x1) 2.546757 3.175726 68.08822 9.272491e-40
s(x2) 7.726989 8.582003 81.42428 2.845063e-120
The term x3 is not significant. It should be dropped from the model.
Here, the wiggleness of the smoothed term s(x1) is indicated by the edf parameter (effective degrees of freedom).
The edf = 1 therefore term is linear (no need for the smooth term). Let's next evaluate the significance of the linear relationship between y and x3...
We'll look at how we can predict y based on the other variables.
Website: http://qcbs.ca/wiki/r_workshop8
> plot(year_gam,page=1, scale=0)
Number of successes relative to failures as x1 increases.
Monthly data from the years 1920 through to 1940
Lets test this using a logistic GAMs (section 5.3).
Lastly, to help interpret the results, we could transform the summed effects back to proportions with the plot_smooth() function (section 5.4)
To model this, we need to use what's called a cyclical cubic spline, or "cc", to model month effects as well as a term for year (section 4.2):
seasonal term
(cyclic cubic spline)
trend term
Lastly, we will see the different ways this relationship could be represented graphically (section 5.4).
Smooth functions have many parameters that could change their behavior. The most often-used parameters are:
emptyPlot(range(gam_data3$x1), c(0,1), h=.5)
avg <- aggregate(prop ~ x1, data=gam_data3, mean, na.rm=TRUE)
lines(avg$x1, avg$prop, col="orange",lwd=2)
Recall that the model uses count data to calculate the logit, which is the log odds ratio between successes and failures:
- If successes = failures, the ratio is 1 and the logit is 0 (log(1) = 0).
- If successes > failures, the ratio is > 1 and the logit has a positive value (e.g., log(2) = 0.69).
- If successes < failures, the ratio is < 1 and the logit has a negative value (e.g., log(0.5) = -0.69).
To model a nonlinear smooth or surface, three different smooth functions are available:
Let's look at one place where knowing how to change the basis can help:
cyclical data.
That is, data where you want the predictor to match at the ends.
Imagine you have a time series of climate data, broken down by monthly measurement, and you want to see if there's a time-trend in yearly temperature.
We'll use the Nottingham temperature time series for this (section 4.1):
Fitted values, intercept included:
Equal amounts of S and F up to about ~400
> prop_model <- gam(prop~ s(x1), data=gam_data3, weights=total, family="binomial")
> prop_summary <- summary(prop_model)
> print(prop_summary$p.table)
> print(prop_summary$s.table)
> plot(prop_model)
We will now briefly look at how to implement other distribution families (e.g., Gamma, binomial, Poisson, negative binomial) to GAMs, using data where a binomial family distribution is needed (measuring the number of successes vs failures of a given event; section 5.1)
> year_gam = gam(nottem~s(nottem_year) + s(nottem_month, bs = "cc"))
> summary(year_gam)$s.table
Here, the intercept is positive, which means that on average there are more successes than failures.
We won't go too much further into this today, but you should know that you can expand on the basic model we've looked at today:
We'll first have a look at different smooth functions and changing the basis, followed by a quick intro on using other distributions and then mixed additive models (GAMMs)...
Would you say that this trend is linear or non-linear?
Contribution/ partial effect:
Over time log odds increases (successes increase, failures decrease).
edf Ref.df F p-value
s(nottem_year) 2.333879 2.906998 2.528043 0.06007933
s(nottem_month) 4.923943 8.000000 390.029032 0.00000000
s():
te():
ti():
k: number of ‘knots’ - determines upper bound of # of base functions used to build the curve (constrains wiggleness)
d - specifies predictors in an interaction that are on the same scale; only used in te() and ti()
bs - specifies the type of underlying base function
> gam_data3 <- read.csv("other_dist.csv")
> summary(gam_data3)
> str(gam_data3)
Logit
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.173978 0.02709613 43.32641 0
edf Ref.df Chi.sq p-value
s(x1) 4.591542 5.615235 798.9407 2.027751e-164
> data(nottem)
> n_years = length(nottem)/12
> nottem_month = rep(1:12, times= n_years)
> nottem_year = rep(1920:(1920+n_years-1),each=12)
> nottem_plot = qplot(nottem_month,nottem, colour=factor(nottem_year),
geom="line")
> print(nottem_plot)
smooth functions for the trend and seasonal features
'data.frame': 514 obs. of 4 variables:
$ prop : num 1 1 1 1 0 1 1 1 1 1 ...
$ total: int 4 20 20 18 18 18 20 20 20 20 ...
$ x1 : int 550 650 750 850 950 650 750 850 950 550 ...
$ fac : Factor w/ 4 levels "f1","f2","f3",..: 1 1 1 1 1 1 1 1 1 1 ...
What does the intercept represent in this model? What about the smooth term?
There is about 1-1.5 degree rise in temperature over the period, but within a given year there is about 20 degrees variation in temperature, on average.
The actual data vary around these values and that is the unexplained variance.
What do these plots tell us about successes versus failures?
As we already derived from the logit plot, we see that at around x1=400 the proportion of successes increases significantly above 0.5
> summary(gamm_intercept)$s.table
Here, if the random slope varied along x0, we would see different curves for each level:
Note that the random slope is static in this case:
Looks like a low-order AR model is needed, so we can fit two models; and AR(1) and AR(2) (section 6.2)
We will now include both a random intercept and slope (section 6.5).
Next we will run a model with a random slope (section 6.4).
Plot the summed effects for the x0 without random effects and for all four levels of fac:
Lastly, we will examine a model with a random smooth (section 6.6).
We will first examine a GAMM with just a random intercept.
As before, we will use gamSim() to automatically generate a dataset, this time with a random effect component generated, then run a model with a random intercept using fac as the random factor (section 6.3)
To start, we will revisit the Nottingham temperature model and examine the residuals using the (partial) autocorrelation function (section 6.1)
edf Ref.df F p-value
s(x0) 3.044725 3.776643 2.811258 2.835663e-02
s(fac) 2.960269 3.000000 95.146996 2.696335e-54
Plot the summed effects for the x0 without random effects, and for the four levels of fac:
plot(gamm_smooth, select=1)
Plot the summed effects for x0 without random effects, and for each level of fac:
plot(gamm_int_slope, select=3)
fac 4
As we saw in the section on changing basis, bs specifies the type of underlying base function.
For random intercepts and linear random slopes we use bs="re", but for random smooths we use bs="fs".
> gamm_smooth <- gam(y ~ s(x0)
+ s(x0, fac, bs="fs", m=1), data=gam_data2)
> summary(gamm_smooth)
> par(mfrow=c(1,2))
> acf(resid(year_gam), lag.max = 36, main = "ACF")
> pacf(resid(year_gam), lag.max = 36, main = "pACF")
> gamm_int_slope <- gam(y ~ s(x0) + s(fac, bs="re")
+ s(fac, x0, bs="re"), data=gam_data2)
> summary(gamm_int_slope)
Note the random intercept term in the summary table. You can view it as follows:
> gamm_slope <- gam(y ~ s(x0) + s(x0, fac, bs="re"), data=gam_data2)
> summary(gamm_slope)
fac 3
> gam_data2 <- gamSim(eg=6)
> year_gam <- gamm(nottem~s(nottem_year)+ s(nottem_month, bs="cc"))
> year_gam_AR1 <- gamm(nottem~s(nottem_year)+
s(nottem_month, bs="cc"),correlation = corARMA(form = ~ 1|
nottem_year, p = 1))
> year_gam_AR2 <- gamm(nottem~s(nottem_year)+
s(nottem_month, bs="cc"),correlation = > corARMA(form = ~ 1|
nottem_year, p = 2))
> anova(year_gam$lme,year_gam_AR1$lme,year_gam_AR2$lme)
fac 2
4 term additive + random effect
When observations are not independent, GAMs can be used to either incorporate:
Illustration of the principle behind cross validation
Here's a representation of a smooth function using a rank 5 cubic spline basis with knot locations at increments of 0.2:
And plot it:
The basis functions are each multiplied by a real valued parameter, βi, and are then summed to give the final curve f(x)
> head(gam_data)
plot_smooth(gamm_int_slope, view="x0", rm.ranef=TRUE, main="intercept + s(x0)",
rug=FALSE)
plot_smooth(gamm_int_slope, view="x0", cond=list(fac="1"), main="... + s(fac) + s(fac, x0)",
col='orange', ylim=c(7,22), rug=FALSE)
plot_smooth(gamm_int_slope, view="x0", cond=list(fac="2"), add=TRUE,
col='red', xpd=TRUE)
plot_smooth(gamm_int_slope, view="x0", cond=list(fac="3"), add=TRUE,
col='purple', xpd=TRUE)
plot_smooth(gamm_int_slope, view="x0", cond=list(fac="4"), add=TRUE,
col='turquoise', xpd=TRUE)
fac 1
par(mfrow=c(1,2), cex=1.1)
plot_smooth(gamm_intercept, view="x0",
rm.ranef=TRUE, main="intercept + s(x1)", rug=FALSE)
plot_smooth(gamm_intercept, view="x0", cond=list(fac="1"), main="... + s(fac)",
col='orange', ylim=c(8,21), rug=FALSE)
plot_smooth(gamm_intercept, view="x0", cond=list(fac="2"), add=TRUE,
col='red')
plot_smooth(gamm_intercept, view="x0", cond=list(fac="3"), add=TRUE,
col='purple')
plot_smooth(gamm_intercept, view="x0", cond=list(fac="4"), add=TRUE,
col='turquoise')
penalized regression spline fit to remaining data (º)
plot_smooth(gamm_slope, view="x0", rm.ranef=TRUE,
main="intercept + s(x0)", rug=FALSE)
plot_smooth(gamm_slope, view="x0", cond=list(fac="1"), main="... + s(fac)",
col='orange',ylim=c(7,22), rug=FALSE)
plot_smooth(gamm_slope, view="x0", cond=list(fac="2"), add=TRUE,
col='red')
plot_smooth(gamm_slope, view="x0", cond=list(fac="3"), add=TRUE,
col='purple')
plot_smooth(gamm_slope, view="x0", cond=list(fac="4"), add=TRUE,
col='turquoise')
omitted from fitting (•)
GCV function
Fitted model which minimizes the GCV score
A cubic spline is a curve constructed from sections of cubic polynomial joined together so that they are continuous in value
Model df AIC BIC logLik Test L.Ratio p-value
year_gam$lme 1 5 1109.908 1127.311 -549.9538
year_gam_AR1$lme 2 6 1101.218 1122.102 -544.6092 1 vs 2 10.68921 0.0011
year_gam_AR2$lme 3 7 1101.598 1125.962 -543.7988 2 vs 3 1.62082 0.2030
Suppose that f is believed to be a 4th order polynomial, so that the space of polynomials of order 4 and below contains f
A basis for this space
Three different types of random effects are distinguished when using GAMMs:
We will now take a few minutes to look at what GAMs are actually doing behind the scenes.
Lets first consider a model containing one smooth function of one covariate, xi:
If lambda is too high, then the data will be over smoothed, and if it is too low then the data will be under smoothed.
It would be good to choose lambda so that the predicted f is as close as possible to f. A suitable criterion might be to choose lambda to minimize:
Instead of controlling smoothness by altering the number of knots, we keep that fixed to size a little larger than reasonably necessary, and control the model’s smoothness by adding a “wiggleness” penalty.
So, rather than fitting the model by minimizing (as with OLS):
y x0 x1 x2 x3 [...] fac
1 4.925515 0.04879793 0.273053853 0.59921214 0.18689231 1
2 10.352629 0.73410097 0.312996866 0.99805171 0.03375583 2
3 13.527934 0.32637799 0.165864279 0.52143350 0.99192673 3
4 15.511255 0.33131391 0.009219346 0.07767554 0.61888402 4
5 11.455404 0.58911290 0.469944164 0.48423685 0.22028845 1
6 10.124126 0.04054477 0.009816532 0.42261209 0.28148881 2
> plot_smooth(gamm_smooth, view="x0", rm.ranef=TRUE,
main="intercept + s(x0)", rug=FALSE)
> plot_smooth(gamm_smooth, view="x0", cond=list(fac="1"),
main="... + s(x0, fac)", col='orange',
ylim=c(7,21), rug=FALSE)
> plot_smooth(gamm_smooth, view="x0", cond=list(fac="2"),
add=TRUE, col='red', xpd=TRUE)
> plot_smooth(gamm_smooth, view="x0", cond=list(fac="3"),
add=TRUE, col='purple', xpd=TRUE)
> plot_smooth(gamm_smooth, view="x0", cond=list(fac="4"),
add=TRUE, col='turquoise', xpd=TRUE)
spline
GCV score
The AR(1) provides a significant increase in fit over the naive model, but there is very little improvement in moving to the AR(2).
so that f(x) becomes
> gamm_intercept <- gam(y ~ s(x0) + s(fac, bs="re"), data=gam_data2)
How many degrees of freedom does a fitted GAM have?
Instead of providing the output of the cross-validation in terms of lambda (model complexity), GAM uses a term called the effective degrees of freedom (edf) - more intuitive for us to speak of df.
Because the number of free parameters in GAMs is difficult to define, the edf are related to lambda, such that the greater the penalty, the smaller the edf.
For example: if the arbitrarily large number of knots is k=10, then k-1 sets the upper limit on the df, which decreases as lambda increases to its best-fit penalty value (found by cross-validation).
There's a great deal more out there on GAMs... this was just the very surface. Could test for concurvity of variables. See this workshop on model selection:
http://distancesampling.org/workshops/duke-spatial-2015/practicals/3-advanced-dsms-solutions.html
Simon Wood, the author of the mgcv package has a very useful website with introductory talks and notes on how to use GAMs:
http://people.bath.ac.uk/sw283/mgcv/
He's also written a book, Generalized Additive Models: An Introduction with R, which we used as reference for this workshop.
Material from this workshop was obtained from the following fantastic blogs/ tutorials:
http://www.fromthebottomoftheheap.net/blog/
http://www.sfs.uni-tuebingen.de/~jvanrij/Tutorial/GAMM.html
http://www.sfs.uni-tuebingen.de/~jvanrij/LSA2015/AnswersLab2.html
Finally, the help pages, available through ?gam in R, are an excellent resource.
Finally, all of the previous mixed models can be compared using anova() as in the previous sections to determine the best fit model.
The ACF (and pACF) provide the cross correlation (and partial correlation) of a time series with itself at different time lags, and are used to identify after how many time steps observations start to be independent.
To estimate f, we must represent it in such a way that the equation above becomes a linear model. This can be done by choosing a basis, bi(x), defining the space of functions of which f is an element:
it could be fit by minimizing:
knots
Lambda
Scaled capacity
and the full model now becomes
Since f is unknown, M is estimated using a generalized cross validation technique that leaves out each datum from the data in turn and considers the average ability of models fitted to the remaining data to predict the left out datum.
The knots are evenly spaced through the range of observed x values. But, the choice of the degree of model smoothness is controlled by the the number of knots, which was arbitrary.
Fits many of the data poorly and does no better with the missing point.
Fits the underlying signal quite well, smoothing through the noise and the missing datum is reasonably
well predicted
Fits the noise as well as the signal and the extra variability induced causes it to predict the missing datum rather poorly.
By varying the βi we can vary the form of f(x) to produce any polynomial function of order 4 or lower
As, lambda goes to infinity, the model becomes linear...
Each section of cubic has different coefficients
The smooth term represents how the log odds of successes vs failures changes over x1. Since edf>1, the proportion of successes increases faster over time.
> plot(gamm_intercept, select=2)