**Regression**

Assumptions for Regression Analysis

Regression Line

In other words, the regression line is the straight line that best describes how a response variable Y changes as an explantory variable X changes. It allows us to predict the value of y for a given value of x.

Coefficient of Determination

Correlation vs Regression:

"

Correlation

are useful for describing the relative strength of a relationship between two variables" (Portney & Watkins, 2009, p. 539) but NO causation can be drawn. As you will learn in this lecture,

regression

are very similar to correlation, however their purpose is to determine to what degree one variable

predicts

(in other words can cause) another variable. "The ability to predict outcomes ... is crucial to effective clinical decision making and goal setting" (Portney & Watkins, 2009, p. 539). For example, it is useful to know that "early language and nonverbal skills have been shown to be important

predictors

of outcome in adaptive behavior in communication and socialization for children with autism (Portney & Watkins, 2009, p. 539) as it may guide the focus on intervention.

There are different types of regression for different purposes and different types of data. These are:

- Linear regression

- Non-linear regression

- Multiple regression

- Logistic regression

Types of Regression Analysis

Linear Regression

"Linear regression involves the examination of two variable, X and Y, that are linearly related" (Portney & Watkins, 2009, p. 539).

The two variables MUST be LINEARLY related underwise a different analysis must be conducted.

But wait, wasn't this just like a correlation scatterplot! I'm confused!!!

The visual representation of a correlation and a linear regression is the same BUT the mathematical computation is different. In a correlation we are simply looking at the degree of association between two variables (to what degree they go up and down together). In a regression we are estimating the degree to which one variable predict the other.

The regression analysis "allows us to find the one line that best described the orientation of all data points in the scatter plot" (Portney & Watkins, 209, p. 542). This is the regression line.

As you can see on the graph, even with the "line of best fit", not all points will fit directly on the line. The line of best fit does not perfectly predict the relationship between the variable for each of the data points. This line is the unique one that will minimize the error component called

residual

.

So, there is inherent error in the regression line drawn from a sample of data. There is also inherent error when we draw conclusions from a sample's regression to the whole population. "We recognize that the straight line we fit to [the] sample data is only an approximation of the true regression line that exists for the underlying population" (Portney & Watkins, 2009, p. 547). If our sample was drawn at random from the whole population, and if our sample is sufficiently large, we minimize this error, however, if our sample selection was biased, it may not be possible to extrapolate our sample prediction to the whole population

When a regression analysis is computed we obtain a number of different information. The first thing we obtain is an r value, just like in a correlation. Regression r value are challenging to interprete for lay reader of research, however with regression, R-square value, called the coefficient of determination, are also computed. R-square tells us the percentage by which one variable explains the other variable. This is much easier to interprete. Watch this short uTube video for a more thorough explanation.

Another example, could be a study that attempts to predict quality of life from recreational participation. After establishing that the variables are linearly related, a linear regression is computed and we find an r=.45 and a r-square = 0.20. This indicates that 20% of the people's quality of life is explained by their recreational participation.

p-value

I like R-square... It is so easy to understand. Can I calculate R-square when I am interpreting correlations? Unfortunately

No

because correlations and regression are not computed in the same way, in a correltion we are not able to explain one variable from the other.

Another piece of information that is obtained from computing a regression is the p-value. Like for correlation, the p-value only tells you whether the r and r-square value you obtained were likely to have occured by chance or not. The do no provide information about the strength of the prediction between variables

Non-linear regression

So can you guess what distinguishes non-linear regression from linear regression...

A non-linear regression is used when the data, once graphed on a scatter plot, is not presenting in a linear way. The interpretation of a non-linear regression results are the same as for a linear regression

Multiple Regression

"Multiple regression is an extension of simple linear regression analysis" (Portney and Watkins, 2009, p.686). It allows the prediction of dependent variable Y from a set of several independent variables X1, X2, etc. The dependent variable must be a continuous measure where as the independent variables can be continuous and categorical.

Logistic Regression

A logistic regression is used when the outcome (dependent) variable is categorigal such as having a diagnosis of ASD (or not), or having a learning disability (or not). The independent variable can be continuous, ordinal or categorical. "In logistic regression, rather than predicting the value of an outcome variable, we are actually predicting the probability of an event occuring. Using the regression equation, we determine if the independent variables can predict whether an individual is likely to belong to" one of the two groups (Portney & Watkins, 2009, p. 696). For example, a study may want to predict whether children born at 28 weeks gestation are more likely to be diagnosed as have a learning disability at age 5 (or not).

The End!