**Simple Regression and Multiple Regression**

**Simple/Multiple Regression**

Normally, in Statistics we use Simple Linear Regression/Multiple Linear Regression technique when our dependent variable is quantitative variable and independent variable(s) is(are) quantitative variable(s).

Besides, Multiple Regression can be used when our dependent variable is quantitative but our independent variables (at least 2) are nominal/ordinal.

**Multiple Regression**

Multiple Linear Regression is an approach to modeling the relationship between a scalar dependent variable y and more than one explanatory variables denoted X.

Linear regression models are often fitted using the least squares approach, but they may also be fitted in other ways, such as by minimizing the "lack of fit" in some other norm, or by minimizing a penalized version of the least squares loss function as in ridge regression.

Multiple Linear Regression (cont.)

Outline

Simple Linear Regression

Multiple Linear Regression

To fit the Regression Line

Suppose there are n data points {yi, xi}, where i = 1, 2, …, n. The goal is to find the equation of the straight line which would provide a "best" fit for the data points.

the least-squares approach: such a line that minimizes the sum of squared residuals of the linear regression model. In other words, numbers alpha (the y-intercept) and beta (the slope) solve the following minimization problem:

To fit the Regression Line(Cont.)

Test ob Beta using t-Statistics

it can simply expand to get a quadratic in α

(alpha) and β (beta) , it can be shown that the values of alphaαand beta that minimize the objective function are

To Fit The Regression Lines

Given a data set of n statistical units, a linear regression model assumes that the relationship between the dependent variable yi and the p-vector of regressors xi is linear.

This relationship is modelled through a disturbance term or error variable — an unobserved random variable that adds noise to the linear relationship between the dependent variable and regressors. Thus the model takes the form

Often these n equations are stacked together and written in vector form as

where

Assumptions

Normality

Linearity

Constant Variance

Independence

Lack of Multicolinearity

Linearity

the mean of the response variable is a linear combination of the parameters and the predictor variables

Linearity Assumption: only a restriction on the parameters

Constant Variance

different response variables have the same variance in their errors, regardless of the values of the predictor variables

to determine for heterogeneous error variance, or when a pattern of residuals violates model assumptions of homoscedasticity, it is prudent to look for a "fanning effect" between residual error and predicted values

error will not be evenly distributed across the regression line

To Fit Regression Lines(cont.)

From the function mentioned before, there are estimation methods to estimate parameters.

These methods differ in computational simplicity of algorithms, presence of a closed-form solution, robustness with respect to heavy-tailed distributions, and theoretical assumptions needed to validate desirable statistical properties such as consistency and asymptotic efficiency.

Least Square Estimation

1.The OLS method minimizes the sum of squared residuals, and leads to a closed-form expression for the estimated value of the unknown parameter beta :

The estimator is unbiased and consistent if the errors have finite variance and are uncorrelated with the regressors

2.GLS is an extension of the OLS method, that allows efficient estimation of beta when either heteroscedasticity and/or correlations are/is present among the error terms of the model, as long as the form of heteroscedasticity and correlation is known independently of the data. GLS minimizes a weighted analogue to the sum of squared residuals from OLS regression.

This special case of GLS is called "weighted least squares".

The GLS solution to estimation problem is

3.other techniques i.e. IRLS, IV, TLS,

Optimal Instrument Regression etc.

Maximum-Likelihood Estimation and Related Techniques

1.Maximum-Likelihood Estimation: is performed when the distribution of the error terms is known to belong to certain parametric family ƒ(theta) of probability distributions. When ,

the resulting estimate is identical to the OLS estimate. GLS estimates are maximum likelihood estimates when error follows a multivariate normal distribution with a known covariance matrix.

2.other techniques are Ridge Regression, LAD, and Adaptive Estimation.

Other Techniques in Estimation

Bayesian linear regression

Quantile regression

Mixed models

Principal Component Regression (PCR)

Least-angle regression

The Theil–Sen estimator

etc.

Independence of Error

this assumes that the errors of the response variables are uncorrelated with each other

Lack of Multicolinearity

standard least squares estimation methods, the design matrix X must have full column rank p; otherwise, there is a condition known as multicollinearity in the predictor variables

it can also happen if there is too little data available compared to the number of parameters to be estimated

Linear regression is widely used in biological, behavioral and social sciences to describe possible relationships between variables. It ranks as one of the most important tools used in these disciplines.

Under the Hypothesis

Test Statistic

To test hypothesis by

Reject H0 when

Correlation

The Pearson correlation is +1 in the case of a perfect positive linear relationship, −1 in the case of a perfect decreasing linear relationship, and some value between −1 and 1 in all other cases, indicating the degree of linear dependence between the variables. As it approaches zero there is less of a relationship. The closer the coefficient is to either −1 or 1, the stronger the correlation between the variables.

If the variables are independent, Pearson's correlation coefficient is 0, but the converse is not true because the

correlation coefficient detects only linear dependencies

between two variables.

Beta test Overall and Each Beta

Under the Hypothesis

Test Statistic

To test hypothesis by

Reject H0 when

For Overall test

For individual Beta

Assumption

- Independence

- Normality

- Homoscedasticity

- Linearity

**Simple Linear Regression**

In statistics, Simple Linear Regression is the least squares estimator of a linear regression model with a single explanatory variable. In other words, simple linear regression fits a straight line through the set of n points in such a way that makes the sum of squared residuals of the model as small as possible.

Simple Linear Regression (cont)

Regression Model

Scale

Independent Variables

Dependent Variables

O , N , Scale

Restriction

1.Extrapolation

2.Inverse prediction

Under the Hypothesis

Test Statistic

To test hypothesis by

Reject H0 when

+e