**2.8 Goodness-of-fit**

2.5 OLS Fitted Values and Residuals

II. Mechanics and interpretation of ordinary least Squares

I. Motivation for Multiple Regression

1.1 The model with two independent variables

2.3 On the meaning of ''Holding other factors Fixed"" in Multiple Regression

2.1 Obtaining the OLS Estimates

**Chapter 3**

Multiple Regression Analysis: Estimation

Multiple Regression Analysis: Estimation

Multiple regression analysis

n

(Yi - Bo - 6B1Xi1 - B2Xi2

i=1 (3.10)

I. Motivation for Multiple Regression

II. Mechanics and Interpretation of ordinary Least Squares

III. The expected Value of the OLS Estimation

IV. The Variance of the OLS Estimators

V. Efficiency of OLS: The Gauss-Markov Theorem

The multiple regression model is still the most widely used vehicle for empirical analysis in economics and other social sciences. Likewise, the method of ordinary least squares is popularly used for estimating the parameters of the multiple regression model.

we now summarize some computational and algebraic feature of the method of ordinary least squares as it applies to a particular set of data. we also discuss to interpret the estimated equation.

we first consider estimating the model with two independent variables. The estimated OLS equation is written in a form similar to the simple regression case:

^Y = ^Bo + ^B1X1 + ^B2X2,

Where ^Bo = the estimate of Bo

^B1 = the estimate of B1

^B2 = the estimate of B2

To understand what OLS is doing, it is important to master the meaning of the indexing of the independent variables in (3.10). The independent variables have two subscripts here, i followed by either 1 or 2. The i subscript refers to the observation number. so the sum in (3.10) is over all i = 1 to n observations.

The second index is simply a method of distinguishing between different independent variable.

The partial effect interpretation of slope coefficients in multiple regression analysis can cause some confusion, so we provide a further discussion now.

The power of multiple regression analysis is that is allows us to do in nonexperimental environments what natural scientists are able to do in a controlled laboratory setting: keep other factors fixed.

In ancient times, the apple was viewed as a sacred fruit that could be used to predict the future. It was believed that the first person to pluck an apple from the water-filled bucket without using their

hands would be the first to marry.

Bobbing for Apples

If the bobber lucked out and caught an apple on the first try, it meant that they would experience true love, while those who got an apple after many tries would not be so lucky.

After obtaining the OLS regression line (3.11), we can obtain a fitted or predicted value for each observation. for observation i, the fitted value is simply

^Yi=^B0+^B1Xi1+^B2Xi2+...+^BkXik,

which is just the predicted value obtained by plugging the values of the independent variable for observation i into equation (3.11). we should not for get about the intercept in obtaining the fitted value; otherwise the answer can be very misleading.

Sometimes, we want to change more than one independent variable at the same time to find the resulting effect on the dependent variable.

we begin with the case of two independent variables:

Y = Bo + B1X1 + B2X2

More important than details underlying the computation of the ^Bj is the interpretation of the estimate equation.

As with simple regression, we can define the total sum of squares (SST), the explained sum of squares (SSE), and the residual sum of squares or sum of squared residuals (SSR).

The fact that R2 never decreases when any variable is added to a regression makes it a poor tool for deciding whether one variable or several variables should be add to the model. the factor that should determine whether an explanatory variable belong in a model is whether the explanatory variable has a nonzero partial effect on y in the population.

2.4 Changing more than one independent Variable Simultaneously

**Introductory Econometrics**

**Professor Dr Soobong Uh**

**By: Miss Photchamane PHENGRATTANAVONH (Jean)**

11.11.2013

11.11.2013

contents

is more amenable to ceteris paribus analysis because is allows us to explicitly control for many other factor that simultaneously affect the dependent variable. this is important both for testing economic theories and for evaluating policy effect when we must rely on nonexperimental data. because multiple regression models can accommodate many explanatory variable that maybe correlated, we can hope to infer causality in cases where simple regression analysis would be misleading.

The first example is a simple variation of a wage equation introduced in chapter 2 or obtaining the effect of education on hourly wage:

wage = Bo + B1edu + B2exper + u

As a second example, consider the problem of explaining the effect of per student spending on the average standardized test score at the high school level.Suppose that the average test score depends on funding, average family income, and other unobservable:

avgscore = Bo + B1expend + B2avginc + u

1.2 The model with k independent variable

The general multiple linear regression model can be written in the population as:

y = Bo + B1X1 + B2X2 + B3X3 + ... + BkXk + u,

where

Bo is the intercept.

B1 is the parameter associated with X1.

B2 is the parameter associated with X2,

and so on.

terminology for multiple regression

Dependent variable

No matter how many explanatory variable we include in our model, there will always be factors we cannot include, and these are collectively contained in u.

y X1,X2,...,Xk

Independence variable

Explained variable

Explanatory variables

Response variable

predicted variable

Regressand

control variablse

predictor variables

Regressors

2.9 Regression through the origin

Sometime, an economic theory or common sense suggests that Bo should be zero, and so we should briefly mention OLS estimation when the intercept is zero.

SUMMARY

Assumption MLR.3

(No perfect Collinearity)

3.3 Omitted Variable Bias: more General Cases

Deriving the sign of omitted variable bias when there are multiple regressors in the estimated model is more difficult. we must remember that correlation between a single explanatory variable and the error generally results in all OLS estimators being biased.

**IV. The variance of the OLS**

The Linear Relationships among the independent variables, R2j

**Multiple regression analysis is also useful for generalizing functional relationships between variables.**

**cons=Bo+B1inc+B2inc2+u**

Once we are in the context of multiple regression, there is no need to stop with two independent variables. Multiple regression analysis allows many observed factors to affect y. In the wage example, we might also include amount of job training, years of tenure with the current employer, measures of ability, and even demographic variables like the number of siblings or mother's education. In the school funding example, additional variables might include measures of teacher quality and school size.

1.2 The model with K Independent variables

The terminology for multiple regression is similar to that for simple regression,and is given in table 3.1 just as in simple regression, the variable u is the error term or disturbance. It contains factors other than X1,X2,...,Xk that effect y.

**2.6 A ""partialling Out"" Interpretation of Multiple Regression**

When applying LOS, we do not need to know explicit formulas for the ^bj that solve the system of equation in (3.13). nevertheless, for certain derivations, we do need explicit formulas for the ^Bj. These formulas also shed further light on the workings of OLS.

Two special cases exist in which the simple regression of Y on X1 will produce the same OLS estimate on X1 as the regression of Y on X1 and X2.

2.7 comparison of simple and Multiple Regression Estimates

1. Simple regression of y on X1 as ~y=~Bo+~B1X1

2. multiple regression as

^Y+^Bo+^B1X1+^B2X2.

comparison between simple and multiple regression:

~B1 = ^B + ^B2 1.

III. The Expected value of the OLS Estimators

We now turn on the statistical properties of OLS for estimating the parameters in an under lying population model. in this section, we derive the expected value of the OLS estimators.

In particular, we state and discus for assumption, which are direct extension of the simple regression model assumptions, under which the OLS estimators are unbiased for the population parameters. we also explicitly obtain the bias in OLS when an important variable has been omitted from the regression.

The first assumption we make simply defines the multiple linear regression (MLR) model.

Assumption MLR.1 (linear in parameters)

Assumption MLR.2 (Random Sampling)

The model in the population can be written as

Y=Bo+B1X1+B2X2+...+BkXk+u,

where Bo,B1,...Bk are the unknown parameters of interest and u is and unobservable random error or disturbance term.

we have a random sample of n observations,

(Xi1,Xi2,...,Xik,Yi): i = 1,2,..., n , following the population model in assumption MLR.1.

In the sample, none of the independent variables is constant, and there are no exact linear relationships among the independent variables.

Assumption MLR.4

(Zero Denominational Mean)

Assumption MLR.5

(Homoskedasticity)

The error u has an expected value of zero given any values of the independent variables. In other words.

E(uIX1,X2,...Xk) = 0.

The error u has the same variance given any value of the explanatory variables. In other words.

Var(uIX1,,...Xk) = Q2.

3.1 Including Irrelevant Variables in a regression model

One issue that we can dispense with fairly quickly is that of inclusion of an irrelevant variable or overspecifying the model in multiple regression analysis. This mean that one of the independence variable is included in the model even though is has no partial effect on y in the population.

SUMMARY

The multiple regression model allows us to effectively hold other factors fixed while examining the effects of a particular independent variable on the dependent variable. It explicitly allows the independent variables to be correlated.

although the model is linear in its parameters, it can be used to model nonlinear relationships by appropriately choosing the dependent and independent variable.

The method of ordinary least squares is easily applied to estimate the multiple regression model. Each slope estimate measures the partial effect of the corresponding independent variable on the dependent variable, holding all other independent variables fixed.

R2 is the proportion of the sample variation in the dependent variable explained by the independent variables, and it serves as a goodness-of-fit-measure. it is important not to put too much weight on the value of R2 when evaluating econometric models.

Under the first for Guass-markov assumption assumptions, the OLS estimators are unbiased. this implies that including an irrelevant variable in a model has no effect on the unbiasednedd of the intercept and other slope estimators. on the other hand, omitting a relevant variable causes OLS to be biased. In many circumstances, the direction of the bias can be determined.

Addinf an irrelevant variable to an equation generally increases the variances of the remaining OLS estimators because of multicollinearity.

3.2 Omitted variable Bias: The Simple Case.

Now suppose that rather then including an irrelevant variable, we omit a variable that actually belongs in the true model. this is often called the problem of excluding a relevant variable or underspecifying the model.

Summary of bias in B1 when X2 is omitted in estimating

B2>o

B2<o

Corr(X1,X2)>o

Positive bias

Corr(X1,X2)<o

negative bias

Positive bias

negative bias

The term R2j in equation is the most difficult of the three components to understand. this term does not appear in simple regression analysis because there is only one independent variable in such case.

We now obtain the variance of the OLS estimators so that, in addition to knowing the central tendencies of the ^Bj, we also have measure of the spread in its sampling distribution. before finding the variances, we add a homoskedaticity assumption, as in chapter 2.

we begin with some simple example to show how multiple regression analysis can be used to solve problems that cannot solved by simple regression.