**Chi**

Square

Presented By:

Siva Krishnamurthy Daniel Cohen

Alex Panagiotidis Michael Davies

What Is The Chi Square Test

How Do We Use Chi Square

Advantages And Disadvantages

Of Using Chi Square Test

O

= Original Value

E

= Estimated Value

In simple terms:

Chi Square is used to test whether or not the observed and estimated values are at all related

.

Definition

A measurement of how expectations compare to results. The data used in calculating a chi square statistic must be random, raw, mutually exclusive, drawn from independent variables and be drawn from a large enough sample.

Areas Chi Square Can Be Used

Actuarial Sciences – Study Of Risk Management

Bio-statistics – Statistics Used For Biological Purposes

Business Analytic – Used For Business Forecasting

Chemometrics – Statistics Used For Chemistry Purposes

Operations Research - Statistics Used To Determine Operations

Quality Control – Statistics Related To Quality And Reliability

General Relations Of Variances

The higher the value of chi square, the more variance there is between the

Original value & Estimated value

Chi Square = O - E

2

E

χ

2

(

)

There are two major forms for this test:

Goodness Of Fit Test:

Test Of Association:

This is a test which determines whether results from research are consistent with expected results from a hypothesis.

This is a test which determines the correlation between two of more different variables and is used to find out whether a change in one variable would affect another.

Chi Square is a more precise method of mathematically estimating future projections than regression & correlation

Once the surveying is completed we can begin our

Chi Square Testing.

A

"Level Of Significance"

has to be set.

In this example it is 5%

H0 (

the null hypothesis

): gender DOESN'T affect working conditions.

H1 (

the alternate hypothesis

) : gender DOES affect working conditions.

Level of Significance

df P = 0.05 P = 0.01 P = 0.001

1 3.84 6.64 10.83

2 5.99 9.21 13.82

3 7.82 11.35 16.27

4 9.49 13.28 18.47

5 11.07 15.09 20.52

6 12.59 16.81 22.46

7 14.07 18.48 24.32

8 15.51 20.09 26.13

9 16.92 21.67 27.88

10 18.31 23.21 29.59

11 19.68 24.73 31.26

12 21.03 26.22 32.91

13 22.36 27.69 34.53

Calculating Chi Square

What does this mean?

Example Question!

Now It's Your Turn!

Level of Significance

df P = 0.05 P = 0.01 P = 0.001

1 3.84 6.64 10.83

2 5.99 9.21 13.82

3 7.82 11.35 16.27

4 9.49 13.28 18.47

5 11.07 15.09 20.52

6 12.59 16.81 22.46

7 14.07 18.48 24.32

8 15.51 20.09 26.13

9 16.92 21.67 27.88

10 18.31 23.21 29.59

11 19.68 24.73 31.26

12 21.03 26.22 32.91

13 22.36 27.69 34.53

Class Activity

Chi Square Terminology

Null & Alternate Hypothesis

Level Of Significance

Degree Of Freedom

DF = (N Of Columns - 1) x (N Of Rows - 1)

o

o

E.g.

DF = (3-1) x (2-1)

DF = 2

Web Definition - A number representing the range of possibilities for movement.

Work out the experimental values and then apply the

"Chi Square Formula"

"You have been asked to write a report on the results of a questionnaire among the employees of a multinational computer manufacturer concerning pay & conditions"

The level of significance can be described as the chance of making an error in the test

.

The survey results are based on the replies of 400 selected employees relating gender to the question - "What is the most important thing to me about my work?"

The Survey:

The Observed Results

To obtain the estimated values:

E= RT X CT

GT

RT = Row Total

CT = Column Total

GT = Grand Total

Asymmetry Of Proof

The Survey:

The Estimated Results

Some disadvantages include:

First you need to determine how your data will be collected to ensure that it is unbiased and from a large enough sample.

Original Entry

H0 : the degree classification at City and Kingston universities are the same

H1 : the degree classification at City and Kingston universities are not the same

Null and Alternate

Hypothesis

Estimated Entry

Chi Square

Since the test stat < critical stat the null hypothesis is accepted as there is no change in the degree classification between the two universities

Class Activity Findings

This Percentage generally tends to be

5% (0.05), 1% (0.01) or 0.1% (0.001).

The most common level of significance used by most scientists and statisticians would be 5%.

The level of significance table would be used to help guide us to confirm either the null or alternate hypothesis.

The table would look something like this

This will be explained later, don't worry!

Overall, helping to predict variables which may occur in the future,

or

to investigate potential occurrences of one set of variables when another one changes.

Our Hypothesis

Level Of Significance

We have 95% confidence in this test which would mean our level of significance is 5% (0.05)

(as long as we are clarified of it not being biased data)

x 300

400

113

RT

CT

GT

Now To Apply The Chi Square Formula

(

)

By stating a hypothesis and applying chi square to the data we can test if this survey has a degree of authority and is not a result of randomness.

Excited?

This can also be known as the test statistic and is the value which will be related to the level of significance table in order to decide whether our null hypothesis should be accepted or rejected.

This is the calculated chi square value, if the value is high then it means there is a big difference between the O and E values.

In order to test this against our level of significance table we must first calculate the degree of freedom.

df =(4-1) x (2-1)

Therefore our df = 3

So, what does this tell us?

Chi Square/ = 31.99

Test Value

Critical Value at df 3

Level of Significance Table/ = 7.82

Because chi square value is higher than our critical stat, the null hypothesis (H ) is rejected. Therefore it can be stated that the findings of this test show gender does affect working conditions.

0

**ANY**

**QUESTIONS?**

Data must be numerical values only - no percentages

Data must be in groups and of a reasonably high sample size

Not always easy to determine whether data is biased or not

If there is only one degree of freedom & have the significance level at 0.01 or 0.001, the results would not be as reliable

You are provided with data collected of degree classifications from two different universities.

Your Job:

Calculate/ Answer The Following: -

The Estimated Values

The Null & Alternate Hypothesis

Chi Square/ Test Statistic

Compare the chi square value with the critical value from the level of significance (at 5%) table

Degree Of Freedom

Think about what the null hypothesis may be AND identify the degree of freedom

Next, find out the estimated values

Observed Entry

It's time to calculate chi square...

df = 3

Test statistic = 0.156

Test Stat = 0.156

Critical Stat = 7.82

SO...

This is when it is difficult to prove a general statement true.

For example

"All penguins can't fly"

This statement can never really be proved since you cannot observe every penguin to disprove this general statement

Therefore we create a null and alternate hypothesis

All penguins can't fly

Some penguins can fly

Null

Alternate

H

0

H

1

H0 if test stat < critical stat = reject

H1 if test stat >= critical stat = accept