**Correlation**

**and**

**Chi-Square**

**By:**

Julie Bollinger

Chad Cross

Courtney McArthur

Julie Bollinger

Chad Cross

Courtney McArthur

Lay Definition

When and Why?

Lay Definition

Examples

Technical Definition

In-Depth Example

User-Friendly Guide

When and Why?

Examples

In-Depth Example

User-Friendly Guide

Technical Definition

Used to determine whether the breakdown of students in specific groups matches the total population of students we would expect to fall into those groups.

Determines if differences occur by chance or have meaning behind them.

Can use category data (male, female, races, etc.)

Chi-square test can be used to assess if any population of students is under-represented in specific categories compared to other populations

The chi-square statistic is used to estimate the probability that the observed pattern of results matches an expected pattern of results by chance.

Reflects the overall lack of fit between the expected and observed frequencies: Sum, over all of the categories or cells, of the squared difference between observed and expected frequencies divided by the expected frequency.

χx2 = chi-square

∑=sum of

O= observed frequency E=expected frequency

χr = correlation coefficient

∑=sum of

x= variable 1

y=variable 2

N= total number of subjects

Correlation shows the strength of the relationship between two variables. A correlation also has a “correlation coefficient,” which is the number that tells us about the strength and direction of the relationship.

+ correlation= as one variable increases the other increases or as one variable decreases the other decreases as well

- correlation= one variable increases while the other decreases

0 correlation= no correlation between variables

Correlation Coefficient Strength of Relationship

0 No Correlation

-0.2 to 0.2 Very Weak, Very Low Correlation -0.4 to -0.2 / 0.2 to 0.4 Weak, Low Correlation

-0.7 to -0.4 / 0.4 to 0.7 Moderate Correlation

-0.9 to -0.7 / 0.7 to 0.9 Strong, High Correlation

-1.0 to -0.9 / 0.9 to 1.0 Very Strong Correlation

School counselors can determine if there is a relationship between two quantifiable variables.

By using a scatter plot diagram, the variables are mapped to show the relationship of positive, negative, or none, based on the grouping and pattern of the data collected.

Correlation does not imply causation and there may be additional variables in the relationship.

The correlation coefficient (r) is the measure of degree of correlation ranging from -1 (a perfect negative linear correlation) through 0 (no correlation) to +1 (a perfect positive correlation).

Most correlational relationships are linear.

A school counselor is questioning if there is a relationship between Number of Office Referrals and Grade Point Averages. The counselor calculated the correlational coefficient by using Excel and found there is a Negative Correlation (-0.91) between Number of Office Referrals and Grade Point Averages. This shows that students who have more office referrals tend to have lower GPAs. This does not mean that being referred to the office caused lower GPAs, but it does indicate there is a negative relationship between the two variables.

1. Have our school demographics changed in the last 5 years? (Based on various yearly demographics)

2. Are we preparing half of our students in our Math classes to exceed the national Math requirements? We found that one male and four females met, while we expected five males out 10 total and five females out of 10 total to exceed the physical education requirements. Is this significant or did by happen by chance?

3. The chi-square model can also be used to compare the expected frequency and observed frequency of office referrals based on gender, ethnicity, etc.

4. The chi-square model may also allow school counselors to see if there are statistically significant differences in attendance by race, gender, GPA, etc.

Calculating the Correlation coefficient using Microsoft Excel

To label your chart title; click: “Chart Tools” > “Layout” > “Chart Title.”

Step 5

Step 4

To label your x and y axis; click: “Chart Tools” > “Layout” > “Axis Titles.”

Step 1

Enter the pair scores for each subject in your spreadsheet.

This is what your final correlation Excel spreadsheet should look like.

Step 7

You can also add a trend line to indicate the relationship between two variables.

Step 6

Step 3

Press “Enter”

This gives you the correlation coefficient. The correlation coefficient should be between -1 and 1.

The example “-0.91” indicates a strong negative correlation between the number of office referrals and the GPA; indicating that when office referrals is higher, GPA is lower for students.

Step 2

Click in the cell where you want to do your calculation. Type:

“=CORREL(“

Highlight your first column of data without the title and insert a comma.

Highlight the second column of data without the title and insert a parenthesis to close the calculation.

As a school counselor we look at the population of girls vs. boys in a school. Imagining we have an equal number of girls vs. boys with a sample of 50 students, we would expect there to be 25 girls, 25 boys. However, in our experimental test there were 30 girls and 20 boys.....

x^2 =(30-25)^2 + (20-25)^2

25 25 =2

Null Hypothesis: No difference in the gender population of the sample

Test Hypothesis: There IS a difference in the probability that a girl or boy would be chosen in the sample

Degree of Freedom: 1 (because only one category can vary due to only 2 genders)

P-value of 0.05 or less is usually regarded as statistically significant, i.e. the observed frequency is significant. If the probability is significant, the outcome did not happen by chance.

In this example, because the p value is between .10 and .20, we determine that the result is not significant. Therefore, we accept the null hypothesis and consider our sample within the range of what we would expect for a 25/25 ratio.

Calculating Chi-Square using Microsoft Excel

By completing the Chi-Square formula you will gain the significance value.

If the value is greater than .05 then the results are not significant. In our example, the results are not significant because the number observed does not differ significantly from the number expected.

Step 8

Finally, complete you Chi-Square test. In order to calculate the chi-square type:

=CHISQ.TEST(

Highlight your observed and expected frequencies without the totals

Close the parenthesis

Be sure to indicate your degree of freedom.

Step 7

Next calculate the sum of the expected frequencies.

Step 6

Calculate the sum of your observed table for both male and female answers.

To calculate the sum:

=SUM (highlight the cells you want to add together)

Step 5

To calculate Chi-Square, first you must find the observed frequency and the expected frequency. To calculate the frequency type:

=COUNTIFS(

Highlight your data

Type the criteria of your calculation

Ex: We only wanted the calculation to count the Males for the observed gender so we typed “M”

Step 3

Create two tables to do your calculations for the frequencies and Chi-Square.

Step 2

Enter your observed and expected data into a spreadsheet.

Step 1

Complete the calculations for both the observed and expected, male and female frequencies.

Step 4

Presenting the Correlation Data

When presenting your data to an audience you can use the scatter plot created in Microsoft Excel to illustrate the relationship between two variables. Each point on the graph indicates a student’s GPA and the number of office referrals he/she had for that year. The trend line will indicate the strength of your correlation. Our scatter plot indicates that we had a strong negative correlation because we had a correlation coefficient of -0.95.

Presenting the Chi-Square Data

Here is an example of what our Chi-Square analysis would like in a bar graph

An easy way to present the data for a Chi Square analysis is through a bar graph. This way the audience can see the observed and expected data easily and compare them visually. If you do choose to present your data this way make sure that you understand your graph and the significance and are able to explain it to your audience clearly. Just because the data in a bar looks like there is significance does not mean there actually is any significance in the data, making the Chi Square calculations very important.

**Hands-on Activity**

**Hands-on Activity**

Variable 1 Variable 2

Size of School Graduation Rate

Attendance GPA

Free & Reduced Lunch Office Referrals