### Present Remotely

Send the link below via email or IM

• Invited audience members will follow you as you navigate and present
• People invited to a presentation do not need a Prezi account
• This link expires 10 minutes after you close the presentation

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

# PS104 Statistics Overview

No description
by

## Michelle Kelly

on 2 December 2014

Report abuse

#### Transcript of PS104 Statistics Overview

Dr Michelle E. Kelly
PS104 Statistics Revision
Descriptive
Inferential
Central Tendency
Mean
Dispersion/ Variability
Variance
Characterises the spread or dispersion of scores
i.e. how spread out they are and how similar they are to the center of the data.
Usually the variance is reported in tables which summarise a variable along with other statistics such as the mean and range.

There is one problem with our variance formula in estimating the population variance

The sample variance tends to underestimate the population variance .... to correct for this, we calculate the variance estimate

Variance formula using N - 1
Standard Deviation
Normal Distribution
z
- scores
The most common measure of variability..
Standard deviation (SD) measures the amount of variation from the average.

When reporting mean – also report SD
Mean = 4 (
SD
= 2.58)
In everyday life many variables are normally distributed.

We use this information to make assumptions about the way our populations are distributed.

Many statistical tests are based on the assumption that data are normally distributed.
A z-score tells us the number of standard deviations which a particular score is above or below the mean of the set of scores .
Standardizing scores so they can be compared meaningfully.
Z-scores allow us to determine how each score compares to the other scores in a data set...
How well did one student perform in an English test compared to the other 50 students?
Which students came in the top 10% of the class?

x - mean
z =
_________
standard deviation
Standard Error
Sampling distribution

The standard deviation of the means of samples is the standard error of the mean:

The degree to which sample means deviate from the mean of your sample means
• Standard error of the mean = estimated standard deviation
---------------------------------
square root of sample size
Describe data
Make inferences about population based on sample
Parametric
Non - Parametric
Pearson's r
Independent t-test
Dependent t-test
Spearman's Rho
Mann Whitney U-test
Wilcoxon Matched Pairs Test
Chi Square Test
Measure of association between nominal variables
SCORE
DATA
1. Ranked
2. Category
Violate assumptions
Correlation
Relationship
Difference

Spearman’s Rho
- association between two sets of ordinally ranked data.

Categorical data are analysed using
Chi-square
(or
Fishers Exact Probability, Yates Correction, Odds Ratio)
.

Mann Whitney, Wilcoxon Matched Pairs
examine differences between ranked data.
Non-parametric statistics generally test hypotheses involving ordinal rankings of data or frequencies...
They make no assumptions about the populations or the shape of their distributions (unlike parametric tests).
‘Distribution free’
Non-Parametric Tests
Assumptions of parametric test violated...
The formulae for parametric tests involve calculations of means and standard deviations.
Parametric tests make assumptions about the characteristics of the populations from which samples are drawn.
'Powerful'
Parametric Tests
Pearson's
Product Moment Correlation Coefficient - relationship between variables
Independent t-test
- difference between two groups (sets of means)
Dependent t-test
- difference between related sets of means (time point 1 and 2)
Correlation
‘What is the relationship between stress and heart disease?’
Correlational studies
Want to know whether the variables vary together i.e. is there an association or correlation between them?
'Alcohol consumption and reaction time'
'Maths scores and music ability'
Pairs of values - want to determine if there is a positive or negative correlation between the two.
Correlation Coefficient
Pearson’s r
tells us two major pieces of information:

(1) How close the points on a scatterplot fit the best-fitting
straight line.
(2) Whether the slope of the scatterplot is positive or negative.

'Single numerical index'
'How strong is the relationship?'
Draw Scatterplot!
Interpreting and reporting results
-0.90 indicates a very strong negative relationship
Mathematical scores were significantly negatively correlated with musical scores, r(8) = -0.90, p<0.05.
Calculate Pearson’s r, note the sample size and consult the table to obtain the corresponding critical value.

Ignoring the sign, your value must be equal to or greater than the critical value to be statistically significant...

Identical to Pearson’s r except that instead of taking the scores directly from your data, the scores are ranked from smallest to largest.
Spearman’s rho is Pearson’s r calculated on ranked scores (it is the non-parametric counterpart of Pearson’s r)
Apply the same Pearson’s
r
formula!
Report and interpret the results the same way too (using Spearman's significance tables)!
Chi-square
measures the relationship/association or differences between two categorical variables.
Correlation coefficient - used to assess association between two variables measured on an ordinal or interval scale.
'Differences between observed and expected frequencies...'
Calculating expected frequencies
“ The chi-square value of 53.6 (
df
=3) was found to have an associated probability value of <0.05. Thus, we can accept that there is a significant difference between the observed and expected frequencies and we can conclude, that the four brands of chocolate are not equally popular. The table shows that more people (n=60) prefer Snickers to the other brands.
Alternatives to Chi-square
Combine categories
The Fisher Exact Probability Test (!)
Yates’ correction

The expected cell frequencies rule whereby no expected frequency should fall below 5 - can be addressed in three ways:
Phi Coefficient - standard 2x2
Cramers V - larger than 2x2
'Strength of association?'
Independent t-test
Comparisons between two groups of scores:
Each group of scores is obtained from two separate groups of individuals
‘unrelated’ or ‘between-participants’
A difference is statistically significant at a certain level (of
df
) only if the observed value of
t
= or e
xceeds the table value.
"There was no significant difference between times taken to sort into two as opposed to four piles, t (17) = 1.241, p > 0.05."
Random allocation of participants into experimental vs control.
Dependent
t
-test
'Related or within participants'
Scores are obtained from the same individuals but on two separate occasions / Matched pairs.
Compare scores at time 1 and 2
Non-parametric alternative to the independent t-test.
Assess whether a statistically significant difference exists between two independent samples of rank-ordered data.

It can be applied when: the data are ordinal; were randomly selected; and tied ranks are dealt with appropriately.
The Mann-Whitney U-test
Wilcoxon matched-pairs test
Examines whether a difference between 2 dependent samples of ordinal rankings is significant.
(1) The difference scores show a very asymmetrical distribution

(2) There are outliers in the difference scores (i.e. a small number of difference scores are very different from the majority).
T-Test Notes
Robust - guards against Type I Errors
Smaller standard error values tend to occur when sample sizes are large, two conditions that lead to larger
t
statistics.
The power of a t-test is influenced by: (a) the selected significance level (eg. p< 0.01 or p<0.05); (b) variability within the sample data; (c) the size of a sample(s); and (d) the magnitude of the difference between means.
Related designs have the distinct statistical advantage of reducing error variance, also need fewer participants.
Dependent
t
-test
Counterbalance to reduce risk of carryover effects.
In contrast to between-grps designs, related designs draw error variance from one source (e.g. one grp of participants) rather than two (e.g. two independent samples of participants)…..

This leads to a smaller standard error in the t-test which means there is a greater likelihood of rejecting the null hypothesis (i.e. more likely to obtain statistically significant result).
Error variance - differential behaviour of participants within the samples as well as experimental error.... Dependent designs reduce this!
Full transcript