Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

Nonexperimental Designs pt 2

No description
by

Micheal Sandbank

on 22 February 2014

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Nonexperimental Designs pt 2

Correlation
Last week we discussed intact group designs,
which investigate the relationship between
a distinguishing construct of interest (which
categorically divides a population) and a
dependent variable, which can be either
categorical or continuous.
For example
Do parents of children with autism
exhibit more stress than parents of children
with down syndrome?
Distinguishing Construct: Diagnosis
Dependent Variable: Valid and Reliable Stress Index
Does this research involve experimental manipulation?
Because participants existed in their groups prior to our investigation (rather than being randomly assigned), we often can't establish temporal precedence of the "IV", and we can't control for the influence of third variables, SO WE CAN
NEVER MAKE CAUSAL INFERENCES WITH THIS DESIGN.
Similarly, we can investigate
the relationship between two
continuous variables using a
correlational design.
For this design we need:
a population of interest
at least 2 continuous variables of interest
For example:
Are math age equivalency scores correlated with reading age equivalency scores for children in middle school?
What is the population?
What are the variables?
Categorical or continuous?
Can we establish a relationship between variable x and variable y?
Can we establish temporal precedence of variable x?
Can we rule out other possible explanations for the association?
Correlational designs are plagued by the same limitations as intact-group designs. Just like the saying
goes, "correlation is not causation." Of course, we
hear this all the time, but we repeatedly treat correlational data as if it DOES show causality.
Deb Roy got 24/7 video data on
his son's language development in the house.

Watch 4:20 - 7:54

What impression do you come away with?
What is "causing" his son to acquire each word?
Despite the limitations, there is an important advantage of correlational designs over intact-group. What do we gain from keeping our "independent variable" continuous?
TOWRE Scores
Bad Reader
Good Reader
When we dichotomize a continuous variable, even if our dichotomization is based on theory, we lose information. What is true distinction between a good reader and a bad reader? Where is the line? If we use a correlational design, we never have to draw that line.
We can use correlational
designs to examine
relationships between
co-occurring variables
(concurrent) or those with
temporal precedence
(longitudinal)
These relationships
are of great importance to us
in education, because we often
examine them to establish the validity
of our assessments.
Our measure of reading ability should
correlate with other validated measures
of reading, and it should correlate with
component skills of reading, such as
phonological awareness. This verifies the
idea that we are measuring what we say
we are measuring.
It's important to choose a population that is homogoneous, and formulate falsifiable
hypotheses prior to examining the relationships
between two variables.
Here is a plot of the correlation between physical strength and math ability for middle school students. We can think of an explanation for why this relationship exists. Testosterone levels are
associated with both strength and math comprehension skills, so
perhaps that is the underlying cause of this relationship.
But here is the same plot, with sixth graders mapped in red and eighth graders mapped in blue. What's the likely third variable causing this relationship? Age. Because our population wasn't homogenous, we saw a relationship that didn't exist. Because we didn't have a falsifiable hypothesis based on theory, once we saw that relationship, it was easy to invent an explanation.
We can also establish the validity of a measure
by showing that it predicts something of great importance to us.
Autism is usually diagnosed between ages
2-4. We know that early intervention can positively effect outcomes, so early identification of treatment needs is of great interest to us. If our measure at 6 months is highly correlated with severity of autism at age 4, it would have predictive validity.
How are we analyzing the
relationship between our
variables of interest?
+
I can plot univariate data on a line to get a sense of the spread, or I can combine them to get...
Bivariate data! Now we see a visual representation of the relationship between both of these variables.
The slope of the line indicates the direction of the relationship, and the distance of the data points from the line indicates the strength of the relationship.
r(x,y) = cov(x,y)/s(x)s(y) = .112
We can calculate the
correlation, which tells
us the magnitude and
direction of the relationship.
If we re-examine our earlier data, we see
that our line is tilted up slightly, so the
relationship is positive, but the average
distance of our data points from the line of
best fit is pretty large, and so the magnitude
of our correlation is not very big.
When the data lines up perfectly,
then we have a perfect correlation.
This means that if we know an
individual's score on a single
variable, then we can know her
exact score on the correlated
variable. It gives us 100% prediction
power. (In reality, this never
happens).
Reading Scores
1. Bobby
2. Billy
3. Jenny
4. Jesse
5. Amber
6. Molly
7. George
8. Esther
9. Johnny
10. Jason
11. Julie
12. Lauren
13. Jimmy
14. Carrie
15. Samantha
16. Jonah
17. Heartley
18. Emma
19.Matt
20. Brent
21. Drew
22. Sarah
23. Rosey
24. Jane
25. Michael
Math Scores
1. Bobby
2. Billy
3. Jenny
4. Jesse
5. Amber
6. Molly
7. George
8. Esther
9. Johnny
10. Jason
11. Julie
12. Lauren
13. Jimmy
14. Carrie
15. Samantha
16. Jonah
17. Heartley
18. Emma
19.Matt
20. Brent
21. Drew
22. Sarah
23. Rosey
24. Jane
25. Michael
Again, this is the same as getting a 100% stable ranking. This is why reliability in group studies is often measured as an intraclass correlation (ICC). It's the extent
to which both raters ranked the participants in a similar order.
We can square the correlation (r)
to get r^2, an index of the proportion of
variance explained.
Let's say Micheal tests pretty poorly
on a standardized measure of math.
(Univariate Descriptive)
Good at smiling.
Bad at math.
At this point, we don't know why
Micheal has done more poorly than expected
(the mean). Perhaps there is something
influencing her score. Perhaps there are multiple things. Perhaps we have some measurement error.
But what if we find
that there is a pretty
strong negative correlation
between the number of
days premature at birth
and standardized math
scores. Micheal was 1
month premature. When
we graph the bivariate
data, Micheal has a new
distance from a new
expected value.
In this scenario, Micheal has two estimates of variance. We can examine these estimates of variance for each person in the study, and make a ratio of unexplained variance to total variance.When we subtract that ratio from 1, we get the proportion of variance in y that is accounted for by x.
We want to explain
all of our variance to
show that none of it is
resultant from random
error. How can we do that?
Multiple regression
allows us to examine the correlation between a criterion variable and two or more predictor variables. We can explain the relationship with the formula for a plane. Because it is multidimensional, it will have multiple slopes.
When we use multiple regression to analyze the relationship between variables, we can calculate beta weights (which help us understand the magnitude and direction of the effect for each variable) and an R value, which we can square, to get the amount of variance in y explained by our predictors.
an example
Research question: Does high school GPA, total SAT score, or quality of letters of recommendation significantly predict college GPA?
R squared = .425
So, do these variables predict college GPA?
Assumption of Linearity
Regression assumes there is a linear relationship
between the predictors and the predicted.
Nonlinear relationships will change your results.
Confidence Intervals
Recall that in order to estimate confidence intervals
around the mean of a sample, or a mean difference between two samples, we needed to a distribution of means from repeated samples of a population.
In regression, we want to estimate confidence
intervals around our slopes. Consequently, we
need a distribution of slopes.
Here is data on two variables of interest from the National Longitudinal Study of Freshman (NLSF; Singer, 2006). Let's call this our population, and the red line our population regression line.
We can repeatedly sample this population and get a slope each time.
Here's one.
Here's another.
And another.
Here's a bunch more
Here they're graphed
together: estimated regression
lines for 10, 100, and 1000
samples.
and here it's plotted as a distribution of slopes.
In other words, if we ranked
participants by math score,
and again by reading score,
would the rankings be similar?
If the measures are perfectly
correlated, then our rank order
will be the same.
Math Scores
1. Bobby
2. Billy
3. Jenny
4. Jesse
5. Amber
6. Molly
7. George
8. Esther
9. Johnny
10. Jason
11. Julie
12. Lauren
13. Jimmy
14. Carrie
15.Samantha
16. Jonah
17. Heartley
18.Emma
19.Matt
20. Brent
21. Drew
22. Sarah
23. Rosey
24. Jane
25. Michael
This individual has a
score of 19 on the x
measure and 50 on the
y measure.
These distances outlined in pink, between each data
point and the line of best fit (expected value) are the
residuals. Residuals, like deviations from the mean, can be averaged to provide a sample estimate of error.

We always estimate our error with some measure of
the extent to which our values have strayed from the
expected value (variance).
Total Variance in y
Explained Variance
The idea is that some of our variance, which could have been due to random error, is actually not random, and explained by our scores on another variable.
Reading Scores
1. Bobby
2. Billy
3. Jenny
4. Jesse
5. Amber
6. Molly
7. George
8. Esther
9. Johnny
10. Jason
11. Julie
12. Lauren
13. Jimmy
14. Carrie
15.Samantha
16. Jonah
17. Heartley
18.Emma
19.Matt
20. Brent
21. Drew
22. Sarah
23. Rosey
24. Jane
25. Michael
predicted variable
intercept
slope (effect of) predictor 1
slope (effect of) predictor 2
In this case, despite a clear
relationship between the variables,
your fit will be poor.
Other things to remember:
Betaweights should be interpreted conservatively;their magnitude depends on the order with whichyou enter the variables.The more variables you add to a regression equation,the more variance you will explain in your sample,but if this isn't based on theory, it's not likely to replicate
so now we can calculate our confidence interval as
slope estimate +/- critical value of distribution x standard error of the slope
What do we do to deal with the
effects of nuisance variables in
correlational studies?
also, this.
For example, what if we want to
examine the effects of nutrition
on academic achievement?
We could use a measure of the nutritional environment
such as the Nutrition Environment Measures Survey
And we could measure academic achievement with
standardized end of the year test scores.
but we know that the nutritional environment
is also correlated with SES, which can also influence
academic achievement
How do we control
for the effect of this
nuisance variable?
Part and partial correlations help
us remove the effects of variables
that are correlated with our
constructs of interest.
We calculate them by examining the shared variance. Remember our different residuals?
We can visualize the residuals (variance)
of all three variables to look like this. Some
variance is associated with only one variable.
Other variance is shared between two
variables. Some variance is shared by
all three variables.
The part correlation
helps us express a
more conservative
estimate of the
relationship between
nutrition and
academic achievement.
Part correlation for nutrition is
b/a+b+c+e

See how we took the shared residuals
between Nutrition and SES out of our
correlation metric?
The partial correlation is even more conservative.
Partial correlation for nutrition is b/b+e
We're removing ALL of the residuals shared by SES
Let's process these concepts
by designing a correlational
study together.
*image adapted from slides created by James Steiger
*image adapted from slides created by James Steiger
*image adapted from slides created by Jennifer Gilbert
*image adapted from slides created by Jennifer Gilbert
*image adapted from slides created by Jennifer Gilbert
*image adapted from slides created by Jennifer Gilbert
*image adapted from slides created by Jennifer Gilbert
Full transcript