
Audio Transcript Auto-generated
- 00:01 - 00:04
Hi. Welcome back today when I wanna talk to you
- 00:04 - 00:08
about is analysis of variance and that's typically known as
- 00:08 - 00:12
a nova Unova actually stands for analysis of variance.
- 00:13 - 00:16
The A N is analysis, Theo from of and deviate
- 00:17 - 00:19
from variants And today we'll take a look a TTE
- 00:19 - 00:21
simple single factor Unova.
- 00:22 - 00:24
As we move through this course, you'll be able to
- 00:24 - 00:30
do more complicated values with Mawr mawr or factors involved
- 00:30 - 00:30
in it.
- 00:31 - 00:33
Yeah, so let me give you a little bit of
- 00:33 - 00:37
ah, background in a nova in in stats one.
- 00:38 - 00:40
What you did is you compared a sample mean toe
- 00:41 - 00:43
hypothesized value on your hypothesis.
- 00:43 - 00:46
Probably looked like this where you had a choco mu
- 00:47 - 00:50
equals to have some sort of hypothesized value earlier on
- 00:50 - 00:50
in this course.
- 00:51 - 00:55
In the earlier chapters, we did, um, to samples against
- 00:55 - 00:55
each other.
- 00:56 - 00:58
Either to variances or two means on hypothesis.
- 00:59 - 01:00
Look a little bit different.
- 01:00 - 01:02
We had two means listed in it.
- 01:02 - 01:05
What a Nova does is it compares the means of
- 01:05 - 01:09
three or mawr samples, so h a is equal.
- 01:09 - 01:14
Thio mu a equals B equals you see, depending on
- 01:14 - 01:17
how maney populations were looking at and The alternative hypothesis
- 01:18 - 01:21
is that at least one mean is different now.
- 01:21 - 01:24
A nova won't tell us directly which of the means
- 01:25 - 01:25
is different.
- 01:25 - 01:27
It'll just tell us whether they're all are all the
- 01:27 - 01:29
same or whether there's at least one that's different.
- 01:30 - 01:32
There could be more than one, but Nova will turn.
- 01:32 - 01:32
Tell us what?
- 01:33 - 01:36
Whether there is a least one you may be thinking.
- 01:36 - 01:39
It might be easier to just compare pairs of means.
- 01:39 - 01:42
In other words, uh, compare a to be an A
- 01:42 - 01:43
to C and B two C.
- 01:43 - 01:49
Unfortunately, that doesn't work very well because you reduce your
- 01:49 - 01:50
accuracy in it.
- 01:50 - 01:54
It creates a situation where you're not going to get
- 01:54 - 01:55
the correct cancer.
- 01:56 - 01:58
So a Nova is is definitely the best way to
- 01:58 - 02:04
do this operation a little bit of theoretical background for
- 02:05 - 02:05
a nova.
- 02:05 - 02:10
And again, the theme null hypothesis is that we are
- 02:10 - 02:14
going to be seeing whether the means are equal and
- 02:14 - 02:17
technically. What it says is that whether the samples are
- 02:17 - 02:21
drawn from populations that all have the same sample means
- 02:23 - 02:26
a nova produces an F statistic similar to what we
- 02:26 - 02:28
did with variances on again.
- 02:28 - 02:28
It's a ratio.
- 02:29 - 02:34
It's a ratio between variances between the mean and variances
- 02:35 - 02:40
within. Within the sample, Um, and you'll see the words
- 02:41 - 02:46
between and within several times in this discussion on, I'll
- 02:46 - 02:49
explain a little bit detail as we move through this
- 02:49 - 02:52
section. Uh, on over relies on three assumptions.
- 02:53 - 02:57
Uh, the first assumption is that the samples are drawn,
- 02:58 - 03:03
are independent from from each other, so they're not matched
- 03:03 - 03:04
or paired.
- 03:06 - 03:11
The second assumption is that the population has a distribution
- 03:11 - 03:15
that's approximately normal, and the third assumption is that the
- 03:16 - 03:18
the populations have the same variances.
- 03:19 - 03:22
Now we'll assume in this course that we don't have
- 03:22 - 03:23
to check those assumptions.
- 03:23 - 03:28
They'll all be presumed to be valid for all our
- 03:28 - 03:30
problems. But in the real world, you have to check
- 03:30 - 03:31
these in order to make an over work.
- 03:34 - 03:37
So let's dio unexamined on the example I want to
- 03:37 - 03:41
do is is a drug trial and s So we
- 03:41 - 03:43
give the drug to a person and we ask them
- 03:43 - 03:44
how they're feeling blah, blah, blah.
- 03:45 - 03:46
And they said, Well, that's our result.
- 03:47 - 03:50
Now, in theory, you give the same drug to three
- 03:50 - 03:53
different candidates, and they should react the same way.
- 03:53 - 03:56
So in theory, we'll get this kind of result.
- 03:57 - 04:00
But in reality, we won't get that result will get
- 04:00 - 04:03
something that varies quite a bit between candidate s.
- 04:04 - 04:06
So this is what the actual data would would look
- 04:06 - 04:09
like on what we're gonna do is we're gonna take
- 04:10 - 04:14
the average each of those candidates, and we're gonna find
- 04:16 - 04:17
the variance between the means.
- 04:18 - 04:22
So that's what between refers to the variance between these
- 04:22 - 04:26
averages. And then we're gonna take the variance within each
- 04:27 - 04:32
sample. So again, the words between and within show up
- 04:32 - 04:33
quite a bit between means.
- 04:33 - 04:38
Between these, um, these three samples and within is within
- 04:39 - 04:39
each of the samples individually.
- 04:42 - 04:45
So let's take a look at some Unova calculations now,
- 04:45 - 04:48
Before I do these calculations, I wanted Thio to say
- 04:48 - 04:52
that the spreadsheets do a really good job of doing
- 04:52 - 04:55
calculations, and you're not gonna actually have toe do them
- 04:55 - 04:57
per se in its entirety.
- 04:57 - 05:00
But I think when you're introduced, you're being introduced to
- 05:00 - 05:03
Nova. I think it's important for you to take a
- 05:03 - 05:05
look at the calculations and see how they're done.
- 05:05 - 05:07
It gives you a better understanding of what is going
- 05:08 - 05:08
on in the problem.
- 05:09 - 05:11
So Let's let's take a look.
- 05:11 - 05:11
A TTE calculations.
- 05:12 - 05:16
So Unova calculations resulted in a table or two tables
- 05:17 - 05:17
that look like this.
- 05:18 - 05:21
The upper tables refer to a summary on the lower
- 05:21 - 05:23
tables refer to as the Unova table.
- 05:24 - 05:26
Summer table is pretty simple.
- 05:27 - 05:30
It lists the group's in your analysis, uh, the number
- 05:31 - 05:36
of items in each of the samples, the some of
- 05:36 - 05:40
those items, and it lists the mean of the samples
- 05:41 - 05:43
and also the variance of the samples.
- 05:44 - 05:47
So that's that's pretty simple stuff that we kind of
- 05:47 - 05:49
did in first level of stats.
- 05:49 - 05:51
But it's important because those numbers are involved in the
- 05:52 - 05:53
calculation of the lower table.
- 05:54 - 05:56
Okay, so let's take a look at the lower table.
- 05:57 - 05:59
And the first entry and lower table is referred to
- 05:59 - 06:00
as between groups.
- 06:01 - 06:04
Yeah, So taking a look at the M s column
- 06:05 - 06:06
first wth E.
- 06:07 - 06:13
M s column is tthe e the variance off the
- 06:13 - 06:17
averages. So those three numbers the 60.75 62.58 at 61.17
- 06:20 - 06:25
variants of those those numbers multiplied by the end in
- 06:25 - 06:25
each group.
- 06:25 - 06:28
So in this case, we have 12 and that will
- 06:28 - 06:30
be multiplied, the various multiplied by 12.
- 06:31 - 06:33
This calculation is a little bit more involved if your
- 06:34 - 06:36
sample sizes are not equal, but we'll worry about that
- 06:36 - 06:37
in the upcoming class.
- 06:39 - 06:42
The number next to that is the degrees of freedom,
- 06:43 - 06:45
which is the number of groups minus one.
- 06:46 - 06:49
So we have three groups in this problem and there's
- 06:49 - 06:49
three minus.
- 06:50 - 06:53
One would be to Okay, the number under the SS
- 06:53 - 06:55
column is called is known as the sum of the
- 06:56 - 07:00
squares, and that is the mean squared or M s
- 07:00 - 07:00
times too.
- 07:01 - 07:04
So if you take two multiplied by 11.8 you get
- 07:04 - 07:11
22.0, point 16 or 17 Now the next line is
- 07:12 - 07:14
called within groups again.
- 07:15 - 07:18
What we have here is an M s number, which
- 07:18 - 07:20
is the mean of the variances.
- 07:21 - 07:23
So if you take those three variances in the upper
- 07:23 - 07:27
table and you find their average, um, what you'll get
- 07:27 - 07:31
is 620.63 Now.
- 07:31 - 07:35
In this case, the degrees of freedom is the total
- 07:35 - 07:36
number of samples minus the group.
- 07:37 - 07:42
So we have 36 samples in total, minus three groups,
- 07:43 - 07:43
which is 33.
- 07:44 - 07:48
And and once again, the sum of the squares in
- 07:48 - 07:52
this case is the mean square 6 20 by 33.
- 07:54 - 07:54
The degrees of freedom.
- 07:56 - 07:59
Uh, the total column, the total rose.
- 07:59 - 08:02
Just simply that the total of the sum of the
- 08:02 - 08:03
squares and degrees of freedom.
- 08:05 - 08:07
Okay, okay, okay.
- 08:08 - 08:08
I want to talk a little bit more.
- 08:09 - 08:12
The sum of the squares numbers here.
- 08:12 - 08:15
So the top number between the groups is sometimes referred
- 08:15 - 08:17
to as s s treatment.
- 08:18 - 08:22
And technically, SS treatment says that if the population means
- 08:23 - 08:29
were all equal, that s s, um, treatment would be
- 08:29 - 08:33
zero. So s s treatment tells us, um, how the
- 08:34 - 08:36
sample means air different from each other.
- 08:37 - 08:41
Um, and the bottom number under within groups is sometimes
- 08:42 - 08:44
referred to as s s error error treatment.
- 08:44 - 08:47
You'll see in in some tables when the Nova calculations
- 08:47 - 08:48
get calculated.
- 08:49 - 08:52
And that's the amount of variation within the group.
- 08:54 - 08:56
So we'll see more examples of this as as we
- 08:56 - 08:57
assume move forward.
- 08:58 - 09:00
And certainly when we get to class, I'll show you
- 09:00 - 09:00
what that actually looks like.
- 09:02 - 09:08
Okay, um, but the F number are Our test statistic
- 09:09 - 09:13
is ah is a ratio of the number of the
- 09:13 - 09:17
variance between the group and within the group.
- 09:18 - 09:21
So in other words, it's, ah ratio of these of
- 09:21 - 09:30
the two numbers 11.8 and 6 20.63 Okay, The P
- 09:31 - 09:34
value is the P value that we've seen before.
- 09:35 - 09:39
So it's the test statistic minus not minus, but with
- 09:39 - 09:41
the two degrees of freedom two and 33.
- 09:43 - 09:46
And if we're using this critical value just for your
- 09:46 - 09:48
interest sake, we would find that by taking, uh, the
- 09:48 - 09:52
Alfa value and then two and three again for the
- 09:52 - 09:52
degrees of freedom.
- 09:55 - 09:59
So this is the complete unova table, and I want
- 10:00 - 10:01
you to go back and review this video a couple
- 10:01 - 10:05
of times so you understand how each number was calculated
- 10:06 - 10:09
and the meaning of a M s and how degrees
- 10:09 - 10:10
of freedom is calculated.
- 10:10 - 10:14
And what s s means how f was calculated and
- 10:15 - 10:17
how the P value was calculated.
- 10:20 - 10:22
Lastly, what we need to do is is do a
- 10:22 - 10:25
decision and a conclusion, which is something that, you know,
- 10:25 - 10:27
we do it in every and over project.
- 10:28 - 10:32
So we compare the P value to Alfa we reject
- 10:32 - 10:34
or fail to reject H O.
- 10:35 - 10:37
And we say there's either going to be a significant
- 10:38 - 10:42
or insignificant evidence at a certain percent to support the
- 10:42 - 10:44
claim that at least one mean is different.
- 10:45 - 10:46
And that's the key behind Ananova.
- 10:47 - 10:50
Conclusion, as you say, that at least one mean is
- 10:50 - 10:54
different. Uh, now again, as I said earlier, it doesn't
- 10:54 - 10:57
tell you, which means different Onley that one of them,
- 10:58 - 10:59
one of them is different.
- 11:01 - 11:03
So that's an introduction to a nova.
- 11:04 - 11:07
The table can look a little daunting, but if you
- 11:07 - 11:10
go through it one step at a time, uh, to
- 11:10 - 11:14
understand how those numbers arrived at, it becomes pretty, uh,
- 11:15 - 11:16
easy to understand.
- 11:17 - 11:18
And we're gonna go through a couple more of these
- 11:18 - 11:21
examples in class and all show you some ways.
- 11:21 - 11:23
Thio understand this a little bit better.
- 11:25 - 11:26
So that's it for the introduction.
- 11:26 - 11:27
Thio Unova.
- 11:27 - 11:29
We'll see you in class and we'll talk further about
- 11:30 - 11:30
this topic.
- 11:30 - 11:31
Bye for now,