Present Remotely
Send the link below via email or IM
CopyPresent to your audience
Start remote presentation Invited audience members will follow you as you navigate and present
 People invited to a presentation do not need a Prezi account
 This link expires 10 minutes after you close the presentation
 A maximum of 30 users can follow your presentation
 Learn more about this feature in our knowledge base article
S&T 04 Statistics
No description
by
TweetAle Ibarra
on 25 June 2014Transcript of S&T 04 Statistics
Bridget surveyed the price of eggs at stores in Guadalajara and Toluca. The data, in pesos per kilogram, are given below.
a) Find the mean price in each city and then state which city has the lower mean.
b) Find the standard deviation of each city's prices.
c) Which city has the more consistently priced egg? Give reasons for your answer.
Statistics
Basic Concepts
Is the universal set in the analysis; is the set that includes all interesting elements for the case.
It is a representative quantity of one of the parts (subset) of a population (universe) that is analyzed to know the characteristics of this population.
The sample must be random to ensure that it is the closest representation of the population.
is always integer (never fractions or decimals).
Examples: number of people, number of classrooms, number of classes, etc.
any value inside a defined interval. Could be fractional or with decimal point.
Example: times, grades, prices, weight, etc.
They are discrete (integer) or continuous (integer, fractional or with decimals).
They are measured using nominal scales where there is no order for the categories. For example, hair color, eye colors, etc.
Mean
13, 18, 13, 14, 13, 16, 14, 21, 13
The mean is basically the average:
(13 + 18 + 13 + 14 + 13 + 16 + 14 + 21 + 13) / 9 = 15
Mode
13, 18, 13, 14, 13, 16, 14, 21, 13
The mode is the number that is repeated more often than any other, so 13 is the mode.
13, 18, 13, 14, 13, 16, 14, 21, 13
13 is repeated 4 times, more than any other value.
Median
13, 18, 13, 14, 13, 16, 14, 21, 13
The median is the middle value, so I'll have to rewrite the list in order.
13, 13, 13, 13, 14, 14, 16, 18, 21
1, 2, 4, 7
In this case, there’s no middle value implicit, so we have to take the average of the two middle values.
1, 2, 4, 7
(2+4) ÷ 2 = 3
Get the mean, mode, and median
2, 9, 6, 2, 4, 6, 8, 9, 3, 6, 8, 9, 6, 7, 7, 1, 5, 6
Ponderated Mean
When the values from which the mean is obtained have different weights or ponderations.
Example:
For the final grade of a class, the following ponderation is considered:
1st Partial 10%
2nd Partial 20%
3rd Partial 30%
Final Exam 40%
If George has the grades 75, 95, 100, and 80 respectively, find the ponderated mean of his semester.
Example:
A student has to present a selection exam, which consists of two parts: Math, with a value of 70% in the exam; and Physics, with a value of 30% in the exam.
If the student got 60 on the Math exam, and 70 in Physics, what is the student’s grade in the exam?
What would be the student’s grade if Math and Phyiscs had the same value in the exam (50% each)?
The mean annual wage for a company’s employees is $5,000. For men, it was $5,200, and for women it was $4,200. Find the percentage of men and women that work at this company.
Work in teams of 3, show me your answer before the time's done!
Variance
It can be considered as the square distance between the values observed (variable), and the expected value of those values altogether (mean).
μ = mean of the population,
N = number of scores
= variance.
In a population
= mean of the sample
n = the total number of values of the sample.
In a sample
The (n – 1) in the denominator gives an unbiased estimate of the population variance,
Standard Deviation
Is a measure of the dispersion of a collection of values; it’s measured against the mean.
The standard deviation is simply the square root of the variance.
It is the most commonly used measure of spread.
Take the dogs heights:
600 , 470, 170, 430, 300
Find the mean of the heights.
Calculate the difference between each height and the mean.
Take each difference and square it.
Find the sum of the squared values.
Divide the sum by (n1)
The result is the variance of the sample.
Now, we can know which values are acceptable or normal by finding the standard deviation (square root of the variance).
`
Get which movies are within an acceptable range for this group.
Get which movies are within an acceptable if all the movies in theaters have an average $200M in box office.
Normal Distribution
Determination of Sample Size
ANOVA
Since the bellshaped curve is symmetric, the probability of deviations from the mean are comparable in either direction.
When you want to describe the probability for a continuous variable, you do so by describing the area under a "bellshaped" curve.
Standard Normal Distribution
is defined as the normal distribution with a
mean = zero (μ = 0)
standard deviation = one (σ= 1).
The normal random variable of a standard normal distribution is called a “Z” score or “Z” distance.
In a “One  Sigma Rule”, (Z = 1, 1) or σ from the mean, the probability (area under the standard normal curve) is 68%.
In a “Two  Sigma Rule”, (Z = 2, 2) or 2σ from the mean, the probability (area under the standard normal curve) is 95%.
In a “Three  Sigma Rule”, (Z = 3, 3) or 3σ from the mean, the probability (area under the standard normal curve) is 99.7%.
1. Find the area under the standard normal curve between z = 0 and z = 2.
2. Find the area under the standard normal curve between z = 0 and z = 1.8.
3. Find the area under the standard normal curve to the right of z = 1.5.
4. Find the area under the standard normal curve to the left of z = 1.75.
5. Find the area under the standard normal curve between z = 1.5 and z = 2.5.
6. Find the area under the standard normal curve between z = 2.78 and z=1.66.
z=0
p=0.500
z=2.00
p=0.9773
0.9773  0.500 = 0.4773
z=0
p=0.500
0.9773  0.500 = 0.4773
z=1.8
z=1.80
z=0
p=0.500
z=1.50
z=0
p=0.500
z=1.75
z=2.5
0.9773  0.500 = 0.4773
z=1.5
z=1.66
z=2.78
If IQ scores are normally distributed with a mean of 100 and a standard deviation of 5, what is the probability that a person chosen at random will have an IQ score greater than 110?
Z=(Xμ)/σ
Z=(110100)/5
Z=2
0.9773  0.500 = 0.4773
z=2.0
Suppose family incomes in a town are normally distributed with a mean of $1,200 and a standard deviation of $600 per month. What is the probability that a family has an income between $1,400 and$2,250?
We wish to determine the proportion of light bulbs produced with lifetimes between 1400 and 1520 hrs. The average lifetime of a bulb is 1000 hours with a deviation of 200.
A fouryear college will accept any student ranked in the top 60% on a national examination. If the test score is normally distributed with a mean of 500 and a standard deviation of 100, what is the cutoff score for acceptance?
x 110.00
μ 100.00
σ 5.00
Z 2.00
P 0.9772 1.0000 2.28%
x 1400.00 2250.00
μ 1200.00 1200.00
σ 600.00 600.00
Z 0.33 1.75
P 0.6306 0.9599 32.94%
x 1400 1520
μ 1000 1000
σ 200 200
Z 2.00 2.60
P 0.9772 0.9953 1.81%
x 525
μ 500
σ 100
Z 0.25
P 0.5987(=60%)
x 2 0
μ 0 0
σ 1 1
Z 2.00 0.00
P 0.9772 0.5000 47.72%
x 0.0 1.8
μ 0 0
σ 1 1
Z 0.00 1.80
P 0.5000 0.0359 46.41%
x 1.5
μ 0
σ 1
Z 1.50
P 0.9332 1.0000 6.68%
x 1.8
μ 0
σ 1
Z 1.75[
P 0.0401 0.0000 4.01%
x 1.5 2.5
μ 0 0
σ 1 1
Z 1.50 2.50
P 0.9332 0.9938 6.06%
x 2.78 1.66
μ 0 0
σ 1 1
Z 2.78 1.66
P 0.0027 0.9515 94.88%
How large should a sample be in a specific situation?
If a larger sample than necessary is used, resources are wasted; if the sample is too small, the objectives of the analysis may not be achieved.
What degree of precision is desired?
the greater the degree of desired precision, the larger will be the necessary sample size.
Suppose we would like to conduct a poll among eligible voters in a city in order to determine the percentage who intend to vote for the Democratic candidate in an upcoming election.
We specify that we want the probability to be
95.5%
that we will estimate the percentage that will vote Democratic within
+/ 1
percentage point.
What is the required sample size?
n = required sample size
Pc = confidence level
e = confidence interval or desired level of precision expressed in percentage or decimals.
p = estimation of proportion
If we observe from the chart, the estimate of proportion (p) yields a maximum variability at (p = 0.5). This provides a more conservative approach when obtaining our required sample size (n).
Thus, if not stated directly, the estimation of proportion (p) can be assumed to be 0.5 for maximum variability.
A random sample is to be selected to estimate the proportion of citizens of a large Texan city who favor federal price controls on natural gas that is transported interstate. The range of the estimate is to be kept within 4% with a confidence level of 96%. How large a simple random sample is required?
A consumer research group is surveying the consumer population of the New England region to estimate the proportion of consumers who use biodegradable laundry detergent.
How large a random sample should be drawn if the objective is to estimate the proportion of consumers using biodegradable detergent within four percentage points with 98% confidence? Assume that the proportion of consumers using this type of detergent is estimated from past experiences to be 0.20.
Suppose we wanted to estimate the arithmetic
mean
hourly wage rate for a group of skilled workers in a certain industry.
Let us further assume that from prior studies we estimate that the population standard deviation of the hourly wage rates of these workers is about $0.15. How large a sample size would be required to yield a probability of 99.7% that we will
estimate the mean
wage rate of these workers within +/ $ 0.03?
Pc = 95.5% = 0.955
Sample size for estimation of a
proportion
Sample size for estimation of a
mean
n = required sample size
z = zscore found in normal distribution table
σ = population standard deviation
e = confidence interval or desired level of precision expressed in percentage or decimals.
Pc = 99.7% = 0.997
P(
z
) = (
0.997
/2) + 0.5
P(
z
) =
0.9985
look on the normal distribution table...
P(
3
) ≈
0.9985
z = 3
e = ±3% = 0.03
σ = 0.15
A publishing wants to know what percent of the population might be interested in a new magazine on making the most of your retirement. Secondary data (that is several years old) indicates that 22% of the population is retired. They are willing to accept an error rate of 5% and they want to be 95% certain that their finding does not differ from the true rate by more than 5%. What is the required sample size?
A fast food company wants to determine the average number of times that fast food users visit fast food restaurants per week. They have decided that their estimate needs to be accurate within plus or minus onetenth of a visit, and they want to be 95% sure that their estimate does not differ from true number of visits by more than onetenth of a visit. Previous research has shown that the standard deviation is .7 visits. What is the required sample size?
Confidence Interval Estimate for a Mean
A random sample of 16 public school teachers in a particular state has a mean salary of $33,000 with a standard deviation of $1,000. Construct a 99 percent confidence interval estimate for the true mean salary for public school teachers for the given state.
For example, you want to know the average amount of time a student at Ohio State University spends listening to music per day, using an MP3 player. The average time for the entire population of OSU students that are MP3player users is the parameter you’re looking for. Certain that you can’t ask every student who uses an MP3 player at OSU this question, you take a random sample of students and find the average from there.
Suppose the average time a student uses an MP3 player per day to listen to music based on a random sample of
1,000 OSU students
is
2.5 hours
, and the standard deviation is
0.5 hours
.
Is it right to say that the population of all OSUstudent MP3player owners use their players an average of 2.5 hours per day for music listening? No.
You hope and may assume that the average for the whole population is close to 2.5, but it probably isn’t exact. After all, you’re only sampling a tiny fraction of the 60,000 member population of all OSU students. The fact is that sample results vary from sample to sample.
The solution is to not only report the average from your sample, but along with it, report some measure of how much you expect that sample average to vary from one sample to the next, with a certain level of confidence.
You want to cover your bases, so to speak (at least most of the time). The number that you use to represent this level of precision in your results is called the
margin of error.
You take your sample average and add and subtract the margin of error (to get that plusorminus factor going), which gives you a confidence interval for the average time all OSU students use their MP3 players.
n = sample size
df = degrees of freedom
Pc = confidence level
t = critical value of "t"
s = standard deviation
x = mean
n = 16
df = n1 = 15
Pc = 99
look on the
tdistribution table...
t = 2.947
s = 33,000
x = 1,000
Suppose we conduct a survey of 19 millionares to find out what percent of their income the average millionaire donates to charity. We discover that the mean percent is 15 with a standard deviation of 5 percent. Find a 95% confidence interval for the mean percent.
The president of a small community college wishes to estimate the average distance commuting students travel to the campus. A sample of 12 students was randomly selected and yielded the following distances in miles: 27, 35, 33, 30, 39, 25, 38, 22, 27, 37, 33, 40. Construct a 95% confidence interval estimate for the true mean distance commuting students travel to the campus.
n=12
df=11
Pc=95%
t=2.201
σ =5.94
x =32.17
CIE= 35.939 & 28.395
Student TTest
When the variance is not known but has to be estimated from sample data, you should use the tdistribution rather than standard normal distribution.
When the sample size (n) is large, say 100 or above, the tdistribution is very similar to the standard normal distribution.
However, with smaller sample sizes (i.e. n = 5), the tdistribution has relatively more “scores” in its tails than does the standard normal distribution.
As a result, you have to extend farther from the mean to contain a given proportion of the area.
Establishing Test Hypotheses (Null Hypothesis and Alternative Hypothesis)
In statistics, the only way of supporting your hypothesis, (H1), is to reject the null hypothesis, (H0
).
Rather than trying to show that (H
¹
), or the alternative hypothesis, is correct, we must show that the null hypothesis, (H
o
), is likely to be wrong – we have to “reject” or “nullify” the null hypothesis , (H
o
).
Resultant (t)
x = sample mean or average of sample
μ
o
= population mean or average of population (expected mean value)
s = standard deviation of sample
n = sample size
A teachers' union would like to establish that the average salary for high school teachers in a particular state is less than $35,500. A random sample of 25 public high school teachers in the particular state has a mean salary of $34,578 with a standard deviation of $910. Test to establish whether the union's claim is correct at the 5 percent level of significance.
Resultant (t)
x = 34,578
μ
o
= 35,500
s = 910
n = 25
t = 5.07
Critical (t)
α = significance level (criterion stated or given)
p = pvalue (found in tdistribution table)
df = n – 1 (degrees of freedom)
The dean of students of a private college claims that the average distance commuting students travel to the campus less than 35 mi. The commuting students feel otherwise, A sample of 16 students was randomly selected and yielded a mean of 36 miles and a standard deviation of 5 miles. Test the dean's claim at the 5 percent level of significance.
An advertising agency would like to create an advertisement for a fast food restaurant claiming that the average waiting time from ordering to receiving your order at the restaurant is less than 5 min. The agency measured the time from ordering to delivery of order for 25 customers and found that the average time was 4.7 min with a standard deviation of 0.6 min. At the 5 percent level of significance, test the claim.
A sales manager claims that his salesmen can sell an average of more than 9.3 computers per week. The CEO of the company would like to prove this, so he measured 15 salesmen’s sales records, getting an average of 7.6 sales, with a std. deviation of 0.9. Use a 95% certainty to prove the manager’s claim.
A movie theater manager claims that he can get more than 500 customers in one night. The box office clerks have registered an average attendance of 463.5 per night during 9 straight days, with a deviation of 36.2. With a certainty of 95% find if the manager can get the at least 500 customers to attend.
A pizza place has been selling an average of 5,632.6 pizzas per month for the last year and a half. Given the lack of materials, they need to sell 5,300 pizzas
at the most
for the next month. With 90% accuracy, and a deviation of 135.2, find out if that is possible.
H
o
: μ ≥ 35,500
H
¹
: μ < 35,500
Critical (t)
α = 0.05
df = 251 = 24
t
α,n1
= 1.711
Lefttailed
H
o
: μ ≥ μ
o
H
¹
: μ
<
μ
o
It’s left tailed because the direction of the inequality in H
¹
is to the left.
We reject H
o
if the resultant value t is
less
than
–
ta
,n1
.
Righttailed
H
o
: μ ≤ μ
o
H
¹
: μ
>
μ
o
It’s right tailed because the direction of the inequality in H
¹
is to the right.
We reject H
o
if the resultant value t is
greater
than
+
t
α,n1
.
Twotailed
H
o
: μ = μ
o
H
¹
: μ
≠
μ
o
It’s twotailed because we have an
unequal
sign, which is in H
¹
.
We reject H
o
if the resultant value t is
greater
than
+
t
α/2,n1
or
less
than

t
α/2,n1
.
Therefore, we have a “LeftTailed” Test and we want to compare if, t ≤ –t
α,n1
,in order to reject the null hypothesis.
from the chart
5.07 ≤ 1.711
Hence, at 5% significance level, we can go with the alternative hypothesis, (H
¹
), and state that
the union’s claim is correct
.
Each year, car manufacturers perform mileage tests on new car models and submit the results to the EPA. The EPA then tests the cars to determine whether the manufacturer´s claims are correct. In 1998, Mercedes Benz reported that the SLK averaged 29 mpg on the highway. Suppose the EPA tested 15 of these cars and obtained an average of 28.75 mpg with a standard deviation of 1.6 mpg. At 5% significance, test to see if Mercedes Benz´s claim is correct.
In 2001, a study done in Mexico found that the average height of males 50+ years of age was 1.63 meters. A random sample of 12 Mexican citizens (50+ years of age) was found to have an average height of 1.68 meters with a standard deviation of 0.2 meters. With 95% certainty, test to see if the study done in 2001 is true.
A OneWay ANOVA or (Analysis of Variance) is a way to test the equality of three or more means at one time by using variances.
Assumptions when using ANOVA:
The populations from which the samples were obtained must be approximately normally distributed.
The samples must be independent.
The variances of the populations must be equal.
The null hypothesis (H0) for ANOVA is that the mean is the same for all groups.
H0: μ1 = μ2 = μ3
The alternative or research hypothesis (H1) is that the mean is not the same for all groups.
H1: μ1 ≠ μ2 ≠ μ3
A math teacher predicts that students will learn most effectively with a constant background sound, as opposed to an unpredictable sound or no sound at all.
She randomly divides 24 students into 3 groups of 8. All students study a passage of text for 30 minutes. Those in group 1 study with background sound at a constant volume in the background. Those in group 2 study with noise that changes volume periodically. Those in group 3 study with no sound at all.
After studying, all students take a test over the material. Their scores were:
Constant sound 7 4 6 8 6 6 2 9
Random sound 5 5 3 4 4 7 2 2
No sound 2 4 7 1 2 1 5 5
1. Put the raw data, according to group, in "x1", "x2", and "x3"
2. Calculate the sum for group 1.
3. Calculate Sx² for group 1.
5. Calculate (Sx)² for group 1.
6. Repeat steps 25 for groups 2 and 3
7. Calculate SSamong
8. Calculate SSwithin
9. Complete the table by calculating: dfamong, dfwithin, MSamong, and MSwithin, and F
10. Check to see if F is statistically significant on probability table with appropriate degrees of freedom and p < .05.
StemandLeaf plot
frequency
histogram
12, 13, 21, 27, 33, 34, 35, 37, 40, 40, 41
mode
mean
median
max
min
range
stemandleaf
table of frequency (x4)
histogram
Correlation
Pearson correlation coefficient
A parking lot manager in upstate New York is considering expanding his lot, for which he’d have to buy more land. The purchase would be worth it if he gets at least 1,500 cars weekly. To help his decision, he’s looking into the records of the last 2 months, for which he has gotten an average if 1,479.4 cars weekly. Given a standard deviation of 33.6 and an accuracy of 90%, would the purchase be worth it?
Back in the 90’s, a photo place announced they could print all your photos in 1 hour. Their maximum capacity was 420 photos per hour. If during 23 hours, they printed an average amount of 411 photos with a standard deviation of 36.7 photos, find out if their demand would allow them to print all photos in one hour. Use 99% accuracy.
A night club has been trying to bring a new DJ for a Friday night. The DJ will only accept to play there if the club is at full capacity, which is 526 people. To learn if they’ll be capable, the owners have been measuring attendance during the last 20 Fridays, for which they’ve gotten an average attendance of 510 people with a standard deviation of 25. Using 90% certainty, statistically evaluate if they’ll be able to pull it off.
The first class ticket on the Titanic was $1,200, and the second class was $700. What would be the proportion of first and second class passengers, if the overall mean price for a ticket was set to be $1,000.
From 2001 to 2012, the winning scores for a golf tournament were 276, 279, 279, 277, 278, 278, 280, 282, 285, 272, 279, and 278. Using the standard deviation for this sample, Sx, find the percent of these winning scores that fall within one standard deviation of the mean.
From 1984 to 1995, the winning scores for a golf tournament were 276, 279, 279, 277, 278, 278, 280, 282, 285, 272, 279, and 278. Using the standard deviation for this sample, Sx, find the percent of these winning scores that fall within one standard deviation of the mean.
How probable do you want it to be that the desired precision will be obtained?
z
= zscore found in normal distribution table
P(
z
) = probability from normal distribution table
P(
z
) = (
Pc
/2) + 0.5
(only use this formula when applying standard normal distribution table)
P(
z
) = (
0.955
/2) + 0.5
P(
z
) =
0.9775
look on the normal distribution table...
P(
2
) ≈
0.9775
z = 2
e = ±1% = 0.01
p = estimation of proportion (?)
p=0.5
Take the receipts:
96, 53, 39, 64, 57,
71, 47, 99, 62, 68
Find the mean of the receipts.
Calculate the difference between each height and the mean.
Take each difference and square it.
Find the sum of the squared values.
Divide the sum by (n1)
The result is the variance of the sample.
Now, we can know which values are acceptable or normal by finding the standard deviation (square root of the variance).
The owner of the Chez Maurice restaurant is interested in how much people spend at the restaurant. He examines 10 randomly selected receipts for parties of four and writes down the following data.
96, 53, 39, 64, 57, 71, 47, 99, 62, 68
Full transcripta) Find the mean price in each city and then state which city has the lower mean.
b) Find the standard deviation of each city's prices.
c) Which city has the more consistently priced egg? Give reasons for your answer.
Statistics
Basic Concepts
Is the universal set in the analysis; is the set that includes all interesting elements for the case.
It is a representative quantity of one of the parts (subset) of a population (universe) that is analyzed to know the characteristics of this population.
The sample must be random to ensure that it is the closest representation of the population.
is always integer (never fractions or decimals).
Examples: number of people, number of classrooms, number of classes, etc.
any value inside a defined interval. Could be fractional or with decimal point.
Example: times, grades, prices, weight, etc.
They are discrete (integer) or continuous (integer, fractional or with decimals).
They are measured using nominal scales where there is no order for the categories. For example, hair color, eye colors, etc.
Mean
13, 18, 13, 14, 13, 16, 14, 21, 13
The mean is basically the average:
(13 + 18 + 13 + 14 + 13 + 16 + 14 + 21 + 13) / 9 = 15
Mode
13, 18, 13, 14, 13, 16, 14, 21, 13
The mode is the number that is repeated more often than any other, so 13 is the mode.
13, 18, 13, 14, 13, 16, 14, 21, 13
13 is repeated 4 times, more than any other value.
Median
13, 18, 13, 14, 13, 16, 14, 21, 13
The median is the middle value, so I'll have to rewrite the list in order.
13, 13, 13, 13, 14, 14, 16, 18, 21
1, 2, 4, 7
In this case, there’s no middle value implicit, so we have to take the average of the two middle values.
1, 2, 4, 7
(2+4) ÷ 2 = 3
Get the mean, mode, and median
2, 9, 6, 2, 4, 6, 8, 9, 3, 6, 8, 9, 6, 7, 7, 1, 5, 6
Ponderated Mean
When the values from which the mean is obtained have different weights or ponderations.
Example:
For the final grade of a class, the following ponderation is considered:
1st Partial 10%
2nd Partial 20%
3rd Partial 30%
Final Exam 40%
If George has the grades 75, 95, 100, and 80 respectively, find the ponderated mean of his semester.
Example:
A student has to present a selection exam, which consists of two parts: Math, with a value of 70% in the exam; and Physics, with a value of 30% in the exam.
If the student got 60 on the Math exam, and 70 in Physics, what is the student’s grade in the exam?
What would be the student’s grade if Math and Phyiscs had the same value in the exam (50% each)?
The mean annual wage for a company’s employees is $5,000. For men, it was $5,200, and for women it was $4,200. Find the percentage of men and women that work at this company.
Work in teams of 3, show me your answer before the time's done!
Variance
It can be considered as the square distance between the values observed (variable), and the expected value of those values altogether (mean).
μ = mean of the population,
N = number of scores
= variance.
In a population
= mean of the sample
n = the total number of values of the sample.
In a sample
The (n – 1) in the denominator gives an unbiased estimate of the population variance,
Standard Deviation
Is a measure of the dispersion of a collection of values; it’s measured against the mean.
The standard deviation is simply the square root of the variance.
It is the most commonly used measure of spread.
Take the dogs heights:
600 , 470, 170, 430, 300
Find the mean of the heights.
Calculate the difference between each height and the mean.
Take each difference and square it.
Find the sum of the squared values.
Divide the sum by (n1)
The result is the variance of the sample.
Now, we can know which values are acceptable or normal by finding the standard deviation (square root of the variance).
`
Get which movies are within an acceptable range for this group.
Get which movies are within an acceptable if all the movies in theaters have an average $200M in box office.
Normal Distribution
Determination of Sample Size
ANOVA
Since the bellshaped curve is symmetric, the probability of deviations from the mean are comparable in either direction.
When you want to describe the probability for a continuous variable, you do so by describing the area under a "bellshaped" curve.
Standard Normal Distribution
is defined as the normal distribution with a
mean = zero (μ = 0)
standard deviation = one (σ= 1).
The normal random variable of a standard normal distribution is called a “Z” score or “Z” distance.
In a “One  Sigma Rule”, (Z = 1, 1) or σ from the mean, the probability (area under the standard normal curve) is 68%.
In a “Two  Sigma Rule”, (Z = 2, 2) or 2σ from the mean, the probability (area under the standard normal curve) is 95%.
In a “Three  Sigma Rule”, (Z = 3, 3) or 3σ from the mean, the probability (area under the standard normal curve) is 99.7%.
1. Find the area under the standard normal curve between z = 0 and z = 2.
2. Find the area under the standard normal curve between z = 0 and z = 1.8.
3. Find the area under the standard normal curve to the right of z = 1.5.
4. Find the area under the standard normal curve to the left of z = 1.75.
5. Find the area under the standard normal curve between z = 1.5 and z = 2.5.
6. Find the area under the standard normal curve between z = 2.78 and z=1.66.
z=0
p=0.500
z=2.00
p=0.9773
0.9773  0.500 = 0.4773
z=0
p=0.500
0.9773  0.500 = 0.4773
z=1.8
z=1.80
z=0
p=0.500
z=1.50
z=0
p=0.500
z=1.75
z=2.5
0.9773  0.500 = 0.4773
z=1.5
z=1.66
z=2.78
If IQ scores are normally distributed with a mean of 100 and a standard deviation of 5, what is the probability that a person chosen at random will have an IQ score greater than 110?
Z=(Xμ)/σ
Z=(110100)/5
Z=2
0.9773  0.500 = 0.4773
z=2.0
Suppose family incomes in a town are normally distributed with a mean of $1,200 and a standard deviation of $600 per month. What is the probability that a family has an income between $1,400 and$2,250?
We wish to determine the proportion of light bulbs produced with lifetimes between 1400 and 1520 hrs. The average lifetime of a bulb is 1000 hours with a deviation of 200.
A fouryear college will accept any student ranked in the top 60% on a national examination. If the test score is normally distributed with a mean of 500 and a standard deviation of 100, what is the cutoff score for acceptance?
x 110.00
μ 100.00
σ 5.00
Z 2.00
P 0.9772 1.0000 2.28%
x 1400.00 2250.00
μ 1200.00 1200.00
σ 600.00 600.00
Z 0.33 1.75
P 0.6306 0.9599 32.94%
x 1400 1520
μ 1000 1000
σ 200 200
Z 2.00 2.60
P 0.9772 0.9953 1.81%
x 525
μ 500
σ 100
Z 0.25
P 0.5987(=60%)
x 2 0
μ 0 0
σ 1 1
Z 2.00 0.00
P 0.9772 0.5000 47.72%
x 0.0 1.8
μ 0 0
σ 1 1
Z 0.00 1.80
P 0.5000 0.0359 46.41%
x 1.5
μ 0
σ 1
Z 1.50
P 0.9332 1.0000 6.68%
x 1.8
μ 0
σ 1
Z 1.75[
P 0.0401 0.0000 4.01%
x 1.5 2.5
μ 0 0
σ 1 1
Z 1.50 2.50
P 0.9332 0.9938 6.06%
x 2.78 1.66
μ 0 0
σ 1 1
Z 2.78 1.66
P 0.0027 0.9515 94.88%
How large should a sample be in a specific situation?
If a larger sample than necessary is used, resources are wasted; if the sample is too small, the objectives of the analysis may not be achieved.
What degree of precision is desired?
the greater the degree of desired precision, the larger will be the necessary sample size.
Suppose we would like to conduct a poll among eligible voters in a city in order to determine the percentage who intend to vote for the Democratic candidate in an upcoming election.
We specify that we want the probability to be
95.5%
that we will estimate the percentage that will vote Democratic within
+/ 1
percentage point.
What is the required sample size?
n = required sample size
Pc = confidence level
e = confidence interval or desired level of precision expressed in percentage or decimals.
p = estimation of proportion
If we observe from the chart, the estimate of proportion (p) yields a maximum variability at (p = 0.5). This provides a more conservative approach when obtaining our required sample size (n).
Thus, if not stated directly, the estimation of proportion (p) can be assumed to be 0.5 for maximum variability.
A random sample is to be selected to estimate the proportion of citizens of a large Texan city who favor federal price controls on natural gas that is transported interstate. The range of the estimate is to be kept within 4% with a confidence level of 96%. How large a simple random sample is required?
A consumer research group is surveying the consumer population of the New England region to estimate the proportion of consumers who use biodegradable laundry detergent.
How large a random sample should be drawn if the objective is to estimate the proportion of consumers using biodegradable detergent within four percentage points with 98% confidence? Assume that the proportion of consumers using this type of detergent is estimated from past experiences to be 0.20.
Suppose we wanted to estimate the arithmetic
mean
hourly wage rate for a group of skilled workers in a certain industry.
Let us further assume that from prior studies we estimate that the population standard deviation of the hourly wage rates of these workers is about $0.15. How large a sample size would be required to yield a probability of 99.7% that we will
estimate the mean
wage rate of these workers within +/ $ 0.03?
Pc = 95.5% = 0.955
Sample size for estimation of a
proportion
Sample size for estimation of a
mean
n = required sample size
z = zscore found in normal distribution table
σ = population standard deviation
e = confidence interval or desired level of precision expressed in percentage or decimals.
Pc = 99.7% = 0.997
P(
z
) = (
0.997
/2) + 0.5
P(
z
) =
0.9985
look on the normal distribution table...
P(
3
) ≈
0.9985
z = 3
e = ±3% = 0.03
σ = 0.15
A publishing wants to know what percent of the population might be interested in a new magazine on making the most of your retirement. Secondary data (that is several years old) indicates that 22% of the population is retired. They are willing to accept an error rate of 5% and they want to be 95% certain that their finding does not differ from the true rate by more than 5%. What is the required sample size?
A fast food company wants to determine the average number of times that fast food users visit fast food restaurants per week. They have decided that their estimate needs to be accurate within plus or minus onetenth of a visit, and they want to be 95% sure that their estimate does not differ from true number of visits by more than onetenth of a visit. Previous research has shown that the standard deviation is .7 visits. What is the required sample size?
Confidence Interval Estimate for a Mean
A random sample of 16 public school teachers in a particular state has a mean salary of $33,000 with a standard deviation of $1,000. Construct a 99 percent confidence interval estimate for the true mean salary for public school teachers for the given state.
For example, you want to know the average amount of time a student at Ohio State University spends listening to music per day, using an MP3 player. The average time for the entire population of OSU students that are MP3player users is the parameter you’re looking for. Certain that you can’t ask every student who uses an MP3 player at OSU this question, you take a random sample of students and find the average from there.
Suppose the average time a student uses an MP3 player per day to listen to music based on a random sample of
1,000 OSU students
is
2.5 hours
, and the standard deviation is
0.5 hours
.
Is it right to say that the population of all OSUstudent MP3player owners use their players an average of 2.5 hours per day for music listening? No.
You hope and may assume that the average for the whole population is close to 2.5, but it probably isn’t exact. After all, you’re only sampling a tiny fraction of the 60,000 member population of all OSU students. The fact is that sample results vary from sample to sample.
The solution is to not only report the average from your sample, but along with it, report some measure of how much you expect that sample average to vary from one sample to the next, with a certain level of confidence.
You want to cover your bases, so to speak (at least most of the time). The number that you use to represent this level of precision in your results is called the
margin of error.
You take your sample average and add and subtract the margin of error (to get that plusorminus factor going), which gives you a confidence interval for the average time all OSU students use their MP3 players.
n = sample size
df = degrees of freedom
Pc = confidence level
t = critical value of "t"
s = standard deviation
x = mean
n = 16
df = n1 = 15
Pc = 99
look on the
tdistribution table...
t = 2.947
s = 33,000
x = 1,000
Suppose we conduct a survey of 19 millionares to find out what percent of their income the average millionaire donates to charity. We discover that the mean percent is 15 with a standard deviation of 5 percent. Find a 95% confidence interval for the mean percent.
The president of a small community college wishes to estimate the average distance commuting students travel to the campus. A sample of 12 students was randomly selected and yielded the following distances in miles: 27, 35, 33, 30, 39, 25, 38, 22, 27, 37, 33, 40. Construct a 95% confidence interval estimate for the true mean distance commuting students travel to the campus.
n=12
df=11
Pc=95%
t=2.201
σ =5.94
x =32.17
CIE= 35.939 & 28.395
Student TTest
When the variance is not known but has to be estimated from sample data, you should use the tdistribution rather than standard normal distribution.
When the sample size (n) is large, say 100 or above, the tdistribution is very similar to the standard normal distribution.
However, with smaller sample sizes (i.e. n = 5), the tdistribution has relatively more “scores” in its tails than does the standard normal distribution.
As a result, you have to extend farther from the mean to contain a given proportion of the area.
Establishing Test Hypotheses (Null Hypothesis and Alternative Hypothesis)
In statistics, the only way of supporting your hypothesis, (H1), is to reject the null hypothesis, (H0
).
Rather than trying to show that (H
¹
), or the alternative hypothesis, is correct, we must show that the null hypothesis, (H
o
), is likely to be wrong – we have to “reject” or “nullify” the null hypothesis , (H
o
).
Resultant (t)
x = sample mean or average of sample
μ
o
= population mean or average of population (expected mean value)
s = standard deviation of sample
n = sample size
A teachers' union would like to establish that the average salary for high school teachers in a particular state is less than $35,500. A random sample of 25 public high school teachers in the particular state has a mean salary of $34,578 with a standard deviation of $910. Test to establish whether the union's claim is correct at the 5 percent level of significance.
Resultant (t)
x = 34,578
μ
o
= 35,500
s = 910
n = 25
t = 5.07
Critical (t)
α = significance level (criterion stated or given)
p = pvalue (found in tdistribution table)
df = n – 1 (degrees of freedom)
The dean of students of a private college claims that the average distance commuting students travel to the campus less than 35 mi. The commuting students feel otherwise, A sample of 16 students was randomly selected and yielded a mean of 36 miles and a standard deviation of 5 miles. Test the dean's claim at the 5 percent level of significance.
An advertising agency would like to create an advertisement for a fast food restaurant claiming that the average waiting time from ordering to receiving your order at the restaurant is less than 5 min. The agency measured the time from ordering to delivery of order for 25 customers and found that the average time was 4.7 min with a standard deviation of 0.6 min. At the 5 percent level of significance, test the claim.
A sales manager claims that his salesmen can sell an average of more than 9.3 computers per week. The CEO of the company would like to prove this, so he measured 15 salesmen’s sales records, getting an average of 7.6 sales, with a std. deviation of 0.9. Use a 95% certainty to prove the manager’s claim.
A movie theater manager claims that he can get more than 500 customers in one night. The box office clerks have registered an average attendance of 463.5 per night during 9 straight days, with a deviation of 36.2. With a certainty of 95% find if the manager can get the at least 500 customers to attend.
A pizza place has been selling an average of 5,632.6 pizzas per month for the last year and a half. Given the lack of materials, they need to sell 5,300 pizzas
at the most
for the next month. With 90% accuracy, and a deviation of 135.2, find out if that is possible.
H
o
: μ ≥ 35,500
H
¹
: μ < 35,500
Critical (t)
α = 0.05
df = 251 = 24
t
α,n1
= 1.711
Lefttailed
H
o
: μ ≥ μ
o
H
¹
: μ
<
μ
o
It’s left tailed because the direction of the inequality in H
¹
is to the left.
We reject H
o
if the resultant value t is
less
than
–
ta
,n1
.
Righttailed
H
o
: μ ≤ μ
o
H
¹
: μ
>
μ
o
It’s right tailed because the direction of the inequality in H
¹
is to the right.
We reject H
o
if the resultant value t is
greater
than
+
t
α,n1
.
Twotailed
H
o
: μ = μ
o
H
¹
: μ
≠
μ
o
It’s twotailed because we have an
unequal
sign, which is in H
¹
.
We reject H
o
if the resultant value t is
greater
than
+
t
α/2,n1
or
less
than

t
α/2,n1
.
Therefore, we have a “LeftTailed” Test and we want to compare if, t ≤ –t
α,n1
,in order to reject the null hypothesis.
from the chart
5.07 ≤ 1.711
Hence, at 5% significance level, we can go with the alternative hypothesis, (H
¹
), and state that
the union’s claim is correct
.
Each year, car manufacturers perform mileage tests on new car models and submit the results to the EPA. The EPA then tests the cars to determine whether the manufacturer´s claims are correct. In 1998, Mercedes Benz reported that the SLK averaged 29 mpg on the highway. Suppose the EPA tested 15 of these cars and obtained an average of 28.75 mpg with a standard deviation of 1.6 mpg. At 5% significance, test to see if Mercedes Benz´s claim is correct.
In 2001, a study done in Mexico found that the average height of males 50+ years of age was 1.63 meters. A random sample of 12 Mexican citizens (50+ years of age) was found to have an average height of 1.68 meters with a standard deviation of 0.2 meters. With 95% certainty, test to see if the study done in 2001 is true.
A OneWay ANOVA or (Analysis of Variance) is a way to test the equality of three or more means at one time by using variances.
Assumptions when using ANOVA:
The populations from which the samples were obtained must be approximately normally distributed.
The samples must be independent.
The variances of the populations must be equal.
The null hypothesis (H0) for ANOVA is that the mean is the same for all groups.
H0: μ1 = μ2 = μ3
The alternative or research hypothesis (H1) is that the mean is not the same for all groups.
H1: μ1 ≠ μ2 ≠ μ3
A math teacher predicts that students will learn most effectively with a constant background sound, as opposed to an unpredictable sound or no sound at all.
She randomly divides 24 students into 3 groups of 8. All students study a passage of text for 30 minutes. Those in group 1 study with background sound at a constant volume in the background. Those in group 2 study with noise that changes volume periodically. Those in group 3 study with no sound at all.
After studying, all students take a test over the material. Their scores were:
Constant sound 7 4 6 8 6 6 2 9
Random sound 5 5 3 4 4 7 2 2
No sound 2 4 7 1 2 1 5 5
1. Put the raw data, according to group, in "x1", "x2", and "x3"
2. Calculate the sum for group 1.
3. Calculate Sx² for group 1.
5. Calculate (Sx)² for group 1.
6. Repeat steps 25 for groups 2 and 3
7. Calculate SSamong
8. Calculate SSwithin
9. Complete the table by calculating: dfamong, dfwithin, MSamong, and MSwithin, and F
10. Check to see if F is statistically significant on probability table with appropriate degrees of freedom and p < .05.
StemandLeaf plot
frequency
histogram
12, 13, 21, 27, 33, 34, 35, 37, 40, 40, 41
mode
mean
median
max
min
range
stemandleaf
table of frequency (x4)
histogram
Correlation
Pearson correlation coefficient
A parking lot manager in upstate New York is considering expanding his lot, for which he’d have to buy more land. The purchase would be worth it if he gets at least 1,500 cars weekly. To help his decision, he’s looking into the records of the last 2 months, for which he has gotten an average if 1,479.4 cars weekly. Given a standard deviation of 33.6 and an accuracy of 90%, would the purchase be worth it?
Back in the 90’s, a photo place announced they could print all your photos in 1 hour. Their maximum capacity was 420 photos per hour. If during 23 hours, they printed an average amount of 411 photos with a standard deviation of 36.7 photos, find out if their demand would allow them to print all photos in one hour. Use 99% accuracy.
A night club has been trying to bring a new DJ for a Friday night. The DJ will only accept to play there if the club is at full capacity, which is 526 people. To learn if they’ll be capable, the owners have been measuring attendance during the last 20 Fridays, for which they’ve gotten an average attendance of 510 people with a standard deviation of 25. Using 90% certainty, statistically evaluate if they’ll be able to pull it off.
The first class ticket on the Titanic was $1,200, and the second class was $700. What would be the proportion of first and second class passengers, if the overall mean price for a ticket was set to be $1,000.
From 2001 to 2012, the winning scores for a golf tournament were 276, 279, 279, 277, 278, 278, 280, 282, 285, 272, 279, and 278. Using the standard deviation for this sample, Sx, find the percent of these winning scores that fall within one standard deviation of the mean.
From 1984 to 1995, the winning scores for a golf tournament were 276, 279, 279, 277, 278, 278, 280, 282, 285, 272, 279, and 278. Using the standard deviation for this sample, Sx, find the percent of these winning scores that fall within one standard deviation of the mean.
How probable do you want it to be that the desired precision will be obtained?
z
= zscore found in normal distribution table
P(
z
) = probability from normal distribution table
P(
z
) = (
Pc
/2) + 0.5
(only use this formula when applying standard normal distribution table)
P(
z
) = (
0.955
/2) + 0.5
P(
z
) =
0.9775
look on the normal distribution table...
P(
2
) ≈
0.9775
z = 2
e = ±1% = 0.01
p = estimation of proportion (?)
p=0.5
Take the receipts:
96, 53, 39, 64, 57,
71, 47, 99, 62, 68
Find the mean of the receipts.
Calculate the difference between each height and the mean.
Take each difference and square it.
Find the sum of the squared values.
Divide the sum by (n1)
The result is the variance of the sample.
Now, we can know which values are acceptable or normal by finding the standard deviation (square root of the variance).
The owner of the Chez Maurice restaurant is interested in how much people spend at the restaurant. He examines 10 randomly selected receipts for parties of four and writes down the following data.
96, 53, 39, 64, 57, 71, 47, 99, 62, 68