**Data Analytics**

**Business Analytics**

**Descriptive Analytics**

**Predictive Analytics**

**Prescriptive Analytics**

**© 2016 Cengage**

**Descriptive Statistics**

Data Visualization

Data Visualization

**Statistical Inference**

**(Statistical) Inference**

Inference:

a conclusion reached on the basis of evidence and reasoning.

Statistical Inference:

the process of deducing properties of an underlying probability distribution by analysis of

data

.

Inferential statistical analysis infers properties about a population: this includes

testing hypotheses

and

deriving estimates

.

**Deriving Estimates**

**Testing Hypothesis**

More Sophisticated

But how do we select a sample?

Senario 1:

The director of personnel for Electronics Associates, Inc. has been assigned the task of developing a prole of the company’s 2,500 employees. The characteristics to be identied include the mean annual salary proportion of employees having completed the company’s management training program.

Senario 2:

Consider the population of customers arriving at a McDonald’s. An employee is asked to select and interview a sample of customers in order to develop a prole of customers who visit the restaurant.

Step 1: Assign a random number to each element of the population.

Step 2: Select the n elements corresponding to the n smallest random numbers.

McDonald's Solution:

The sampling procedure was based on the fact that some customers presented discount coupons. Whenever a customer presented a discount coupon, the next customer served was asked to complete a customer prole questionnaire.

Mean

Interval Estimation

Proportion

Electronics Associates, Inc. (EAI) Data

Example:

Sample

Not Very Accurate?!

What if we select another sample?

Let's select 500 more samples of 30

Is this normal? (pun intended)

Does this only apply to sample mean?

sample mean is a random variable!

so it has has an

expected value

(mean),

a

standard deviation

, and a

characteristic shape

so, what about the shape?

Population has a normal distribution

Population does not have a normal distribution

When the population has a normal distribution, the sampling distribution of x is normally distributed for any sample size.

When the population does not have a normal distribution, the sampling distribution of x is normally distributed for large sample sizes.

General statistical practice is to assume that, for most applications, the sampling distribution of x can be approximated by a normal distribution whenever the sample size is 30 or more. In cases in which the population is highly skewed or outliers are present, sample sizes of 50 may be needed.

How many is large?

So does the

sample size matters?

The answer is yes, but in varieance, not mean!

Suppose that

in the EAI sampling problem we select a simple random sample of 100 EAI employees instead of the 30 originally considered.

sample proportion is a random variable!

so it has has an

expected value

(mean),

a

standard deviation

, and a

characteristic shape

How about the characteristic shape?

For a simple random sample from a large population, p is a binomial random variable indicating the number of elements in the sample with the characteristic of interest.

Does the sample size matter?

Provide information about how close the point

estimate is to the value of the population parameter.

Point Estimate

Let's put everything in one place!

what is the problem with

this estimation?

We know

We do not know

This introduces an additional source of uncertainty

t-Distribution

How do we deal with this uncertainty?

which distribution does represent less certainty?

degrees of freedom= n-1

Example: in our EAI example, n=30.

Thus we have 29 degrees of freedom

Let's put everything in one place

again

!

Before

After

In Class Exercise

a study designed to estimate the mean credit card debt for the population of U.S. households. A sample of 70 households provided the credit card balances shown in Table 6.5.

Data: NewBalance

Compute an interval estimate of the population mean with 95%

confidence interval

Why 90%

a hypothesis is an

assumption

we make about a population parameter such as any quantity or measurement about this population that is fixed and that we can use it as a value to a distribution variable. Typical examples of parameters are the

mean

and the

variance

.

How many emils should I send out before someone signs up for our service?

Example:

10?

5?

What if we instead ask: What is the

mean

number of e-mails that we need to send before someone signs up to our product? We can define the population here as the recipients of our offer.

Null Hypothesis

Alternative Hypothesis

H

H

0

a

a tentative conjecture

about a population parameter

the exact opposite of the tenative conjecture

counter claim

claim

hard to state

easy to state

hard to reject

easy to reject

Research Hypothesis

so... which one to begin with?

Start with

H

0

Start with

Several new fuel injection units will be manufactured, installed in test automobiles, and subjected to research-controlled driving conditions. The new system provides more than 24 miles per gallon.

Example:

Hypothesis Test

The label on a soft drink bottle states that it contains 67.6 fluid ounces.

Example:

Hypothesis Test(s)

Challenging the Null hypothesis

H

a

Summary

one-tailed

one-tailed

two-tailed

Hypothesis Test of the Population Mean

Example

The label on a large can of Hilltop Coffee states that the can contains 3 pounds of coffee.

The FTC (Federal Trade Commission) interprets the label information on a large can of coffee as a claim by Hilltop that the population mean filling weight is at least 3 pounds per can.

Hypothesis

a sample of 36 cans of coffee is selecte.

what does an average of 2.92 mean?

how much less than 3 pounds is

significance

?

probability of making a Type I error by rejecting the null hypothesis by mistake

test-statistics

t = ?

How small must the test statistic t be before we choose to reject the null hypothesis?

Summary of one-tailed test

Example

Holiday Toys manufactures and distributes its products through more than 1,000 retail outlets.

Holiday’s marketing director is expecting demand to average 40 units per retail outlet

Holiday decided to survey a sample of 25 retailers to gather more information.

Hypothesis

The sample of 25 retailers provided a mean of 37.4

The sample has a standard deviaiton of 11.79

p-value is a probability used to determine whether the null hypothesis should be rejected

For a two-tailed test, values of the test statistic in either tail provide evidence against the null hypothesis.

Summary of two-tailed test

Summary

Hypothesis Test of the Population Proportion

one-tailed

one-tailed

two-tailed

Example

Pine Creek implemented a special promotion designed to attract women golfers.

Over the past year, 20% of the players at Pine Creek were women.

One month after the promotion was implemented, the course manager requested a statistical study to determine whether the proportion of women players at Pine Creek had increased.

random sample of 400 players was selected.

how much more than 0.25 is

significance

?

test-statistics:

Summary of one-tailed and two-tailed test (proportion)

what does sample proportion of 0.25 mean?

Lower tail

Upper tail

Excel

sample proportion is 0.25