# M3341S18 Chap 3 Describing Numerical Data

## Alan Burstein

on 8 February 2018

MGNT 3341
Spring 2018

The Georgia County Dataset
Describing Distributions of Numerical Data
Exploring Distributions of the Individual Variables
Alan's First Rule of Examining Data:
If at all possible, Draw a Picture
Where's the center?
What's the "shape" of the distribution
How spread out are the data?
Where do the numbers come from?
Mind the
unit of observation!
Looking at Distributions
Birth Weight and Infant Mortality
Measuring the Relationship between Two Variables
Population

versus
Sample

Statistics
Center

Shape
Standard Deviation
What Is It?
The standard deviation is a measure of the average distance of cases from the mean
Notation
Computation
Mean: 21.4%
s: 3.1%

How "typical" or "Unusual" is each of the following:
Fayette: 12.9%
Lamar: 19.9%
Clayton: 27.2%
How You Can Use It
"Standard Deviations from Mean"
is the same as z-score
is the same as standardized score
Chebyshev's Theorem
At least (1 - 1/z2) of the items in any data set will be
within z standard deviations of the mean, where z is
any value greater than 1.

"Empirical Rule"
Center

Shape
Center

Shape
Center

Shape
Center

Shape
r = 0.43
The dataset shows ten year infant mortality rate per 1000 births for 154 counties.
The infant mortality rate ranges from a minimum of 3.4 to a maximum of 25.8.
Mean = 8.8
Median = 8.3
Standard Deviation = 3.4
The distribution of infant mortality rates is skewed to the right with three outlying counties.

Infant Mortality versus Low Birth Rate
Where's the center?
How spread out are the heights?
What's the shape of the Distribution?
Mean = 64.4
s. d. = 1.8
Mean = 70.4
s. d. = 2.7
https://edpuzzle.com/media/588faa7a25a3743e1bb4b58d