### Present Remotely

Send the link below via email or IM

CopyPresent to your audience

Start remote presentation- Invited audience members
**will follow you**as you navigate and present - People invited to a presentation
**do not need a Prezi account** - This link expires
**10 minutes**after you close the presentation - A maximum of
**30 users**can follow your presentation - Learn more about this feature in our knowledge base article

# Statistics

Ch. 1 and 2 Summary

by

Tweet## Ivanti Galloway

on 5 September 2012#### Transcript of Statistics

What is Statistics? How to describe

and Summarize Statistics population: total set of observations that can be made

census: study that obtains data from every member of a population

sample: subset of a population

units: things in a population

subject: units that are people

variable: characteristic of a unit

dichotomous variable: variable without only two possible outcomes

quantitative variable: variable assigns a number to each individual

statistic: number derived from population (denoted with Greek letters like

parameter: number derived from the population

random (sampling) error: difference between sample value and true population value

experiment: procedure which results in a measurement or observation

random experiment: experiment which outcome depends on chance

execution/trail: each repitition of an experiment Parameter Vs. Statistics population sample parameter

exact number statistic

estimated number so there can be error Say there's box of cards with either 1 or 0 written on each card. You pull out 25 cards in each random experiment to figure out how many 1s are in the box. You pull out 14 1s and 11 0s. This give the sample percentage 14/25=.56=56%. However if the true population value of the 1s is 60%, then 54%-60%=-4% gives us our random sampling error.

suppose you performed this experiment 3 more times.

The second time the sample random error is +4%, the third

time -8%, and the last 20%. the average comes out to...

AV: -4%+4%-8%+20%= +3% 4 The cancellation of negative and positive results in a small number. To get a "typical size" of a random sampling error, ignore the size and get the mean of the absolute values (MA): 4%+4%+8%+20%= 9%

4 MA is difficult to deal with theoretically

because the absolute value function is not

differential at 0. So gernerally the root mean square is used.

(RMS): (-4%) +(+4%) +(-8%) +(+20%) = 124%= 11.14%

4 The RMS of all possible random sampling errors is called standard error size n and its percentage p to estimate the population percentage of a dichotomous population, we show that SE= (1- ) 1 n 2 n p inferential statistic: estimate of an unknown parameter made by examining a random sample

sampling theory: examining the different samples that are possible and likely from a population with a known parameter 2.1 Variables and Data Sets observations: actual values of variables

measurement: observations that are numbers

data set: collection of observations of observation or measurement s of a variable forms *goal of statistics is to use information provided by a data set to study the population from which it came.

* arranging data into charts, tables, and graphs along with computations of various descriptive numbers about the data

*reasoning in environment where one does not know, or cannot know, all facts needed to reach conclusion with complete certainty

* 2.2 Categorical Data categorical quantitative Variables gender

religion

race

occupation

blood type height

weight

age

years of education

2.3 Ordinal Data ordinal-categorical variables ranked in meaningful order nominal-categorical data that are not ordinal; 2.3 example 20 students STA 290 class

D B D B F C A B B D

A B B C+ C C+ B A A B+

*ordinal data grade tally frequency relative

frequency A

B+

B

C+

C

D+

D

F llll l llll ll ll ll lll l 4

1

7

2

2

0

3

1 .2

.05

.35

.1

.1

0

.15

.05

1 20 median: divides the ordered list into two equal parts, halfway between 10th and 11th grade, B

mode: most accruing, B 2.4 RATIO DATA

ratio data: quantitative data for which it's meaningful to form quotients

ex. person's age

ratio v. ordinal: The data set of grades from the previous chart are ordinal. If the grades were assigned point value (A=4, B=3, C=2, etc.) then the set would be ratio

Ratio Variable can be discrete or continuous

Discrete Data-There are gaps between possible values. A variable that can have only integers will be discrete

Continuous Data-The data can be described by points in an interval of the line. There are no gaps between possible values. 2.5 Frequency Tables and Histograms

The STA grades chart was an example a tally and frequency table

Histograms are graphically displayed based on the frequency table.

Unlike bar graphs which has the height of the bar proportional to the frequency, histograms bar's area's are proportional to the frequency. Histogram 1 2 3 4 2 3 4 5 6 7 5 and 6 have no height on this graph up aren't left out of the horizontal scale.

The horizontal scale is uniform

The histogram fits the horizontal scale and not the other way around.

In this case the center of each bar is possible value (2,7) and the boundaries occur at impossible values like 2.5.

continuous data sets put frequency tables in groups or sets called intervals or classes Sturge's Rule

2 n K-1 *only a guide 2.6 Grouped Data and Sturge's Rule absolute density d=f/w

Height of rectangle=Density= Frequency

Class Width

relative density d%=d/n=f%/w

Height of rectangle=Relative Density=Relative Frequency

Class Width 2.7 Stem and Leaf Plot

way of organizing numbers that makes them easier to read

Data

12 45 43 25 13 22 63 29 34

56 14 34 11 23 13 39 25 23

Stem and Leaf Plot

1 1 2 3 3 4

2 2 3 3 5 5 9

3 4 4 9

4 3 5

5 6

6 3

2.8 Five-Number Summary

Rank 1 2 3 4 5 6 7 8

Value 2 3 4 4 5 5 6 10 Range from minimum value min=2 to the maximum value max=10. These numbers can used as a starting point of a five number summary of as data set. The average (1+2+...+8)/8=4.5. There's no number in the middle so the average of the fourth and fifth value (4.5) is the median. The median separates the values in to equal parts.

lower-ranking values: {2,3,4,4}

higher-ranking values: {5,5,6,10}

if there is an odd number of values in the set include the median in both parts

The median of the lower half is called the first quartile Q1=(3+4)/2=3.5 The median of the high half is called the third quartile, Q3=(5+6)/2=5.5. Sometimes Q2 is used for the median.

The five-number summary includes the min, Q1, med, Q3, max in ascending order. In this case 2, 3.5, 4.5, 5.5. 10.

The range=max-min

Interquartile range (IQR)=Q3-Q1 2.9 Box Plot

2 3 4 5 6 7 8 9 10 2.10 The mean

the mean is the average of all the values in the set.

Sample mean: x= x /n

Population mean: u= x /N

Grouped Mean: x= x(f/n)

n

i=1 n

i=1 2.11 Variance

Data set 1 48 49 50 50 51 52

Data set 2 0 10 50 50 90 100

These data sets have the same median, mode, and mode. In order to describe the data set well we also need to measure its variability. Mean Absolute Deviation:

MAD=(1/n) x -m i=1 n l l i l l l l SAMPLE STANDARD DEVIATION:

n= (x -x) 1 n-1 l i=1 n 2 l l l l population standard variation

Full transcriptand Summarize Statistics population: total set of observations that can be made

census: study that obtains data from every member of a population

sample: subset of a population

units: things in a population

subject: units that are people

variable: characteristic of a unit

dichotomous variable: variable without only two possible outcomes

quantitative variable: variable assigns a number to each individual

statistic: number derived from population (denoted with Greek letters like

parameter: number derived from the population

random (sampling) error: difference between sample value and true population value

experiment: procedure which results in a measurement or observation

random experiment: experiment which outcome depends on chance

execution/trail: each repitition of an experiment Parameter Vs. Statistics population sample parameter

exact number statistic

estimated number so there can be error Say there's box of cards with either 1 or 0 written on each card. You pull out 25 cards in each random experiment to figure out how many 1s are in the box. You pull out 14 1s and 11 0s. This give the sample percentage 14/25=.56=56%. However if the true population value of the 1s is 60%, then 54%-60%=-4% gives us our random sampling error.

suppose you performed this experiment 3 more times.

The second time the sample random error is +4%, the third

time -8%, and the last 20%. the average comes out to...

AV: -4%+4%-8%+20%= +3% 4 The cancellation of negative and positive results in a small number. To get a "typical size" of a random sampling error, ignore the size and get the mean of the absolute values (MA): 4%+4%+8%+20%= 9%

4 MA is difficult to deal with theoretically

because the absolute value function is not

differential at 0. So gernerally the root mean square is used.

(RMS): (-4%) +(+4%) +(-8%) +(+20%) = 124%= 11.14%

4 The RMS of all possible random sampling errors is called standard error size n and its percentage p to estimate the population percentage of a dichotomous population, we show that SE= (1- ) 1 n 2 n p inferential statistic: estimate of an unknown parameter made by examining a random sample

sampling theory: examining the different samples that are possible and likely from a population with a known parameter 2.1 Variables and Data Sets observations: actual values of variables

measurement: observations that are numbers

data set: collection of observations of observation or measurement s of a variable forms *goal of statistics is to use information provided by a data set to study the population from which it came.

* arranging data into charts, tables, and graphs along with computations of various descriptive numbers about the data

*reasoning in environment where one does not know, or cannot know, all facts needed to reach conclusion with complete certainty

* 2.2 Categorical Data categorical quantitative Variables gender

religion

race

occupation

blood type height

weight

age

years of education

2.3 Ordinal Data ordinal-categorical variables ranked in meaningful order nominal-categorical data that are not ordinal; 2.3 example 20 students STA 290 class

D B D B F C A B B D

A B B C+ C C+ B A A B+

*ordinal data grade tally frequency relative

frequency A

B+

B

C+

C

D+

D

F llll l llll ll ll ll lll l 4

1

7

2

2

0

3

1 .2

.05

.35

.1

.1

0

.15

.05

1 20 median: divides the ordered list into two equal parts, halfway between 10th and 11th grade, B

mode: most accruing, B 2.4 RATIO DATA

ratio data: quantitative data for which it's meaningful to form quotients

ex. person's age

ratio v. ordinal: The data set of grades from the previous chart are ordinal. If the grades were assigned point value (A=4, B=3, C=2, etc.) then the set would be ratio

Ratio Variable can be discrete or continuous

Discrete Data-There are gaps between possible values. A variable that can have only integers will be discrete

Continuous Data-The data can be described by points in an interval of the line. There are no gaps between possible values. 2.5 Frequency Tables and Histograms

The STA grades chart was an example a tally and frequency table

Histograms are graphically displayed based on the frequency table.

Unlike bar graphs which has the height of the bar proportional to the frequency, histograms bar's area's are proportional to the frequency. Histogram 1 2 3 4 2 3 4 5 6 7 5 and 6 have no height on this graph up aren't left out of the horizontal scale.

The horizontal scale is uniform

The histogram fits the horizontal scale and not the other way around.

In this case the center of each bar is possible value (2,7) and the boundaries occur at impossible values like 2.5.

continuous data sets put frequency tables in groups or sets called intervals or classes Sturge's Rule

2 n K-1 *only a guide 2.6 Grouped Data and Sturge's Rule absolute density d=f/w

Height of rectangle=Density= Frequency

Class Width

relative density d%=d/n=f%/w

Height of rectangle=Relative Density=Relative Frequency

Class Width 2.7 Stem and Leaf Plot

way of organizing numbers that makes them easier to read

Data

12 45 43 25 13 22 63 29 34

56 14 34 11 23 13 39 25 23

Stem and Leaf Plot

1 1 2 3 3 4

2 2 3 3 5 5 9

3 4 4 9

4 3 5

5 6

6 3

2.8 Five-Number Summary

Rank 1 2 3 4 5 6 7 8

Value 2 3 4 4 5 5 6 10 Range from minimum value min=2 to the maximum value max=10. These numbers can used as a starting point of a five number summary of as data set. The average (1+2+...+8)/8=4.5. There's no number in the middle so the average of the fourth and fifth value (4.5) is the median. The median separates the values in to equal parts.

lower-ranking values: {2,3,4,4}

higher-ranking values: {5,5,6,10}

if there is an odd number of values in the set include the median in both parts

The median of the lower half is called the first quartile Q1=(3+4)/2=3.5 The median of the high half is called the third quartile, Q3=(5+6)/2=5.5. Sometimes Q2 is used for the median.

The five-number summary includes the min, Q1, med, Q3, max in ascending order. In this case 2, 3.5, 4.5, 5.5. 10.

The range=max-min

Interquartile range (IQR)=Q3-Q1 2.9 Box Plot

2 3 4 5 6 7 8 9 10 2.10 The mean

the mean is the average of all the values in the set.

Sample mean: x= x /n

Population mean: u= x /N

Grouped Mean: x= x(f/n)

n

i=1 n

i=1 2.11 Variance

Data set 1 48 49 50 50 51 52

Data set 2 0 10 50 50 90 100

These data sets have the same median, mode, and mode. In order to describe the data set well we also need to measure its variability. Mean Absolute Deviation:

MAD=(1/n) x -m i=1 n l l i l l l l SAMPLE STANDARD DEVIATION:

n= (x -x) 1 n-1 l i=1 n 2 l l l l population standard variation