Loading presentation...

Present Remotely

Send the link below via email or IM


Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.


AP Statistics Topic Outline

No description

Andrew Knauft

on 7 May 2015

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of AP Statistics Topic Outline

Major Ideas
Looking at Data
Gathering Data
Using Data
Exploring Data
Describing patterns and departures from patterns
(I.A) Constructing and Interpreting graphical displays of distributions of univariate data
Dotplot, stemplot, histogram, cumulative frequency plot
Anticipating Patterns
Exploring random phenomena using probability and simulation
(III.C) The Normal Distribution
Statistical Inference
Estimating population parameters and testing hypotheses
AP Statistics Topic Outline
(I.B) Summarizing distributions of univariate data
Measuring center (mean, median), spread (IQR, range, sd)
Measuring position (quartiles, percentiles, standardized scores)
Effect of changing units
(I.C) Comparing distributions of univariate data
Dotplots, back-to-back stemplots, parallel boxplots
Comparing shape, center and spread
Comparing clusters, gaps, outliers
Sampling and Experimentation
Planning and conducting a study
(I.D) Exploring bivariate data
Analyzing patterns in scatterplots
Correlation and linearity
Least-squares regression line
Residual plots, outliers, influential points
Transformations to achieve linearity: logarithmic and power transformations
(I.E) Exploring categorical data
Frequency tables and bar charts
Marginal and joint frequency (two-way tables
Conditional relative frequencies and association
Comparing distributions using bar charts
(II.A) Overview of methods for data collection
Sample Survey
Observational Study
(II.B) Planning and conducting surveys
Characteristics of good surveys
Populations, samples, and random selection
Sources of bias
SRS, Stratified, Cluster
(II.C) Planning and conducting experiments
Characteristics of good experiments
Treatments, control groups, experimental units, random assignments and replication
Sources of bias and confounding, including placebo effect and blinding
Completely randomized design
Randomized block design, including matched pairs design
(II.D) Generalizability
Results and types of conclusions
from observational studies
from experiments
from surveys
(III.C.1) Theory
Shape, center and spread
Empirical rule
Finding probabilities from standardized scores using tables
Finding probabilities from standardized scores using calculators
(III.C.1) Applications
Model for measurements
Argue whether a sample came from a Normal population
(III.A) Probability
"Law of large numbers"
Conditional probability
Random Variables
Expected value, Standard Deviation
Linear combinations
Probability Distributions
(III.B) Combining Random Variables
Independence vs. Dependence
Mean, SD for sums and differences of independent RV
(III.D) Sampling Distributions
Proportion, Mean
Central Limit Theorem
Difference in proportions
Difference in means
chi-square distribution
(IV.A) Estimation
Population parameters
Margins of error
Logic of confidence intervals
CI for proportion, difference in proportion
CI for mean, difference in mean (paired and unpaired)
CI for slope of least-square regression line
(IV.B) Tests of Significance
Logic of significance testing
Null and alternate hypotheses
one-sided vs. two-sided
Type I and Type II errors
Test for proportion, difference in proportion
Test for mean, difference in mean (paired and unpaired)
Chi-square tests
Goodness of fit
Test for slope of best-fit line
The AP Exam
Section I
40 multiple choice {90 min}
~~10 minute break~~
Section II
6 Free Response {90 min}
q1-q5: ~12 min each
q6: ~30 min
q6 is an "Investigative task" -- Integrate topics and apply them to new contexts or in a non-routine way
Wednesday, May 13 at 12:00
Mean of Sum / Difference
Probability of A and B
When two events are independent, the probability of both
occurring is the product of the probabilities of the individual events. More formally, if events A and B are independent, then the probability of both A and B occurring is:
P(A and B) = P(A) x P(B)
If you flip a coin twice, what is the probability that it will come up heads both times? Event A is that the coin comes up heads on the first flip and Event B is that the coin comes up heads on the second flip. Since both P(A) and P(B) equal 1/2, the probability that both events occur is:
1/2 x 1/2 = 1/4.
Variance of Sum / Difference
A cluster is formed when several data points lie in a small interval. A gap is an interval that contains no data. An outlier has a value that is much greater than or much less than other data in the set. An outlier may significantly affect the mean of a data set. A single outlier will not affect the mode(s) and is likely to affect the median only slightly. Features such as clusters, gaps, and outliers are more easily seen when the data are shown on a line plot.
Data: A set of measurements or observations taken on a group of objects.
Variable: A characteristic of an object.
Two types of data:
• Quantitative variables
– Weight, family income, number of cups of coffee on a given day.
• Categorical variables
– Gender, college major, satisfaction response on a survey (poor, fair, good, excellent)
Reading Box Plots
The conditional probability of an event B is the probability that the event will occur given the knowledge that an event A has already occurred. This probability is written P(B|A), notation for the probability of B given A. In the case where events A and B are independent (where event A has no effect on the probability of event B), the conditional probability of event B given event A is simply the probability of event B, that is P(B).
If events A and B are not independent, then the probability of the intersection of A and B (the probability that both events occur) is defined by
P(A and B) = P(A)P(B|A).

From this definition, the conditional probability P(B|A) is easily obtained by dividing by P(A)
A histogram is a graphical representation of the distribution of numerical data using bars of different heights
Bivariate data. When we conduct a study that examines the relationship between two variables, we are working with bivariate data. Suppose we conducted a study to see if there were a relationship between the height and weight of high school students. Since we are working with two variables (height and weight), we would be working with bivariate data.

ayy lmao
Back to Back Stem plot example
Central Limit theorem
If there is an outlier, it would be more accurate to use the mean rather than the median to define the center of the spread.
Example of a plot for which a linear regression is inadequate.
Quartile Range



Simple Random Sampling — every possible sample of the same size is equally likely to be chosen
Systematic sampling is often used instead of random sampling. It is also called an Nth name selection technique. After the required sample size has been calculated, every Nth record is selected from a list of population members. As long as the list does not contain any hidden order, this sampling method is as good as the random sampling method. Its only advantage over the random sampling technique is simplicity. Systematic sampling is frequently used to select a specified number of records from a computer file.

DEFINITION of 'Stratified Random Sampling'
A method of sampling that involves the division of
a population into smaller groups known as strata. In stratified random sampling, the strata are formed based on members' shared attributes or characteristics. A random sample from each stratum is taken in a number proportional to the stratum's size when compared to the population. These subsets of the strata are then pooled to form a random sample.
Means one type of data
1. Linearity refers to whether a data pattern is linear (straight) or nonlinear (curved).
2. Slope refers to the direction of change in variable Y when variable X gets bigger. If variable Y also gets bigger, the slope is positive; but if variable Y gets smaller, the slope is negative.
3. Strength refers to the degree of "scatter" in the plot. If the dots are widely spread, the relationship between variables is weak. If the dots are concentrated around a line, the relationship is strong.
Measures of spread describe how similar or varied the set of observed values are for a particular variable

There are many reasons why the measure of the spread of data values is important, but one of the main reasons regards its relationship with measures of central tendency. A measure of spread gives us an idea of how well the mean, for example, represents the data.

Some ways we can measure spread with the standard deviation, the mean, the variability, median, and so much more.
Linear transformation. A linear transformation preserves linear relationships between variables. Therefore, the correlation between x and y would be unchanged after a linear transformation. Examples of a linear transformation to variable x would be multiplying x by a constant, dividing x by a constant, or adding a constant to x.
Nonlinear tranformation. A nonlinear transformation changes (increases or decreases) linear relationships between variables and, thus, changes the correlation between variables. Examples of a nonlinear transformation of variable x would be taking the square root of x or the reciprocal of x.
A good expierimental design serves three purposes.

Causation. It allows the experimenter to make causal inferences about the relationship between independent variables and a dependent variable.

Control. It allows the experimenter to rule out alternative explanations due to the confounding effects of extraneous variables (i.e., variables other than the independent variables).

Variability. It reduces variability within treatment conditions, which makes it easier to detect differences in treatment outcomes.
Two events are mutually exclusive or disjoint if they cannot occur at the same time.

The complement of an event is the event not occuring. The probability that Event A will not occur is denoted by P(A').
The mean of the discrete random variable X is also called the expected value of X. Notationally, the expected value of X is denoted by E(X). Use the following formula to compute the mean of a discrete random variable.

E(X) = μx = Σ [ xi * P(xi) ]

Discrete. Within a range of numbers, discrete variables can take on only certain values. Suppose, for example, that we flip a coin and count the number of heads. The number of heads will be a value between zero and plus infinity. Within that range, though, the number of heads can be only certain values. For example, the number of heads can only be a whole number, not a fraction. Therefore, the number of heads is a discrete variable. And because the number of heads results from a random process - flipping a coin - it is a discrete random variable.
Univariate means single variable.
Univariate data doesn't look at causes or relationship.

Data: A set of measurements or observations taken on a group of objects.
Variable: A characteristic of an object.
Two types of data:
• Quantitative variables
– Weight, family income, number of cups of coffee on a given day.
• Categorical variables
– Gender, college major, satisfaction response on a survey (poor, fair, good, excellent)
Full transcript