**Data Analysis, Modelling, and Decisionmaking**

Data

Data: n. (pl.) more than one realization of measured characteristic(s) of an element/unit

Probability Distributions

Probability: n. the numeric value representing the chance, likelihood, or possibility that a particular event will occur.

"The logic of science." -- E. T. Jaynes

Statistical Inference

Statistical inference: n. the theory, methods, and practice of forming judgments about the parameters of a population and the reliability of statistical relationships, typically on the basis of random sampling.

Evidence-based Management

Evidence-based management: the combination of research and relevant data in making managerial decisions.

Graphical Summary

Single Variable

Box-and-whisker plot

Histogram

Dot Plot

Density plot

Pie/bar chart

Numerical Summary

Single Variable

Metric summary

Percentile summary

Two variables

Covariance/correlation

Cross-tabulation

Linear Regression

Slope(s)

Intercept

R-squared

Probability Theory

Discrete Distributions

Binomial Distribution (np)

Poisson Distribution (lambda)

Hypergeometric (n,k,N,K)

Geometric (k,p)

Continuous Distributions

Normal Distribution (mu,sigma)

Uniform Distribution (a,b)

Exponential Distribution (lambda)

Hypothesis Testing

Expense ratios are higher for intermediate government bond funds.

Confidence Intervals

With 95% confidence, the difference in average expense ratios is 0.012 to 0.163 higher for intermediate government funds.

Sound argumentation

Relevance of evidence

Bayes Rule and the role of belief.

Probabilistic statements occupy the role -- in science -- of stating uncertainty in a common language.

Bayesian decision-making is the essence of evidence-based management.

Decision criteria and transparency

Careful attention to process renders clarity in the decision-making process.

Problem definition and careful attention to sequence and logical structure permit disagreement in an explicable framework, e.g. we cannot dispute the facts but we may well dispute their relevance.

Two variables

Mosaic plots

Scatterplots

Samples

Populations

Is the treatment worth the expense?

Little evidence to suggest that car seats are effective for two to six year olds.

Data

**Pertinent evidence, examples, and facts.**

Warrants

The reason that the evidence supports the claim.

Backing

Rebuttals

Qualifiers

**Claim**

**A statement of opinion to be supported.**

Stephen Toulmin's Model of Argumentation

Backing: Evidence supporting the application of a warrant.

Rebuttals: mitigating factors in the disqualification of warrants.

Qualifiers: Limits on the strength or applicability of a warrants.

The most significant cause of death among American children.

Use data to investigate the efficacy of child safety seats.

Claim: Car safety seats do not improve outcomes for children aged 2 through 6

Generalization warrants:

The evidence from a sample implies truth in a population.

Composition warrants:

The evidence contains signs, clues, symptoms, or components of the claim.

Authority warrant:

The evidence is linked to authoritative source interpretations.

Analogy warrant:

Evidence is connected by analogy, event, or precedent.

Causality warrant:

The evidence is caused by or as a result of the claim.

Principle warrant:

The evidence is indicative of a broader, relevant principle.

Just the facts on the size of the problem and the "cures"

The data on all reported crashes with fatalities.

Experimental evidence does not show the advantage of car seats either.

**Robert W. Walker, Ph. D.**

Associate Professor of Quantitative Methods

BondFunds.xls

A random sample of 184 bond funds.

Fund Number: Identification number unique to each fund.

Type: Intermediate government / short term corporate

Assets: In millions of dollars

Fees: Sales charges (yes or no)

Expense ratio: Ratio of expenses to net assets

Return 2009: Twelve-month return in 2009

3-year return: Annualized returns 2007-2009

5-year return: Annualized returns 2005-2009

Risk: Risk-of-loss classification for the mutual fund.

Data can be acquired from http://www.willamette.edu/~rwalker/GSM5103/data/BEARXSP500.xlsx

The elements of a sound argument are statements of logic replete with bounding conditions.

This is also true of the statements of statistics.

And of science -- when probability enters.

Definitions

Arithmetics

Addition

Multiplication

Union and intersection/joint probability

Conditional probability and independence

Bayes' Rule

P(Fees=Yes|Type=IG)=34/87

P(Type=IG|Fees=Yes)=34/54

P(Type=IG and Fees=Yes)=34/184

What do we know?

These programs cost money per participant.

They recover money in unpaid benefits now and over time.

The failure rate is low.

Generally, the benefits ($860 on average) are less than cost.

BUT

What are the goals? What quantities are we minimizing/maximizing?

How do we translate the quantities that we can measure and assumptions about those that we can't into decisions?

In USA Today (March 18, 2012)

Drug testing welfare applicants nets little.

Net effect is minimal if not costly. Must assume saved benefits.

No drug test, wo welfare: Program protects taxpayer dollars.

Even if costly, funds are only disbursed to those with negative tests.

Remaining Uncertainties

How do we account for those not taking the test?

Some drop from fear and some from not needing TANF but this matters for accounting

For state employees, if performance metrics can't establish reasonable suspicion, then there may be better uses of resources.

An Extended Example using Juries [Resources > Probability > Juries.xlsx

Problem setup: A murder

Three pieces of evidence:

Blood type: P(E1|NG)=0.45

Fingerprints: P(E2|NG)=0.21

DNA: P(E3|NG)=0.017

Bayes rule gives us: P(G|E1,E2,E3)

It all depends on the Prior.....

Drug test:

Two outcomes: + and -

Two statuses: User and non-user

Claim: Car seats do not improve crash outcomes for 2-6 year olds.

Evidence: No difference in the death outcome from broad observational data.

Warrant: Fatal crashes are just like others and the death outcome is just like injuries in relation to car seats.

Probability is :

(1) a priori (known),

(2) empirical (data), or

(3) subjective.

What do we learn

given

the data?

Because what we are learning from data is almost always a

probability distribution

. How likely is a variable to take on particular values?