**Statistics Training**

**Level 1**

Overview of Level 1:

* Why do we need statistics?

* Observational versus Designed Experiment

* Examples

* Confounding Factors

* Correlation versus Causation

* Blocking

**NOW IT IS TIME...**

...TO COMPLETE YOUR LEVEL 1 TEST!

Aim of Training:

To provide basic through to advanced training on statistics for those who read the research (but don't perform the trials or statistics).

Level 1 (basic): fundamental outline and understanding of observational versus experimental designs, confounding and blocking;

Level 2 (intermediate): explanation of terms; outline of hypothesis testing,

p-values

, statistical significance and statistical error;

Level 3 (advanced): attempting the statistics involved and putting it all together to understand what it means.

WHY DO WE NEED STATISTICS?

The world is full of variability and you need to be able to make decisions in the presence of that variability. This is what

statistics

helps you to do.

If there were no variability:

* any two people receiving a drug would have exactly the same response to that drug. To compare the effects of two drugs, A and B, you'd need only two people - one to receive drug A and one to receive drug B.

* However as there is variability, you need a

resonable

number of people to receive drug A, and a

resonable

number of people to receive drug B. It is hoped that you will get enough information from these two groups to be able to decide whether either drug is better.

Because of the variability that is present, the information from these two groups of people will not be exactly the same as you would get from different groups of people, or even from the same two groups at another time!

MOST IMPORTANT ASPECTS

It doesn't matter whether you're comparing the effects of two drugs, or surveying the attitudes of owners of cars or observing the number of naturopaths in different locations; in any scientific investigation, you will encounter variability in what you measure.

Three of the most important aspects of statistics:

* Estimation

* Hypothesis Testing

* The collection of the data

THE COLLECTION OF DATA:

Two types of data - qualitative and quantitative:

Qualitative data consists of attributes, labels, or non-numerical entries. May be written as 'numbers', but aren't 'real numbers'. e.g. liberal, labour, postcode.

Quantitative data consists of numerical measurements or counts. e.g. how tall, number of children.

In level 2, we will look at appropriate ways to display this data

OBSERVATIONAL VERSUS DESIGNED EXPERIMENT

REFERENCES

Fisch, K & Mcleod, S. (Mar 27th 2010).

Amazing Statistics,

viewed 14th June 2016 <_http.://www.youtube.com/watch?v=oGGYIw_pIj8>

Khan academy. (2015) Statistical questions: Data and statistics 6th grade. Retrieved from <https.://www.youtube.com/watch?v=OjzfQDFf7Uk>

Margna, M. (2014). Infinity. Accessed 03/11/2016, from <http.:_//www.orangefreesounds.com/meditation-music-free>

Moore, D., Notz, W., & Fligner, M. (2013).

The Basic Practice of Statistics,

6th Edition, W.H. Freeman and Company: New York.

Oehlert, G. (2000).

A first course in design and analysis of experiments.

New York: W.H. Freeman and Company.

Russell, K. (2016).

STA404 Statistical Reasoning

[STA404 201630 Study Guide]. Retrieved from Charles Sturt University website: https://interact2.csu.edu.au/bbcswebdav/pid-735871-dt-content-rid-1653688_1/courses/S-STA404_201630_W_D/All_Chapters_symbols.pdf

Designed Experiments:

Researchers manipulate the explanatory variables (treatments) while holding other variables constant and notice the consequences of the response variable.

Observational Experiments:

Researchers observes the differences in explanatory variables and see if these are related to the differences in the response variable.

Example 1: In a study, researchers divide a field into four equal areas. In each area, a certain amount of nutrients is applied. Heights of trees in the four areas are recorded.

Example 2: In a study, researchers measure the amount of nutrients in soil and the heights of trees in different areas.

Is example 1 an experiment or observational study? Example 2?

Can you draw any cause-and-effect relationships in Example 2?

EXAMPLES & QUESTIONS

CONFOUNDING FACTORS

ILLUSTRATING CONFOUNDING

BLOCKING

A

block

is a group of individuals that are known before the experiment to be similar in some way that is expected to affect the response to the treatments.

In a

block design

, the random assignment of individuals to treatments is carried out separately within each block.

Blocks control the effects of some outside variables by bringing those variables into the experiment.

(Moore, Notz & Fligner, 2013) (Oehlert, 2000)

EXAMPLES

Example: If three concentrations of nutrients are applied to six land areas, with each concentration applied to two areas, there are two replicates. If some areas are closer to water while some are not, then we should divide the areas into two blocks. Why?

Example: In an experiment you are randomly recruiting 24 students (male and female) with outcome to show if different levels of fibre impact hunger levels. You will be testing over 4 sessions and each student will receive 2 high fibre muffins and 2 low fibre muffins over these four sessions. At the start of the session they will be asked to rate hunger out of 100 and then after eating muffin at end of 15 minute session they will be again asked to rate hunger level out of 100. Response variable is the decrease in hunger level. What would be a potential block that you would apply?

Statistics is the science of learning from data.

Data are numbers, but they are not 'just numbers".

Data are numbers with a context.

The number 10.5 for example, carries no information by itself. But if we hear that a friend's new baby weighed 10.5 pounds at birth, we have an understanding of the size of the child. The context engages our background knowledge and allows us to make judgments. We know that a baby weighting 10.5 pounds is quite large, and that a human baby is unlikely to weigh 10.5 ounces or 10.5 kilograms. The context makes the number informative.

(Moore, Notz & Fligner, 2013)

BEWARE THE LURKING VARIABLE

Almost all relationships between two variables are influenced by other variables lurking in the background.

To understand the relationship between two variables, you must often look at other variables. Careful statistical studies try to think of and measure possible lurking variables in order to correct for their influence.

News reports often just ignore possible lurking variables that might ruin a good headline like "Playing soccer can improve your grades." Rather than looking at education and affluence as lurking variables, i.e. background factors that help explain the relationship between soccer and good grades.

The habit of asking, "What might lie behind this relationship?" is part of thinking statistically.

(Moore, Notz & Fligner, 2013)

Two variables (explanatory variables or lurking variables) are confounded when their effects on a response variable cannot be distinguished from each other.

Observational studies of the effect of one variable on another often fail because the explanatory variable is confounded with lurking variables.

Well-designed experiments take steps to prevent confounding.

(Moore, Notz & Fligner, 2013)

The relationship between coffee drinking and pancreatic cancer is confounded by cigarette smoking. The relationship between the confounder and the expanatory variable and the response variable can be either positive or negative.

(Russell, 2016)

ESTIMATION:

Typically we want to obtain information (a 'data set') about something; e.g. weights (in kg) of 12 month old babies; voting intention of electors in QLD.

The collection of weights for all babies, or all voting intentions, is called a

population

.

Note: it is

not

the babies, or electors, under study who form the population.

Each individual has characteristics of interest. If we measure some characteristic, its value varies from case to case. each characteristic is called a

variable

e.g. who will you vote for in the election.

Usually it is not practical (expense / ethics / time) to measure the characteristic of the entire population. So we collect data from a subset of the population under study: a

sample.

The characteristics of the sample are used to estimate the characteristics of the population.

HYPOTHESIS TESTING:

Often we want to know something about the value of a parameter of a population. Recall that a population is the entire (actual or theoretical) set of values of some variable, and a parameter is some numerical characteristic of that population.

Sometimes we do not know the value of a population parameter, and we want to estimate its value. However, in other circumstance,s we know a value that the parameter is claimed to have, and we want to check whether this claim is true. Effectively we are using sample evidence to check a claim about the value of an unknown population parameter.

Example: A packet of muesli states that it contains 900 gm. Although it does not say so, this is almost certainly a claim about the average contents of these packets, since not every packet will contain exactly the same amount of muesli.

Example: A paper in the literature states that, on a certain feeding regime, adults will lose an average of 1 kg in weight per week.

In level 2, we will look into how we can test if these claims are true.

There is a relationship between coffee and cigarettes

There is a relationship between cigarettes and pancreatic cancer