TahaYasseri

The double-edged sword of statistical significance

"Big Data" vs "p-value"

"Data"

"Sex"

Number of papers in

p-value

In statistics, the p-value is a function of the observed sample results (a statistic) that is used for testing a statistical hypothesis. Before the test is performed, a threshold value is chosen, called the significance level of the test,

traditionally 5% or 1%

and denoted as

a

.

If the p-value is equal to or smaller than the significance level (

a

), it suggests that the observed data are inconsistent with the assumption that the null hypothesis is true and thus that hypothesis must be rejected (but this does not automatically mean the alternative hypothesis can be accepted as true). When the p-value is calculated correctly, such a test is guaranteed to control the Type I error rate to be no greater than

a

.

An equivalent interpretation is that p-value is the probability of obtaining the observed sample results, or "more extreme" results, when the null hypothesis is actually true (here, "more extreme" is dependent on the way the hypothesis is tested).

Biased publication

p-hacking

0.05 is not a very good choice!

David Colquhoun (using simulation):

de Winter, Joost CF, and Dimitra Dodou. "A surge of p-values between 0.041 and 0.049 in recent decades (but negative results are increasing rapidly too)." PeerJ 3 (2015): e733.

**Thank you!**

p=0.03

Weissgerber et al.

Oxford Internet Institute

University of Oxford

The five papers

p-value

Mosquitoes

(annoying and impossible to swat away)

The emperor's new clothes

(fraught with obvious problems that everyone ignores)

The “

s

tatistical

h

ypothesis

i

nference

t

esting”

Lambdin, Charles. "Significance tests as sorcery: Science is empirical—significance tests are not." Theory & Psychology 22.1 (2012): 67-90.

Teenage Sex

Sir Ronald Aylmer Fisher

(1890 – 1962)

English statistician, evolutionary biologist, mathematician, geneticist, and eugenicist

1925

"0.05"

H0: The coin is fair

H1: The coin always lands tail up

# of exp Obs

1 T

2 TT

3 TTT

4 TTTT

5 TTTTT

7 TTTTTTT

p

0.5

0.25

0.12

0.06

0.03

0.008

If you use p=0.05 to suggest that you have made a discovery, you will be wrong at least 30% of the time.

Colquhoun, David. "An investigation of the false discovery rate and the misinterpretation of p-values." Royal Society Open Science 1.3 (2014): 140216.

Head, Megan L., et al. "The Extent and Consequences of P-Hacking in Science." PLoS Biol 13.3 (2015): e1002106.

Effect size

Significant effect but small

Significant Correlation but weak

What to do?

Clear description of

sample selection,

data manupulation,

cleaning,

methods to calculate p-value, etc.

Provide distributions and data!