Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

The double-edged sword of statistical significance

#FAIL! at WEBSCI15, 29 June 2015, Oxford.
by

Taha Yasseri

on 29 June 2015

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of The double-edged sword of statistical significance

Taha Yasseri
TahaYasseri
The double-edged sword of statistical significance
"Big Data" vs "p-value"
"Data"
"Sex"
Number of papers in
p-value
In statistics, the p-value is a function of the observed sample results (a statistic) that is used for testing a statistical hypothesis. Before the test is performed, a threshold value is chosen, called the significance level of the test,
traditionally 5% or 1%
and denoted as
a
.

If the p-value is equal to or smaller than the significance level (
a
), it suggests that the observed data are inconsistent with the assumption that the null hypothesis is true and thus that hypothesis must be rejected (but this does not automatically mean the alternative hypothesis can be accepted as true). When the p-value is calculated correctly, such a test is guaranteed to control the Type I error rate to be no greater than
a
.

An equivalent interpretation is that p-value is the probability of obtaining the observed sample results, or "more extreme" results, when the null hypothesis is actually true (here, "more extreme" is dependent on the way the hypothesis is tested).
Biased publication
p-hacking
0.05 is not a very good choice!




David Colquhoun (using simulation):


de Winter, Joost CF, and Dimitra Dodou. "A surge of p-values between 0.041 and 0.049 in recent decades (but negative results are increasing rapidly too)." PeerJ 3 (2015): e733.
Thank you!
p=0.03
Weissgerber et al.
Oxford Internet Institute
University of Oxford
The five papers
p-value
Mosquitoes
(annoying and impossible to swat away)


The emperor's new clothes
(fraught with obvious problems that everyone ignores)


The “
s
tatistical
h
ypothesis
i
nference
t
esting”
Lambdin, Charles. "Significance tests as sorcery: Science is empirical—significance tests are not." Theory & Psychology 22.1 (2012): 67-90.
Teenage Sex
Sir Ronald Aylmer Fisher
(1890 – 1962)
English statistician, evolutionary biologist, mathematician, geneticist, and eugenicist
1925
"0.05"
H0: The coin is fair
H1: The coin always lands tail up
# of exp Obs
1 T
2 TT
3 TTT
4 TTTT
5 TTTTT
7 TTTTTTT
p
0.5
0.25
0.12
0.06
0.03
0.008
If you use p=0.05 to suggest that you have made a discovery, you will be wrong at least 30% of the time.
Colquhoun, David. "An investigation of the false discovery rate and the misinterpretation of p-values." Royal Society Open Science 1.3 (2014): 140216.
Head, Megan L., et al. "The Extent and Consequences of P-Hacking in Science." PLoS Biol 13.3 (2015): e1002106.
Effect size
Significant effect but small


Significant Correlation but weak
What to do?
Clear description of
sample selection,
data manupulation,
cleaning,
methods to calculate p-value, etc.
Provide distributions and data!
Full transcript