Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

Strata 2016

No description
by

Marton Trencseni

on 5 June 2016

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Strata 2016

The missing talk...
... that would have helped me 3 years ago
Beautiful A/B testing:
Marton Trencseni
2012-2015: Prezi
@mtrencseni
mtrencseni@gmail.com
O'Reilly Strata London, 2016
2016: Facebook
Why `beautiful`?
1,123
1,056
9,919
10,033
A
B
conv.
impr.
Σ
A
there's more to it!
Title
Beauty
logging
hashing
multivariate tests
statistical tests
bayesian testing
confidence intervals
credible intervals
stopping
peeking
stat. signifcance
stat. power
daily variability
metric mining
multi-armed bandits
A/B/C/n
group sequential testing
p-values
projected results
regression to mean
tracking
winsorization
information economics
Definition
HIPPO
vs
data-driven
scientific
Table of contents
1. Should you even do
an experiment?
3. Test early, fail early!
2. What metric to look at?
4. Logging and hashing
BEFORE
DURING
AFTER
5. Don't change the experiment
6. Population
7. The maths
8. How to report?
9. Remember!
Flowchart
References
Q&A
Evan Miller's blog
Ron Kohavi's papers
David Robinson's blog
Optimizely, VWO
`The A/B Testing Book` remains to be written!
(countless stats books)
(medical literature)
1. Should you even do an experiment?
Do you already know?
did your organization already perform this A/B test a hundred times?
3. Test early, fail early!
Don't spend too much time building the feature, you'll get attached to it
2. What metric to look at
Your metric should be what matters to your business (possibly not raw CTRs)
fallacy: data mining metrics
4. Logging and hashing
complete, connect & attribute
fallacy: mod 100
md5(test_name+user_id)
5. Don't change the experiment
(or, if you do, then it's a new one)
Don't deploy code while the experiment is running
6. Population effects
Test on your target demographic
Daily/weekly/monthly seasonality
7. The maths
your statistical engine
8. How to report
story: raw vs projected
9. Remember!
Keep track, keep score in a standardized tool
Do you already know?
did your organization already perform this A/B test a hundred times?
Logging good? Are we using hashing?
Can connect exposure logs to business metrics?
Is it worth it?
Do we know?
Standardized reporting in place?
Conflicting tests?
Pick metric!
Deploy code!
How long?
Sanity check
Patience
Read off raw
Data mining metrics
Calculate projected
Record
Make biz/product decision
Profit!
~10%
Key = representative sample
2009-2012: Scalien
JOIN
JOIN
vs
large numbers -> significance
example: exposure based dashboard split
inexpensive early on, much more expensive later
pricing page example
green button example
chargeback/refunds example
Etsy example
let's "prove that it works"
build mock versions early: quick B-M-L cycles
uneven
memory
non-random crosseffects
random crosseffects
product team example
keep old one running
just measure metrics
tune parameters
fast => good
free vs paying
MT
frequentist
bayesian
stat.significanc, power
prior, posterior,
P(CTR_B > CTR_B)
large org => data science team => handles it
custom emails vs standardized reporting
keeps us honest
independent
"Maserati problem"
multi armed bandit
pricing pages
template ordering
your page
bytepawn.com
Full transcript