Loading presentation...

Present Remotely

Send the link below via email or IM


Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.


On Being a Data Skeptic

No description


on 4 November 2015

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of On Being a Data Skeptic

Cathy O'Neil

Skeptic, not cynic
the data evangelist
the data doomsayer

Trusting data too much
How bad does it get?
What can we do?
People get addicted
to metrics
People focus on numbers
not behaviors
People frame the
problem incorrectly
Death Spiral
Most models are
I'll make you click
I'll make you buy
I'll make you stay
I'll decide if you're smart
I'll decide whether to hire you
I'll optimize you for ROI
Political modeling
Set standards
for models

Set standards
for modeling
Data Privacy
Update of Susan Webber's 2006 piece “Management’s Great Addiction:
It’s time we recognized that we just can’t measure everything”
What are the systemic risks and who's keeping track?
Is this an intractible inevitable problem of modern life?
Don't overestimate
Twitter/ obesity
Don't underestimate
Predator/ prey
Thank you!
Are models racist?
- Google search
- Job hiring
- Peer-to-peer lending
- Invisible failures
feedback loop
long term systemic changes
winner as witness
beyond the filter bubble
Is Segmentation "Good"?

- insurance
- screening at the airport
- generalized surveillance
Quantify this?
- How?
- Modeling the model
- Monte Carlo?
What are the long-term effects?
Rich get richer
Poor get poorer
less mobility
increased inequality
The economics
- Short term gains from private co's
- Long term negative effects
- Completely unregulated
- Fiercely lobbied even in Europe
First it was Obama
Next it's everyone
Personal messaging
Personal offers
Is this democratic?
Related but different:
Poll models
- Feedback loop here too
- Might cause weird voting behavior
- But doesn't directly pervert issues
How it worked
- Individual appeals
- Facebook graph etc.
- Linking databases
- Money, then votes
- Targeted phone calls, emails
- Ads and Reddit
- Now used by Caesar's
Tons of data out there
European attempts
Start with kids?
Scrutiny for:
high impact,
high stakes, and
widespread models
What would that look like?
Hippocratic Oath of modeling
Data skepticism
Data standards
Story telling
Reproducibility and beyond
Modeling is hard
Need better tools
Wakari, ipython notebook
and beyond
Public access
Robustness tests
Open Models
On Being a
Data Skeptic

Let's put the science in data science
People ignore
perverse incentives
data is "objective"
sanitizing effect/ control
measure love?
uncertainty is ignored
proxies and power
causation is a bitch
translation phase
evaluation metric
ads & A/B tests
Netflix prize
gaming (FICO)
credit rating agencies
high stakes testing
Trusting data too little
People don't use math
to estimate value
People stick the quant
in the back room
People interpret skepticism as negativity
People ignore
wider consequences
ballpark estimates for models
including business models
efficient meetings?
nerd dialect
unedited view into biz
co's as kool-aid parties
VC drip-feed
"not my problem"
feedback loops
public as stakeholder
who is left out?
It's sometimes convenient to downplay our effects.
Full transcript