Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Data Science at Kreditech

No description
by

Jose Garcia Moreno-Torres

on 14 September 2015

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Data Science at Kreditech

Data Science at Kreditech
Jose Garcia Moreno-Torres

CDO at

Prelude: tonight's menu
Entrée:

The real-world issues no one told me about in school
Main course A: Binary (yes/no) classification problem whose solution created a whole subset of problems.
Main course B: Pair of regression problems where data is missing left and right
Dessert: Short overview of other problems we work on
All sparkled with a brief look at the history of Data Science at Kreditech
Entrée:
It's a crazy world
out there
.... and he found it was BAD. He investigated and investigated, only to find that his data contained too much information, information that was not available in the production environment.
Main course A: Credit Scoring
Core problem at Kreditech

Classic step-by-step approach:
Data cleaning and preparation
Feature selection and construction
Modeling
Evaluation
The only performance that matters is the one you achieve in the production environment
Our step by step approach:

– Data cleaning and preparation

– Evaluation metric definition

– Feature selection and construction

– Modeling

(Before we solve a problem, we need to understand what we are solving)
Simple binary classification problem

Why is it complex?

– Reject inference: Only get feedback on “yes” cases

– Feedback delay: Give a loan today, learn whether it was the right decision in a few months

– Unstable conditions: volatile environment (changes in website, for example, or technical errors) and never-seen seasonality (Christmas)

How to evaluate? Statistics vs Business goal
Limit estimation: How much money should I lend?

Goal: Maximize amount, minimize defaults, minimize amount of customers who leave
Main course B: Twin regression problems
Pricing: How much should I charge this customer?

Goal: Maximize prize, minimize amount of customers who leave
Censorship:

If customer A accepted
to pay 10%, would he
have also accepted 20%?
15? 11?

If customer B left
because he found
20% too expensive,
would he have
taken the offer at
10? 15?
19?

Ideas:

Squeeze the data
(customer A almost surely
would have accepted 8% ->
extra data point)

Active learning: choose amounts or prices that would help you learn
Other applications (saving something for the next talk)
Loan limit model for recurring customers:

Event-based features
Extra information = extra data sources
Choose the right debt collection strategy:

Several options available, pick one to apply to a customer, wait a few days, pick another one
Multiple instance learning, active learning

Customer Lifetime Value estimation:

Time series modeling, distance-based methods
In the beginning, there was only data.
The unsuspecting data scientist, freshly hired into a tech startup, built a model, and he saw the performance was good.
He was satisfied, so he deployed the model...
Reject inference: Extend acceptance past optimal operations to learn from further cases




Feedback delay: Heuristically approximate future performance

Unstable conditions:
a) Build models resilient to it (high regularization)
b) Try and detect shifts in data distribution
Full transcript