Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

Third-party data

EBU Congress on Big Data - Geneva 22-23 March 2016
by

pierre-nicolas schwab

on 30 April 2016

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Third-party data

Big Data tools enable reconciliation of various data sources to better
profile users
Profiling
accuracy
and behavior prediction depend on data available
Volume
of data is not necessarily key for prediction
Internal data sometimes insufficient :
external data
may be useful to improve the explained variance of your model
Problem & Definitions
Where can data be coming from?
High-Level map of data sources
3rd party data : often understood as data
purchased
from commercial firms
3rd-party data can also be
free
(open data)
≠ 2nd-party data (e.g. through DMP = Data Management Platform
3rd party data put in perspective
Third-party data
Pierre-Nicolas Schwab, RTBF (Belgium)
EBU Big Data Congress 22 March 2016

Real estate market is highly asymmetric in Belgium : sale prices not public (monopoly of notaries)
Pricing mechanisms opaque
Question : how to
predict real-estate prices
? Problem addressed by Realo (belgian startup).
Example 2 : Population statistics
CONTACT
Think out of the box
: new data can help improve your understanding of users' behaviors.
Don't be afraid
: new data is potentially everywhere ! Try new things with POC's.
Be Agile
: learn from your failures, improve. A failed POC may be the first step of a successful path.
Don't underestimate
adoption
process
Prioritize
: use the map to define your path to more and better data sources. Come after the conference (Linkedin invitations also welcome) to get
more information about our decision maps
.
Conclusions and recommendations
Different types of data :
1st party
2nd party
3rd party

Facebook connect : popular way to login
Broad range of information can be collected on user
Case 1 : TAM Airlines
Example 1 : Facebook data
Insurers : how to adapt pricing of car insurance (good vs. bad drivers)
old method: questionnaire (declarative data)
new method : actual data on driving behavior (observed)
Example 3 : from apps to iOT
Prioritize your data collection process
Realo built an algorithm to predict real-estate
asked
prices
Precision of model greatly improved by including open data (population statistics) provided by the State (= 3rd party)
Precision of Apps (1st party) < telematic units (sold and managed by 3rd party)
Data quality more
granular
with Telematic units
However ...
Data currently collected does NOT improve significantly predictions
Adoption process
differs between countries (4m units in Italy vs. 30k in Belgium)
New variables need to be included in model (e.g. health-related with wearables)

Think first about the
underlying theoretical model
Identify the variables
that will maximize the explained variance
Public data is also 3rd-party data
Test different granularity levels (the most granular is not always the best)
Pierre-Nicolas Schwab

psn@rtbf.be

+32 (0) 486 42 79 42
Mapping of all data sources to support decision process
6 dimensions
defined:
Relevance
: how much does the data help improve user's profiling in function of business objectives
Intrusiveness
: how intrusive is the data collection process
Granularity

Costs

Longevity / Obsolescence
: how long will the data remain relevant ?
Richness
: how much does the data enrich your prior knowledge of users ?
Potential
new variables
in dataset
Granular data
(user-level)
Easy to collect (don't forget legal aspects though)
But ...
"Manual" work needed to aggregate data in higher-level exploitable categories ("centers of interest" variable recently removed from Facebook API)
Stability of Facebook API
Full transcript