Prezi

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in the manual

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

Fraud detection at FINN.no

Getting Started with Fraud Detection and Cassandra
by mck semb wever on 2 September 2014

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Fraud detection at FINN.no

Fraud detection


@ FINN.no


24 CPU
50 Gb RAM
5.5 Tb disks RAID50
100Gb SSD
(commit logs + HDFS + ext4 journal)
NOOP IO scheduler


MACHINE SPECS
casssandra-1.2.16
datacenter
Oslo (suburbs)
datacenter
Oslo (downtown)
Application writes
reads
~25 μmicroseconds
~0.8 milliseconds
4 replica (1 per virtual datacenter)
CL.ONE for reads+writes
##
data model
##
operations
high/fast write throughput
schema free design
time-to-live on data
size of data
total load scale out
uptime
tunable consistency vs availability
( get rid of JOINs! )
}
We use Cassandra when…
@mck_sw
mck@apache.org
Who's FINN.no
Cassandra @FINN.no
Fraud Control @FINN.no
Schema + Architecture
Rules
Machine Learning
Wrap up
INSERT INTO ad_created (day, created, adId, rules)
VALUES (
dayHour, now
,
adId
,
rulesMap
)
USING TTL (1 year);
Application writes
Application reads
SELECT adid, rules FROM ad_created WHERE day =
dayHour
;
SELECT status FROM ad_status WHERE adid =
adId
ORDER BY updated ASC;
INSERT INTO ad_status (adId, updated, status)
VALUES (
adId
,
today
,
status
) USING TTL (1 year);
##
datastax cql driver
Application reads
SELECT adid, rules FROM ad_created WHERE day =
dayHour
;
SELECT status FROM ad_status WHERE adid =
adId
ORDER BY updated ASC;
##
datastax cql driver
Application writes
INSERT INTO ad_status (adId, updated, status)
VALUES (
adId
,
today
,
status
) USING TTL (1 year);
INSERT INTO ad_created (day, created, adId, rules)
VALUES (
dayHour
,
now
,
adId
,
rulesMap
) USING TTL (1 year);
how many new ads each day?
how many were manually checked?
how many are fraud?
what's the lifecycle of each ad?
how can we automate fraud detection?
how do we know the automation is optimal?
DC_FAST 1
DC_FAST 2
DC 1
DC 2
ad_created
ad_status
Credits
Ricardo Alonso Martin
Christian Søhoel
Tomas Håheim Mortensen
Anders Aagaard
Arild Nebb Ervik
talented on Cassandra
and machine learning?
FINN's always hiring the best
get in contact!
Classifieds
Lifecycle
Started
(Edited)
Fraud
Control
Active
Expired
Paused
Rejected
SOLD
Invalid
Current ML algorithms
- logistic regression
A/B testing
- is it enough?
- multivariate testing
automated ML
- beyond files for input/output
Log everything
- no idea about next attack
In house competitions (kaggle)
See the full transcript