Loading presentation...

Present Remotely

Send the link below via email or IM


Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.


crunch 2015 logging@skyscanner

Speakers template

scott krueger

on 24 November 2015

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of crunch 2015 logging@skyscanner


fellow data enthusiast
experienced @ scale
big picture
try -> fail -> succeed -> adapt -> repeat
automate everything
tech company

global travel business
founded 2003
700+ employees
made up of 50+ nationalities
50+ million
what's your best before?
why am I here?
Researching your next move?
Already implementing or operating a large scale distributed data platform?
Not sure - Budapest is amazing and I want to hear more about this stuff?
Technical Want
Business Need
Business need: the sensible pairing of technical ‘want’ with a legitimate business reason to do something.
what's your best before date?
"The single biggest problem in communication is the illusion that it has taken place." - George Bernard Shaw

Take aways:

monitor your systems AND all of the steps in them (data integrity and data quality)
capacity plan (monthly)
when cracks appear, bandage carefully (this is definitely your "do something" moment)
Identify your 'need' (business task), match it with your technical 'want' (engineering task)

What problem?
Existing Impact?
What are
others doing?
Build or Buy?
"One pipeline to rule them all" - Jay Kreps


21st century rules
1. By the time you begin
development + implementation,
something new has arrived

2. By the time your new data
platform is operational, something
new has arrived

Why is this?
Oh...yeah, Schemas.

level of industry adoption
cross-platform language support
efficiency (compaction + serialization speed)
your data has structure so maintain integrity
Pitch It.
Stick to your guns, but
Build It.
Demo it.
Follow these two weird
tricks to keep your event
logging lights on!!!
1. You have to do this in parallel
2. Figure out how to piggy back
your existing system at source.

Example pattern: web server logs / client double log -> new platform -> new analysis
Voila!: now you have a reference point to compare old and new, allowing you to confidentally turn
off the old.
Demo it.

There’s a big difference
distributed data technologies on development environments versus out in the wild. Dev as code, ops as code from the start.

Deployable Units
Health Checks
Take it to Prod.
So that's it?
Consumer Adoption
On-board before prod.
In a legacy migration, not
everything has to migrate.
Someone knows!

On-going adoption:

Invest early!
Promote to prioritize
Hold a workshop!
Feedback: bad and
good always welcome
300 + api / white label
business need met.
room for growth.


People: 700 + staff (50% engineering)
10 global offices: latest opening London
150 new recruits
Traffic: 50% YOY
of which 77% mobile
40 million app downloads
"...really, it's not that complex, and the added value is considerable. We hope we managed to convey this idea and don't hesitate to ask questions on slack..."
Business Need.
data path
Data Products
You can only illustrate the value of a data integration platform by showing information derived from it.
Planning for cloud deployment vs doing cloud deployment
* old
* new
"Jobs may fail to deploy and it's sometimes hard to identify the reason: container logs for failed containers are deleted from Yarn after a few hours and then not available through the yarn console...."
what's next?
250 nodes across multiple data centres and cloud regions
3 billion events per day
3 engineers
10 TB data per day
Open Source
github: Skyscanner/Kafka-Topic-Enforcer
Kafka Offset Manager
Platform Evolution
Tooling and Services
Workload migrations
'Kappa' processing
Config Driven Data Processing and Routing
why are you here?
Note: not an official
gartner chart!
Must read: "The Log - what every software engineer should know about real-time data’s unifying abstraction."
A prototype with real value
(rather than a proof of concept!)
Full transcript