Send the link below via email or IMCopy
Present to your audienceStart remote presentation
- Invited audience members will follow you as you navigate and present
- People invited to a presentation do not need a Prezi account
- This link expires 10 minutes after you close the presentation
- A maximum of 30 users can follow your presentation
- Learn more about this feature in our knowledge base article
crunch 2015 logging@skyscanner
Transcript of crunch 2015 logging@skyscanner
fellow data enthusiast
experienced @ scale
try -> fail -> succeed -> adapt -> repeat
global travel business
made up of 50+ nationalities
what's your best before?
why am I here?
Researching your next move?
Already implementing or operating a large scale distributed data platform?
Not sure - Budapest is amazing and I want to hear more about this stuff?
Business need: the sensible pairing of technical ‘want’ with a legitimate business reason to do something.
what's your best before date?
"The single biggest problem in communication is the illusion that it has taken place." - George Bernard Shaw
monitor your systems AND all of the steps in them (data integrity and data quality)
capacity plan (monthly)
when cracks appear, bandage carefully (this is definitely your "do something" moment)
Identify your 'need' (business task), match it with your technical 'want' (engineering task)
Build or Buy?
"One pipeline to rule them all" - Jay Kreps
21st century rules
1. By the time you begin
development + implementation,
something new has arrived
2. By the time your new data
platform is operational, something
new has arrived
Why is this?
level of industry adoption
cross-platform language support
efficiency (compaction + serialization speed)
your data has structure so maintain integrity
Stick to your guns, but
Follow these two weird
tricks to keep your event
logging lights on!!!
1. You have to do this in parallel
2. Figure out how to piggy back
your existing system at source.
Example pattern: web server logs / client double log -> new platform -> new analysis
Voila!: now you have a reference point to compare old and new, allowing you to confidentally turn
off the old.
There’s a big difference
distributed data technologies on development environments versus out in the wild. Dev as code, ops as code from the start.
Take it to Prod.
So that's it?
On-board before prod.
In a legacy migration, not
everything has to migrate.
Promote to prioritize
Hold a workshop!
Feedback: bad and
good always welcome
300 + api / white label
business need met.
room for growth.
People: 700 + staff (50% engineering)
10 global offices: latest opening London
150 new recruits
Traffic: 50% YOY
of which 77% mobile
40 million app downloads
"...really, it's not that complex, and the added value is considerable. We hope we managed to convey this idea and don't hesitate to ask questions on slack..."
You can only illustrate the value of a data integration platform by showing information derived from it.
Planning for cloud deployment vs doing cloud deployment
"Jobs may fail to deploy and it's sometimes hard to identify the reason: container logs for failed containers are deleted from Yarn after a few hours and then not available through the yarn console...."
250 nodes across multiple data centres and cloud regions
3 billion events per day
10 TB data per day
Kafka Offset Manager
Tooling and Services
Config Driven Data Processing and Routing
why are you here?
Note: not an official
Must read: "The Log - what every software engineer should know about real-time data’s unifying abstraction."
A prototype with real value
(rather than a proof of concept!)