Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

crunch 2015 logging@skyscanner

Speakers template
by

scott krueger

on 24 November 2015

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of crunch 2015 logging@skyscanner


logging@skyscanner

Intros
fellow data enthusiast
experienced @ scale
big picture
try -> fail -> succeed -> adapt -> repeat
automate everything
tech company

global travel business
founded 2003
700+ employees
made up of 50+ nationalities
50+ million
unique
mostly
visitors
facts
what's your best before?
why am I here?
Researching your next move?
Already implementing or operating a large scale distributed data platform?
Not sure - Budapest is amazing and I want to hear more about this stuff?
Technical Want
vs
Business Need
Business need: the sensible pairing of technical ‘want’ with a legitimate business reason to do something.
what's your best before date?
"The single biggest problem in communication is the illusion that it has taken place." - George Bernard Shaw

Take aways:

monitor your systems AND all of the steps in them (data integrity and data quality)
capacity plan (monthly)
when cracks appear, bandage carefully (this is definitely your "do something" moment)
Identify your 'need' (business task), match it with your technical 'want' (engineering task)

What problem?
Existing Impact?
What are
others doing?
Build or Buy?
Research.
http://www.gartner.com/technology/research/methodologies/hype-cycle.jsp
"One pipeline to rule them all" - Jay Kreps
http://www.slideshare.net/JayKreps1/i-32858698

http://www.slideshare.net/JayKreps1/i-32858698

21st century rules
1. By the time you begin
development + implementation,
something new has arrived

2. By the time your new data
platform is operational, something
new has arrived


Why is this?
Oh...yeah, Schemas.
Considerations:

level of industry adoption
cross-platform language support
efficiency (compaction + serialization speed)
your data has structure so maintain integrity
Pitch It.
Research:
Stick to your guns, but
remember...
Build It.
Demo it.
Follow these two weird
tricks to keep your event
logging lights on!!!
1. You have to do this in parallel
2. Figure out how to piggy back
your existing system at source.

Example pattern: web server logs / client double log -> new platform -> new analysis
Voila!: now you have a reference point to compare old and new, allowing you to confidentally turn
off the old.
Demo it.

There’s a big difference
deploying
distributed data technologies on development environments versus out in the wild. Dev as code, ops as code from the start.


Deployable Units
Tests
Health Checks
Monitoring
Notifications
Take it to Prod.
So that's it?
Consumer Adoption
On-board before prod.
In a legacy migration, not
everything has to migrate.
Someone knows!

On-going adoption:
feedback
documentation

Invest early!
Promote to prioritize
Hold a workshop!
Momentum.
Feedback: bad and
good always welcome
300 + api / white label
partners
business need met.
room for growth.

Internal:

People: 700 + staff (50% engineering)
10 global offices: latest opening London
150 new recruits
Traffic: 50% YOY
of which 77% mobile
40 million app downloads
"...really, it's not that complex, and the added value is considerable. We hope we managed to convey this idea and don't hesitate to ask questions on slack..."
snowball.
http://www.bigdatalandscape.com/
Business Need.
normalization
data path
Plan.
Hosting
+
Data Products
You can only illustrate the value of a data integration platform by showing information derived from it.
Planning for cloud deployment vs doing cloud deployment
* old
* new
"Jobs may fail to deploy and it's sometimes hard to identify the reason: container logs for failed containers are deleted from Yarn after a few hours and then not available through the yarn console...."
what's next?
Platform:
250 nodes across multiple data centres and cloud regions
3 billion events per day
3 engineers
10 TB data per day
facts
Open Source
github: Skyscanner/Kafka-Topic-Enforcer
Kafka Offset Manager
Platform Evolution
Tooling and Services
Workload migrations
'Kappa' processing
Config Driven Data Processing and Routing
why are you here?
http://humble-homes.com
Note: not an official
gartner chart!
Must read: "The Log - what every software engineer should know about real-time data’s unifying abstraction."
A prototype with real value
(rather than a proof of concept!)
Full transcript