Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

DevOps

No description
by

Alejandro Bernal

on 30 March 2017

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of DevOps

Continuous Deployment
Consistency in Pipeline
Deployment Pipeline
FEEDBACK
DevOps
Principles
Consistency
What is it ?
1st Way
Fast
Feedback

Service
Reliability Culture

Monitoring
Automated Testing
Deployment Strategies
LEARNING
Understanding
Complexity

3d Way
Objective
Values
Automation
Feedback
Make Software with the Highest Quality Possible
Pilars
Development
Deployment
Operations
Span
FLOW
Enable and Sustain fast flow of work from Development into Operations

without causing chaos
Increase Flow and shorten lead Time
Automation
Global optimization over local optimization

Systems thinking
Objectives
Repeatable and reliable process

Automate

All in version control

If it hurts do it more often

Build quality in

Done means released

Everybody is responsible for the delivery process

Continuous improvement
Representation of the process through which the code goes from version control into the hands of users
Process that deploys deliverables into production automatically
Continuous Delivery
Process that prepare deliverables automatically
Continuous Integration
Process that keeps the code of the application in a working state
Practices

Standard deployment for every environment

Smoke test for deployments

If anything fails stop the line

Peer reviews (pull requests)

Code is committed to trunk
Anti Patterns
Incongruent testing and production environments

Testing takes too long

Manual regression and acceptance tests

Long lead times

High technical debt

Services are slow and hard to change
Practices
Pull code from version control
Run unit test
Build code locally
Code and verify working state
Push code to version control
Trigger build
Trigger unittest
Trigger package artifact or image creation



1
2
3
5
6
4
Unit Test
Smoke Test
7
8
9
10
11
Unit Test
User Acceptance Test
13
LOCAL
UAT
STAGING
PROD
Capacity Tests
Automated Acceptance Test
Monitoring
DEV
The least-cost way to ensure that the behavior of any two host will remain completely identical is always to implement the same changes in the same order on both hosts.
Create consistent environments

All elements of the pipeline should be disposable and reproducible

All environments should look like production

Decrease variability between elements in the pipeline

Repeatability increases speed rebuilding environments

Reduced errors related to inconsistencies

Increases security related to inconsistencies
Principles
Advantages

Keeps a history of all changes
Can easy check differences between versions
Can restore and rebuild all elements
Everything can be versioned and tagged
All changes are visible and audited for everyone
Changes can be tracked
Version Control
What is in version control

Application code
Configuration scripts
Configuration management (Domain Specific Languaje ) code
Image build scripts (Virtual Machines Containers)
Meta definitions (JSON, YAML, TOML)
Automated tests
Documentation, procedures and release notes
Templates (Cloudformation, Terraform, Heat)
Database schema abstractions, DNS, firewall definitions
Network definitions (Switch configurations)
Infrastructure as Code
Google´s SysOps Death Squads
Why Order Matters
Modularity
Composability
Extensibility
Flexibility
Repeatability
Declaration
Abstraction
Idempotence
Convergence
Principles
CFEngine
Puppet
Chef
Ansible
Technologies
Bare Metal
Virtual Machine
Cloud
Container
Types of Servers


Internal IT policy of supporting only 2 Linux versions at any given time

When a new version was introduced the oldes had to be rolled off

The "Death Squad" team would identify deprecated server versions and put them in a shared spreadsheet

The "Death Squad" team would help owners depreciate and/or pressure until marked done

Concepts
Divergence
Convergence

Congruence
Least variation
Fastest provision model
Fits well with microservices architectures
No need for infrastructure as code
Binary consistency from desktop to production
Built into the CI process
Immutable Delivery
Principles
You give them a Name
You clean them often
You customize them - possibly dress them
You live with them for long periods of time
Pets VS Cattle
Pets
LXC
Docker
OpenVZ
Others
Technologies
You give them a number
You do not customize them - Not thinking about dressing them
You do not live with them, you live because of them
Cattle
Objectives

Build quality in as early as possible
Find bugs as early as possible
Everything gets tested
If it hurts, bring the pain forward
Create fast feedback
Version control tests
Continuous testing ensure continuos improvement
Principles

Change the adversarial mindset to a collaboration mindset
Change dev then test to test then dev
Instead of having a test team you have an everyone tests team
Test early, test often and shorten the feedback loop
Tests should have reasonable expectations
Fix bugs when you find them
Done means released
ATDD (Acceptance Test Driven Development)

Discuss with stake holders
Distill and define Done
Develop
TDD
Demonstrate
Automated
Unit tests
Integration
System
Load
Acceptance
Manual
Showcases
Usability
Exploratory
Security
Objectives and Principles
TDD - BDD - ATDD
TDD - Test Driven Development

Prevent scope creep
Catches design issues early
Creates cleaner code
Builds trust with other service owners
Creates a consistent rhythm
BDD - Behavior Driven Development

Tests whether the software fulfills the business need
Instead of how it works it tests how it behaves
Based on Domain Driven Design
Test are typically conversational
Rolling Upgrades
Servers are updated/upgraded one at a time
The process is completed when all the servers are updated
During the upgrade there is a temporary reduction in capacity
Server is first drained by removing it from the load balancer
Canary
Similar to rolling upgrade but to a subset of servers
Apply smoke performance and load test
If no problems arise add more subsets
Server is first drained by removing it from the load balancer
Phased Roll-Outs
Roll-outs partitioned by users or groups
Groups are categorized by risk tolerance
If no problems arise add more subsets
Sometimes internal employees can be first roll-out group
Power user or meta communities rolls-outs
Proportional Shedding
A new service is built on new machines
The load balancers shed small percentages of traffic to the new service
The percentage is increased if no issues are found
The old service is turn off when all traffic is moved to the new service with no erros
With bare metal this can be expensive - Cloud makes this more economical
Blue - Green
Green is the live service
Blue is dormant and requires limited resources
When the release goes live the two are swapped
Rolling back is easier to do
Popular technique
Toggling Features
Flags are set in the code that can turn on or off a feature
Allows developers to continuously deploy new code decoupled from a release
2d Way
Define Service Levels like

Service Level Agreements
Service Level Objectives
Service Level Indicators
100% reliability is a myth
All systems go down
Not all services are equal
Manage risk and failure by service
Managing reliability is about managing risk
Managing risk is about cost
Principles
Must of the software life time will be in usage time not in development and design time
Note
Based on

Availability
Latency
Performance
Change Management
Monitoring
Emergency Response
Capacity Planning
Understanding Risk and Failure
The impact within an organization that make 1 million dollars per day


One (90%) - 36.5 days per year = 36.5M
Two (99% ) - 3.65 days per year = 3.6M
Three (99.9%) - 8.76 hours per year = 365K
Four (99.99%) - 52.56 minutes per year = 36.5K
Five (99.999%) - 5.26 minutes per year = 3.65K
Six (99.9999%) - 31.5 seconds per year = 365
THE 9's
Principles
Desing for Failure

Adaptive Systems - Feedback loops

Developer Managed Service

Contingency

Peer reviews and pairing

Embedded Engineers
Design For Failure
Benefits
Cost
Easier to change
Faster to fix
Easier to experiment
Practices

Adaptative Systems
Fault Injection
The Netflix Simian Army
Chaos Monkey (Hosts)
Chaos Gorilla (Data Center)
Latency Monkey (Inject Latency)
Conformity Monkey (Best Practice)
Security Monkey (Security Violation)
A/B Testing
Test A vs B versions
of the software
performance
Dark Deploys
How Facebook tested
the new chat app
with out the user
being aware about it
This notion talks about making things in small batches in order to let the system to improve itself by constant feedback

History of the letters : mass vs single production
Developer Managed Service
You build it you own it : CTO of Amazon

Developers Wearing Pagers
First on call is the dev rotation team
Second call is the VP of Engineering
Third call is CTO

Developers shall know about the infrastructure
Contingency
Universal Agreement for launch

For major releases with high risk and impact

Ten minutes review to check

When will it be launched
Who is launching it
Has it been in production yet
Can it be dark
Is it new infrastructure (is it monitored)
Has an on and off switch
Peer Review and Pairing
Guide Lines Peer Review

All changes are peer reviewed
Everyone monitors the commit logs
High risk changes should include an SME
Break up larger changes into smaller ones

Guide Lines Pairing

Pair programming for everything
Is slower but decreases bugs up to 70% to 80% [Better trade off]
Spread knowledge
Great for trainning
Embedded Engineers
Operations in development


Development in operations
ChatOps
A collaboration model that connects people, tools and process in a single place

Some know tools that support it
Slack
HipChat
Overview
80 % of outages are caused by a change
80 % of restoration time is spent trying to figure out what changed
A good practice is to look at the last change first
Why Monitor
Alerting

Visualizing

Collecting

Trending

Anomalies

Learning
Google's 4 Golden Signals
Latency

Traffic

Errors

Saturation
Indicators
Business

Application

Infrastructure

User Based

Deployment
Key Factors on big Companies
Amazon

Order Rate

Facebook

Packet loss
Analysis
Real Time
Correlation
Historical
Anomaly
Machine Learning
Statistical

Mean
Median
Percentiles
Standard Deviation
Technologies
Graphite
Splunk
Nagios
Kudu - From Hadoop people
Which System is more complex ?
In Search of Certainty
Cybernetics
Desired state configuration management

Promise Theory

Physics and Biology as model realms to assert that uncertainty is an unescapable fact of technology

How to addapt systems for uncertainty
Circular Causality

Self Steering Approach

Listen, Calibrate, Change and Adapt

Systemic Approach
Cynefin
Designed to describe the evolutionary nature of complex systems

Draws on research from complex adaptive systems theory, cognitive science, anthropology and psychology
Circuit Breaker Patterns
Wrap a protected function call in a circuit breaker object

Monitors for failures

When a threshold is met trip a circuit breaker

Calls are then returned with an error
Promise Theory
A model of voluntary cooperation between individuals, autonomous actors or agents

Publish their intentions to one another in the form of promises
Uses

Economics and Game Theory
Organizational Behavior
Systems Management
Quantified Self Movement
Network Policy (SDN)
Obligation VS Cooperation
Failure
It must not fail
It will fail

Automation
It must look like this
It should look like this

Visualization
Show me what it looks like
Show me what you think it looks like
Sample

Promise quality
I will feed you dog
Promise quantity
On Mondays and Friday mornings
Promisee
My neighbor
Promisor
me
Technologies already implementing this pattern

Spring Boot
Services implemations
Nginx Plus
Load balancers implementations
Analogies
A sport car is complicated system
A raiforest is a complex system
Apollo 13 was a complex system

Learning Organizations
Communication
Blameless Culture
Note
You are either building a learning organization

OR

You will be losing to someone who is
Deming Cycle
System Thinking
94% of problems in business are systems
driven and only 6% are people driven
Deming's 14 points
1. Create constancy of purpose :
Goal
2. Adopt the new philosophy
Lead not manage
replace top down management
replace command and control management
create cooperative leadership models
3. Cease Dependence on Inspection to Achieve Quality
Don´t wait till it's done to test it
Build quality from start to end
Make sure the door fits before it's build
4. End the practice of awarding business on the basis of a price tag
It's how much money you make once it's built
5. Improve constantly the system of production and service
The Andon-Chord
Dev wearing pagers
TDD - BDD
6. Institute training on the Job
Create team and systems thinking
7. Help People, machines and Gadgets do a Better Job
ChatOps
8. Drive out fear, so that everyone may work effectively for the company
Fail fast and fail often
Learn from failing
Mean Time To Recovery Focus
Inject Failure
9. Break down barriers between departments
High Trust
Cross functional collaboration
Shared responsabilities
10. Stop managing by slogans
Zero Defects/Never Fail
100% Availability myth
11. Supervisors must change from sheer numbers to Quality
Workers must feel pride in what they do
Leader-Leader not Leader-Follower
12. Abolishment of the Annual or Merit Rating and MBO's
Management must not be deterministic
They should be managed by intents
13. Institute a Vigorous Program of Education and Self Improvement
Kaize (Self Improvement)
Kata (Vigorous Self Improvement)
14. Transformation is everybody's job
Back to the AIM/GOAL/WHY

5 Disciplines
Systems Thinking
Personal Mastery
learning
practice
Mental Models
Challenge Assumtiomps
Not taking things for granted
Share Vision
Flat organizations - not command - and - control
Team Learning
Boundary crossing
Sharing practice
Building Blocks
Psychological Safety

Appreciation of Differences

Openness to New Ideas

Time for Reflection

Systematic Knowledge Sharing

Education and Experimentation

Reinforced Learning
Ladder of Inference
1. Observe

2. Select

3. Meaning

4. Assumptions

5. Conclusions

6. Beliefs

7. Actions
Ladder of Inference
Can create bad judgment
Our assumptions can lead us to bad conclusions
Question your assumptions and conclusions
Seek contrary data
Make your assumptions visible to others
Invite others to test your assumptions and conclusions
Inquire other people assumptions and conclusions
Move down the ladder instead of up
Netflix Way
Freedom & Responsability
Nine Values

Judgement
Communication
Impact
Curiosity
Innovation
Courage
Passion
Honesty
Selflessness
Practices

Self motivating
Self aware
Self disciplined
Self improving
Acts like a leader
Doesn´t wait to be told what to do
Picks up the trash lying on the floor
Netflix
Context vs Control
Context

Strategy
Metrics
Assumptions
Objectives
Defined roles
knowledge of the stakes
Transparency about desicion making
Control

Top-down decision making
Management approval
Committees
Planning and process valued more than results
Note
" People fail to get along because they fear
each other; they fear each other because they
don't know each other; they don't know each other because they have not communicated with each other "

Martin Luther King Jr.
Communication Feedback
How to give negative - critical feedback

Importance of positive feedback

People are motivated more by progress than accomplishments
Feedback should be goal oriented

Goals should be for learning, improvement and trust

Should not be personal
Anti Patterns
I am right you are wrong

Assigning blame

Sarcasm

Sugarcoating

Beating around the bush

Passive aggressions

One-way conversations
Positive Patterns
Giving Feedback

Honesty
Being straightforward
Timely

Receiving Feedback

Good listening
Ask questions to understand
Always reply with thank you
Avoid
Give Feedback by

Email
Phone
Chat
Social media
SBI
SBI - Situation - Behavior - Impact

Sample

Situation :
Yesterday in our daily meeting

Behavior :
You spoke over me several times

Impact :
I felt like I am not able to share my own opinions
3 Types of Feedback
Dan North

Porpoise Feedback
Only provide positive feedback
assume everything will self correct

Sandwich Feedback
Offer positive feedback
Give constructive feedback (critical/save)
Offer general possitive summary

Atkins Feedback
Give constructive feedback

Microagressions
A subtle but offensive comment or action
directed at a minority or other non dominant group that is often unintentional or unconsciously reinforces a stereotype

Tone policing
Othering (not one of us)
Old View
Human error is a cause of accidents

To explain failure seek for failure

You must find people's inaccurate assessments wrong decisions, bad judgments
New View
Human error is a symptom of trouble deeper inside a system

To explain failure do not try to find where people went wrong

Instead find how people's assessments and actions made sense at the time, given the circumstances that surrounded them
Taylorism
as the root of all evil
Workers at maximum efficiency

Control efficiency by monitoring and supervision

Managers plan and workers work
Blameless Culture
A blameless culture believes that systems are NOT inherently safe and humans do the best they can to keep them running
Postmortems
The "Focus" is on the problem not on the people
Regret

An acknowledgement of the impact of the outage and an apology

Reason

Timeline of events, from initial incident detection to resolution

Remedy

A list of remediation items to ensure that this particular outage won´t repeat
Etsy Practices
Ecourage learning by having these blameless Post-Mortems on outages and accidents

Understand how accidents happen in order to better equip ourselves from it happening in the future

DO NOT PUNISH PEOPLE FOR MAKING MISTAKES

Enable and encourage people who do make mistakes to be experts on educating the rest of the organization how not to make them in the future


Objective
Culture
Set of
Knowledge
Beliefs
Behavior Patterns
And material stuff
Used by their members to resolve their needs and communicate among themself
https://es.wikipedia.org/wiki/Cultura
Thanks To
https://courses.edx.org/courses/course-v1:LinuxFoundationX+LFS161x+2T2016/info
Introduction to DevOps
John Willis
Principles
Monitoring
Objective
Visualize the product as "service" and stablish goals to monitor and improve them

Continuously
Service Reliability Culture
Complexity
Fast Feedback
@botchagalupe
12
14
15
16
16
17
18
To make, improve or restructure the Assembly Line
TO OBSERVE
TO THINK
Full transcript