Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Architectural Resiliency

No description
by

Jeremy Deane

on 2 December 2017

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Architectural Resiliency

High Availability (HA)
Architectural Resiliency
Inevitable Failure
Resiliency Principles
Resiliency Techniques
Final Word
@jtdeane
github.com/jtdeane
Jeremy Deane
JeremyDeane.net
High Availability
refers to systems that are available for
extended
periods of time.

Availability = Uptime / Total Time
High Availability Techniques
Redundancy (n+1)
Load Balancing
Clustering
Replication
Virtualization (Elasticity)
Caching (Local vs. Distributed)
Fail-Over (Cold, Warm, Hot)
High Availability
Types of
Failure
Architecture Single Points of Failure
Storage (SAN - NFS Mounts)
Database (RDBMS & NoSQL)
Networking Infrastructure (Routers and Switches)
Middleware (Enterprise Integration & Message Oriented)
External Services (SAAS)
Acts of God
Defective Things...
Incompetence
Negligence
...not like security
Happenstance
Mens Rea
HA Failures
Data Center Failures
Virtual Machine
Virtualization Host
Physical Machine
Environment Controls
Power (& Generator)
Other Covert Failures
Technical Debt
Innovation Debt
Unexpected Volume
Security
Resiliency
- "the power or ability to return to the original form"
What can we learn from Nature?
Systems should have
interconnected
network structures
Systems should have
diversity
and
redundancy
Systems should be able to
adapt
and
self-organize
Guidelines ("...like the Pirate Code")
Model threats and failure scenarios
OWASP Threat Modeling: http://bit.ly/1CX9e0k
Reference Book: http://bit.ly/1CX9HzA
Monitor and Measure
Eliminate or Reduce Chance of Failure
Stateless and Asynchronous
over
Stateful and Synchronous
Avoid Tight Coupling and Distributed Transactions
DO
NOT
USE DISTRIBUTED TRANSACTIONS
Adopt Polyglot Architectures and Technologies
"Don't name your farm animals"
Commit to Continuous Delivery
"Keep Jez Happy"
Promote Culture of Accountability
45 Minutes
$465m Trading Loss
Code Release Failure
Monitoring and Measurement
Mean time to failure (
MTTF
)
Mean time to repair (
MTTR
)
Mean time between failures (
MTBF
)
Monitoring and Management Techniques
Logging (
Standard Format
)
Correlating Activities (
Tracking ID
)
Process Monitoring (
Synthetics
)
Enterprise Dashboards
Continuous Integration and Continuous Deployment (CICD)
Continuous Integration (CI)
Clean
Compile
Unit Test
Package (
Versioned
)
Publish
Continuous Delivery (CD)
Instantiation
Provisioning
Installation
Startup (Boot)
Functional Regression
"MTTD"
Mean Time to Deploy
Recoverability
Document System dependencies
Practice Disaster Recovery
Murphy's Law
Architecture
Hosting Facility Resiliency
Data Center should be multi-region: Weather & Fault lines
Hosting components (N) have at least one independent backup component (+1)....
...but N+2 Highly Recommended
Security Resiliency
Defense in Depth
Least Privilege Principle
Data Resiliency
Replication and Sharding
Relaxed Consistency
Event Sourcing & CQRS
A
tomicity,
C
onsistency,
I
solation,
D
urability (
ACID
)
B
asic
A
vailability,
S
oft-state
E
ventual consistency (
BASE
)
Application Resiliency
Endpoint Resiliency Patterns
Throttler
Circuit Breaker
Tolerant Reader
Bulkheads
"Limits Failure Exposure"
Process Isolation
Microservices
Microcontainers
Microdevelopers?
Application Security
"Validate Input and Sanitize Output"
OWASP Cheat Sheets: http://bit.ly/17ojtlE
Validation
Negative Testing
Penetration Testing
Chaos Testing: http://bit.ly/17okcmO
"Get Written Permission First"
Culture
End-to-End Ownership
Blameless Postmortem (e.g. Etsy)
Continuous Improvement
Opportunity Co
$
t
"Manage Risk!"
Example
Example
Case Study
Deployment Dilemma
Conventional:
Locations?
Multi-tenancy?

Containers
Granularity?
Operations?
Postel's Law
- "Be conservative in what you send, be liberal in what you accept"
The network is reliable.
Latency is zero.
Bandwidth is infinite.
The network is secure.
Topology doesn't change.
There is one administrator.
Transport cost is zero.
The network is homogeneous.
by Peter Deutsch
Fallacies of distributed computing
"No Orphans"
Tiered Storage
12 Factor-Factor App
http://12factor.net/

I.
One codebase
tracked in revision control, many deploys
II. Explicitly
declare and isolate dependencies
III. Store config in the environment
IV. Treat backing services as attached resources
V. Strictly
separate build and run stages
VI. Execute the app as one or more

stateless processes
VII. Export services via port binding
IX. Maximize robustness with
fast startup and graceful shutdown
X. Keep
development, staging, and production as similar as possible
XI. Treat
logs as event stream
s
XII. Run admin/management tasks as one-off processes
Recover Time Objective (
RTO
)

Recovery Point Objective (
RPO
)
Other Key Measurements
https://gist.github.com/jtdeane/f2ed821f99fa214acdf9
https://gist.github.com/jtdeane/7a1b1c3a9653eb644e02
Platform as a Service (PaaS)
Measurement
- A quantitatively expressed reduction of uncertainty based on one or more observations.
-
Douglas W. Hubbarb

Measurement Scales - Stanley Smith Stevens:

Nominal - boolean value (e.g. likes)

Ordinal - simple uncalculated values (e.g. ratings)

Interval - range of possible values (e.g. temperature)

Ratio - values that can be compared (e.g. $)
"Anything can be measured"

1. If it matters then it is detectable

2. If it is detectable then it can be detected as a ranges of amounts

3. If it is detectable as a range of amounts then it can be measured
A less than perfect measurement, provided it reduces uncertainty, is better than no measurement at all.
Idea
Flow - Measuring the pain of software engineering

http://www.openmastery.org/


Secure by Default
Full transcript