Send the link below via email or IMCopy
Present to your audienceStart remote presentation
- Invited audience members will follow you as you navigate and present
- People invited to a presentation do not need a Prezi account
- This link expires 10 minutes after you close the presentation
- A maximum of 30 users can follow your presentation
- Learn more about this feature in our knowledge base article
Transcript of Architectural Resiliency
refers to systems that are available for
periods of time.
Availability = Uptime / Total Time
High Availability Techniques
Caching (Local vs. Distributed)
Fail-Over (Cold, Warm, Hot)
Architecture Single Points of Failure
Storage (SAN - NFS Mounts)
Database (RDBMS & NoSQL)
Networking Infrastructure (Routers and Switches)
Middleware (Enterprise Integration & Message Oriented)
External Services (SAAS)
Acts of God
...not like security
Data Center Failures
Power (& Generator)
Other Covert Failures
- "the power or ability to return to the original form"
What can we learn from Nature?
Systems should have
Systems should have
Systems should be able to
Guidelines ("...like the Pirate Code")
Model threats and failure scenarios
OWASP Threat Modeling: http://bit.ly/1CX9e0k
Reference Book: http://bit.ly/1CX9HzA
Monitor and Measure
Eliminate or Reduce Chance of Failure
Stateless and Asynchronous
Stateful and Synchronous
Avoid Tight Coupling and Distributed Transactions
USE DISTRIBUTED TRANSACTIONS
Adopt Polyglot Architectures and Technologies
"Don't name your farm animals"
Commit to Continuous Delivery
"Keep Jez Happy"
Promote Culture of Accountability
$465m Trading Loss
Code Release Failure
Monitoring and Measurement
Mean time to failure (
Mean time to repair (
Mean time between failures (
Monitoring and Management Techniques
Correlating Activities (
Process Monitoring (
Continuous Integration and Continuous Deployment (CICD)
Continuous Integration (CI)
Continuous Deployment (CD)
Mean Time to Deploy
Document System dependencies
Practice Disaster Recovery
Hosting Facility Resiliency
Data Center should be multi-region: Weather & Fault lines
Hosting components (N) have at least one independent backup component (+1)....
...but N+2 Highly Recommended
Defense in Depth
Least Privilege Principle
Replication and Sharding
Event Sourcing & CQRS
ventual consistency (
Endpoint Resiliency Patterns
"Limits Failure Exposure"
"Validate Input and Sanitize Output"
OWASP Cheat Sheets: http://bit.ly/17ojtlE
Chaos Testing: http://bit.ly/17okcmO
"Get Written Permission First"
Blameless Postmortem (e.g. Etsy)
- "Be conservative in what you send, be liberal in what you accept"
The network is reliable.
Latency is zero.
Bandwidth is infinite.
The network is secure.
Topology doesn't change.
There is one administrator.
Transport cost is zero.
The network is homogeneous.
by Peter Deutsch
Fallacies of distributed computing
12 Factor-Factor App
tracked in revision control, many deploys
declare and isolate dependencies
III. Store config in the environment
IV. Treat backing services as attached resources
separate build and run stages
VI. Execute the app as one or more
VII. Export services via port binding
IX. Maximize robustness with
fast startup and graceful shutdown
development, staging, and production as similar as possible
logs as event stream
XII. Run admin/management tasks as one-off processes
Recover Time Objective (
Recovery Point Objective (
Other Key Measurements
Platform as a Service (PaaS)
- A quantitatively expressed reduction of uncertainty based on one or more observations.
Douglas W. Hubbarb
Measurement Scales - Stanley Smith Stevens:
Nominal - boolean value (e.g. likes)
Ordinal - simple uncalculated values (e.g. ratings)
Interval - range of possible values (e.g. temperature)
Ratio - values that can be compared (e.g. $)
"Anything can be measured"
1. If it matters then it is detectable
2. If it is detectable then it can be detected as a ranges of amounts
3. If it is detectable as a range of amounts then it can be measured
A less than perfect measurement, provided it reduces uncertainty, is better than no measurement at all.
Flow - Measuring the pain of software engineering
Secure by Default