Send the link below via email or IMCopy
Present to your audienceStart remote presentation
- Invited audience members will follow you as you navigate and present
- People invited to a presentation do not need a Prezi account
- This link expires 10 minutes after you close the presentation
- A maximum of 30 users can follow your presentation
- Learn more about this feature in our knowledge base article
Transcript of DS Amazon
Highly Available Key-value Store
Load Balancing & Heterogeneity
Partitioning & Replication
Membership & Failure Detection
- Amazon eCommerce platform
> 10M customers at peak times .
> 10K servers (distributed around the world) .
3M checkout operations per day .
Problem: Reliability at massive scale:
Slightest outage has significant financial consequences and impacts customer trust.
- DYNAMO's REQUERMENTS
Read and write operations that are defined by a key
Atomicity, Consistency, Isolation, Durabilty
Systems must achieve latency and throughput requirements
relax the “C” ,,, to increase availability,
- DYNAMO's ASSUMPTIONS
> Only used by internal Amazon systems
> No security considerations
> Unlimited scalability
- DYNAMO's SLA
Service Level Agreement:
contract between client and service about their relationship
In Amazon a typical client request involves over 100 services who might have dependencies
SLA are governed by 99.9th percentile
Must be able to add nodes on-demand with minimal impact
Load Balancing & Heterogeneity
Every node should have the same set of responsibilities as its peers.
Focus on peer to peer techniques to avoid single points of failure
Work must be distributed according to capabilities of the nodes
- System Interface
- Data Versioning
- Handling Failures
- Membership & Failure Detection
put (key, context, object)
key: PK associated with data object
context: vector clocks and history
object: data to store
- Data is replicated on N hosts (N is determined by user)
- Coordinator nodes replicate the data for nodes they are responsible for coordinating
- Preference list: The list of nodes that is responsible for storing a particular key.
- Use 'Consistent Hashing'
" the output range of a hash function is treated as a fixed circular space (ring). "
Each node gets an ID from the space of keys.
Nodes are arranged in a ring.
Data stored on the first node clockwise of the current placement of the data key.
Use “virtual nodes”
Dynamo provides eventual consistency,
> A put() call may return to its caller before the update has been applied at all the replicas
> A get() call may return many versions of the same object.
an object having distinct version sub-histories, which the system will need to reconcile in the future.
' in order to capture causality between different versions of the same object.
Version evolution of an object over time.
Each version of each object has one associated vector clock.
‘‘ list of (node, counter) pairs ’’
If the counters on the first object’s clock are less-than-or-equal than all of the nodes in the second clock, then the first is a direct ancestor of the second (and can be ignored).
Otherwise; 'Application-level' reconciliation !
Temporary failures: Hinted Handoff
Permanent failures: Replica Synchronization
Replica that would have been on failed node is sent to another with a hint as to original destination
Replica synchronization to insure no information is lost
Use background Gossip
Use standard Gossip, Heartbeats, and Timeouts to implement failure detection
Use Seeds to prevent logical partition
Local persistence component allows for different storage engines to be plugged in:
- Berkeley Database (BDB) :
object of tens of kilobytes
Service Oriented Architecture
Partition by Token and T Tokens per node
Range of nodes vary b/c of random selection of tokens
Partition into 'equal slices' and T Tokens per node
Tokens used to map values in hash space to nodes
Partition into 'equal slices' and 'Q/S' Tokens per node
Each node in system must always have Q/S Tokens assigned to it
strategy is the best
in terms of balancing
(to deal with non-uniform data and load distribution)
Advantages of using virtual nodes
The number of virtual nodes that a node is responsible can be decided based on its capacity, accounting for heterogeneity in the physical infrastructure.
A real node’s load can be distributed across the ring, thus ensuring a hot spot is not targeted to a single node.
If a node becomes unavailable the load handled by this node is evenly dispersed across the remaining available nodes.
When a node becomes available again, the newly available node accepts a roughly equivalent amount of load from each of the other available nodes.
- List of nodes responsible for storing a particular key.
- Due to failures, preference list contains more than N nodes.
- Due to virtual nodes, preference list skips positions to ensure distinct physical nodes.
One of the important design consideration is to decide
> to perform the process of resolving the conflicts i.e. during reads or writes.
>> At read time (providing an “always writeable” data store).
> perform the process of conflict resolution.
>> Can be done by data store or application.
-Average and 99.9 percentiles of latencies for read and write requests during our peak request season of December 2006.
-Comparison of performance of 99.9th percentile
latencies for buffered vs. non-buffered writes over a period of
> Key identifies operations
> Operations don’t require multiple data items
> Data to be stored is relatively small
>> this simple query model and do not need any `Relational Schema` !!
Balancing Performance and Durability