Loading presentation...

Present Remotely

Send the link below via email or IM


Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.


DS Amazon

No description

Aya Hawari

on 22 March 2018

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of DS Amazon

Amazon's Dynamo
Design Considerations
Highly Available Key-value Store
Abdulraqeb Al-Sarori
Mohammed Hammouda
Huthaifa Yusef
Aya Hawari
Thank You
Design Considerations

Incremental Scalability
Load Balancing & Heterogeneity
Dynamo's Architecture
System Interface
Partitioning & Replication
Handling Failures
Membership & Failure Detection
- Amazon eCommerce platform
> 10M customers at peak times .
> 10K servers (distributed around the world) .
3M checkout operations per day .

Problem: Reliability at massive scale:
Slightest outage has significant financial consequences and impacts customer trust.
Query Model
ACID Properties
Read and write operations that are defined by a key
Atomicity, Consistency, Isolation, Durabilty
Systems must achieve latency and throughput requirements
relax the “C” ,,, to increase availability,

> Only used by internal Amazon systems

> No security considerations

> Unlimited scalability
Service Level Agreement:
contract between client and service about their relationship

In Amazon a typical client request involves over 100 services who might have dependencies

SLA are governed by 99.9th percentile
Incremental Scalability

Must be able to add nodes on-demand with minimal impact
Load Balancing & Heterogeneity
Every node should have the same set of responsibilities as its peers.
Focus on peer to peer techniques to avoid single points of failure
Work must be distributed according to capabilities of the nodes
Dynamo's Architecture
- System Interface
- Partitioning
- Replication
- Data Versioning
- Handling Failures
- Membership & Failure Detection
Two operations:

put (key, context, object)
key: PK associated with data object
context: vector clocks and history
object: data to store
get (key)
- Data is replicated on N hosts (N is determined by user)

- Coordinator nodes replicate the data for nodes they are responsible for coordinating

- Preference list: The list of nodes that is responsible for storing a particular key.
- Use 'Consistent Hashing'
" the output range of a hash function is treated as a fixed circular space (ring). "
Each node gets an ID from the space of keys.
Nodes are arranged in a ring.
Data stored on the first node clockwise of the current placement of the data key.
Use “virtual nodes”
Dynamo provides eventual consistency,
> A put() call may return to its caller before the update has been applied at all the replicas

> A get() call may return many versions of the same object.

an object having distinct version sub-histories, which the system will need to reconcile in the future.

uses '
Vector Clocks
' in order to capture causality between different versions of the same object.
Version evolution of an object over time.
Each version of each object has one associated vector clock.

‘‘ list of (node, counter) pairs ’’

If the counters on the first object’s clock are less-than-or-equal than all of the nodes in the second clock, then the first is a direct ancestor of the second (and can be ignored).

Otherwise; 'Application-level' reconciliation !

Vector Clock
Temporary failures: Hinted Handoff
Permanent failures: Replica Synchronization
Replica that would have been on failed node is sent to another with a hint as to original destination
Replica synchronization to insure no information is lost
Ring Membership
Use background Gossip
Use standard Gossip, Heartbeats, and Timeouts to implement failure detection
Failure Detection
Merkle Tree
External discovery
Use Seeds to prevent logical partition
Local persistence component allows for different storage engines to be plugged in:
- Berkeley Database (BDB) :
- MySQL:
object of tens of kilobytes
larger objects
Service Oriented Architecture
Ensuring Uniform

Partition by Token and T Tokens per node
Range of nodes vary b/c of random selection of tokens
Partition into 'equal slices' and T Tokens per node
Tokens used to map values in hash space to nodes
Partition into 'equal slices' and 'Q/S' Tokens per node
Each node in system must always have Q/S Tokens assigned to it
strategy is the best
in terms of balancing
(to deal with non-uniform data and load distribution)
Advantages of using virtual nodes

The number of virtual nodes that a node is responsible can be decided based on its capacity, accounting for heterogeneity in the physical infrastructure.

A real node’s load can be distributed across the ring, thus ensuring a hot spot is not targeted to a single node.

If a node becomes unavailable the load handled by this node is evenly dispersed across the remaining available nodes.

When a node becomes available again, the newly available node accepts a roughly equivalent amount of load from each of the other available nodes.
- List of nodes responsible for storing a particular key.

- Due to failures, preference list contains more than N nodes.

- Due to virtual nodes, preference list skips positions to ensure distinct physical nodes.
Preference Lists
One of the important design consideration is to decide

> to perform the process of resolving the conflicts i.e. during reads or writes.

>> At read time (providing an “always writeable” data store).

> perform the process of conflict resolution.

>> Can be done by data store or application.
-Average and 99.9 percentiles of latencies for read and write requests during our peak request season of December 2006.
-Comparison of performance of 99.9th percentile
latencies for buffered vs. non-buffered writes over a period of
24 hours.
> Key identifies operations
> Operations don’t require multiple data items
> Data to be stored is relatively small
>> this simple query model and do not need any `Relational Schema` !!
Query Model
Balancing Performance and Durability
Full transcript