Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

DS Amazon

No description
by

Aya Hawari

on 22 March 2018

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of DS Amazon

Amazon's Dynamo
Design Considerations
Highly Available Key-value Store
Abdulraqeb Al-Sarori
Mohammed Hammouda
Huthaifa Yusef
Aya Hawari
Thank You
Outline
Overview
Design Considerations

Incremental Scalability
Load Balancing & Heterogeneity
Symmetry
Decentralization
Dynamo's Architecture
System Interface
Partitioning & Replication
Versioning
Handling Failures
Membership & Failure Detection
Implementation
Overview
- Amazon eCommerce platform
Scale
> 10M customers at peak times .
> 10K servers (distributed around the world) .
3M checkout operations per day .

Problem: Reliability at massive scale:
Slightest outage has significant financial consequences and impacts customer trust.
- DYNAMO's REQUERMENTS
Query Model
ACID Properties
Efficiency
Read and write operations that are defined by a key
Atomicity, Consistency, Isolation, Durabilty
Systems must achieve latency and throughput requirements
Dynamo:
relax the “C” ,,, to increase availability,

- DYNAMO's ASSUMPTIONS
> Only used by internal Amazon systems

> No security considerations

> Unlimited scalability
- DYNAMO's SLA
Service Level Agreement:
contract between client and service about their relationship

In Amazon a typical client request involves over 100 services who might have dependencies

SLA are governed by 99.9th percentile
Incremental Scalability

Must be able to add nodes on-demand with minimal impact
Load Balancing & Heterogeneity
Symmetry
Every node should have the same set of responsibilities as its peers.
Focus on peer to peer techniques to avoid single points of failure
Decentralization
Work must be distributed according to capabilities of the nodes
Dynamo's Architecture
- System Interface
- Partitioning
- Replication
- Data Versioning
- Handling Failures
- Membership & Failure Detection
Two operations:

put (key, context, object)
key: PK associated with data object
context: vector clocks and history
object: data to store
get (key)
- Data is replicated on N hosts (N is determined by user)

- Coordinator nodes replicate the data for nodes they are responsible for coordinating

- Preference list: The list of nodes that is responsible for storing a particular key.
- Use 'Consistent Hashing'
" the output range of a hash function is treated as a fixed circular space (ring). "
Each node gets an ID from the space of keys.
Nodes are arranged in a ring.
Data stored on the first node clockwise of the current placement of the data key.
DYNAMO
Use “virtual nodes”
Dynamo provides eventual consistency,
> A put() call may return to its caller before the update has been applied at all the replicas

> A get() call may return many versions of the same object.

Challenge:
an object having distinct version sub-histories, which the system will need to reconcile in the future.


Solution:
uses '
Vector Clocks
' in order to capture causality between different versions of the same object.
Version evolution of an object over time.
Each version of each object has one associated vector clock.

‘‘ list of (node, counter) pairs ’’

Reconciliation:
If the counters on the first object’s clock are less-than-or-equal than all of the nodes in the second clock, then the first is a direct ancestor of the second (and can be ignored).

Otherwise; 'Application-level' reconciliation !

Vector Clock
Temporary failures: Hinted Handoff
Permanent failures: Replica Synchronization
Replica that would have been on failed node is sent to another with a hint as to original destination
Replica synchronization to insure no information is lost
Ring Membership
Use background Gossip
Use standard Gossip, Heartbeats, and Timeouts to implement failure detection
Failure Detection
Merkle Tree
External discovery
Use Seeds to prevent logical partition
Evaluation
Local persistence component allows for different storage engines to be plugged in:
- Berkeley Database (BDB) :
- MySQL:
object of tens of kilobytes
larger objects
Implementation
Java
Evaluation
Service Oriented Architecture
Ensuring Uniform
Load

Distribution
Strategy
Strategy
Strategy
Partition by Token and T Tokens per node
Range of nodes vary b/c of random selection of tokens
Partition into 'equal slices' and T Tokens per node
Tokens used to map values in hash space to nodes
Partition into 'equal slices' and 'Q/S' Tokens per node
Each node in system must always have Q/S Tokens assigned to it
strategy is the best
3
rd
in terms of balancing
(to deal with non-uniform data and load distribution)
Advantages of using virtual nodes

The number of virtual nodes that a node is responsible can be decided based on its capacity, accounting for heterogeneity in the physical infrastructure.

A real node’s load can be distributed across the ring, thus ensuring a hot spot is not targeted to a single node.

If a node becomes unavailable the load handled by this node is evenly dispersed across the remaining available nodes.

When a node becomes available again, the newly available node accepts a roughly equivalent amount of load from each of the other available nodes.
- List of nodes responsible for storing a particular key.

- Due to failures, preference list contains more than N nodes.

- Due to virtual nodes, preference list skips positions to ensure distinct physical nodes.
Preference Lists
One of the important design consideration is to decide

> to perform the process of resolving the conflicts i.e. during reads or writes.

>> At read time (providing an “always writeable” data store).

> perform the process of conflict resolution.

>> Can be done by data store or application.
when
who
-Average and 99.9 percentiles of latencies for read and write requests during our peak request season of December 2006.
-Comparison of performance of 99.9th percentile
latencies for buffered vs. non-buffered writes over a period of
24 hours.
> Key identifies operations
> Operations don’t require multiple data items
> Data to be stored is relatively small
>> this simple query model and do not need any `Relational Schema` !!
NoSQL
database
Query Model
Balancing Performance and Durability
Full transcript