Introducing 

Prezi AI.

Your new presentation assistant.

Refine, enhance, and tailor your content, source relevant images, and edit visuals quicker than ever before.

Loading…
Transcript

Small Data

One Location

Low Request Rates

Simple Queries

Low Availability

Consistent

Low Security

Ephemeral

Automation and Proof Points

PaaS at scale?

Big benchmarks?

Scaling All Dimensions Together

The Common Themes...

Map-Reduce - DIY Hadoop

  • HortonWorks
  • Cloudera
  • MapR
  • Datastax Cassandra Enterprise / Brisk

Big Data

AWS Elastic Map-Reduce - Hadoop

Google BigQuery

Riak

AWS 10Gbit Network instances

AWS DynamoDB

Solid State Disks

Availability Zones

Network Attached Storage

Zadara Storage

AWS Elastic Block Store

Bigger Data

Bigger Instance Types

  • More disks - EC2 to 1.6TB
  • More RAM - EC2 to 68GB

AWS Availability Zones

  • Double or triple writes
  • Cross zone latency
  • Inter-zone bandwidth cost

AWS DynamoDB

Access Keys

High Availability

Multiple datacenters a few Milliseconds apart

High Security

Increased complexity

  • Installation issues
  • Monitoring and diagnostics

Fine grain AWS security groups

Cassandra Secure Inter-node Traffic

AWS DynamoDB

Scalability Dimensions

  • Data Size
  • Query Complexity
  • Read & Mutate Request Rates
  • Geographical Distribution

AWS Availability Zones

AWS DynamoDB

Scalability Constraints

  • Availability
  • Security
  • Consistency
  • Durability

Azure Storage Geo Replication

AWS RDS MySQL Master/Slave

http://blogs.msdn.com/b/windowsazurestorage/archive/2011/09/15/introducing-geo-replication-for-windows-azure-storage.aspx

AWS 10Gbit Network instances

http://buyafuckingssd.com/

AWS DynamoDB

Scaling Data Architectures in the Cloud

AWS SimpleDB

@adrianco

Netflix Cloud Architect

for #ccevent Performance Summit

Feb 13th, 2012

Personal Opinions and Incomplete Experiences...

Cassandra Replication with Local Quorum

AWS S3

http://buyafuckingssd.com/

Azure Storage

MySQL Master/Slave with Zadara Storage

Google Cloud Storage

Google Cloud Storage Cross US or Cross Europe

Highest Request Rates

Local Replication

Higher Request Rates

Global Replication

Multiple datacenters a few Milliseconds apart

Start Here

Multiple datacenters up to hundreds of Milliseconds apart

Citrusleaf

Cassandra

Citrusleaf

Cassandra Replication with Quorum

Riak

RethinkDB

MySQL on local disk

Riak

RethinkDB

Netflix uses Cassandra to do this between us-east (Virginia) and eu-west (Ireland), it works...

MongoDB Master/Slave Replicas

MongoDB with Zadara Storage

Riak Replication

AWS Availability Zones

AWS RDS MySQL Master/Slave

AWS DynamoDB

AWS SimpleDB

AWS S3

Cassandra with Read/Write One

  • Fastest option
  • Eventually consistent

Azure Storage

Google Cloud Storage

Durable

DynamoDB Eventually Consistent Requests

Multiple independent copies

Eventually Consistent

Queries

Cassandra Replication with Quorum

MySQL with Zadara Storage

MongoDB Master/Slave Replicas

HBase for Range Queries

MongoDB Eventual Consistency

Riak Replication

Cassandra with Acunu Storage

Riak Eventual Consistency

More Complex Queries

EMR / Hadoop with Pig, Hive or Cascading

Google BigQuery

DynamoDB?

Global Regions

EMR / Hadoop with Pig, Hive or Cascading

Custom Map-Reduce jobs

Most Complex Queries

Google BigQuery

Cassandra

Riak

Scalability Benchmark

  • Over a Million writes/s
  • Setup 288 nodes in an hour

Netflix Open Source PaaS

  • github.com/netflix
  • Cassandra Client - Astyanax
  • Automation - Priam

Link to Slides/Discuss @adrianco

Learn more about creating dynamic, engaging presentations with Prezi