Introducing

Prezi AI.

Your new presentation assistant.

Refine, enhance, and tailor your content, source relevant images, and edit visuals quicker than ever before.

ApacheCon – Cassandra (and Hadoop) – FINN.NO

mck semb wever

Updated June 24, 2015

Transcript

C* rocks on…

( get rid of JOINs! )

high/fast write throughput
schema CQL design
time-to-live on data
size of data
total load scale out
uptime
tunable consistency vs availability

}

a few uses…

Cassandra (and Hadoop) case study

#1 Users Search History

#2 Fraud detection

#3 IP-to-Geography

#4 Message Inbox

#5 Microservices metrics

#6 Event Statistics

C* is moving way quick!

lessons learnt

MACHINE SPECS

24 CPU
50 Gb RAM
5.5 Tb disks RAID50
100Gb SSD (commit logs)
100Gb SSD (HDFS)
NOOP IO kernel scheduler

casssandra-2.0.11

Avoid OrderPreservingPartitioner
Avoid custom serialised data (transparency is gold)
Avoid skinny rows on non-commodity machines

CQL3 rocks (always!)
Use json over maps (until you need maps)

Don't run JobTracker+NameNode on a C node
Upgrading to vnodes tedious (~2 months)

During C ops
plan carefully, test strategies, test clients,
easy to bump CL.ONE -> CL.QUORUM,
stop Hadoop

Heed disk latency + utilisation
>20% asking for trouble
SSD for commit-log, separate SSD for HDFS,

Comprehensive monitoring - detect problems quick
C is robust and will hide problems
monitor gc, and pending actions growing,
look out for spikes in 95th percentile

Stay under capacity!
easy backup, repair, streaming, etc

(Cassandra writes sequential)

Y

Time-Series Event Tracking and Aggregation

Time-Series

Event Tracking and Aggregation

a la

"Event Statistics"

Time-Series Event Tracking and Aggregation

Kafka

Scribe

the real world…

active development
clustered (sync msgs)
stream processing!

a lot more servers
zookeeper

async msgs
decentralised
simple ops

archaic options
lost buffers

Time-Series Event Tracking and Aggregation

Still using super columns?

Time-Series Event Tracking and Aggregation

SizeTiered

Leveled

DateTiered

SizeTiered

Leveled

DateTiered

Time-Series Event Tracking and Aggregation

milliseconds

DateTiered

Leveled

SizeTiered

Time-Series Event Tracking and Aggregation

DateTiered

Leveled

SizeTiered

Splits

RecordReader

Time-Series Event Tracking and Aggregation

Each day:

5000+ minute jobs
7 daily jobs
+ ad-hoc jobs
~ 1 billion records read from C
~ 150M records written to C

CL.ALL

CL.ONE

Data Locality !

Time-Series Event Tracking and Aggregation

Simple Approach:

favour localhost location

TokenAwarePolicy(DcAwareRoundRobinPolicy(local-dc))

and use consistency level LOCAL_ONE

Proper approach:

hadoop-2.2 & Capacity Scheduler

capacity-scheduler.xml "yarn.scheduler.capacity.node-locality-delay = 3"

core-site.xml "net.topology.script.file.name = <your-topology-script>"

User Centric Statistics

Simple…

Time-Series Event Tracking and Aggregation

Advanced…

graphing: user->ad, ad->user, etc
Mahout ("Taste") --> Myrrix
Spark ALS

http://www.datastax.com/dev/blog/cql3-table-support-in-hadoop-pig-and-hive

http://www.acunu.com/2/post/2011/08/scaling-up-cassandra-and-mahout-with-hadoop.html

datacenter

nydalen

datacenter

postgirobygget

DC 1

DC 2

DC 1 _FAST

DC 2 _FAST

(+MapReduce +Spark)

Secondary Indexes
Compression
CQL3
CQL tracing
Automatic pagination
async and unlogged batch statements
fluent api to cql java driver
Counters-2 !!

(Cassandra-2.1)

DC 1

DC 2

DC 1 _FAST

DC 2 _FAST

Choose a template

Whiteboard (AI Assisted)

Unleash creativity and collaboration with our Whiteboard Prezi AI-assisted presentation template, seamlessly combining the simplicity of a traditional whiteboard with the power of digital innovation for dynamic and interactive visual storytelling.

Constellations (AI Assisted)

Illuminate your ideas with our captivating Constellations Prezi AI-assisted presentation template, merging celestial elegance with professional design to elevate your content and guide your audience through a stellar visual experience.

Hiking Journey (AI Assisted)

Elevate your presentations with our immersive Hiking Journey Prezi AI-assisted presentation template, meticulously crafted to showcase the beauty of your adventures, from scenic trails to breathtaking landscapes, providing a visually compelling experience for every outdoor enthusiast.

See more templates →

Presentations from around the world

PROY. PED. CON TIC

Yanina Chwoewsky

Karter-wonder

Mrs. Wood

Natalie

Fifth Grade

See staff picks →

Learn more about creating dynamic, engaging presentations with Prezi

Why Prezi is better