Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

Big Data in the Cloud

No description
by

Ali Hodroj

on 17 March 2014

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Big Data in the Cloud

AWS User Group - NYC
March, 2013

Big Data in the Cloud
Context: Big Data
Motivation: Cloud
Motivation: Common Problems
Goals,
and potential solutions
Ali Hodroj
How we solved it
Thank You!
Questions?
Availability
Orchestrate this...
Scalability
and Cost
http://bit.ly/ZFeao6
OpenLogic: Real-time search of Big Data sets
GigaSpaces customer
Real-time Big Data on Amazon EC2
about:me
@ahodroj
solutions architect @
Smallest practical MongoDB cluster...
Programs that are...
Big Data
Cloud
A Push/Pull semantic
Platforms that are...
Horizontally Scalable
Parallel (Map/Reduce)
Data-intensive
Exploratory analytics
Distributed
Redundant/Fault-tolerant
Elastic
On-Demand
Unlimited Scale
Distributed
Horizontally Scalable
Data-intensive
Cheap Storage
Or lack thereof...
EC2, Cache, RDS,
Elastic Map/Reduce
October
2012
EBS and EC2 outage
Dec
2012
Feb
2013
June
2012
Windows Azure
Accidental Complexity
"Accidental (or incidental) complexity is complexity that arises in computer programs or their development process which is non-essential to the problem to be solved. While essential complexity is inherent and unavoidable, accidental complexity is caused by the approach chosen to solve the problem."
Cloud Bursting Case Study
Zynga moved ~80% of their
workload from Amazon to
their private zCloud
"own the base, rent the spike"
http://code.zynga.com/2012/02/the-evolution-of-zcloud/
http://www.cloudifysource.org
Open source (Apache 2) framework for deploying, managing, healing, and scaling any app on any cloud
Any cloud
(Specify generics, automate specifics)
Cloud
Deployment Model:
Recipes
Cloud
Deployment model
Demo
Application
Service tier (Cassandra)
Lifecycle Management
Auto-Scaling Across the stack
Multi-Zone MongoDB
Cluster
mong-ha recipe
http://bit.ly/15VVnZG
Ops-centric
Automation
Cross-Cloud
Redundancy
App-centric
Automation
Cloud Portability
Bursting/Workload Migration
~100 TB hybrid cloud
http://nflx.it/tK0wiB
288-node Cassandra cluster
1 million writes/sec
30-60 minutes setup
It's not just OS-level metrics (CloudWatch)

Expanding/Shrinking based on?
...Mapper throuput
...Reducer throughput
...Data edge cases
MongoDB
Cloud Bursting...
Automatically
add new shards?
Amazon EC2
Private Cloud
Not all clouds are create equal...
Onboarding - Monitoring -
Autoscaling
Somewhere
between
DevOps and PaaS
Managing Big
Data Apps
Managing Big
Data
>
Get it today, for free
http://www.cloudifysource.org
http://www.github.com/CloudifySource/cloudify-recipes
Recipes
Why am I here?
To share how GigaSpaces manages Big Data app
deployment in the cloud
Offline local
development
(JVM's as mock instances)
Public cloud
Private clouds
Traditional
IT data center
Cloud Driver
@CloudifySource
Lessons Learned
Cloud portability
matters
Bursting
!= OLAP, BI, Cubes
Terabyte/Petabyte scale
Full transcript