Loading presentation...

Present Remotely

Send the link below via email or IM


Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.


Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

Powering Cloudstack with Ceph RBD

Presentation materials for my talk at Apachecon NA 2013. http://na.apachecon.com/schedule/presentation/146/

Patrick McGarry

on 27 February 2013

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Powering Cloudstack with Ceph RBD

Powering Cloudstack with Ceph RBD What is Ceph? BLOCK FILE OBJECT Unified, Distributed Storage Platform native
RESTful thin provisioning
cloning strong consistency
snapshots Why should you care? Time Cost ease of administration
no manual data migration / load balancing
painless scaling linear function of size/performance
incremental expansion
no vendor lock-in
open diverse storage needs
block (primarily VMs)
shared file system w/ POSIX, coherent caches
structured data

terabytes, petabytes, exabytes
heterogeneous hardware
reliability and fault tolerance Requirements Object Block File Why start with ... ? more useful than (disk) blocks
names in a single flat namespace
variable size
simple API with rich semantics

more scalable than files
no hard-to-distribute hierarchy
update semantics do not span objects
workload is trivially parallel model pools
1s to 100s
independent namespaces or object collections
replication level, placement policy

blobs of data (bytes to gigabytes)
attributes (ex: "version=12"; bytes to kilobytes)
key/value bundle (bytes to gigabytes) Usually, it looks like this... [Human] [Computer] [Disk] [Disk] [Disk] [Disk] Well, more like this... [Human] [Human] [Human] [Human] [Human] [Human] [Human] [Human] Ceph makes it look just a little different... Object Storage Daemons (OSDs)
10s to 10,000s per cluster
1 per disk, SSD, RAID, etc
hardware agnostic
serve stored objects to clients
intelligently peer to perform:
recovery Monitors
Small, odd, number per cluster
Maintain cluster membership && state
provide consensus for distributed decision making
do not serve stored objects to clients Which is assembled like this ... Yes, but what makes Ceph cool? CRUSH
pseudo-random placement algorithm
fast calc
no lookup
statistically uniform distribution
stable mapping
limited data migration on change
rule-based configuration
infrastructure topology aware
adjustable replication
allows weighting But how does that work? hash(object name) % num pg CRUSH (pg, cluster state, policy) Or, at a higher level... [Client] So, what happens when something breaks? then... Now, lets revisit how we talk to the cluster... atomic single-action transactions
update data, attr together
efficient key/value storage inside an object
object-granularity snapshot primitives
embed code in ceph-osd daemon via plugin API
inter-client communication via object (finally, what we came for!) So what is RBD, really? RADOS Block Device
storage of disk images in RADOS
decouple VM from host
images striped across entire cluster (pool)
copy-on-write clones
support in:
Qemu / KVM
mainline Linux kernel (2.6.39+)
Cloudstack, Openstack, Xen So what does this look like? Virtual disk gets striped across the cluster Most commonly used for VMs, like those running in Cloudstack. The idea is that:
objects -> librados -> librbd -> virtualization (KVM) -> VM
gives you something like Amazon's EBS Because it's a shared environment, you can do fun things like migrate running instances between hosts. The driver in mainline Linux kernel allows you to map a block device as a device node (/dev/rbd0). What is copy-on-write cloning? So, what about Cloudstack? 4.0 got RBD support for Primary Storage via KVM
No support for VMWare / Xen (patches welcome!)
Live migration supported
No snapshots (yet)
NFS still required for system VMs Setup is easy! Add a hypervisor
per Cloudstack doc
nothing special

Add primary storage
CloudStack UI
Infrastructure ->
Primary Storage ->
"Add Primary Storage" ->
Select "Protocol" RBD ->
Fill in cluster info (cephx) ->
Optionally tag it as 'rbd'

Done! What's next? Implement snapshot && backup support
probably 4.2, with new storage code
Cloning (aka layering) support
one base/golden image for multi-instance
Ceph support for Secondary/Backup storage
backup storage is new in 4.2
great use for that Ceph S3-compatible gateway Who is to blame? Wrote CloudStack integration
Partner, not an Inktank employee
(we like it this way)
Does CloudStack / Ceph Support
@widodh Questions? Patrick McGarry



Full transcript