Jesse Keating
jesse.keating@rackspace.com
@iamjkeating
prezi.com/j2sol
Upstream development happens
Internal merge branch and package is created daily
CI environment is (nearly) fully rebuilt and deployed
Automated testing validates the package
This is done daily -- or faster if we iterate on a sub-task
CI-validated package deployed as an upgrade to preproduction
More automated tests and human driven tests to validate integration and upgrade process (migrations)
Package iteration for bug fixes re-deployed to preproduction and tests re-evaluate
OpenStack with some of our own software on top.
Package iterates in preproduction for two weeks (or longer) before considered production ready
Outage windows used per-region to deploy tested package
post-deploy smoke tests validate continued operation and successful deployment
Hot-patch of code/config post-deploy to address criticial issues
All production regions are deployed in a given week -- sometimes longer if issues are encountered.
Multiple deploys of new Public Cloud code a day -- with sub 10-second customer perceived impact
Public cloud has Regions -- a physical collection of systems that act as part of the public cloud
Venvs built from git tags applied to merged repos
Bundled together with puppet manifests based on same git tag to form a package
Package uploaded to torrent server
All driven by jenkins and custom software
Can happen at any time leading up to outage window
Shell driven Ansible playbooks
Tens of thousands nodes to touch!
Shell driven Ansible
60~70 cloud instances for control
Majority of work done during a migration can happen on a live DB. Only a small part should lock the db.
Reduces time when db is locked and services are out
Graceful shutdowns and cell first means minimal customer impact
Reduces time when API or compute resources are unavailable. Allows for more frequent deploys.
Potential for zero perceived customer impact