[openstack-dev] Continuous deployment - significant process change
robertc at robertcollins.net
Mon Apr 29 21:04:23 UTC 2013
We had a process track session about bringing in upstream continuous
deployment for openstack.
I suspect that while the session was good with both deployer and
distributor attendees, we need to do more to make it happen, as it
impinges on review / testing / backwards compat requirements for every
Note that CD doesn't require no-downtime deployments, CD is about
being able to adopt *any arbitrary revision of trunk* at *any point in
time*. The engineering required to do deployments without disruption
is beneficial to both CD and per-release deployments.
Here are the key takeaways we came up with:
* No more big landings [except the purely mechanical]. Set a hard
limit - maybe 500 lines of diff. Big landings are more risky per line
of diff than small ones due to reviewer cognitive overhead - reviewers
get non-linearly less effective the larger the review.
* CD can be done many ways; we need to gate the *specific* ways that
upstream adopts, as soon as possible. Thats a -infra thing, and there
are already discussions on it. We don't need to support *every
possible config for CD*. Organisations interested in a particular
configuration(s) will need to contribute resources to permit
gate-quality checks of those configurations.
* No more cramming: when a freeze is happening, anything that is
'land this for the release' has to be pushed back on -really hard-. If
its not ready, it's not ready.
* -never- choose to break something that is neither experimental nor
deprecated since the last release. If an accident happens, correct it
as quickly as possible.
* Land features disabled by default. Such disabled features are
experimental and we don't need to be so careful - we can in fact
remove them entirely if they turn out to be bad idea - when they
become supported (individual teams can define this we think) they
can't be broken again though: they are now part of the product.
* 'To break' means just that - it could be an exception, it could be
a massive jump in DB utilisation, or latency. Whatever our criteria
are for 'fit for use', breaking something stops it being fit for use
in *existing deployed environments*.
How do we get all this into place : I can update wiki pages etc, but
is any more agreement needed? Should the TC eyeball it?
Robert Collins <rbtcollins at hp.com>
HP Cloud Services
More information about the OpenStack-dev