[openstack-dev] [tripleo] Idempotence of the deployment process

James Slagle james.slagle at gmail.com
Mon Apr 3 13:42:11 UTC 2017


On Sat, Apr 1, 2017 at 5:00 PM, Fox, Kevin M <Kevin.Fox at pnnl.gov> wrote:
> At our site, we've seen bugs in idempotence break our system too.
>
> In once case, it was an edge case of the master server going uncontactable at just the wrong time for a few seconds, causing the code to (wrongly) believe that keys didnt exist and needed to be recreated, then network connectivity was re-established and it went on doing its destructive deed.
>
> Similar things have happened on more then one occasion.
>
> So, I've become less enthralled with the idea that you should be doing everything all the time, even though it should be idempotent. The more code you run, the more likely there will be a bug in it somewhere. Its extremely hard to test for all occurrences of these sorts of bugs.

+1 to what you've both said.

I think we should take a pragmatic approach here. Our code needs to be
idempotent and we should definitely develop and fix bugs with that in
mind.

At the same time, we don't need to run code unnecessarily. Do we
really need or want to run puppet 15 times across a HA cluster of 3
controller nodes just when we are scaling out compute nodes? It would
be very difficult and I don't know how we'd ever be sure we fixed all
the potential bugs caused by non-idempotent code.

I've been thinking about how to address some of these issues, and I
started on a spec for Pike to address one aspect:

https://review.openstack.org/431745

That is still WIP as we discuss how to address this in a general way
in the templates. However, as a first step I've proposed a patch that
will help in the case of scaling out nodes:

https://review.openstack.org/452223

I think one of the themes of this type of work is "let operators who
know what they're doing, actually be able to do it". There are risks
with exposing more knobs to enable/disable functionality. However, if
you know what you're doing, and it's documented sufficiently, then
there are huge benefits as well (e.g., huge reduction in time to scale
out compute nodes).

-- 
-- James Slagle
--



More information about the OpenStack-dev mailing list