[openstack-dev] [tripleo] Idempotence of the deployment process
Fox, Kevin M
Kevin.Fox at pnnl.gov
Sat Apr 1 21:00:50 UTC 2017
At our site, we've seen bugs in idempotence break our system too.
In once case, it was an edge case of the master server going uncontactable at just the wrong time for a few seconds, causing the code to (wrongly) believe that keys didnt exist and needed to be recreated, then network connectivity was re-established and it went on doing its destructive deed.
Similar things have happened on more then one occasion.
So, I've become less enthralled with the idea that you should be doing everything all the time, even though it should be idempotent. The more code you run, the more likely there will be a bug in it somewhere. Its extremely hard to test for all occurrences of these sorts of bugs.
You should carefully weigh the risks/rewards of self healing on each part of the system. If an action is only ever done once, like bootstrapping credentials, and the effect of "self healing" likely breaks the system anyway, its probably better never to do it repeatedly.
Thanks,
Kevin
________________________________________
From: Alex Schultz [aschultz at redhat.com]
Sent: Friday, March 31, 2017 4:21 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: [openstack-dev] [tripleo] Idempotence of the deployment process
Hey folks,
I wanted to raise awareness of the concept of idempotence[0] and how
it affects deployment(s). In the puppet world, we consider this very
important because since puppet is all about ensuring a desired state
(ie. a system with config files + services). That being said, I feel
that it is important for any deployment tool to be aware of this.
When the same code is applied to the system repeatedly (as would be
the case in a puppet master deployment) the subsequent runs should
result in no changes if there is no need. If you take a configured
system and rerun the same deployment code you don't want your services
restarting when the end state is supposed to be the same. In the case
of TripleO, we should be able deploy an overcloud and rerun the
deployment process should result in no configuration changes and 0
services being restarted during the process. The second run should
essentially be a noop.
We have recently uncovered various bugs[1][2][3][4] that have
introduced service disruption due to a lack of idempotency causing
service restarts. So when reviewing or developing new code what is
important about the deployment is to think about what happens if I run
this bit of code twice. There are a few common items that come up
around idempotency. Things like execs in puppet-tripleo should be
refreshonly or use unless/onlyif to prevent running again if
unnecessary. Additionally in the TripleO configuration it's important
to understand in which step a service is configured and if it possibly
would get deconfigured in another step. For example, we configure
apache and some wsgi services in step 3. But we currently configure
some additional wsgi openstack services in step 4 which is resulting
in excessive httpd restarts and possible service unavailability[5]
when updates are applied.
Another important place to understand this concept is in upgrades
where we currently allow for ansible tasks to be used. These should
result in an idempotent action when puppet is subsequently run which
means that the two bits of code essentially need to result in the same
configuration. For example in the nova-api upgrades for Newton to
Ocata we needed to run the same commands[6] that would later be run by
puppet to prevent clashing configurations and possible idempotency
problems.
Idempotency issues can cause service disruptions, longer deployment
times for end users, or even possible misconfigurations. I think it
might be beneficial to add an idempotency periodic job that is
basically a double run of the deployment process to ensure no service
or configuration changes on the second run. Thoughts? Ideally one in
the gate would be awesome but I think it would take to long to be
feasible with all the other jobs we currently run.
Thanks,
-Alex
[0] http://binford2k.com/content/2015/10/idempotence-not-just-big-scary-word
[1] https://bugs.launchpad.net/tripleo/+bug/1664650
[2] https://bugs.launchpad.net/puppet-nova/+bug/1665443
[3] https://bugs.launchpad.net/tripleo/+bug/1665405
[4] https://bugs.launchpad.net/tripleo/+bug/1665426
[5] https://review.openstack.org/#/c/434016/
[6] https://review.openstack.org/#/c/405241/
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
More information about the OpenStack-dev
mailing list