[openstack-dev] [tripleo] Idempotence of the deployment process

Alex Schultz aschultz at redhat.com
Mon Apr 3 20:57:38 UTC 2017


On Sun, Apr 2, 2017 at 6:01 AM, Dan Prince <dprince at redhat.com> wrote:
> On Fri, 2017-03-31 at 17:21 -0600, Alex Schultz wrote:
>> Hey folks,
>>
>> I wanted to raise awareness of the concept of idempotence[0] and how
>> it affects deployment(s).  In the puppet world, we consider this very
>> important because since puppet is all about ensuring a desired state
>> (ie. a system with config files + services). That being said, I feel
>> that it is important for any deployment tool to be aware of this.
>> When the same code is applied to the system repeatedly (as would be
>> the case in a puppet master deployment) the subsequent runs should
>> result in no changes if there is no need.  If you take a configured
>> system and rerun the same deployment code you don't want your
>> services
>> restarting when the end state is supposed to be the same. In the case
>> of TripleO, we should be able deploy an overcloud and rerun the
>> deployment process should result in no configuration changes and 0
>> services being restarted during the process. The second run should
>> essentially be a noop.
>>
>> We have recently uncovered various bugs[1][2][3][4] that have
>> introduced service disruption due to a lack of idempotency causing
>> service restarts. So when reviewing or developing new code what is
>> important about the deployment is to think about what happens if I
>> run
>> this bit of code twice.  There are a few common items that come up
>> around idempotency. Things like execs in puppet-tripleo should be
>> refreshonly or use unless/onlyif to prevent running again if
>> unnecessary.  Additionally in the TripleO configuration it's
>> important
>> to understand in which step a service is configured and if it
>> possibly
>> would get deconfigured in another step.  For example, we configure
>> apache and some wsgi services in step 3. But we currently configure
>> some additional wsgi openstack services in step 4 which is resulting
>> in excessive httpd restarts and possible service unavailability[5]
>> when updates are applied.
>>
>> Another important place to understand this concept is in upgrades
>> where we currently allow for ansible tasks to be used. These should
>> result in an idempotent action when puppet is subsequently run which
>> means that the two bits of code essentially need to result in the
>> same
>> configuration. For example in the nova-api upgrades for Newton to
>> Ocata we needed to run the same commands[6] that would later be run
>> by
>> puppet to prevent clashing configurations and possible idempotency
>> problems.
>>
>> Idempotency issues can cause service disruptions, longer deployment
>> times for end users, or even possible misconfigurations.  I think it
>> might be beneficial to add an idempotency periodic job that is
>> basically a double run of the deployment process to ensure no service
>> or configuration changes on the second run. Thoughts?  Ideally one in
>> the gate would be awesome but I think it would take to long to be
>> feasible with all the other jobs we currently run.
>
> How would we verify that services aren't getting changed/restarted
> even? Checking process runtimes perhaps or something?
>

So from deployment standpoint we can check the steps when you run a
deployment twice to ensure that there are no changes in the output
from the puppet steps.  So at a minimum we could deploy, run an update
and analyze the logs from the update to ensure there were no items.
In the past I've done this[0] by capturing the last run summary from
puppet and checking to make sure nothing was changed.


> If you used the multinode jobs or perhaps the new undercloud_deploy
> installer (single node) it might be feasible to add this into the gate.
> I would avoid adding this to the OVB queue as it is already too full
> and we can probably gain the coverage we need without that type of
> testing.
>

I wouldn't necessarily start with as a gating action as I think a
basic periodic job might be sufficient.

Thanks,
-Alex

[0] https://review.openstack.org/#/c/279271/9/fuelweb_test/helpers/astute_log_parser.py@212

> Dan
>
>>
>> Thanks,
>> -Alex
>>
>> [0] http://binford2k.com/content/2015/10/idempotence-not-just-big-sca
>> ry-word
>> [1] https://bugs.launchpad.net/tripleo/+bug/1664650
>> [2] https://bugs.launchpad.net/puppet-nova/+bug/1665443
>> [3] https://bugs.launchpad.net/tripleo/+bug/1665405
>> [4] https://bugs.launchpad.net/tripleo/+bug/1665426
>> [5] https://review.openstack.org/#/c/434016/
>> [6] https://review.openstack.org/#/c/405241/
>>
>> _____________________________________________________________________
>> _____
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubs
>> cribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



More information about the OpenStack-dev mailing list