[openstack-dev] [tripleo] [ci] Adding idempotency job on overcloud deployment.

Ben Nemec openstack at nemebean.com
Thu Jun 8 17:02:24 UTC 2017



On 06/08/2017 10:16 AM, Emilien Macchi wrote:
> On Thu, Jun 8, 2017 at 1:47 PM, Sofer Athlan-Guyot <sathlang at redhat.com> wrote:
>> Hi,
>>
>> Alex Schultz <aschultz at redhat.com> writes:
>>
>>> On Wed, Jun 7, 2017 at 5:20 AM, Sofer Athlan-Guyot <sathlang at redhat.com> wrote:
>>>> Hi,
>>>>
>>>> Emilien Macchi <emilien at redhat.com> writes:
>>>>
>>>>> On Wed, Jun 7, 2017 at 12:45 PM, Sofer Athlan-Guyot <sathlang at redhat.com> wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I don't think we have such a job in place.  Basically that would check
>>>>>> that re-running the "openstack deploy ..." command won't do anything.
>>
>> I've had a look at openstack-infra/tripleo-ci.  Should I test it in with
>> ovb/quickstart or tripleo.sh.  Both way are fine by me, but I may be
>> lacking context about which one is more relevant.
>>
>>>>>> We had such an error by the past[1], but I'm not sure this has been
>>>>>> captured by an associated job.
>>>>>>
>>>>>> WDYT ?
>>>>>
>>>>> It would be interesting to measure how much time does it take to run
>>>>> it again.
>>>>
>>>> Could you point out how such an experiment could be done ?
>>>>
>>>>> If it's short, we could add it to all our scenarios + ovb
>>>>> jobs.  If it's long, maybe we need an additional job, but it would
>>>>> take more resources, so maybe we could run it in periodic pipeline
>>>>> (note that periodic jobs are not optimal since we could break
>>>>> something quite easily).
>>>>
>>>> Just adding as context that the issue was already raised[1].  Beside
>>>> time constraint, it was pointed out that we would also need to parse the
>>>> log to find out if anything was restarted.  But it could be a second
>>>> step.  For parsing, this code was pointed out[2].
>>>>
>>>
>>> There's a few things that would need to be enabled in order to reuse
>>> some of this work.  We'll need to add the ability to generate a report
>>> on the puppet run[0]. And then we'll need to be able to capture it[1]
>>> somewhere that we could then use that parsing code on.  From there,
>>> just rerunning the installation would be a simple start to the
>>> idempotency check.  In fuel, we had hacked in a special flag[2] that
>>> we used in testing to actually rerun the task immediately to find when
>>> a specific task was not idempotent in addition to also rerunning the
>>> entire deployment. For tripleo a similar concept would be to rerun the
>>> steps twice but that's usually not where the issues crop us for us. So
>>> rerunning the entire installation deployment would be better as we
>>> tend to have issues with configuration items between steps
>>> conflicting.
>>
>> Maybe we could go with something equivalent to:
>>
>>   ts="$(date '+%F %T')"
>>   ... re-run deploy command ...
>>
>>   sudo journalctl --since="${ts}" | egrep 'Stopping|Starting' | grep -v 'user.*slice' > restarted.log
>>   wc -l restarted.log
>>
>> This should be 0 on every overcloud nodes.
>>
>> This is simpler to implement and should catch any unwanted service
>> restart.
>>
>> WDYT ?
>
> It's smart, for services. It doesn't cover configuration files changes
> and other resources managed by Puppet, like Keystone resources, etc.
> But it's an excellent start to me.

I just want to point out that the updates job is already doing this when 
it runs in every repo except tripleo-heat-templates (that's the only 
package we actually update in the updates job, every other project is a 
noop).  I can also tell you how long it takes to redo a deployment with 
no changes: just under 2000 seconds, or around 33 minutes.  At least 
that's the current average in tripleo-ci right now (although I see we 
just added around 100 seconds to the update time in the last day or two. 
*sigh*).

>
>>>
>>> Thanks,
>>> -Alex
>>>
>>> [0] https://review.openstack.org/#/c/273740/4/mcagents/puppetd.rb@204
>>> [1] https://review.openstack.org/#/c/273740/4/mcagents/puppetd.rb@102
>>> [2] https://review.openstack.org/#/c/273737/
>>>
>>>> [1] http://lists.openstack.org/pipermail/openstack-dev/2017-March/114836.html
>>>> [2] https://review.openstack.org/#/c/279271/9/fuelweb_test/helpers/astute_log_parser.py@212
>>>>
>>>>>
>>>>>> [1] https://bugs.launchpad.net/tripleo/+bug/1664650



More information about the OpenStack-dev mailing list