[openstack-dev] [tripleo] Validations before upgrades and updates

Florian Fuchs flfuchs at redhat.com
Tue May 16 13:28:56 UTC 2017

On Mon, May 15, 2017 at 6:27 PM, Steven Hardy <shardy at redhat.com> wrote:
> On Mon, May 08, 2017 at 02:45:08PM +0300, Marios Andreou wrote:
>>    Hi folks, after some discussion locally with colleagues about improving
>>    the upgrades experience, one of the items that came up was pre-upgrade and
>>    update validations. I took an AI to look at the current status of
>>    tripleo-validations [0] and posted a simple WIP [1] intended to be run
>>    before an undercloud update/upgrade and which just checks service status.
>>    It was pointed out by shardy that for such checks it is better to instead
>>    continue to use the per-service  manifests where possible like [2] for
>>    example where we check status before N..O major upgrade. There may still
>>    be some undercloud specific validations that we can land into the
>>    tripleo-validations repo (thinking about things like the neutron
>>    networks/ports, validating the current nova nodes state etc?).
>>    So do folks have any thoughts about this subject - for example the kinds
>>    of things we should be checking - Steve said he had some reviews in
>>    progress for collecting the overcloud ansible puppet/docker config into an
>>    ansible playbook that the operator can invoke for upgrade of the 'manual'
>>    nodes (for example compute in the N..O workflow) - the point being that we
>>    can add more per-service ansible validation tasks into the service
>>    manifests for execution when the play is run by the operator - but I'll
>>    let Steve point at and talk about those.Â
> Thanks for starting this thread Marios, sorry for the slow reply due to
> Summit etc.
> As we discussed, I think adding validations is great, but I'd prefer we
> kept any overcloud validations specific to services in t-h-t instead of
> trying to manage service specific things over multiple repos.
> This would also help with the idea of per-step validations I think, where
> e.g you could have a "is service active" test and run it after the step
> where we expect the service to start, a blueprint was raised a while back
> asking for exactly that:
> https://blueprints.launchpad.net/tripleo/+spec/step-by-step-validation
> One way we could achive this is to add ansible tasks that perform some
> validation after each step, where we combine the tasks for all services,
> similar to how we already do upgrade_tasks and host_prep_tasks:
> https://github.com/openstack/tripleo-heat-templates/blob/master/docker/services/database/redis.yaml#L92
> With the benefit of hindsight using ansible tags for upgrade_tasks wasn't
> the best approach, because you can't change the tags via SoftwareDeployment
> (e.g you need a SoftwareConfig per step), it's better if we either generate
> the list of tasks by merging maps e.g
>   validation_tasks:
>     step3:
>       - sometask
> Or via ansible conditionals where we pass a step value in to each run of
> the tasks:
>   validation_tasks:
>     - sometask
>       when: step == 3
> The latter approach is probably my preference, because it'll require less
> complex merging in the heat layer.
> As you mentioned, I've been working on ways to make the deployment steps
> more ansible driven, so having these tasks integrated with the t-h-t model
> would be well aligned with that I think:
> https://review.openstack.org/#/c/454816/
> https://review.openstack.org/#/c/462211/
> Happy to discuss further when you're ready to start integrating some
> overcloud validations.

Maybe these are two kinds of pre-upgrade validations that serve
different purposes.

The more general validations (like checking connectivity, making sure
the stack is in good shape, repos are available, etc.) should give
operators a fair amount of confidence that all basic prerequisites to
start an update are met *before* the upgrade is started. They could be
run from the UI or CLI and would fit well into the tripleo-validations
repo. Similar to the existing tripleo-validations, failures don't
prevent operators from doing something.

The service-specific validations otoh are closely tied to the upgrade
process and will stop further progress when failing. They are
fundamentally different to the tripleo-validations and could therefore
live in t-h-t.

I personally don't see why we shouldn't have pre-upgrade validations
both in tripleo-validations and in t-h-t, as long as we know which
ones go where. If everything that's tied to a specific overcloud
service or upgrade step goes into t-h-t, I could see these two groups
(using the validations suggested earlier in this thread):

- Undercloud service check
- Verify that the stack is in a *_COMPLETE state
- Verify undercloud disk space. For node replacement we recommended a
minimum of 10 GB free.
- Network/repo availability check (undercloud and overcloud)
- Verify we're at the latest version of the current release
- ...

- Pacemaker cluster health
- Ceph health
- APIs healthcheck (per overcloud service)
- Check Galera and Rabbit clusters and verify all nodes are up.
- Disabling stonith.
- ...

In theory I could imagine another variety of pre-upgrade validations:
Ones that are general in nature (not tied to an overcloud service),
but are specific to a particular version jump (so they would be run
before a N..O upgrade, but wouldn't make sense for an O..P jump).
These could still live in the tripleo-validations repo, but would only
exist as backports to the relevant "from"-version. But lacking a good
example, this is probably a bit academic for now. :-)

Any thoughts?


More information about the OpenStack-dev mailing list