[openstack-dev] [tripleo] Validations before upgrades and updates

Marios Andreou mandreou at redhat.com
Wed May 17 06:42:34 UTC 2017


On Tue, May 16, 2017 at 4:28 PM, Florian Fuchs <flfuchs at redhat.com> wrote:

> On Mon, May 15, 2017 at 6:27 PM, Steven Hardy <shardy at redhat.com> wrote:
> > On Mon, May 08, 2017 at 02:45:08PM +0300, Marios Andreou wrote:
> >>    Hi folks, after some discussion locally with colleagues about
> improving
> >>    the upgrades experience, one of the items that came up was
> pre-upgrade and
> >>    update validations. I took an AI to look at the current status of
> >>    tripleo-validations [0] and posted a simple WIP [1] intended to be
> run
> >>    before an undercloud update/upgrade and which just checks service
> status.
> >>    It was pointed out by shardy that for such checks it is better to
> instead
> >>    continue to use the per-service  manifests where possible like [2]Â
> for
> >>    example where we check status before N..O major upgrade. There may
> still
> >>    be some undercloud specific validations that we can land into the
> >>    tripleo-validations repo (thinking about things like the neutron
> >>    networks/ports, validating the current nova nodes state etc?).
> >>    So do folks have any thoughts about this subject - for example the
> kinds
> >>    of things we should be checking - Steve said he had some reviews in
> >>    progress for collecting the overcloud ansible puppet/docker config
> into an
> >>    ansible playbook that the operator can invoke for upgrade of the
> 'manual'
> >>    nodes (for example compute in the N..O workflow) - the point being
> that we
> >>    can add more per-service ansible validation tasks into the service
> >>    manifests for execution when the play is run by the operator - but
> I'll
> >>    let Steve point at and talk about those.Â
> >
> > Thanks for starting this thread Marios, sorry for the slow reply due to
> > Summit etc.
> >
> > As we discussed, I think adding validations is great, but I'd prefer we
> > kept any overcloud validations specific to services in t-h-t instead of
> > trying to manage service specific things over multiple repos.
> >
> > This would also help with the idea of per-step validations I think, where
> > e.g you could have a "is service active" test and run it after the step
> > where we expect the service to start, a blueprint was raised a while back
> > asking for exactly that:
> >
> > https://blueprints.launchpad.net/tripleo/+spec/step-by-step-validation
> >
> > One way we could achive this is to add ansible tasks that perform some
> > validation after each step, where we combine the tasks for all services,
> > similar to how we already do upgrade_tasks and host_prep_tasks:
> >
> > https://github.com/openstack/tripleo-heat-templates/blob/
> master/docker/services/database/redis.yaml#L92
> >
> > With the benefit of hindsight using ansible tags for upgrade_tasks wasn't
> > the best approach, because you can't change the tags via
> SoftwareDeployment
> > (e.g you need a SoftwareConfig per step), it's better if we either
> generate
> > the list of tasks by merging maps e.g
> >
> >   validation_tasks:
> >     step3:
> >       - sometask
> >
> > Or via ansible conditionals where we pass a step value in to each run of
> > the tasks:
> >
> >   validation_tasks:
> >     - sometask
> >       when: step == 3
> >
> > The latter approach is probably my preference, because it'll require less
> > complex merging in the heat layer.
> >
> > As you mentioned, I've been working on ways to make the deployment steps
> > more ansible driven, so having these tasks integrated with the t-h-t
> model
> > would be well aligned with that I think:
> >
> > https://review.openstack.org/#/c/454816/
> >
> > https://review.openstack.org/#/c/462211/
> >
> > Happy to discuss further when you're ready to start integrating some
> > overcloud validations.
>
> Maybe these are two kinds of pre-upgrade validations that serve
> different purposes.
>
> The more general validations (like checking connectivity, making sure
> the stack is in good shape, repos are available, etc.) should give
> operators a fair amount of confidence that all basic prerequisites to
> start an update are met *before* the upgrade is started. They could be
> run from the UI or CLI and would fit well into the tripleo-validations
> repo. Similar to the existing tripleo-validations, failures don't
> prevent operators from doing something.
>
> The service-specific validations otoh are closely tied to the upgrade
> process and will stop further progress when failing.


yeah - you could also argue that the current overcloud service upgrade
validations (which just check 'is this service running OK' at step0 of the
upgrade) are also *pre* upgrade since we didn't do anything yet it is
literally step0. Note that as of upgrade to stable/ocata you can disable
these if you need to re-run the upgrade step for example so it doesn't fail
on the service checks.


> They are
> fundamentally different to the tripleo-validations and could therefore
> live in t-h-t.
>

ACK yeah this seems to be the general consensus forming here -
tripleo-validations for checking things especially on the undercloud
and continue to use the tht for the overcloud service validations. For many
reasons and especially since we get the benefit of 'auto' generated list of
services currently deployed per node etc etc so per service validation runs
only if service is deployed.

>
> I personally don't see why we shouldn't have pre-upgrade validations
> both in tripleo-validations and in t-h-t, as long as we know which
> ones go where. If everything that's tied to a specific overcloud
> service or upgrade step goes into t-h-t, I could see these two groups
> (using the validations suggested earlier in this thread):
>
> tripleo-validations:
> - Undercloud service check
> - Verify that the stack is in a *_COMPLETE state
> - Verify undercloud disk space. For node replacement we recommended a
> minimum of 10 GB free.
> - Network/repo availability check (undercloud and overcloud)
> - Verify we're at the latest version of the current release
> - ...
>
> tripleo-heat-templates:
> - Pacemaker cluster health
> - Ceph health
> - APIs healthcheck (per overcloud service)
> - Check Galera and Rabbit clusters and verify all nodes are up.
> - Disabling stonith.
> - ...
>

thanks these all seem like good things to be checking and the split seems
reasonable to me,



> In theory I could imagine another variety of pre-upgrade validations:
> Ones that are general in nature (not tied to an overcloud service),
>
but are specific to a particular version jump (so they would be run
> before a N..O upgrade, but wouldn't make sense for an O..P jump).
>

for sure, we have things like migrations for example service foo-api is
deprecated and instead the foo service is now served by apache and that
will happen only in a specific upgrade version



> These could still live in the tripleo-validations repo, but would only
> exist as backports to the relevant "from"-version. But lacking a good
> example, this is probably a bit academic for now. :-)
>
> Any thoughts?
>
> Thanks
> Florian
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20170517/8eccc3f1/attachment.html>


More information about the OpenStack-dev mailing list