[openstack-dev] [ptl][tc] Accessible upgrade support
Kevin Benton
kevin at benton.pub
Fri Oct 6 23:23:33 UTC 2017
>The neutron story is mixed on accessable upgrade, because at least in some
cases, like ovs, upgrade might trigger a network tear down / rebuild that
generates an outage (though typically a pretty small one).
This shouldn't happen. If it does it should be reported as a bug. All
existing OVS flows are left in place during agent initialization and we
don't get rid of the old ones until the agent finishes setting up the new
ones.
On Thu, Oct 5, 2017 at 4:42 AM, Sean Dague <sean at dague.net> wrote:
> On 10/05/2017 07:08 AM, Graham Hayes wrote:
> >
> >
> > On Thu, 5 Oct 2017, at 09:50, Thierry Carrez wrote:
> >> Matt Riedemann wrote:
> >>> What's the difference between this tag and the zero-impact-upgrades
> tag?
> >>> I guess the accessible one is, can a user still ssh into their VM while
> >>> the nova compute service is being upgraded. The zero-impact-upgrade one
> >>> is more to do with performance degradation during an upgrade. I'm not
> >>> entirely sure what that might look like, probably need operator input.
> >>> For example, while upgrading, you're live migrating VMs all over the
> >>> place which is putting extra strain on the network.
> >>
> >> The zero-impact-upgrade tag means no API downtime and no measurable
> >> impact on performance, while the accessible-upgrade means that while
> >> there can be API downtime, the resources provisioned are still
> >> accessible (you can use the VM even if nova-api is down).
> >>
> >> I still think we have too many of those upgrade tags, and amount of
> >> information they provide does not compensate the confusion they create.
> >> If you're not clear on what they mean, imagine a new user looking at the
> >> Software Navigator...
> >>
> >> In particular, we created two paths in the graph:
> >> * upgrade < accessible-upgrade
> >> * upgrade < rolling-upgrade < zero-downtime < zero-impact
> >>
> >> I personally would get rid of zero-impact (not sure there is that much
> >> additional information it conveys beyond zero-downtime).
> >>
> >> If we could make the requirements of accessible-upgrade a part of
> >> rolling-upgrade, that would also help (single path in the graph, only 3
> >> "levels"). Is there any of the current rolling-upgrade things (cinder,
> >> neutron, nova, swift) that would not qualify for accessible-upgrade as
> >> well ?
> >
> > Well, there is projects (like designate) that qualify for accessible
> > upgrade, but not rolling upgrade.
>
> The neutron story is mixed on accessable upgrade, because at least in
> some cases, like ovs, upgrade might trigger a network tear down /
> rebuild that generates an outage (though typically a pretty small one).
>
> I still think it's hard to describe to folks what is going on without
> pictures. And the tag structure might just be the wrong way to describe
> the world, because they are a set of positive assertions, and upgrade
> expectations are really about: "how terrible will this be".
>
> If I was an operator the questions I might have is:
>
> 1) Really basic, will my db roll forward?
>
> 2) When my db rolls forward, is it going to take a giant table lock that
> is effectively an outage?
>
> 3) Is whatever date I created, computes, networks going to stay up when
> I do all this? (i.e. no customer workload interuption)
>
> 4) If the service is more than 1 process, can they arbitrarily work with
> N-1 so I won't have a closet outage when services restart.
>
> 5) If the service runs on more than 1 host, can I mix host levels, or
> will there be an outage as I upgrade nodes
>
> 6) If the service talks to other openstack services, is there a strict
> version lock in which means I've got to coordinate with those for
> upgrade? If so, what order is that and is it clear?
>
> 7) Can I seamlessly hide my API upgrade behind HA-Proxy / Istio / (or
> similar) so that there is no API service interruption
>
> 8) Is there any substantial degradation in running "mixed mode" even if
> it's supported, so that I know whether I can do this over a longer
> window of time when time permits
>
> 9) What level of validation exists to ensure that any of these "should
> work" do work?
>
>
> The tags were really built around grouping a few of these, but even with
> folks that are near the problem, they got confusing quick. I really
> think that some more pictoral upgrade safety cards or something
> explaining the things you need to consider, and what parts projects
> handle for you would be really useful. And then revisit whatever the
> tagging structure is going to be later.
>
>
> -Sean
>
> --
> Sean Dague
> http://dague.net
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20171006/d1afb1f5/attachment.html>
More information about the OpenStack-dev
mailing list