<div dir="ltr">><span style="font-size:12.8px">The neutron story is mixed on accessable upgrade, because at least in </span><span style="font-size:12.8px">some cases, like ovs, upgrade might trigger a network tear down / </span><span style="font-size:12.8px">rebuild that generates an outage (though typically a pretty small one).</span><div><span style="font-size:12.8px"><br></span></div><div><span style="font-size:12.8px">This shouldn't happen. If it does it should be reported as a bug. All existing OVS flows are left in place during agent initialization and we don't get rid of the old ones until the agent finishes setting up the new ones.</span></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Oct 5, 2017 at 4:42 AM, Sean Dague <span dir="ltr"><<a href="mailto:sean@dague.net" target="_blank">sean@dague.net</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5">On 10/05/2017 07:08 AM, Graham Hayes wrote:<br>
><br>
><br>
> On Thu, 5 Oct 2017, at 09:50, Thierry Carrez wrote:<br>
>> Matt Riedemann wrote:<br>
>>> What's the difference between this tag and the zero-impact-upgrades tag?<br>
>>> I guess the accessible one is, can a user still ssh into their VM while<br>
>>> the nova compute service is being upgraded. The zero-impact-upgrade one<br>
>>> is more to do with performance degradation during an upgrade. I'm not<br>
>>> entirely sure what that might look like, probably need operator input.<br>
>>> For example, while upgrading, you're live migrating VMs all over the<br>
>>> place which is putting extra strain on the network.<br>
>><br>
>> The zero-impact-upgrade tag means no API downtime and no measurable<br>
>> impact on performance, while the accessible-upgrade means that while<br>
>> there can be API downtime, the resources provisioned are still<br>
>> accessible (you can use the VM even if nova-api is down).<br>
>><br>
>> I still think we have too many of those upgrade tags, and amount of<br>
>> information they provide does not compensate the confusion they create.<br>
>> If you're not clear on what they mean, imagine a new user looking at the<br>
>> Software Navigator...<br>
>><br>
>> In particular, we created two paths in the graph:<br>
>> * upgrade < accessible-upgrade<br>
>> * upgrade < rolling-upgrade < zero-downtime < zero-impact<br>
>><br>
>> I personally would get rid of zero-impact (not sure there is that much<br>
>> additional information it conveys beyond zero-downtime).<br>
>><br>
>> If we could make the requirements of accessible-upgrade a part of<br>
>> rolling-upgrade, that would also help (single path in the graph, only 3<br>
>> "levels"). Is there any of the current rolling-upgrade things (cinder,<br>
>> neutron, nova, swift) that would not qualify for accessible-upgrade as<br>
>> well ?<br>
><br>
> Well, there is projects (like designate) that qualify for accessible<br>
> upgrade, but not rolling upgrade.<br>
<br>
</div></div>The neutron story is mixed on accessable upgrade, because at least in<br>
some cases, like ovs, upgrade might trigger a network tear down /<br>
rebuild that generates an outage (though typically a pretty small one).<br>
<br>
I still think it's hard to describe to folks what is going on without<br>
pictures. And the tag structure might just be the wrong way to describe<br>
the world, because they are a set of positive assertions, and upgrade<br>
expectations are really about: "how terrible will this be".<br>
<br>
If I was an operator the questions I might have is:<br>
<br>
1) Really basic, will my db roll forward?<br>
<br>
2) When my db rolls forward, is it going to take a giant table lock that<br>
is effectively an outage?<br>
<br>
3) Is whatever date I created, computes, networks going to stay up when<br>
I do all this? (i.e. no customer workload interuption)<br>
<br>
4) If the service is more than 1 process, can they arbitrarily work with<br>
N-1 so I won't have a closet outage when services restart.<br>
<br>
5) If the service runs on more than 1 host, can I mix host levels, or<br>
will there be an outage as I upgrade nodes<br>
<br>
6) If the service talks to other openstack services, is there a strict<br>
version lock in which means I've got to coordinate with those for<br>
upgrade? If so, what order is that and is it clear?<br>
<br>
7) Can I seamlessly hide my API upgrade behind HA-Proxy / Istio / (or<br>
similar) so that there is no API service interruption<br>
<br>
8) Is there any substantial degradation in running "mixed mode" even if<br>
it's supported, so that I know whether I can do this over a longer<br>
window of time when time permits<br>
<br>
9) What level of validation exists to ensure that any of these "should<br>
work" do work?<br>
<br>
<br>
The tags were really built around grouping a few of these, but even with<br>
folks that are near the problem, they got confusing quick. I really<br>
think that some more pictoral upgrade safety cards or something<br>
explaining the things you need to consider, and what parts projects<br>
handle for you would be really useful. And then revisit whatever the<br>
tagging structure is going to be later.<br>
<span class="HOEnZb"><font color="#888888"><br>
<br>
-Sean<br>
<br>
--<br>
Sean Dague<br>
<a href="http://dague.net" rel="noreferrer" target="_blank">http://dague.net</a><br>
</font></span><div class="HOEnZb"><div class="h5"><br>
______________________________<wbr>______________________________<wbr>______________<br>
OpenStack Development Mailing List (not for usage questions)<br>
Unsubscribe: <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" rel="noreferrer" target="_blank">OpenStack-dev-request@lists.<wbr>openstack.org?subject:<wbr>unsubscribe</a><br>
<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" rel="noreferrer" target="_blank">http://lists.openstack.org/<wbr>cgi-bin/mailman/listinfo/<wbr>openstack-dev</a><br>
</div></div></blockquote></div><br></div>