[openstack-dev] [grenade] future direction on partial upgrade support

Joe Gordon joe.gordon0 at gmail.com
Fri Jun 26 22:54:48 UTC 2015


No

On Fri, Jun 26, 2015 at 10:15 AM, Joe Gordon <joe.gordon0 at gmail.com> wrote:

>
>
> On Wed, Jun 24, 2015 at 11:44 AM, Joe Gordon <joe.gordon0 at gmail.com>
> wrote:
>
>>
>>
>> On Wed, Jun 24, 2015 at 11:03 AM, Sean Dague <sean at dague.net> wrote:
>>
>>> On 06/24/2015 01:41 PM, Russell Bryant wrote:
>>> > On 06/24/2015 01:31 PM, Joe Gordon wrote:
>>> >>
>>> >>
>>> >> On Tue, Jun 16, 2015 at 9:58 AM, Sean Dague <sean at dague.net
>>> >> <mailto:sean at dague.net>> wrote:
>>> >>
>>> >>     Back when Nova first wanted to test partial upgrade, we did a
>>> bunch of
>>> >>     slightly odd conditionals inside of grenade and devstack to make
>>> it so
>>> >>     that if you were very careful, you could just not stop some of
>>> the old
>>> >>     services on a single node, upgrade everything else, and as long
>>> as the
>>> >>     old services didn't stop, they'd be running cached code in
>>> memory, and
>>> >>     it would look a bit like a 2 node worker not upgraded model. It
>>> worked,
>>> >>     but it was weird.
>>> >>
>>> >>     There has been some interest by the Nova team to expand what's
>>> not being
>>> >>     touched, as well as the Neutron team to add partial upgrade
>>> testing
>>> >>     support. Both are great initiatives, but I think going about it
>>> the old
>>> >>     way is going to add a lot of complexity in weird places, and not
>>> be as
>>> >>     good of a test as we really want.
>>> >>
>>> >>     Nodepool now supports allocating multiple nodes. We have a
>>> multinode job
>>> >>     in Nova regularly testing live migration using this.
>>> >>
>>> >>     If we slice this problem differently, I think we get a better
>>> >>     architecture, a much easier way to add new configs, and a much
>>> more
>>> >>     realistic end test.
>>> >>
>>> >>     Conceptually, use devstack-gate multinode support to set up 2
>>> nodes, an
>>> >>     all in one, and a worker. Let grenade upgrade the all in one,
>>> leave the
>>> >>     worker alone.
>>> >>
>>> >>     I think the only complexity here is the fact that grenade.sh
>>> implicitly
>>> >>     drives stack.sh. Which means one of:
>>> >>
>>> >>     1) devstack-gate could build the worker first, then run grenade.sh
>>> >>
>>> >>     2) we make it so grenade.sh can execute in parts more easily, so
>>> it can
>>> >>     hand something else running stack.sh for it.'
>>> >>
>>> >>     3) we make grenade understand the subnode for partial upgrade, so
>>> it
>>> >>     will run the stack phase on the subnode itself (given
>>> credentials).
>>> >>
>>> >>     This kind of approach means deciding which services you don't
>>> want to
>>> >>     upgrade doesn't require devstack changes, it's just a change of
>>> the
>>> >>     services on the worker.
>>> >>
>>> >>     We need a volunteer for taking this on, but I think all the
>>> follow on
>>> >>     partial upgrade support will be much much easier to do after we
>>> have
>>> >>     this kind of mechanism in place.
>>> >>
>>> >>
>>> >> I think this is a great approach for the future of partial upgrade
>>> >> support in grenade. I would like to point out step 0 here, is to get
>>> >> tempest passing consistently in multinode.
>>> >>
>>> >> Currently the neutron job is failing consistently, and nova-network
>>> >> fails roughly 10% of the time due
>>> >> to https://bugs.launchpad.net/nova/+bug/1462305
>>> >> and https://bugs.launchpad.net/nova/+bug/1445569
>>> >
>>> > If multi-node isn't reliable more generally yet, do you think the
>>> > simpler implementation of partial-upgrade testing could proceed?  I've
>>> > already done all of the patches to do it for Neutron.  That way we
>>> could
>>> > quickly get something in place to help block regressions and work on
>>> the
>>> > longer-term multinode refactoring without as much time pressure.
>>>
>>> The thing is, these partial service bits are sneaker than one realizes
>>> over time. There have been all kinds of edge conditions that crept up on
>>> the n-cpu one that are really subtle because code is running in memory
>>> on stale versions of dependencies which are no longer on disk. And the
>>> number of people that have this model in their head is basically down to
>>> a SPOF.
>>>
>>
>> I agree, As the author of the current multinode job it is definitely a
>> ugly hack (but one that has worked surprisingly well until now).
>>
>>
>>>
>>> The fact that neutron-grenade is at a 40% fail rate right now (and has
>>> been for over a week) is not preventing anyone from just rechecking to
>>> get past it. So I think assuming additional failing grenade tests are
>>> going to keep folks from landing bugs is probably not a good assumption.
>>> Making the whole path more complicated for other people to debug is an
>>> explosion waiting to happen.
>>>
>>> So I do want to take a hard line on doing this right, because the debt
>>> here is higher than you might think. The partial code was always very
>>> conceptually fragile, and fails in really funny ways some times, because
>>> of the fact that old is not isolated from new in a way that would be
>>> expected.
>>>
>>
>> Assuming the smoke jobs work, I don't think making grenade do mulitnode
>> should take very long. In which case we get a much more realistic upgrade
>> situation.
>>
>>
>
> Good news, it looks like both smoke jobs are working (ignoring failures
> from https://review.openstack.org/#/c/195748/).
>

So next step is to teach grenade to do multinode.


>
>
>>
>>> I -1ed the n-net partial upgrade changes for the same reason.
>>>
>>>         -Sean
>>>
>>> --
>>> Sean Dague
>>> http://dague.net
>>>
>>>
>>> __________________________________________________________________________
>>> OpenStack Development Mailing List (not for usage questions)
>>> Unsubscribe:
>>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20150626/1714aea6/attachment.html>


More information about the OpenStack-dev mailing list