[openstack-dev] [grenade] future direction on partial upgrade support

Armando M. armamig at gmail.com
Mon Jul 6 18:54:45 UTC 2015


Hi,

Not sure if we reached any conclusion with this thread, and I would like to
resume it so that we don't derail the initial plan set forth by Russell and
agreed during the Liberty summit, among other things.

If I look at the thread I think this can be summarized as follow. Please
correct me if I am wrong:

   1. There is a desire for making Grenade more modular by relying on
   multi-node support. This is beneficial for all the projects that aim at
   testing partial upgrades.
   2. There are a number of steps required to achieve 1. The work required
   is not overly complicated, but it requires some discipline and good
   understanding of the overall OpenStack machine to get it to completion.
   3. Should this effort be given priority, it can impact stuff that is
   currently in flight, like the patches from Russell on Neutron partial
   upgrade, and Dan on improvements for nova-net upgrades.
   4. With minor tweaks single-node Grenade can be useful in the interim,
   while everything gets ported over a more robust multi-node Grenade job
   configuration.

Have we identified a volunteer for activity 1? For what I can tell, Joe was
kind to set the infra to start gathering data on the reliability of the
multi-node jobs, but they are clearly flaky [1], and currently broken. I
have seen nothing else. If I am mistaken, please fill me in.

Now, in terms of a resolution for this, would it be fair to say that until
we get 1) bootstrapped, Russell and Dan's efforts are a low-hanging fruit
worth taking? I would personally think so: after all patches [2,3,4] seem
trivial enough:

   - they don't add much complexity
   - they are fairly self-contained, and
   - can be easily swept away with the other grenade 'odd conditionals' in
   the context of 1.

Thoughts?

Thanks,
Armando

[1] http://goo.gl/NPkeZh
[2] https://review.openstack.org/#/q/topic:partial-neutron-upgrade,n,z
[3] https://review.openstack.org/#/q/topic:neutron-agent-control,n,z
[4] https://review.openstack.org/#/c/189478/
<https://review.openstack.org/#/c/189478/>

On 26 June 2015 at 15:54, Joe Gordon <joe.gordon0 at gmail.com> wrote:

> No
>
> On Fri, Jun 26, 2015 at 10:15 AM, Joe Gordon <joe.gordon0 at gmail.com>
> wrote:
>
>>
>>
>> On Wed, Jun 24, 2015 at 11:44 AM, Joe Gordon <joe.gordon0 at gmail.com>
>> wrote:
>>
>>>
>>>
>>> On Wed, Jun 24, 2015 at 11:03 AM, Sean Dague <sean at dague.net> wrote:
>>>
>>>> On 06/24/2015 01:41 PM, Russell Bryant wrote:
>>>> > On 06/24/2015 01:31 PM, Joe Gordon wrote:
>>>> >>
>>>> >>
>>>> >> On Tue, Jun 16, 2015 at 9:58 AM, Sean Dague <sean at dague.net
>>>> >> <mailto:sean at dague.net>> wrote:
>>>> >>
>>>> >>     Back when Nova first wanted to test partial upgrade, we did a
>>>> bunch of
>>>> >>     slightly odd conditionals inside of grenade and devstack to make
>>>> it so
>>>> >>     that if you were very careful, you could just not stop some of
>>>> the old
>>>> >>     services on a single node, upgrade everything else, and as long
>>>> as the
>>>> >>     old services didn't stop, they'd be running cached code in
>>>> memory, and
>>>> >>     it would look a bit like a 2 node worker not upgraded model. It
>>>> worked,
>>>> >>     but it was weird.
>>>> >>
>>>> >>     There has been some interest by the Nova team to expand what's
>>>> not being
>>>> >>     touched, as well as the Neutron team to add partial upgrade
>>>> testing
>>>> >>     support. Both are great initiatives, but I think going about it
>>>> the old
>>>> >>     way is going to add a lot of complexity in weird places, and not
>>>> be as
>>>> >>     good of a test as we really want.
>>>> >>
>>>> >>     Nodepool now supports allocating multiple nodes. We have a
>>>> multinode job
>>>> >>     in Nova regularly testing live migration using this.
>>>> >>
>>>> >>     If we slice this problem differently, I think we get a better
>>>> >>     architecture, a much easier way to add new configs, and a much
>>>> more
>>>> >>     realistic end test.
>>>> >>
>>>> >>     Conceptually, use devstack-gate multinode support to set up 2
>>>> nodes, an
>>>> >>     all in one, and a worker. Let grenade upgrade the all in one,
>>>> leave the
>>>> >>     worker alone.
>>>> >>
>>>> >>     I think the only complexity here is the fact that grenade.sh
>>>> implicitly
>>>> >>     drives stack.sh. Which means one of:
>>>> >>
>>>> >>     1) devstack-gate could build the worker first, then run
>>>> grenade.sh
>>>> >>
>>>> >>     2) we make it so grenade.sh can execute in parts more easily, so
>>>> it can
>>>> >>     hand something else running stack.sh for it.'
>>>> >>
>>>> >>     3) we make grenade understand the subnode for partial upgrade,
>>>> so it
>>>> >>     will run the stack phase on the subnode itself (given
>>>> credentials).
>>>> >>
>>>> >>     This kind of approach means deciding which services you don't
>>>> want to
>>>> >>     upgrade doesn't require devstack changes, it's just a change of
>>>> the
>>>> >>     services on the worker.
>>>> >>
>>>> >>     We need a volunteer for taking this on, but I think all the
>>>> follow on
>>>> >>     partial upgrade support will be much much easier to do after we
>>>> have
>>>> >>     this kind of mechanism in place.
>>>> >>
>>>> >>
>>>> >> I think this is a great approach for the future of partial upgrade
>>>> >> support in grenade. I would like to point out step 0 here, is to get
>>>> >> tempest passing consistently in multinode.
>>>> >>
>>>> >> Currently the neutron job is failing consistently, and nova-network
>>>> >> fails roughly 10% of the time due
>>>> >> to https://bugs.launchpad.net/nova/+bug/1462305
>>>> >> and https://bugs.launchpad.net/nova/+bug/1445569
>>>> >
>>>> > If multi-node isn't reliable more generally yet, do you think the
>>>> > simpler implementation of partial-upgrade testing could proceed?  I've
>>>> > already done all of the patches to do it for Neutron.  That way we
>>>> could
>>>> > quickly get something in place to help block regressions and work on
>>>> the
>>>> > longer-term multinode refactoring without as much time pressure.
>>>>
>>>> The thing is, these partial service bits are sneaker than one realizes
>>>> over time. There have been all kinds of edge conditions that crept up on
>>>> the n-cpu one that are really subtle because code is running in memory
>>>> on stale versions of dependencies which are no longer on disk. And the
>>>> number of people that have this model in their head is basically down to
>>>> a SPOF.
>>>>
>>>
>>> I agree, As the author of the current multinode job it is definitely a
>>> ugly hack (but one that has worked surprisingly well until now).
>>>
>>>
>>>>
>>>> The fact that neutron-grenade is at a 40% fail rate right now (and has
>>>> been for over a week) is not preventing anyone from just rechecking to
>>>> get past it. So I think assuming additional failing grenade tests are
>>>> going to keep folks from landing bugs is probably not a good assumption.
>>>> Making the whole path more complicated for other people to debug is an
>>>> explosion waiting to happen.
>>>>
>>>> So I do want to take a hard line on doing this right, because the debt
>>>> here is higher than you might think. The partial code was always very
>>>> conceptually fragile, and fails in really funny ways some times, because
>>>> of the fact that old is not isolated from new in a way that would be
>>>> expected.
>>>>
>>>
>>> Assuming the smoke jobs work, I don't think making grenade do mulitnode
>>> should take very long. In which case we get a much more realistic upgrade
>>> situation.
>>>
>>>
>>
>> Good news, it looks like both smoke jobs are working (ignoring failures
>> from https://review.openstack.org/#/c/195748/).
>>
>
> So next step is to teach grenade to do multinode.
>
>
>>
>>
>>>
>>>> I -1ed the n-net partial upgrade changes for the same reason.
>>>>
>>>>         -Sean
>>>>
>>>> --
>>>> Sean Dague
>>>> http://dague.net
>>>>
>>>>
>>>> __________________________________________________________________________
>>>> OpenStack Development Mailing List (not for usage questions)
>>>> Unsubscribe:
>>>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>>
>>>
>>>
>>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20150706/5b4781a2/attachment.html>


More information about the OpenStack-dev mailing list