[openstack-dev] [oslo.messaging] State wrapping in the MessageHandlingServer

Joshua Harlow harlowja at fastmail.com
Wed Nov 11 19:05:21 UTC 2015


Matthew Booth wrote:
> On Tue, Nov 10, 2015 at 6:46 PM, Joshua Harlow <harlowja at fastmail.com
> <mailto:harlowja at fastmail.com>> wrote:
>
>     Matthew Booth wrote:
>
>         My patch to MessageHandlingServer is currently being reverted
>         because it
>         broke Nova tests:
>
>         https://review.openstack.org/#/c/235347/
>
>         Specifically it causes a number of tests to take a very long time to
>         execute, which ultimately results in the total build time limit
>         being
>         exceeded. This is very easy to replicate. The
>         test
>         nova.tests.functional.test_server_group.ServerGroupTest.test_boot_servers_with_affinity
>         is an example test which will always hit this issue. The problem is
>         that ServerGroupTest.setUp() does:
>
>                   self.compute2 = self.start_service('compute',
>         host='host2')
>                   self.addCleanup(self.compute2.kill)
>
>         The problem with this is that start_service() adds a fixture
>         which also
>         adds kill as a cleanup method. kill does stop(), wait(). This
>         means that
>         the resulting call order is: start, stop, wait, stop, wait. The
>         redundant call to kill is obviously a wart, but I feel we should
>         have
>         handled it anyway.
>
>         The problem is that we decided it should be possible to restart a
>         server. There are some unit tests in oslo.messaging that do
>         this. It's
>         not clear to me that there are any projects which do this, but after
>         this experience I feel like it would be good to check before
>         changing it :)
>
>         The implication of that is that after wait() the state wraps,
>         and we're
>         now waiting on start() again. Consequently, when the second
>         cleanup call
>         hangs.
>
>         We could fix Nova (at least the usage we have seen) by removing the
>         wrapping. After wait() if you want to start a server again you
>         need to
>         create a new one.
>
>         So, to be specific, lets consider the following 2 call sequences:
>
>         1. start stop wait stop wait
>         2. start stop wait start stop wait
>
>         What should they do? The behaviours with and without wrapping are:
>
>         1. start stop wait stop wait
>         WRAP: start stop wait HANG HANG
>         NO WRAP: start stop wait NO-OP NO-OP
>
>         2. start stop wait start stop wait
>         WRAP: start stop wait start stop wait
>         NO WRAP: start stop wait NO-OP NO-OP NO-OP
>
>         I'll refresh my memory on what they did before my change in the
>         morning.
>         Perhaps it might be simpler to codify the current behaviour, but
>         iirc I
>         proposed this because it was previously undefined due to races.
>
>
>     I personally prefer not allowing restarting, its needless code
>     complexity imho and a feature that people imho probably aren't using
>     anyway (just create a new server object if u are doing this), so I'd
>     be fine with doing the above NO WRAP and turning those into NO-OPs
>     (and for example raising a runtime error in the case of start stop
>     wait start ... to denote that restarting isn't
>     recommended/possible). If we have a strong enough reason to really
>     to start stop wait start ...
>
>     I might be convinced the code complexity is worth it but for now I'm
>     not convinced...
>
>
> I agree, and in the hopefully unlikely event that we did break anybody,
> at least they would get an obvious exception rather than a hang. A
> lesson from breaking nova was that the log messages were generated and
> were available in the failed test runs, but nobody noticed them.
>
> Incidentally, I think I'd also merge my second patch into the first
> before resubmitting, which adds timeouts and the option not to log.
>

+1

Makes sense to me.

IMHO we can remove the log output later if we determine it's to noisy 
for folks (and/or not helping)...

> Matt
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



More information about the OpenStack-dev mailing list