[openstack-dev] [oslo.messaging] State wrapping in the MessageHandlingServer
Joshua Harlow
harlowja at fastmail.com
Wed Nov 11 19:05:21 UTC 2015
Matthew Booth wrote:
> On Tue, Nov 10, 2015 at 6:46 PM, Joshua Harlow <harlowja at fastmail.com
> <mailto:harlowja at fastmail.com>> wrote:
>
> Matthew Booth wrote:
>
> My patch to MessageHandlingServer is currently being reverted
> because it
> broke Nova tests:
>
> https://review.openstack.org/#/c/235347/
>
> Specifically it causes a number of tests to take a very long time to
> execute, which ultimately results in the total build time limit
> being
> exceeded. This is very easy to replicate. The
> test
> nova.tests.functional.test_server_group.ServerGroupTest.test_boot_servers_with_affinity
> is an example test which will always hit this issue. The problem is
> that ServerGroupTest.setUp() does:
>
> self.compute2 = self.start_service('compute',
> host='host2')
> self.addCleanup(self.compute2.kill)
>
> The problem with this is that start_service() adds a fixture
> which also
> adds kill as a cleanup method. kill does stop(), wait(). This
> means that
> the resulting call order is: start, stop, wait, stop, wait. The
> redundant call to kill is obviously a wart, but I feel we should
> have
> handled it anyway.
>
> The problem is that we decided it should be possible to restart a
> server. There are some unit tests in oslo.messaging that do
> this. It's
> not clear to me that there are any projects which do this, but after
> this experience I feel like it would be good to check before
> changing it :)
>
> The implication of that is that after wait() the state wraps,
> and we're
> now waiting on start() again. Consequently, when the second
> cleanup call
> hangs.
>
> We could fix Nova (at least the usage we have seen) by removing the
> wrapping. After wait() if you want to start a server again you
> need to
> create a new one.
>
> So, to be specific, lets consider the following 2 call sequences:
>
> 1. start stop wait stop wait
> 2. start stop wait start stop wait
>
> What should they do? The behaviours with and without wrapping are:
>
> 1. start stop wait stop wait
> WRAP: start stop wait HANG HANG
> NO WRAP: start stop wait NO-OP NO-OP
>
> 2. start stop wait start stop wait
> WRAP: start stop wait start stop wait
> NO WRAP: start stop wait NO-OP NO-OP NO-OP
>
> I'll refresh my memory on what they did before my change in the
> morning.
> Perhaps it might be simpler to codify the current behaviour, but
> iirc I
> proposed this because it was previously undefined due to races.
>
>
> I personally prefer not allowing restarting, its needless code
> complexity imho and a feature that people imho probably aren't using
> anyway (just create a new server object if u are doing this), so I'd
> be fine with doing the above NO WRAP and turning those into NO-OPs
> (and for example raising a runtime error in the case of start stop
> wait start ... to denote that restarting isn't
> recommended/possible). If we have a strong enough reason to really
> to start stop wait start ...
>
> I might be convinced the code complexity is worth it but for now I'm
> not convinced...
>
>
> I agree, and in the hopefully unlikely event that we did break anybody,
> at least they would get an obvious exception rather than a hang. A
> lesson from breaking nova was that the log messages were generated and
> were available in the failed test runs, but nobody noticed them.
>
> Incidentally, I think I'd also merge my second patch into the first
> before resubmitting, which adds timeouts and the option not to log.
>
+1
Makes sense to me.
IMHO we can remove the log output later if we determine it's to noisy
for folks (and/or not helping)...
> Matt
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
More information about the OpenStack-dev
mailing list