[openstack-dev] [oslo.messaging] State wrapping in the MessageHandlingServer

Matthew Booth mbooth at redhat.com
Tue Nov 10 18:31:27 UTC 2015


My patch to MessageHandlingServer is currently being reverted because it
broke Nova tests:

https://review.openstack.org/#/c/235347/

Specifically it causes a number of tests to take a very long time to
execute, which ultimately results in the total build time limit being
exceeded. This is very easy to replicate. The
test nova.tests.functional.test_server_group.ServerGroupTest.test_boot_servers_with_affinity
is an example test which will always hit this issue. The problem is
that ServerGroupTest.setUp() does:

        self.compute2 = self.start_service('compute', host='host2')
        self.addCleanup(self.compute2.kill)

The problem with this is that start_service() adds a fixture which also
adds kill as a cleanup method. kill does stop(), wait(). This means that
the resulting call order is: start, stop, wait, stop, wait. The redundant
call to kill is obviously a wart, but I feel we should have handled it
anyway.

The problem is that we decided it should be possible to restart a server.
There are some unit tests in oslo.messaging that do this. It's not clear to
me that there are any projects which do this, but after this experience I
feel like it would be good to check before changing it :)

The implication of that is that after wait() the state wraps, and we're now
waiting on start() again. Consequently, when the second cleanup call hangs.

We could fix Nova (at least the usage we have seen) by removing the
wrapping. After wait() if you want to start a server again you need to
create a new one.

So, to be specific, lets consider the following 2 call sequences:

1. start stop wait stop wait
2. start stop wait start stop wait

What should they do? The behaviours with and without wrapping are:

1. start stop wait stop wait
WRAP: start stop wait HANG HANG
NO WRAP: start stop wait NO-OP NO-OP

2. start stop wait start stop wait
WRAP: start stop wait start stop wait
NO WRAP: start stop wait NO-OP NO-OP NO-OP

I'll refresh my memory on what they did before my change in the morning.
Perhaps it might be simpler to codify the current behaviour, but iirc I
proposed this because it was previously undefined due to races.

Matt
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20151110/522e7db5/attachment.html>


More information about the OpenStack-dev mailing list