<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Tue, Nov 10, 2015 at 6:46 PM, Joshua Harlow <span dir="ltr"><<a href="mailto:harlowja@fastmail.com" target="_blank">harlowja@fastmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5">Matthew Booth wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

My patch to MessageHandlingServer is currently being reverted because it<br>

broke Nova tests:<br>

<br>

<a href="https://review.openstack.org/#/c/235347/" rel="noreferrer" target="_blank">https://review.openstack.org/#/c/235347/</a><br>

<br>

Specifically it causes a number of tests to take a very long time to<br>

execute, which ultimately results in the total build time limit being<br>

exceeded. This is very easy to replicate. The<br>

test nova.tests.functional.test_server_group.ServerGroupTest.test_boot_servers_with_affinity<br>

is an example test which will always hit this issue. The problem is<br>

that ServerGroupTest.setUp() does:<br>

<br>

         self.compute2 = self.start_service('compute', host='host2')<br>

         self.addCleanup(self.compute2.kill)<br>

<br>

The problem with this is that start_service() adds a fixture which also<br>

adds kill as a cleanup method. kill does stop(), wait(). This means that<br>

the resulting call order is: start, stop, wait, stop, wait. The<br>

redundant call to kill is obviously a wart, but I feel we should have<br>

handled it anyway.<br>

<br>

The problem is that we decided it should be possible to restart a<br>

server. There are some unit tests in oslo.messaging that do this. It's<br>

not clear to me that there are any projects which do this, but after<br>

this experience I feel like it would be good to check before changing it :)<br>

<br>

The implication of that is that after wait() the state wraps, and we're<br>

now waiting on start() again. Consequently, when the second cleanup call<br>

hangs.<br>

<br>

We could fix Nova (at least the usage we have seen) by removing the<br>

wrapping. After wait() if you want to start a server again you need to<br>

create a new one.<br>

<br>

So, to be specific, lets consider the following 2 call sequences:<br>

<br>

1. start stop wait stop wait<br>

2. start stop wait start stop wait<br>

<br>

What should they do? The behaviours with and without wrapping are:<br>

<br>

1. start stop wait stop wait<br>

WRAP: start stop wait HANG HANG<br>

NO WRAP: start stop wait NO-OP NO-OP<br>

<br>

2. start stop wait start stop wait<br>

WRAP: start stop wait start stop wait<br>

NO WRAP: start stop wait NO-OP NO-OP NO-OP<br>

<br>

I'll refresh my memory on what they did before my change in the morning.<br>

Perhaps it might be simpler to codify the current behaviour, but iirc I<br>

proposed this because it was previously undefined due to races.<br>

</blockquote>

<br></div></div>

I personally prefer not allowing restarting, its needless code complexity imho and a feature that people imho probably aren't using anyway (just create a new server object if u are doing this), so I'd be fine with doing the above NO WRAP and turning those into NO-OPs (and for example raising a runtime error in the case of start stop wait start ... to denote that restarting isn't recommended/possible). If we have a strong enough reason to really to start stop wait start ...<br>

<br>

I might be convinced the code complexity is worth it but for now I'm not convinced...<br></blockquote><div><br></div><div>I agree, and in the hopefully unlikely event that we did break anybody, at least they would get an obvious exception rather than a hang. A lesson from breaking nova was that the log messages were generated and were available in the failed test runs, but nobody noticed them.</div><div><br></div><div>Incidentally, I think I'd also merge my second patch into the first before resubmitting, which adds timeouts and the option not to log.</div><div><br></div><div>Matt</div></div></div></div>