[openstack-dev] Adding a clean shutdown for stop/delete breaks Jenkins

Sean Dague sean at dague.net
Mon Jul 8 12:54:36 UTC 2013


On 07/03/2013 01:08 PM, David Kranz wrote:
> On 07/03/2013 12:30 PM, Day, Phil wrote:
>>
>> Hi Folks,
>>
>> I have a change submitted which adds the same clean shutdown logic to
>> stop and delete that exists for soft reboot – the rational being that
>> its always better to give a VM a chance to shutdown cleanly if
>> possible even if you’re about to delete it as sometimes other parts of
>> the application expect this, and if its booted from a volume you want
>> to leave the guest file system in a tidy state.
>>
>> https://review.openstack.org/#/c/35303/
>>
>> However setting the default value to 120 seconds (as per soft reboot)
>> causes the Jenkins gate jobs to blow the 3 hour limit.   This seems to
>> be just a gradual accumulation of extra time rather than any one test
>> running much longer.
>>
>> So options would seem to be:
>>
>> i)Make the default wait time much shorter so that Jenkins runs OK
>> (tries this with 10 seconds and it works fine), and assume that users
>> will configure it to a more realistic value.
>>
>> ii)Keep the default at 120 seconds, but make the Jenkins jobs use a
>> specific configuration setting (is this possible, and iof so can
>> someone point me at where to make the change) ?
>>
>> iii)Increase the time allowed for Jenkins
>>
>> iv)The ever popular something else …
>>
>> Thought please.
>>
>> Cheers,
>>
>> Phil
>>
> The fact that changing the timeout changes gate time means the code is
> actually hitting the timeout. Is that expected?
> Shutdown is now relying on the guest responding to acpi. Is that what we
> want? Tempest uses a specialized image and I'm not sure how it is set up
> in this regard. In any event I don't think we want to add any more time
> to server delete when running in the gate.
>
> I'm also a little concerned that this seems to be a significant behavior
> change when using vms that behave like the ones in the gate. In reboot
> this is handled by having soft/hard options of course.

I think that's a good question, do we know that cirros actually responds 
to acpi shutdown?

I'm also a bit more ok with this on the soft_reboot path (which makes 
total sense to me) than the power_off path (which today is a hard kill), 
and putting this in destroy just seems wrong to me. It does seem to 
change the semantics quite a bit for a stable API.

For HA fencing it's really important to have a way that we can still 
immediately kill a guest, dead, right now, so that if it has access to 
shared resources it can't damage them when we want to give them to a 
different guest.

	-Sean

-- 
Sean Dague
http://dague.net



More information about the OpenStack-dev mailing list