[openstack-dev] Adding a clean shutdown for stop/delete breaks Jenkins

Vishvananda Ishaya vishvananda at gmail.com
Wed Jul 10 19:22:12 UTC 2013


IIRC acpi support was added to cirros quite a while ago. Are we using an up-to-date image?

Vish

On Jul 8, 2013, at 5:54 AM, Sean Dague <sean at dague.net> wrote:

> On 07/03/2013 01:08 PM, David Kranz wrote:
>> On 07/03/2013 12:30 PM, Day, Phil wrote:
>>> 
>>> Hi Folks,
>>> 
>>> I have a change submitted which adds the same clean shutdown logic to
>>> stop and delete that exists for soft reboot – the rational being that
>>> its always better to give a VM a chance to shutdown cleanly if
>>> possible even if you’re about to delete it as sometimes other parts of
>>> the application expect this, and if its booted from a volume you want
>>> to leave the guest file system in a tidy state.
>>> 
>>> https://review.openstack.org/#/c/35303/
>>> 
>>> However setting the default value to 120 seconds (as per soft reboot)
>>> causes the Jenkins gate jobs to blow the 3 hour limit.   This seems to
>>> be just a gradual accumulation of extra time rather than any one test
>>> running much longer.
>>> 
>>> So options would seem to be:
>>> 
>>> i)Make the default wait time much shorter so that Jenkins runs OK
>>> (tries this with 10 seconds and it works fine), and assume that users
>>> will configure it to a more realistic value.
>>> 
>>> ii)Keep the default at 120 seconds, but make the Jenkins jobs use a
>>> specific configuration setting (is this possible, and iof so can
>>> someone point me at where to make the change) ?
>>> 
>>> iii)Increase the time allowed for Jenkins
>>> 
>>> iv)The ever popular something else …
>>> 
>>> Thought please.
>>> 
>>> Cheers,
>>> 
>>> Phil
>>> 
>> The fact that changing the timeout changes gate time means the code is
>> actually hitting the timeout. Is that expected?
>> Shutdown is now relying on the guest responding to acpi. Is that what we
>> want? Tempest uses a specialized image and I'm not sure how it is set up
>> in this regard. In any event I don't think we want to add any more time
>> to server delete when running in the gate.
>> 
>> I'm also a little concerned that this seems to be a significant behavior
>> change when using vms that behave like the ones in the gate. In reboot
>> this is handled by having soft/hard options of course.
> 
> I think that's a good question, do we know that cirros actually responds to acpi shutdown?
> 
> I'm also a bit more ok with this on the soft_reboot path (which makes total sense to me) than the power_off path (which today is a hard kill), and putting this in destroy just seems wrong to me. It does seem to change the semantics quite a bit for a stable API.
> 
> For HA fencing it's really important to have a way that we can still immediately kill a guest, dead, right now, so that if it has access to shared resources it can't damage them when we want to give them to a different guest.
> 
> 	-Sean
> 
> -- 
> Sean Dague
> http://dague.net
> 
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




More information about the OpenStack-dev mailing list