[openstack-dev] [nova] periodic task

Gary Kotton gkotton at vmware.com
Tue Aug 25 15:03:32 UTC 2015



On 8/25/15, 7:04 AM, "Matt Riedemann" <mriedem at linux.vnet.ibm.com> wrote:

>
>
>On 8/24/2015 9:32 PM, Gary Kotton wrote:
>> In item #2 below the reboot is down via the guest and not the nova
>>api¹s :)
>>
>> From: Gary Kotton <gkotton at vmware.com <mailto:gkotton at vmware.com>>
>> Reply-To: OpenStack List <openstack-dev at lists.openstack.org
>> <mailto:openstack-dev at lists.openstack.org>>
>> Date: Monday, August 24, 2015 at 7:18 PM
>> To: OpenStack List <openstack-dev at lists.openstack.org
>> <mailto:openstack-dev at lists.openstack.org>>
>> Subject: [openstack-dev] [nova] periodic task
>>
>> Hi,
>> A couple of months ago I posted a patch for bug
>> https://launchpad.net/bugs/1463688. The issue is as follows: the
>> periodic task detects that the instance state does not match the state
>> on the hypervisor and it shuts down the running VM. There are a number
>> of ways that this may happen and I will try and explain:
>>
>>  1. Vmware driver example: a host where the instances are running goes
>>     down. This could be a power outage, host failure, etc. The first
>>     iteration of the perdioc task will determine that the actual
>>     instacne is down. This will update the state of the instance to
>>     DOWN. The VC has the ability to do HA and it will start the instance
>>     up and running again. The next iteration of the periodic task will
>>     determine that the instance is up and the compute manager will stop
>>     the instance.
>>  2. All drivers. The tenant decides to do a reboot of the instance and
>>     that coincides with the periodic task state validation. At this
>>     point in time the instance will not be up and the compute node will
>>     update the state of the instance as DWON. Next iteration the states
>>     will differ and the instance will be shutdown
>>
>> Basically the issue hit us with our CI and there was no CI running for a
>> couple of hours due to the fact that the compute node decided to
>> shutdown the running instances. The hypervisor should be the source of
>> truth and it should not be the compute node that decides to shutdown
>> instances. I posted a patch to deal with this
>> https://review.openstack.org/#/c/190047/. Which is the reason for this
>> mail. The patch is backwards compatible so that the existing deployments
>> and random shutdown continues as it works today and the admin now has an
>> ability just to do a log if there is a inconsistency.
>>
>> We do not want to disable the periodic task as knowing the current state
>> of the instance is very important and has a ton of value, we just do not
>> want the periodic to task to shut down a running instance.
>>
>> Thanks
>> Gary
>>
>>
>> 
>>_________________________________________________________________________
>>_
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: 
>>OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>
>In #2 the guest shouldn't be rebooted by the user (tenant) outside of
>the nova-api.  I'm not sure if it's actually formally documented in the
>nova documentation, but from what I've always heard/known, nova is the
>control plane and you should be doing everything with your instances via
>the nova-api.  If the user rebooted via nova-api, the task_state would
>be set and the periodic task would ignore the instance.

Matt, this is one case that I showed where the problem occurs. There are
others and I can invest time to see them. The fact that the periodic task
is there is important. What I don¹t understand is why having an option of
log indication for an admin is something that is not useful and instead we
are going with having the compute node shutdown instance when this should
not happen. Our infrastructure is behaving like cattle. That should not be
the case and the hypervisor should be the source of truth.

This is a serious issue and instances in production can and will go down.

>
>-- 
>
>Thanks,
>
>Matt Riedemann
>
>
>__________________________________________________________________________
>OpenStack Development Mailing List (not for usage questions)
>Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




More information about the OpenStack-dev mailing list