[openstack-dev] [nova] periodic task

Matt Riedemann mriedem at linux.vnet.ibm.com
Tue Aug 25 14:04:14 UTC 2015



On 8/24/2015 9:32 PM, Gary Kotton wrote:
> In item #2 below the reboot is down via the guest and not the nova api’s :)
>
> From: Gary Kotton <gkotton at vmware.com <mailto:gkotton at vmware.com>>
> Reply-To: OpenStack List <openstack-dev at lists.openstack.org
> <mailto:openstack-dev at lists.openstack.org>>
> Date: Monday, August 24, 2015 at 7:18 PM
> To: OpenStack List <openstack-dev at lists.openstack.org
> <mailto:openstack-dev at lists.openstack.org>>
> Subject: [openstack-dev] [nova] periodic task
>
> Hi,
> A couple of months ago I posted a patch for bug
> https://launchpad.net/bugs/1463688. The issue is as follows: the
> periodic task detects that the instance state does not match the state
> on the hypervisor and it shuts down the running VM. There are a number
> of ways that this may happen and I will try and explain:
>
>  1. Vmware driver example: a host where the instances are running goes
>     down. This could be a power outage, host failure, etc. The first
>     iteration of the perdioc task will determine that the actual
>     instacne is down. This will update the state of the instance to
>     DOWN. The VC has the ability to do HA and it will start the instance
>     up and running again. The next iteration of the periodic task will
>     determine that the instance is up and the compute manager will stop
>     the instance.
>  2. All drivers. The tenant decides to do a reboot of the instance and
>     that coincides with the periodic task state validation. At this
>     point in time the instance will not be up and the compute node will
>     update the state of the instance as DWON. Next iteration the states
>     will differ and the instance will be shutdown
>
> Basically the issue hit us with our CI and there was no CI running for a
> couple of hours due to the fact that the compute node decided to
> shutdown the running instances. The hypervisor should be the source of
> truth and it should not be the compute node that decides to shutdown
> instances. I posted a patch to deal with this
> https://review.openstack.org/#/c/190047/. Which is the reason for this
> mail. The patch is backwards compatible so that the existing deployments
> and random shutdown continues as it works today and the admin now has an
> ability just to do a log if there is a inconsistency.
>
> We do not want to disable the periodic task as knowing the current state
> of the instance is very important and has a ton of value, we just do not
> want the periodic to task to shut down a running instance.
>
> Thanks
> Gary
>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>

In #2 the guest shouldn't be rebooted by the user (tenant) outside of 
the nova-api.  I'm not sure if it's actually formally documented in the 
nova documentation, but from what I've always heard/known, nova is the 
control plane and you should be doing everything with your instances via 
the nova-api.  If the user rebooted via nova-api, the task_state would 
be set and the periodic task would ignore the instance.

-- 

Thanks,

Matt Riedemann




More information about the OpenStack-dev mailing list