[openstack-dev] [nova] periodic task
Matt Riedemann
mriedem at linux.vnet.ibm.com
Tue Aug 25 14:04:14 UTC 2015
On 8/24/2015 9:32 PM, Gary Kotton wrote:
> In item #2 below the reboot is down via the guest and not the nova api’s :)
>
> From: Gary Kotton <gkotton at vmware.com <mailto:gkotton at vmware.com>>
> Reply-To: OpenStack List <openstack-dev at lists.openstack.org
> <mailto:openstack-dev at lists.openstack.org>>
> Date: Monday, August 24, 2015 at 7:18 PM
> To: OpenStack List <openstack-dev at lists.openstack.org
> <mailto:openstack-dev at lists.openstack.org>>
> Subject: [openstack-dev] [nova] periodic task
>
> Hi,
> A couple of months ago I posted a patch for bug
> https://launchpad.net/bugs/1463688. The issue is as follows: the
> periodic task detects that the instance state does not match the state
> on the hypervisor and it shuts down the running VM. There are a number
> of ways that this may happen and I will try and explain:
>
> 1. Vmware driver example: a host where the instances are running goes
> down. This could be a power outage, host failure, etc. The first
> iteration of the perdioc task will determine that the actual
> instacne is down. This will update the state of the instance to
> DOWN. The VC has the ability to do HA and it will start the instance
> up and running again. The next iteration of the periodic task will
> determine that the instance is up and the compute manager will stop
> the instance.
> 2. All drivers. The tenant decides to do a reboot of the instance and
> that coincides with the periodic task state validation. At this
> point in time the instance will not be up and the compute node will
> update the state of the instance as DWON. Next iteration the states
> will differ and the instance will be shutdown
>
> Basically the issue hit us with our CI and there was no CI running for a
> couple of hours due to the fact that the compute node decided to
> shutdown the running instances. The hypervisor should be the source of
> truth and it should not be the compute node that decides to shutdown
> instances. I posted a patch to deal with this
> https://review.openstack.org/#/c/190047/. Which is the reason for this
> mail. The patch is backwards compatible so that the existing deployments
> and random shutdown continues as it works today and the admin now has an
> ability just to do a log if there is a inconsistency.
>
> We do not want to disable the periodic task as knowing the current state
> of the instance is very important and has a ton of value, we just do not
> want the periodic to task to shut down a running instance.
>
> Thanks
> Gary
>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
In #2 the guest shouldn't be rebooted by the user (tenant) outside of
the nova-api. I'm not sure if it's actually formally documented in the
nova documentation, but from what I've always heard/known, nova is the
control plane and you should be doing everything with your instances via
the nova-api. If the user rebooted via nova-api, the task_state would
be set and the periodic task would ignore the instance.
--
Thanks,
Matt Riedemann
More information about the OpenStack-dev
mailing list