[Openstack-operators] Maintenance

Jay Pipes jaypipes at gmail.com
Fri Apr 22 19:47:11 UTC 2016


On 04/14/2016 05:14 AM, Juvonen, Tomi (Nokia - FI/Espoo) wrote:
<snip>
> As admin I want to know when host is ready to actions to be done by admin
> during the maintenance. Meaning physical resources are emptied.

You are equating "host maintenance mode" with the end result of a call 
to `nova host-evacuate-live`. The two are not the same.

"host maintenance mode" typically just refers to taking a Nova compute 
node out of consideration for placing new workloads on that compute 
node. Putting a Nova compute node into host maintenance mode is as 
simple as calling `nova service-disable $hostname nova-compute`.

Depending on what you need to perform on the compute node that is in 
host maintenance mode, you *may* want to migrate the workloads from that 
compute node to some other compute node that isn't in host maintenance 
mode. The `nova host-evacuate $hostname` and `nova host-evacuate-live 
$hostname` commands in the Nova CLI [1] can be used to migrate or 
live-migrate all workloads off the target compute node.

Live migration will reduce the disruption that tenant workloads (data 
plane) experience during the workload migration. However, research at 
Mirantis has shown that libvirt/KVM/QEMU live migration performed 
against workloads with even a medium rate of memory page dirtying can 
easily never complete. Solutions like auto-converge and xbzrle 
compression have minimal effect on this, unfortunately. Pausing a 
workload manually is typically what is done to force the live migration 
to complete.

[1] Note that these are commands in the Nova CLI tool 
(python-novaclient). Neither a host-evacuate nor a host-evacuate-live 
REST API call exists in the Compute API. This fact alone should suggest 
to folks that the appropriate place to put logic associated with 
performing host maintenance tasks should be *outside* of Nova entirely...

> As owner of a server I want to prepare for maintenance to minimize downtime,
> keep capacity on needed level and switch HA service to server not
> affected by maintenance.

This isn't an appropriate use case, IMHO. HA control planes should, by 
their very nature, be established across various failure domains. The 
whole *point* of having an HA service is so that you don't need to 
"prepare" for some maintenance event (planned or unplanned).

All HA control planes worth their salt will be able to notify some 
external listener of a partition in the cluster. This HA control plane 
is the responsibility of the tenant, not the infrastructure (i.e. Nova). 
I really do not want to add coupling between infrastructure control 
plane services and tenant control plane services.

> As owner of a server I want to know when my servers will be down because of
> host maintenance as it might be servers are not moved to another host.

See above. As an owner of a server involved in an HA cluster, it is *the 
server owner's* responsibility to set things up so that the cluster 
rebalances, handles redirected load, or does the custom thing that they 
want. This isn't, IMHO, the domain of the NVFi but rather a much 
higher-level NFVO orchestration layer.

> As owner of a server I want to know if host is to be totally removed, so
> instead of keeping my servers on host during maintenance, I want to move
> them to somewhere else.

This isn't something the owner of a server even knows about in a cloud 
environment. Owners of a server don't (and shouldn't) know which compute 
node they are, nor should they know that a host is having a planned or 
unplanned host maintenance event.

The infrastructure owner (cloud deployer/operator) is responsible for 
doing the needful and performing a [live] migration of workloads off of 
a failing host or a host that is undergoing a cold upgrade. The tenant 
doesn't know anything about these things, and shouldn't.

> As owner of a server I want to send acknowledgement to be ready for host
> maintenance and I want to state if servers are to be moved or kept on host.

This is describing some virtual inventory management or CMDB 
functionality that isn't in scope for infrastructure services like Nova. 
Perhaps it's worth looking into how something like Remedy can manage 
your virtual inventory in this manner, but I don't see this being in the 
OpenStack realm really...

FWIW, this is the same objection I had to Tacker joining the OpenStack 
Big Tent. It is essentially a monolithic, purpose-built-for-Telco 
application that orchestrates VNFs at layers way above the OpenStack 
deployment.

Best,
-jay

> Removal and creating of server is in owner's control already. Optionally
> server
> Configuration data could hold information about automatic actions to be
> done
> when host is going down unexpectedly or in controlled manner. Also
> actions at
> the same if down permanently or only temporarily. Still this needs
> acknowledgement from server owner as he needs time for application level
> controlled HA service switchover.
> Br,
> Tomi
>
>
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>



More information about the OpenStack-operators mailing list