[openstack-dev] [All] Maintenance mode in OpenStack during patching/upgrades

Joe Cropper cropper.joe at gmail.com
Sat Oct 18 04:21:05 UTC 2014


I’m glad to see this topic getting some focus once again.  :-)

From several of the administrators I talk with, when they think of putting a host into maintenance mode, the common requests I hear are:

1. Don’t schedule more VMs to the host
2. Provide an optional way to automatically migrate all (usually active) VMs off the host so that users’ workloads remain “unaffected” by the maintenance operation

#1 can easily be achieved, as has been mentioned several times, by simply disabling the compute service.  However, #2 involves a little more work, although certainly possible using all the operations provided by nova today (e.g., live migration, etc.).  I believe these types of discussions have come up several times over the past several OpenStack releases—certainly since Grizzly (i.e., when I started watching this space).

It seems that the general direction is to have the type of workflow needed for #2 outside of nova (which is certainly a valid stance).  To that end, it would be fairly straightforward to build some code that logically sits on top of nova, that when entering maintenance:

1. Prevents VMs from being scheduled to the host;
2. Maintains state about the maintenance operation (e.g., not in maintenance, migrations in progress, in maintenance, or error);
3. Provides mechanisms to, upon entering maintenance, dictates which VMs (active, all, none) to migrate and provides some throttling capabilities to prevent hundreds of parallel migrations on densely packed hosts (all done via a REST API).

If anyone has additional questions, comments, or would like to discuss some options, please let me know.  If interested, upon request, I could even share a video of how such cases might work.  :-)  My colleagues and I have given these use cases a lot of thought and consideration and I’d love to talk more about them (perhaps a small session in Paris would be possible).

- Joe

On Oct 17, 2014, at 4:18 AM, John Garbutt <john at johngarbutt.com> wrote:

> On 17 October 2014 02:28, Matt Riedemann <mriedem at linux.vnet.ibm.com> wrote:
>> 
>> 
>> On 10/16/2014 7:26 PM, Christopher Aedo wrote:
>>> 
>>> On Tue, Sep 9, 2014 at 2:19 PM, Mike Scherbakov
>>> <mscherbakov at mirantis.com> wrote:
>>>>> 
>>>>> On Tue, Sep 9, 2014 at 6:02 PM, Clint Byrum <clint at fewbar.com> wrote:
>>>> 
>>>> The idea is not simply deny or hang requests from clients, but provide
>>>> them
>>>> "we are in maintenance mode, retry in X seconds"
>>>> 
>>>>> You probably would want 'nova host-servers-migrate <host>'
>>>> 
>>>> yeah for migrations - but as far as I understand, it doesn't help with
>>>> disabling this host in scheduler - there is can be a chance that some
>>>> workloads will be scheduled to the host.
>>> 
>>> 
>>> Regarding putting a compute host in maintenance mode using "nova
>>> host-update --maintenance enable", it looks like the blueprint and
>>> associated commits were abandoned a year and a half ago:
>>> https://blueprints.launchpad.net/nova/+spec/host-maintenance
>>> 
>>> It seems that "nova service-disable <host> nova-compute" effectively
>>> prevents the scheduler from trying to send new work there.  Is this
>>> the best approach to use right now if you want to pull a compute host
>>> out of an environment before migrating VMs off?
>>> 
>>> I agree with Tim and Mike that having something respond "down for
>>> maintenance" rather than ignore or hang would be really valuable.  But
>>> it also looks like that hasn't gotten much traction in the past -
>>> anyone feel like they'd be in support of reviving the notion of
>>> "maintenance mode"?
>>> 
>>> -Christopher
>>> 
>>> _______________________________________________
>>> OpenStack-dev mailing list
>>> OpenStack-dev at lists.openstack.org
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>> 
>> 
>> host-maintenance-mode is definitely a thing in nova compute via the os-hosts
>> API extension and the --maintenance parameter, the compute manager code is
>> here [1].  The thing is the only in-tree virt driver that implements it is
>> xenapi, and I believe when you put the host in maintenance mode it's
>> supposed to automatically evacuate the instances to some other host, but you
>> can't target the other host or tell the driver, from the API, which
>> instances you want to evacuate, e.g. all, none, running only, etc.
>> 
>> [1]
>> http://git.openstack.org/cgit/openstack/nova/tree/nova/compute/manager.py?id=2014.2#n3990
> 
> We should certainly make that more generic. It doesn't update the VM
> state, so its really only admin focused in its current form.
> 
> The XenAPI logic only works when using XenServer pools with shared NFS
> storage, if my memory serves me correctly. Honestly, its a bit of code
> I have planned on removing, along with the rest of the pool support.
> 
> In terms of requiring DB downtime in Nova, the current efforts are
> focusing on avoiding downtime all together, via expand/contract style
> migrations, with a little help from objects to avoid data migrations.
> 
> That doesn't mean maintenance mode if not useful for other things,
> like an emergency patching of the hypervisor.
> 
> John
> 
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




More information about the OpenStack-dev mailing list