[openstack-dev] [All] Maintenance mode in OpenStack during patching/upgrades
Tim Bell
Tim.Bell at cern.ch
Tue Oct 21 06:29:16 UTC 2014
> -----Original Message-----
> From: Christopher Aedo [mailto:doc at aedo.net]
> Sent: 21 October 2014 04:45
> To: OpenStack Development Mailing List (not for usage questions)
> Subject: Re: [openstack-dev] [All] Maintenance mode in OpenStack during
> patching/upgrades
>
...
>
> Also, I would like to see "maintenance mode" for Nova be limited just to
> stopping any further VMs being sent there, and the node reporting that it's in
> maintenance mode. I think proactive workload migration should be handled
> independently, as I can imaging scenarios where maintenance mode might be
> desired without coupling migration to it.
>
A typical scenario we have is a non-fatal hardware repair. If a node is reporting ECC memory errors, you want to schedule a repair which
Will be disruptive for any VMs running on that host. The users get annoyed when you give them their new VM and then immediately tell them the
hardware is going to be repaired.
Setting into maintenance for me should mean no new work. I assume that stopping the service has a negative impact on other functions like Telemetry.
Tim
> I would love to keep discussing this further - a small session in Paris would be
> great. But it seems like there's never enough time at the summits, so I don't
> have high hopes for making much progress on this specific topic there. Just the
> same, if anything gets pulled together, I'll be keeping an eye out for it.
>
> -Christopher
>
> On Fri, Oct 17, 2014 at 9:21 PM, Joe Cropper <cropper.joe at gmail.com> wrote:
> > I’m glad to see this topic getting some focus once again. :-)
> >
> > From several of the administrators I talk with, when they think of putting a
> host into maintenance mode, the common requests I hear are:
> >
> > 1. Don’t schedule more VMs to the host 2. Provide an optional way to
> > automatically migrate all (usually active) VMs off the host so that
> > users’ workloads remain “unaffected” by the maintenance operation
> >
> > #1 can easily be achieved, as has been mentioned several times, by simply
> disabling the compute service. However, #2 involves a little more work,
> although certainly possible using all the operations provided by nova today (e.g.,
> live migration, etc.). I believe these types of discussions have come up several
> times over the past several OpenStack releases—certainly since Grizzly (i.e.,
> when I started watching this space).
> >
> > It seems that the general direction is to have the type of workflow needed for
> #2 outside of nova (which is certainly a valid stance). To that end, it would be
> fairly straightforward to build some code that logically sits on top of nova, that
> when entering maintenance:
> >
> > 1. Prevents VMs from being scheduled to the host; 2. Maintains state
> > about the maintenance operation (e.g., not in maintenance, migrations
> > in progress, in maintenance, or error); 3. Provides mechanisms to, upon
> entering maintenance, dictates which VMs (active, all, none) to migrate and
> provides some throttling capabilities to prevent hundreds of parallel migrations
> on densely packed hosts (all done via a REST API).
> >
> > If anyone has additional questions, comments, or would like to discuss some
> options, please let me know. If interested, upon request, I could even share a
> video of how such cases might work. :-) My colleagues and I have given these
> use cases a lot of thought and consideration and I’d love to talk more about
> them (perhaps a small session in Paris would be possible).
> >
> > - Joe
> >
> > On Oct 17, 2014, at 4:18 AM, John Garbutt <john at johngarbutt.com> wrote:
> >
> >> On 17 October 2014 02:28, Matt Riedemann <mriedem at linux.vnet.ibm.com>
> wrote:
> >>>
> >>>
> >>> On 10/16/2014 7:26 PM, Christopher Aedo wrote:
> >>>>
> >>>> On Tue, Sep 9, 2014 at 2:19 PM, Mike Scherbakov
> >>>> <mscherbakov at mirantis.com> wrote:
> >>>>>>
> >>>>>> On Tue, Sep 9, 2014 at 6:02 PM, Clint Byrum <clint at fewbar.com>
> wrote:
> >>>>>
> >>>>> The idea is not simply deny or hang requests from clients, but
> >>>>> provide them "we are in maintenance mode, retry in X seconds"
> >>>>>
> >>>>>> You probably would want 'nova host-servers-migrate <host>'
> >>>>>
> >>>>> yeah for migrations - but as far as I understand, it doesn't help
> >>>>> with disabling this host in scheduler - there is can be a chance
> >>>>> that some workloads will be scheduled to the host.
> >>>>
> >>>>
> >>>> Regarding putting a compute host in maintenance mode using "nova
> >>>> host-update --maintenance enable", it looks like the blueprint and
> >>>> associated commits were abandoned a year and a half ago:
> >>>> https://blueprints.launchpad.net/nova/+spec/host-maintenance
> >>>>
> >>>> It seems that "nova service-disable <host> nova-compute"
> >>>> effectively prevents the scheduler from trying to send new work
> >>>> there. Is this the best approach to use right now if you want to
> >>>> pull a compute host out of an environment before migrating VMs off?
> >>>>
> >>>> I agree with Tim and Mike that having something respond "down for
> >>>> maintenance" rather than ignore or hang would be really valuable.
> >>>> But it also looks like that hasn't gotten much traction in the past
> >>>> - anyone feel like they'd be in support of reviving the notion of
> >>>> "maintenance mode"?
> >>>>
> >>>> -Christopher
> >>>>
> >>>> _______________________________________________
> >>>> OpenStack-dev mailing list
> >>>> OpenStack-dev at lists.openstack.org
> >>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >>>>
> >>>
> >>> host-maintenance-mode is definitely a thing in nova compute via the
> >>> os-hosts API extension and the --maintenance parameter, the compute
> >>> manager code is here [1]. The thing is the only in-tree virt driver
> >>> that implements it is xenapi, and I believe when you put the host in
> >>> maintenance mode it's supposed to automatically evacuate the
> >>> instances to some other host, but you can't target the other host or
> >>> tell the driver, from the API, which instances you want to evacuate, e.g. all,
> none, running only, etc.
> >>>
> >>> [1]
> >>> http://git.openstack.org/cgit/openstack/nova/tree/nova/compute/manag
> >>> er.py?id=2014.2#n3990
> >>
> >> We should certainly make that more generic. It doesn't update the VM
> >> state, so its really only admin focused in its current form.
> >>
> >> The XenAPI logic only works when using XenServer pools with shared
> >> NFS storage, if my memory serves me correctly. Honestly, its a bit of
> >> code I have planned on removing, along with the rest of the pool support.
> >>
> >> In terms of requiring DB downtime in Nova, the current efforts are
> >> focusing on avoiding downtime all together, via expand/contract style
> >> migrations, with a little help from objects to avoid data migrations.
> >>
> >> That doesn't mean maintenance mode if not useful for other things,
> >> like an emergency patching of the hypervisor.
> >>
> >> John
> >>
> >> _______________________________________________
> >> OpenStack-dev mailing list
> >> OpenStack-dev at lists.openstack.org
> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
> >
> > _______________________________________________
> > OpenStack-dev mailing list
> > OpenStack-dev at lists.openstack.org
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
More information about the OpenStack-dev
mailing list