[openstack-dev] [All] Maintenance mode in OpenStack during patching/upgrades

Tim Bell Tim.Bell at cern.ch
Wed Sep 10 06:49:44 UTC 2014


It would be great if each OpenStack component could provide a maintenance mode like this… there was some work being considered on Cells https://blueprints.launchpad.net/nova/+spec/disable-child-cell-support which would have allowed parts of Nova to indicate they were in maintenance.

Something generic would be very useful. Some operators have asked for ‘read-only’ modes also where query is OK but update is not permitted.

Tim

From: Mike Scherbakov [mailto:mscherbakov at mirantis.com]
Sent: 09 September 2014 23:20
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [All] Maintenance mode in OpenStack during patching/upgrades

Sergii, Clint,
to rephrase what you are saying - there are might be situations when our OpenStack API will not be responding, as simply services would be down for upgrade.
Do we want to support it somehow? For example, if we know that Nova is going to be down, can we respond with HTTP 503 with appropriate Retry-After time in header?

The idea is not simply deny or hang requests from clients, but provide them "we are in maintenance mode, retry in X seconds"

> Turbo Hipster was added to the gate
great idea, I think we should use it in Fuel too

> You probably would want 'nova host-servers-migrate <host>'
yeah for migrations - but as far as I understand, it doesn't help with disabling this host in scheduler - there is can be a chance that some workloads will be scheduled to the host.


On Tue, Sep 9, 2014 at 6:02 PM, Clint Byrum <clint at fewbar.com<mailto:clint at fewbar.com>> wrote:
Excerpts from Mike Scherbakov's message of 2014-09-09 00:35:09 -0700:
> Hi all,
> please see below original email below from Dmitry. I've modified the
> subject to bring larger audience to the issue.
>
> I'd like to split the issue into two parts:
>
>    1. Maintenance mode for OpenStack controllers in HA mode (HA-ed
>    Keystone, Glance, etc.)
>    2. Maintenance mode for OpenStack computes/storage nodes (no HA)
>
> For first category, we might not need to have maintenance mode at all. For
> example, if we apply patching/upgrade one by one node to 3-node HA cluster,
> 2 nodes will serve requests normally. Is that possible for our HA solutions
> in Fuel, TripleO, other frameworks?

You may have a broken cloud if you are pushing out an update that
requires a new schema. Some services are better than others about
handling old schemas, and can be upgraded before doing schema upgrades.
But most of the time you have to do at least a brief downtime:

 * turn off DB accessing services
 * update code
 * run db migration
 * turn on DB accessing services

It is for this very reason, I believe, that Turbo Hipster was added to
the gate, so that deployers running against the upstream master branches
can have a chance at performing these upgrades in a reasonable amount of
time.

>
> For second category, can not we simply do "nova-manage service disable...",
> so scheduler will simply stop scheduling new workloads on particular host
> which we want to do maintenance on?
>

You probably would want 'nova host-servers-migrate <host>' at that
point, assuming you have migration set up.

http://docs.openstack.org/user-guide/content/novaclient_commands.html

> On Thu, Aug 28, 2014 at 6:44 PM, Dmitry Pyzhov <dpyzhov at mirantis.com<mailto:dpyzhov at mirantis.com>> wrote:
>
> > All,
> >
> > I'm not sure if it deserves to be mentioned in our documentation, this
> > seems to be a common practice. If an administrator wants to patch his
> > environment, he should be prepared for a temporary downtime of OpenStack
> > services. And he should plan to perform patching in advance: choose a time
> > with minimal load and warn users about possible interruptions of service
> > availability.
> >
> > Our current implementation of patching does not protect from downtime
> > during the patching procedure. HA deployments seems to be more or less
> > stable. But it looks like it is possible to schedule an action on a compute
> > node and get an error because of service restart. Deployments with one
> > controller... well, you won’t be able to use your cluster until the
> > patching is finished. There is no way to get rid of downtime here.
> >
> > As I understand, we can get rid of possible issues with computes in HA.
> > But it will require migration of instances and stopping of nova-compute
> > service before patching. And it will make the overall patching procedure
> > much longer. Do we want to investigate this process?
> >
> > _______________________________________________
> > OpenStack-dev mailing list
> > OpenStack-dev at lists.openstack.org<mailto:OpenStack-dev at lists.openstack.org>
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
> >
>

_______________________________________________
OpenStack-dev mailing list
OpenStack-dev at lists.openstack.org<mailto:OpenStack-dev at lists.openstack.org>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



--
Mike Scherbakov
#mihgen
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140910/5eaf45f6/attachment.html>


More information about the OpenStack-dev mailing list