[openstack-dev] [Nova] RFC Host Maintenance

Juvonen, Tomi (Nokia - FI/Espoo) tomi.juvonen at nokia.com
Wed Apr 13 07:02:11 UTC 2016


> -----Original Message-----
> From: EXT Jim Rollenhagen [mailto:jim at jimrollenhagen.com]
> Sent: Tuesday, April 12, 2016 4:46 PM
> To: OpenStack Development Mailing List (not for usage questions)
> <openstack-dev at lists.openstack.org>
> Subject: Re: [openstack-dev] [Nova] RFC Host Maintenance
> 
> On Thu, Apr 07, 2016 at 06:36:20AM -0400, Sean Dague wrote:
> > On 04/07/2016 03:26 AM, Juvonen, Tomi (Nokia - FI/Espoo) wrote:
> > > Hi Nova, Ops, stackers,
> > >
> > > I am trying to figure out different use cases and requirements there
> > > would be for host maintenance and would like to get feedback and
> > > transfer all this to spec and discussion what could and should land for
> > > Nova or other places.
> > >
> > > As working in OPNFV Doctor project that has the Telco perspective about
> > > related requirements, I started to draft a spec based on something
> > > smaller that would be nice to have in Nova and less complicated to have
> > > it in single cycle. Anyhow the feedback from Nova API team was to look
> > > this as a whole and gather more. This is why asking this here and not
> > > just trough spec, to get input for requirements and use cases with
> wider
> > > audience. Here is the draft spec proposing first just maintenance
> window
> > > to be added:
> > > _https://review.openstack.org/296995/_
> > >
> > > Here is link to OPNFV Doctor requirements:
> > > _http://artifacts.opnfv.org/doctor/docs/requirements/02-
> use_cases.html#nvfi-maintenance_
> > > _http://artifacts.opnfv.org/doctor/docs/requirements/03-
> architecture.html#nfvi-maintenance_
> > > _http://artifacts.opnfv.org/doctor/docs/requirements/05-
> implementation.html#nfvi-maintenance_
> > >
> > > Here is what I could transfer as use cases, but would ask feedback to
> > > get more:
> > >
> > > As admin I want to set maintenance period for certain host.
> > >
> > > As admin I want to know when host is ready to actions to be done by
> admin
> > > during the maintenance. Meaning physical resources are emptied.
> > >
> > > As owner of a server I want to prepare for maintenance to minimize
> downtime,
> > > keep capacity on needed level and switch HA service to server not
> > > affected by
> > > maintenance.
> > >
> > > As owner of a server I want to know when my servers will be down
> because of
> > > host maintenance as it might be servers are not moved to another host.
> > >
> > > As owner of a server I want to know if host is to be totally removed,
> so
> > > instead of keeping my servers on host during maintenance, I want to
> move
> > > them
> > > to somewhere else.
> > >
> > > As owner of a server I want to send acknowledgement to be ready for
> host
> > > maintenance and I want to state if servers are to be moved or kept on
> host.
> > > Removal and creating of server is in owner's control already.
> Optionally
> > > server
> > > Configuration data could hold information about automatic actions to be
> > > done
> > > when host is going down unexpectedly or in controlled manner. Also
> > > actions at
> > > the same if down permanently or only temporarily. Still this needs
> > > acknowledgement from server owner as he needs time for application
> level
> > > controlled HA service switchover.
> >
> > While I definitely understand the value of these in a deployement, I'm a
> > bit concerned of baking all this structured data into Nova itself. As it
> > effectively means putting some degree of a ticket management system in
> > Nova that's specific to a workflow you've decided on here. Baked in
> > workflow is hard to change when the needs of an industry do.
> >
> > My counter proposal on your spec was to provide a free form field
> > associated with maintenance mode which could contain a url linking to
> > the details. This could be a jira ticket, or a REST url for some other
> > service. This would actually be much like how we handle images in Nova,
> > with a url to glance to find more info.
> 
> FWIW, this is what we do in ironic. A maintenance boolean, and a
> maintenance_reason text field that operators can dump text/links/etc in.
> 
> As an example:
> $ ironic node-set-maintenance $uuid on --reason "Dead fiber // ticket 123
> // jroll 2016/04/12"
> 
> It's worked well for Rackspace's deployment, at least, and I seem to
> remember others being happy with it as well.

Thanks Jim, I can understand for basic need this is enough. Anyhow looking the Telco requirements (linked OPNFV requirements) it is a lot of more complicated. Also I think if looking all kind of user or operator needs there could be a configurable "maintenance engine" to support different kind of scenarios from simpler IT cases to complicated Telco cases. Also this could help in all kind of upgrades if you know also what hosts are in certain level already...

If looking now that maybe we have all the APIs in Nova in place and we do not want more in Nova content, then it would just be something new (or some other existing project) running in OpenStack that would own the configurable maintenance.

-Tomi

> 
> // jim
> 
> >
> > 	-Sean
> >
> > --
> > Sean Dague
> > http://dague.net
> >
> >
> __________________________________________________________________________
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe: OpenStack-dev-
> request at lists.openstack.org?subject:unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



More information about the OpenStack-dev mailing list