Open Stack

Fri Nov 11 08:25:36 UTC 2016

I have been looking in past two OpenStack summits to have changes needed to
fulfill OPNFV Doctor use case for planned host maintenance and at the same
time trying to find other Ops requirements to satisfy different needs. I was
just about to start a new project (Fenix), but looking Craton, it seems
a good alternative and was proposed to me in Barcelona meetup. Here is some
ideas and would like a comment wither Craton could be used here.

OPNFV Doctor / NFV requirements are described here:
http://artifacts.opnfv.org/doctor/docs/requirements/02-use_cases.html#nvfi-maintenance
http://artifacts.opnfv.org/doctor/docs/requirements/03-architecture.html#nfvi-maintenance
http://artifacts.opnfv.org/doctor/docs/requirements/05-implementation.html#nfvi-maintenance

My rough thoughts about what would be initially needed (as short as I can):

- There should be a database of all hosts matching to what is known by Nova.
- There should by an API for Cloud Admin to set planned maintenance window
  for a host (maybe aggregate, group of hosts), when in maintenance and unset
  when finished. There might be some optional parameters like target host
  where to move things currently running on effected host. could also be
  used for retirement of a host.
- There should be project(tenant) and host specific notifications that could:
    - Trigger alarm in Aodh so Application would be aware of maintenance state
      changes effecting to his servers, so zero downtime of application could
      be guaranteed.
    - Notification could be consumed by workflow engine like Mistral, where
      application server specific actions flows and admin action flows could
      be performed (to move servers away, disable host,...).
    - Host monitoring like Vitrage could consume notification to disable
      alarms for host as of planned maintenance ongoing and not down by fault.
- There should be admin and project level API to query maintenance session
  status.
- Workflow status should be queried or read as notification to keep internal
  state and send further notification.
- Some more discussion also in "BCN-ops-informal-meetup" that goes beyond this:
  https://etherpad.openstack.org/p/BCN-ops-informal-meetup

What else, details, problems:

There is a problem in flow engine actions. Depending on how long maintenance
would take or what type of server is running, application wants flows to behave
differently. Application specific flows could surely be done, but problem is
that they should make admin actions. It should be solved how application can
decide actions flows while only admin can run them. Should admin make
the flows and let application a power to choose by hint in nova metadata or
in notification going to flow engine.

Started a discussion in Austin summit about extending the planned host
maintenance in Nova, but it was agreed there could just be a link to external
tool. Now if this tool would exist in OpenStack, I would suggest to link it
like this, but surely this is to be seen after the external tool
implementation exists:
- Nova Services API could have a way for admin to set and unset a "base URL"
  pointing to external tool about planned maintenance effecting to a host.
- Admin should see link to external tool when querying services via services
  API. This might be formed like: {base URL}/{host_name}
- Project should have a project specific link to external tool when querying
  via Nova servers API. This might be: {base URL}/project/{hostId}.
  hostId is exposed to project as it do not tell exact host, but otherwise as
  a unique identifier for host:
  hashlib.sha224(projectid + host_name).hexdigest()

Br,
Tomi Juvonen
Senior SW Architect, Nokia

Open Stack

[openstack-dev] [Craton] NFV planned host maintenance

OpenStack

Community

Documentation

Branding & Legal