[openstack-dev] [Craton] NFV planned host maintenance

Ian Cordasco sigmavirus24 at gmail.com
Wed Nov 16 14:46:08 UTC 2016


-----Original Message-----
From: Juvonen, Tomi (Nokia - FI/Espoo) <tomi.juvonen at nokia.com>
Reply: OpenStack Development Mailing List (not for usage questions)
<openstack-dev at lists.openstack.org>
Date: November 11, 2016 at 02:27:19
To: OpenStack Development Mailing List (not for usage questions)
<openstack-dev at lists.openstack.org>
Subject:  [openstack-dev] [Craton] NFV planned host maintenance

> I have been looking in past two OpenStack summits to have changes needed to
> fulfill OPNFV Doctor use case for planned host maintenance and at the same
> time trying to find other Ops requirements to satisfy different needs. I was
> just about to start a new project (Fenix), but looking Craton, it seems
> a good alternative and was proposed to me in Barcelona meetup. Here is some
> ideas and would like a comment wither Craton could be used here.

Hi Tomi,

Thanks for your interest in craton! I'm replying in-line, but please
come and join us in #craton on Freenode as well!

> OPNFV Doctor / NFV requirements are described here:
> http://artifacts.opnfv.org/doctor/docs/requirements/02-use_cases.html#nvfi-maintenance
> http://artifacts.opnfv.org/doctor/docs/requirements/03-architecture.html#nfvi-maintenance
> http://artifacts.opnfv.org/doctor/docs/requirements/05-implementation.html#nfvi-maintenance
>
> My rough thoughts about what would be initially needed (as short as I can):
>
> - There should be a database of all hosts matching to what is known by Nova.

So I think this might be the first problem that you'll run into with Craton.

Craton is designed to specifically manage the physical devices in a
data centre. At the moment, it only considers the hosts that you'd run
Nova on, not the Virtual Machines that Nova is managing on the Compute
hosts.

It's plausible that we could add the ability to track virtual
machines, but Craton is meant to primarily work underneath the cloud.
I think this might be changing since Craton is looking forward to
helping manage a multi-cloud environment, so it's possible this won't
be an issue for long.

> - There should by an API for Cloud Admin to set planned maintenance window
> for a host (maybe aggregate, group of hosts), when in maintenance and unset
> when finished. There might be some optional parameters like target host
> where to move things currently running on effected host. could also be
> used for retirement of a host.

This sounds like it's part of the next phase of Craton development -
the remediation workflows. I think Jim and Sulo are more suited
towards talking to that though.

> - There should be project(tenant) and host specific notifications that could:

We are talking about an events/notifications system.

> - Trigger alarm in Aodh so Application would be aware of maintenance state
> changes effecting to his servers, so zero downtime of application could
> be guaranteed.

I'm not sure it should be Craton's responsibility to do this, but I
expect the administrator could set alarm criteria based off of
Craton's events stream.

> - Notification could be consumed by workflow engine like Mistral, where
> application server specific actions flows and admin action flows could
> be performed (to move servers away, disable host,...).
> - Host monitoring like Vitrage could consume notification to disable
> alarms for host as of planned maintenance ongoing and not down by fault.
> - There should be admin and project level API to query maintenance session
> status.
> - Workflow status should be queried or read as notification to keep internal
> state and send further notification.
> - Some more discussion also in "BCN-ops-informal-meetup" that goes beyond this:
> https://etherpad.openstack.org/p/BCN-ops-informal-meetup

These are all interesting ideas. Thank you!

> What else, details, problems:
>
> There is a problem in flow engine actions. Depending on how long maintenance
> would take or what type of server is running, application wants flows to behave
> differently. Application specific flows could surely be done, but problem is
> that they should make admin actions. It should be solved how application can
> decide actions flows while only admin can run them. Should admin make
> the flows and let application a power to choose by hint in nova metadata or
> in notification going to flow engine.
>
> Started a discussion in Austin summit about extending the planned host
> maintenance in Nova, but it was agreed there could just be a link to external
> tool. Now if this tool would exist in OpenStack, I would suggest to link it
> like this, but surely this is to be seen after the external tool
> implementation exists:
> - Nova Services API could have a way for admin to set and unset a "base URL"
> pointing to external tool about planned maintenance effecting to a host.
> - Admin should see link to external tool when querying services via services
> API. This might be formed like: {base URL}/{host_name}
> - Project should have a project specific link to external tool when querying
> via Nova servers API. This might be: {base URL}/project/{hostId}.
> hostId is exposed to project as it do not tell exact host, but otherwise as
> a unique identifier for host:
> hashlib.sha224(projectid + host_name).hexdigest()

I'm a little confused by these problems, so I'll have to re-read them
a few times and reply later.

Thanks again Tomi!

--
Ian Cordasco



More information about the OpenStack-dev mailing list