<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Nov 16, 2016 at 10:54 AM, Sulochan Acharya <span dir="ltr"><<a href="mailto:sulo.foss@gmail.com" target="_blank">sulo.foss@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Hi,<br><div class="gmail_extra"><br><div class="gmail_quote"><div><div class="gmail-h5">On Wed, Nov 16, 2016 at 2:46 PM, Ian Cordasco <span dir="ltr"><<a href="mailto:sigmavirus24@gmail.com" target="_blank">sigmavirus24@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span>-----Original Message-----<br>

From: Juvonen, Tomi (Nokia - FI/Espoo) <<a href="mailto:tomi.juvonen@nokia.com" target="_blank">tomi.juvonen@nokia.com</a>><br>

Reply: OpenStack Development Mailing List (not for usage questions)<br>

<<a href="mailto:openstack-dev@lists.openstack.org" target="_blank">openstack-dev@lists.openstack<wbr>.org</a>><br>

Date: November 11, 2016 at 02:27:19<br>

To: OpenStack Development Mailing List (not for usage questions)<br>

<<a href="mailto:openstack-dev@lists.openstack.org" target="_blank">openstack-dev@lists.openstack<wbr>.org</a>><br>

Subject:  [openstack-dev] [Craton] NFV planned host maintenance<br>

<br>

> I have been looking in past two OpenStack summits to have changes needed to<br>

> fulfill OPNFV Doctor use case for planned host maintenance and at the same<br>

> time trying to find other Ops requirements to satisfy different needs. I was<br>

> just about to start a new project (Fenix), but looking Craton, it seems<br>

> a good alternative and was proposed to me in Barcelona meetup. Here is some<br>

> ideas and would like a comment wither Craton could be used here.<br>

<br>

</span>Hi Tomi,<br>

<br>

Thanks for your interest in craton! I'm replying in-line, but please<br>

come and join us in #craton on Freenode as well!<br>

<span><br>

> OPNFV Doctor / NFV requirements are described here:<br>

> <a href="http://artifacts.opnfv.org/doctor/docs/requirements/02-use_cases.html#nvfi-maintenance" rel="noreferrer" target="_blank">http://artifacts.opnfv.org/doc<wbr>tor/docs/requirements/02-use_<wbr>cases.html#nvfi-maintenance</a><br>

> <a href="http://artifacts.opnfv.org/doctor/docs/requirements/03-architecture.html#nfvi-maintenance" rel="noreferrer" target="_blank">http://artifacts.opnfv.org/doc<wbr>tor/docs/requirements/03-archi<wbr>tecture.html#nfvi-maintenance</a><br>

> <a href="http://artifacts.opnfv.org/doctor/docs/requirements/05-implementation.html#nfvi-maintenance" rel="noreferrer" target="_blank">http://artifacts.opnfv.org/doc<wbr>tor/docs/requirements/05-imple<wbr>mentation.html#nfvi-maintenanc<wbr>e</a><br>

><br>

> My rough thoughts about what would be initially needed (as short as I can):<br>

><br>

> - There should be a database of all hosts matching to what is known by Nova.<br>

<br>

</span>So I think this might be the first problem that you'll run into with Craton.<br>

<br>

Craton is designed to specifically manage the physical devices in a<br>

data centre. At the moment, it only considers the hosts that you'd run<br>

Nova on, not the Virtual Machines that Nova is managing on the Compute<br>

hosts.<br></blockquote></div></div></div></div></div></blockquote><div><br></div><div>Craton's inventory supports the following modeling:</div><div><ol><li>devices, which may have a parent (so a strict tree); we map this against such entities as top-of-rack switches; hosts; and containers<br></li><li>logical relationships for these devices, including project, region, cell (optional); and arbitrary labels (tags)<br></li><li>key/value variables on most entities, including devices. Variables support <b>resolution</b> - an override mechanism where values are looked up against some chain (for device, that's the device tree, cell, region, in that order). Values are typed JSON in the underlying (and default) SQLAlchemy model we use.</li></ol><div>Craton users synchronize the device inventory from other source of truth systems, such as an asset database; or perhaps manually. Meanwhile, variables can reflect desired state configuration (so like Ansible); as well as captured information.</div></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div><div class="gmail-h5"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<br>

It's plausible that we could add the ability to track virtual<br>

machines, but Craton is meant to primarily work underneath the cloud.<br>

I think this might be changing since Craton is looking forward to<br>

helping manage a multi-cloud environment, so it's possible this won't<br>

be an issue for long.<br></blockquote></div></div></div></div></div></blockquote><div><br></div><div>Craton's device-focused model, although oriented to hardware, is rather arbitrary. Recently we have been also looking at what is needed to support a multi-tenant, multi-cloud inventory, and it seems quite feasible to manage in Craton's inventory a subset of the resources provided by AWS or Azure.</div><div><br></div><div>Does this mean VMs and similar resources? Maybe. However, our thinking has been, for relatively fast changing and numerous resources, link to the source of truth, in this case Nova. In particular, we have a very model for variables that could be readily extended to support what we call virtualized variables - dictionary mappings that are implemented by looking up on a remote service. See <a href="https://bugs.launchpad.net/craton/+bug/1606882">https://bugs.launchpad.net/craton/+bug/1606882</a> - so long as it implements collections.abc.Mapping, we can plug into how variables are resolved.</div><div> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div><div class="gmail-h5"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"></blockquote><div><br></div></div></div><div>So I think there is 2 parts to this. 1. What Ian mentioned is that Craton currently does not keep inventory of VM running inside an openstack deployment. Nova (and Horizon for UI) is already doing this for the users. However, we do run jobs or workload on the VM .. like live migrating VM's from a host that might undergo maintenance, or a host monitoring flagged as bad etc. This is done using `plugins` that talk to nova etc. So I think some of what you are looking for falls into that perhaps ? This can be based on some notification the Craton engine receives from another application (like monitoring for example).</div></div></div></div></blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><span class="gmail-"><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<span><br>

> - There should by an API for Cloud Admin to set planned maintenance window<br>

> for a host (maybe aggregate, group of hosts), when in maintenance and unset<br>

> when finished. There might be some optional parameters like target host<br>

> where to move things currently running on effected host. could also be<br>

> used for retirement of a host.<br>

<br>

</span>This sounds like it's part of the next phase of Craton development -<br>

the remediation workflows. I think Jim and Sulo are more suited<br>

towards talking to that though.<br>

<span><br></span></blockquote><div><br></div></span><div>So we will be able to trigger a work/job (maintenance) based on 1. User defined schedule 2. Based on some notification that we receive from another application. Both user defined. Like Ian suggested, this is something we plan to do in the next phase.</div></div></div></div></blockquote><div><br></div><div>To further describe: Craton will support an API to run such workflows against some device inventory; use from either the REST API or the Craton client that wraps. A very reasonable integration would be to use this integration in a webhook.</div><div> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><span class="gmail-"><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span>

> - There should be project(tenant) and host specific notifications that could:<br>

<br>

</span>We are talking about an events/notifications system.<br>

<span><br></span></blockquote><div> </div></span><div>+1. We are working on providing notification messages for all actions within the application.</div><span class="gmail-"><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span>

> - Trigger alarm in Aodh so Application would be aware of maintenance state<br>

> changes effecting to his servers, so zero downtime of application could<br>

> be guaranteed.<br>

<br>

</span>I'm not sure it should be Craton's responsibility to do this, but I<br>

expect the administrator could set alarm criteria based off of<br>

Craton's events stream.<br></blockquote><div><br></div></span><div>+1 We need to make sure that we dont try to be a monitoring solution. But like Ian said we can always look at using the notification system to do downstream processing.</div></div></div></div></blockquote><div><br></div><div>Agreed. Note that the notifications we are planning to publish against oslo.messaging are those that change database elements directly controlled by Craton, and not linked inventory such as VMs in Nova. But this notification can work in concert with Nova notifications.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><span class="gmail-"><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<span><br>

> - Notification could be consumed by workflow engine like Mistral, where<br>

> application server specific actions flows and admin action flows could<br>

> be performed (to move servers away, disable host,...).<br>

> - Host monitoring like Vitrage could consume notification to disable<br>

> alarms for host as of planned maintenance ongoing and not down by fault.<br></span></blockquote><div><br></div></span><div>I think its both ways, </div><div>some alarm triggered -> Craton -> Disable the monitoring. But also,</div><div>craton notification -> Some application consumes it -> does something else.</div><div><br></div><div>So the way I think of this is, Admin sets/schedules some work on a host -> Craton workflow will disable your monitoring (given monitoring solution allows such action) -> Start the maintenance work, once finished -> Start the monitoring again.</div><span class="gmail-"><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span>

> - There should be admin and project level API to query maintenance session<br>

> status.<br></span></blockquote><div><br></div></span><div>+1 We have API for all actions... and for all Inventory management as well.</div><span class="gmail-"><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span>

> - Workflow status should be queried or read as notification to keep internal<br>

> state and send further notification.<br>

> - Some more discussion also in "BCN-ops-informal-meetup" that goes beyond this:<br>

> <a href="https://etherpad.openstack.org/p/BCN-ops-informal-meetup" rel="noreferrer" target="_blank">https://etherpad.openstack.org<wbr>/p/BCN-ops-informal-meetup</a><br>

<br>

</span>These are all interesting ideas. Thank you!<br>

<span><br>

> What else, details, problems:<br>

><br>

> There is a problem in flow engine actions. Depending on how long maintenance<br>

> would take or what type of server is running, application wants flows to behave<br>

> differently. Application specific flows could surely be done, but problem is<br>

> that they should make admin actions. It should be solved how application can<br>

> decide actions flows while only admin can run them. Should admin make<br>

> the flows and let application a power to choose by hint in nova metadata or<br>

> in notification going to flow engine.<br></span></blockquote><div><br></div></span><div>So the way Craton flow/workflow/job works is .. each job is a "plugin". Who can run this job is decided though RBAC(work in progress). We expect the plugin to have enough logic to decide what it wants to do. We can definitely set TTL for a running job. Given it was triggered by "admin" who has "admin actions" privilege on Nova you would be allowed to do so. </div><span class="gmail-"><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span>

><br>

> Started a discussion in Austin summit about extending the planned host<br>

> maintenance in Nova, but it was agreed there could just be a link to external<br>

> tool. Now if this tool would exist in OpenStack, I would suggest to link it<br>

> like this, but surely this is to be seen after the external tool<br>

> implementation exists:<br></span></blockquote><div><br></div></span><div>We are far from it right now, but I think Craton would be able to do some of it. Like I said before scheduled jobs will be a part of the workflow engine so this will be possible. We should discuss more on what this exactly looks like etc. Example: Admins would like to upgrade a host. In this case we might trigger a workflow to live-migrate all the vm's from that host, disable monitoring, disable the service, upgrade the host, enable monitoring, and put the host back in production (enabled in nova services) for instance.</div><span class="gmail-"><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span>

> - Nova Services API could have a way for admin to set and unset a "base URL"<br>

> pointing to external tool about planned maintenance effecting to a host.<br>

> - Admin should see link to external tool when querying services via services<br>

> API. This might be formed like: {base URL}/{host_name}<br>

> - Project should have a project specific link to external tool when querying<br>

> via Nova servers API. This might be: {base URL}/project/{hostId}.<br>

> hostId is exposed to project as it do not tell exact host, but otherwise as<br>

> a unique identifier for host:<br>

> hashlib.sha224(projectid + host_name).hexdigest()<br></span></blockquote><div><br></div></span><div>I am not sure about this but interaction with Nova through the API is assumed as a part of a workflow. Most of it might be encapsulated in the running job in the sense that the job you are running is actually doing it. Craton might provide the scaffolding to make it easy to do so. I am not sure what the possibilities are from Nova side, but you might be able to do so simply by scheduling a work on a set of host with just craton perhaps? We can discuss more on #craton ?</div></div></div></div></blockquote><div><br></div><div>Please do join us on #craton to work out these details! Thanks for a great set of questions.</div><div><br></div><div>- Jim</div></div></div></div>