<div dir="ltr">Comments inline.<br><div class="gmail_extra"><br></div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Jun 16, 2016 at 10:13 AM, Matt Riedemann <span dir="ltr"><<a href="mailto:mriedem@linux.vnet.ibm.com" target="_blank">mriedem@linux.vnet.ibm.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On 6/16/2016 6:12 AM, Preston L. Bannister wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

I am hoping support for instance quiesce in the Nova API makes it into<br>

OpenStack. To my understanding, this is existing function in Nova, just<br>

not-yet exposed in the public API. (I believe Cinder uses this via a<br>

private Nova API.)<br>

</blockquote>

<br></span>

I'm assuming you're thinking of the os-assisted-volume-snapshots admin API in Nova that is called from the Cinder RemoteFSSnapDrivers (glusterfs, scality, virtuozzo and quobyte). I started a separate thread about that yesterday, mainly around the lack of CI testing / status so we even have an idea if this is working consistently and we don't regress it.</blockquote><div><br></div><div>Yes, I believe we are talking about the same thing. Also, I saw your other message. :)</div><div><br></div><div><br></div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class=""><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Much of the discussion is around disaster recovery (DR) and NFV - which<br>

is not wrong, but might be muddling the discussion? Forget DR and NFV,<br>

for the moment.<br>

<br>

My interest is simply in collecting high quality backups of applications<br>

(instances) running in OpenStack. (Yes, customers are deploying<br>

applications into OpenStack that need backup - and at large scale. They<br>

told us, *very* clearly.) Ideally, I would like to give the application<br>

a chance to properly quiesce, so the on-disk state is most-consistent,<br>

before collecting the backup.<br>

</blockquote>

<br></span>

We already attempt to quiesce an active volume-backed instance before doing a volume snapshot:<br>

<br>

<a href="https://github.com/openstack/nova/blob/11bd0052bdd660b63ecca53c5b6fe68f81bdf9c3/nova/compute/api.py#L2266" rel="noreferrer" target="_blank">https://github.com/openstack/nova/blob/11bd0052bdd660b63ecca53c5b6fe68f81bdf9c3/nova/compute/api.py#L2266</a><span class=""><br></span></blockquote><div><br></div><div>The problem is, from my point of view, if the instance has more than one volume (and many do), then quiescing the instance for more than once is not very nice.</div><div><br></div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class=""><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

The existing function in Nova should be at least a good start, it just<br>

needs to be exposed in the public Nova API. (At least, this is my<br>

understanding.)<br>

<br>

Of course, good backups (however collected) allow you to build DR<br>

solutions. My immediate interest is simply to collect high-quality backups.<br>

<br>

The part in the blueprint about an atomic operation on a list of<br>

instances ... this might be over-doing things. First, if you have a set<br>

of related instances, very likely there is a logical order in which they<br>

should be quiesced. Some could be quiesced concurrently. Others might<br>

need to be sequential.<br>

<br>

Assuming the quiesce API *starts* the operation, and there is some means<br>

to check for completion, then a single-instance quiesce API should be<br>

sufficient. An API that is synchronous (waits for completion before<br>

returning) would also be usable. (I am not picky - just want to collect<br>

better backups for customers.)<br>

</blockquote>

<br></span>

As noted above, we already attempt to quiesce when doing a volume-backed instance snapshot.<br>

<br>

The problem comes in with the chaining and orchestration around a list of instances. That requires additional state management and overhead within Nova and while we're actively trying to redo parts of the code base to make things less terrible, adding more complexity on top at the same time doesn't help.<br></blockquote><div><br></div><div>I agree with your concern. To be clear, what I am hoping for is the simplest possible version - a API to quiesce/unquiesce a single instance, similar to the existing pause/unpause APIs.</div><div><br></div><div>Handling of lists of instances (and response to state changes), I would expect implement on the caller-side. There are application-specific semantics, so a single-instance API has merit from my perspective.</div><div> </div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

I'm also not sure what something like multiattach volumes will throw into the mix with this, but that's another DR/HA requirement.<br>

<br>

So I get that lots of people want lots of things that aren't in Nova right now. We have that coming from several different projects (cinder for multiattach volumes, neutron for vlan-aware-vms and routed networks), and several different groups (NFV, ops).<br>

<br>

We also have a lot of people that just want the basic IaaS layer to work for the compute service in an OpenStack cloud, like being able to scale that out better and track resource usage for accurate scheduling.<br>

<br>

And we have a lot of developers that want to be able to actually understand what it is the code is doing, and a much smaller number of core maintainers / reviewers that don't want to have to keep piling technical debt into the project while we're trying to fix some of what's already built up over the years - and actually have this stuff backed with integration testing.<br>

<br>

So, I get it. We all have requirements and we all have resource limitations, which is why we as a team prioritize our work items for the release. This one didn't make it for Newton.<br></blockquote><div><br></div><div>Ah. I did not quite get that from what I read online. Unfortunate. Also sounds like the Nova-folk are overloaded, and we need to come up with resources to contribute to Nova, if we want this to appear in better time.</div><div><br></div><div><br></div><div> </div></div></div></div>