[openstack-dev] [nova] consistency and exposing quiesce in the Nova API
Preston L. Bannister
preston at bannister.us
Thu Jun 16 20:30:30 UTC 2016
On Thu, Jun 16, 2016 at 10:13 AM, Matt Riedemann <mriedem at linux.vnet.ibm.com
> On 6/16/2016 6:12 AM, Preston L. Bannister wrote:
>> I am hoping support for instance quiesce in the Nova API makes it into
>> OpenStack. To my understanding, this is existing function in Nova, just
>> not-yet exposed in the public API. (I believe Cinder uses this via a
>> private Nova API.)
> I'm assuming you're thinking of the os-assisted-volume-snapshots admin API
> in Nova that is called from the Cinder RemoteFSSnapDrivers (glusterfs,
> scality, virtuozzo and quobyte). I started a separate thread about that
> yesterday, mainly around the lack of CI testing / status so we even have an
> idea if this is working consistently and we don't regress it.
Yes, I believe we are talking about the same thing. Also, I saw your other
Much of the discussion is around disaster recovery (DR) and NFV - which
>> is not wrong, but might be muddling the discussion? Forget DR and NFV,
>> for the moment.
>> My interest is simply in collecting high quality backups of applications
>> (instances) running in OpenStack. (Yes, customers are deploying
>> applications into OpenStack that need backup - and at large scale. They
>> told us, *very* clearly.) Ideally, I would like to give the application
>> a chance to properly quiesce, so the on-disk state is most-consistent,
>> before collecting the backup.
> We already attempt to quiesce an active volume-backed instance before
> doing a volume snapshot:
The problem is, from my point of view, if the instance has more than one
volume (and many do), then quiescing the instance for more than once is not
> The existing function in Nova should be at least a good start, it just
>> needs to be exposed in the public Nova API. (At least, this is my
>> Of course, good backups (however collected) allow you to build DR
>> solutions. My immediate interest is simply to collect high-quality
>> The part in the blueprint about an atomic operation on a list of
>> instances ... this might be over-doing things. First, if you have a set
>> of related instances, very likely there is a logical order in which they
>> should be quiesced. Some could be quiesced concurrently. Others might
>> need to be sequential.
>> Assuming the quiesce API *starts* the operation, and there is some means
>> to check for completion, then a single-instance quiesce API should be
>> sufficient. An API that is synchronous (waits for completion before
>> returning) would also be usable. (I am not picky - just want to collect
>> better backups for customers.)
> As noted above, we already attempt to quiesce when doing a volume-backed
> instance snapshot.
> The problem comes in with the chaining and orchestration around a list of
> instances. That requires additional state management and overhead within
> Nova and while we're actively trying to redo parts of the code base to make
> things less terrible, adding more complexity on top at the same time
> doesn't help.
I agree with your concern. To be clear, what I am hoping for is the
simplest possible version - a API to quiesce/unquiesce a single instance,
similar to the existing pause/unpause APIs.
Handling of lists of instances (and response to state changes), I would
expect implement on the caller-side. There are application-specific
semantics, so a single-instance API has merit from my perspective.
> I'm also not sure what something like multiattach volumes will throw into
> the mix with this, but that's another DR/HA requirement.
> So I get that lots of people want lots of things that aren't in Nova right
> now. We have that coming from several different projects (cinder for
> multiattach volumes, neutron for vlan-aware-vms and routed networks), and
> several different groups (NFV, ops).
> We also have a lot of people that just want the basic IaaS layer to work
> for the compute service in an OpenStack cloud, like being able to scale
> that out better and track resource usage for accurate scheduling.
> And we have a lot of developers that want to be able to actually
> understand what it is the code is doing, and a much smaller number of core
> maintainers / reviewers that don't want to have to keep piling technical
> debt into the project while we're trying to fix some of what's already
> built up over the years - and actually have this stuff backed with
> integration testing.
> So, I get it. We all have requirements and we all have resource
> limitations, which is why we as a team prioritize our work items for the
> release. This one didn't make it for Newton.
Ah. I did not quite get that from what I read online. Unfortunate. Also
sounds like the Nova-folk are overloaded, and we need to come up with
resources to contribute to Nova, if we want this to appear in better time.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the OpenStack-dev