[openstack-dev] [tempest][qa][ironic][nova] When Nova should mark instance as successfully deleted?

Matthew Treinish mtreinish at kortar.org
Fri May 27 15:25:00 UTC 2016


On Fri, May 27, 2016 at 05:52:51PM +0300, Vasyl Saienko wrote:
> Lucas, Andrew
> 
> Thanks for fast response.
> 
> On Fri, May 27, 2016 at 4:53 PM, Andrew Laski <andrew at lascii.com> wrote:
> 
> >
> >
> > On Fri, May 27, 2016, at 09:25 AM, Lucas Alvares Gomes wrote:
> > > Hi,
> > >
> > > Thanks for bringing this up Vasyl!
> > >
> > > > At the moment Nova with ironic virt_driver consider instance as
> > deleted,
> > > > while on Ironic side server goes to cleaning which can take a while. As
> > > > result current implementation of Nova tempest tests doesn't work for
> > case
> > > > when Ironic is enabled.
> >
> > What is the actual failure? Is it a capacity issue because nodes do not
> > become available again quickly enough?
> >
> >
> The actual failure is that temepest community doesn't want to accept 1
> option.
> https://review.openstack.org/315422/
> And I'm not sure that it is the right way.

No Andrew is right, this is a resource limitation in the gate. The failures
you're hitting are caused by resource constraints in the gate and not having
enough available nodes to run all the tests because deleted nodes are still
cleaning (or doing another operation) and aren't available to nova for booting
another guest.

I -2d that patch because it's a workaround for the fundamental issue here and
not actually an appropriate change for Tempest. What you've implemented in that
patch is the equivalent of talking to libvirt or some other hypervisor directly
to find out if something is actually deleted. It's a layer violation, there is
never a reason that should be necessary especially in a test of the nova api.

> 
> > >
> > > > There are two possible options how to fix it:
> > > >
> > > >  Update Nova tempest test scenarios for Ironic case to wait when
> > cleaning is
> > > > finished and Ironic node goes to 'available' state.
> > > >
> > > > Mark instance as deleted in Nova only after cleaning is finished on
> > Ironic
> > > > side.
> > > >
> > > > I'm personally incline to 2 option. From user side successful instance
> > > > termination means that no instance data is available any more, and
> > nobody
> > > > can access/restore that data. Current implementation breaks this rule.
> > > > Instance is marked as successfully deleted while in fact it may be not
> > > > cleaned, it may fail to clean and user will not know anything about it.
> > > >
> 
> >
> > > I don't really like option #2, cleaning can take several hours
> > > depending on the configuration of the node. I think that it would be a
> > > really bad experience if the user of the cloud had to wait a really
> > > long time before his resources are available again once he deletes an
> > > instance. The idea of marking the instance as deleted in Nova quickly
> > > is aligned with our idea of making bare metal deployments
> > > look-and-feel like VMs for the end user. And also (one of) the
> > > reason(s) why we do have a separated state in Ironic for DELETING and
> > > CLEANING.
> >
> 
> The resources will be available only if there are other available baremetal
> nodes in the cloud.
> User doesn't have ability to track for status of available resources
> without admin access.
> 
> 
> > I agree. From a user perspective once they've issued a delete their
> > instance should be gone. Any delay in that actually happening is purely
> > an internal implementation detail that they should not care about.
> >

Delete is an async operation in Nova. There is never any immediacy here it
always takes an indeterminate amount of time between it being issued by the user
and the server actually going away. The disconnect here is that when running
with the ironic driver the server disappears from Nova but the resources aren't
freed back when that happens until the cleaning is done. I'm pretty sure this is
different from all the other Nova drivers. 

I don't really have a horse in this race so whatever ends up being decided for
the behavior here is fine. But, I think we need to be clear with what the
behavior here is and want we actually want. Personally, I don't see an issue
with the node being in the deleting task_state for a long time because that's
what is really happening while it's deleting. To me a delete is only finished
when the resource is actually gone and it's consumed resources return to the
pool.

> > >
> > > I think we should go with #1, but instead of erasing the whole disk
> > > for real maybe we should have a "fake" clean step that runs quickly
> > > for tests purposes only?
> > >

Disabling the cleaning step (or having a fake one that does nothing) for the
gate would get around the failures at least. It would make things work again
because the nodes would be available right after Nova deletes them.

-Matt Treinish

> >
> 
> At the gates we just waiting for bootstrap and callback from node when
> cleaning starts. All heavy operations are postponed. We can disable
> automated_clean, which means it is not tested.
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20160527/a03bf02c/attachment.pgp>


More information about the OpenStack-dev mailing list