[openstack-dev] [tempest][qa][ironic][nova] When Nova should mark instance as successfully deleted?

Andrew Laski andrew at lascii.com
Fri May 27 18:44:31 UTC 2016



On Fri, May 27, 2016, at 11:25 AM, Matthew Treinish wrote:
> On Fri, May 27, 2016 at 05:52:51PM +0300, Vasyl Saienko wrote:
> > Lucas, Andrew
> > 
> > Thanks for fast response.
> > 
> > On Fri, May 27, 2016 at 4:53 PM, Andrew Laski <andrew at lascii.com> wrote:
> > 
> > >
> > >
> > > On Fri, May 27, 2016, at 09:25 AM, Lucas Alvares Gomes wrote:
> > > > Hi,
> > > >
> > > > Thanks for bringing this up Vasyl!
> > > >
> > > > > At the moment Nova with ironic virt_driver consider instance as
> > > deleted,
> > > > > while on Ironic side server goes to cleaning which can take a while. As
> > > > > result current implementation of Nova tempest tests doesn't work for
> > > case
> > > > > when Ironic is enabled.
> > >
> > > What is the actual failure? Is it a capacity issue because nodes do not
> > > become available again quickly enough?
> > >
> > >
> > The actual failure is that temepest community doesn't want to accept 1
> > option.
> > https://review.openstack.org/315422/
> > And I'm not sure that it is the right way.
> 
> No Andrew is right, this is a resource limitation in the gate. The
> failures
> you're hitting are caused by resource constraints in the gate and not
> having
> enough available nodes to run all the tests because deleted nodes are
> still
> cleaning (or doing another operation) and aren't available to nova for
> booting
> another guest.
> 
> I -2d that patch because it's a workaround for the fundamental issue here
> and
> not actually an appropriate change for Tempest. What you've implemented
> in that
> patch is the equivalent of talking to libvirt or some other hypervisor
> directly
> to find out if something is actually deleted. It's a layer violation,
> there is
> never a reason that should be necessary especially in a test of the nova
> api.
> 
> > 
> > > >
> > > > > There are two possible options how to fix it:
> > > > >
> > > > >  Update Nova tempest test scenarios for Ironic case to wait when
> > > cleaning is
> > > > > finished and Ironic node goes to 'available' state.
> > > > >
> > > > > Mark instance as deleted in Nova only after cleaning is finished on
> > > Ironic
> > > > > side.
> > > > >
> > > > > I'm personally incline to 2 option. From user side successful instance
> > > > > termination means that no instance data is available any more, and
> > > nobody
> > > > > can access/restore that data. Current implementation breaks this rule.
> > > > > Instance is marked as successfully deleted while in fact it may be not
> > > > > cleaned, it may fail to clean and user will not know anything about it.
> > > > >
> > 
> > >
> > > > I don't really like option #2, cleaning can take several hours
> > > > depending on the configuration of the node. I think that it would be a
> > > > really bad experience if the user of the cloud had to wait a really
> > > > long time before his resources are available again once he deletes an
> > > > instance. The idea of marking the instance as deleted in Nova quickly
> > > > is aligned with our idea of making bare metal deployments
> > > > look-and-feel like VMs for the end user. And also (one of) the
> > > > reason(s) why we do have a separated state in Ironic for DELETING and
> > > > CLEANING.
> > >
> > 
> > The resources will be available only if there are other available baremetal
> > nodes in the cloud.
> > User doesn't have ability to track for status of available resources
> > without admin access.
> > 
> > 
> > > I agree. From a user perspective once they've issued a delete their
> > > instance should be gone. Any delay in that actually happening is purely
> > > an internal implementation detail that they should not care about.
> > >
> 
> Delete is an async operation in Nova. There is never any immediacy here
> it
> always takes an indeterminate amount of time between it being issued by
> the user
> and the server actually going away. The disconnect here is that when
> running
> with the ironic driver the server disappears from Nova but the resources
> aren't
> freed back when that happens until the cleaning is done. I'm pretty sure
> this is
> different from all the other Nova drivers. 
> 
> I don't really have a horse in this race so whatever ends up being
> decided for
> the behavior here is fine. But, I think we need to be clear with what the
> behavior here is and want we actually want. Personally, I don't see an
> issue
> with the node being in the deleting task_state for a long time because
> that's
> what is really happening while it's deleting. To me a delete is only
> finished
> when the resource is actually gone and it's consumed resources return to
> the
> pool.

I wouldn't argue against an instance hanging around in a deleting state
for a long time. However at this time quota usage is not reduced until
the instance is considered to have been deleted. I think those would
need to be decoupled in order to leave instances in a deleting state. A
user should not need to wait hours to get their quota back just because
they wanted a baremetal machine. The burden of a long cleanup should
fall on a deployer and their ability to manage capacity.

But the issue here is just capacity. Whether or not we keep an instance
in a deleting state, or when we release quota, doesn't change the
Tempest failures from what I can tell. The suggestions below address
that.


> 
> > > >
> > > > I think we should go with #1, but instead of erasing the whole disk
> > > > for real maybe we should have a "fake" clean step that runs quickly
> > > > for tests purposes only?
> > > >
> 
> Disabling the cleaning step (or having a fake one that does nothing) for
> the
> gate would get around the failures at least. It would make things work
> again
> because the nodes would be available right after Nova deletes them.
> 
> -Matt Treinish
> 
> > >
> > 
> > At the gates we just waiting for bootstrap and callback from node when
> > cleaning starts. All heavy operations are postponed. We can disable
> > automated_clean, which means it is not tested.
> > 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe:
> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> Email had 1 attachment:
> + signature.asc
>   1k (application/pgp-signature)



More information about the OpenStack-dev mailing list