[openstack-dev] [heat] Repeating stack-delete many times

Steven Hardy shardy at redhat.com
Tue Feb 10 13:04:58 UTC 2015


On Tue, Feb 10, 2015 at 03:04:39PM +0400, Kairat Kushaev wrote:
>    Hi all,
>    During the analysis of the following bug:
>    https://bugs.launchpad.net/heat/+bug/1418878
>    i figured out that orchestration engine doesn't work properly in some
>    cases.
>    The case is the following:A 
>    trying to delete the same stack with resources n times in series.
>    It might happen if the stack deleting takes much time and a user is
>    sending
>    the second delete request again.
>    Orchestration engine behavior is the following:
>    1) When first stack-delete command comes to heat service
>    it acquires the stack lock and sends delete request for resources
>    to other clients.
>    Unfortunately, the command does not start to delete resources from heat
>    db.A 
>    2) At that time second stack-delete command for the same stack
>    comes to heat engine. It steals the stack lock, waits 0.2 (hard-coded
>    constant!)A 
>    sec to allow previous stack-delete command finish the operations (of
>    course,A 
>    the first didn't manage to finish deleting on time). After that engine
>    service startsA 
>    the deleting again:
>    A  A  A - Request resources from heat DB (They exist!)
>    A  A  A - Send requests for delete to other clients (They do not exist
>    because ofA 
>    A  A  A  A  point 1).

This is expected, and the reason for the following error path in most
resource handle_delete paths is to ignore any "do not exist errors":

  self.client_plugin().ignore_not_found(e)

>    Finally, we have stack in DELETE_FAILED state because the clients raise
>    exceptions during stack delete.

This is the bug, the exception which is raised isn't getting ignored by the
nova client plugin, which by default only ignores NotFound exceptions:

https://github.com/openstack/heat/blob/master/heat/engine/clients/os/nova.py#L85

In this case, I think the problem is you're getting a Conflict exception
when attempting to re-delete the NovaFloatingIpAssociation:

https://github.com/openstack/heat/blob/master/heat/engine/resources/nova_floatingip.py#L148

So, I think this is probably a bug specific to NovaFloatingIpAssociation
rather than a problem we need to fix accross all resources?

I'd probably suggest we either add another except clause which catches (and
ignores) this situation, or look at if novaclient is raising the wrong
exception type, as "NotFound" would appear to be a saner error than
"Conflict" when trying to delete a non-existent association?

Steve



More information about the OpenStack-dev mailing list