[heat] Resource replacement terminates at DELETE_COMPLETE
HI everyone! I have a situation with a heat stack where it has an Octavia Load Balancer resource which it thinks it's already replaced and so will not recreate it. Resource api_lb with id 3978 already replaced by 3999; not checking check /var/lib/kolla/venv/lib/python2.7/site-packages/heat/engine/check_resource.py:310 : It goes to a DELETE_COMPLETED state and just sits there. The stack stays UPDATE_IN_PROGRESS and nothing else moves. It doesn't even time out after 4 hours. Doing a stack check puts everytinng as CHECK_COMPLETE, even the non-existent load balancers. I can mark the LB and its components unhealthy and start another update, but this just repeats the cycle. This all started with some Octavia shenanigans which ended with all the load balancers being deleted manually. I have 2 similar stacks which recreated fine, but this one went through the cycle several other times as we were trying to fix the LB problem. This is a super edge case, but hopefully someone has another idea how to get out of it. Thanks! Erik
On 22/06/19 11:30 AM, Erik McCormick wrote:
HI everyone!
I have a situation with a heat stack where it has an Octavia Load Balancer resource which it thinks it's already replaced and so will not recreate it.
Resource api_lbwith id 3978 already replaced by 3999; not checking check /var/lib/kolla/venv/lib/python2.7/site-packages/heat/engine/check_resource.py:310 :
Ruh-roh. What version of Heat are you using? There has been at least one known bug related to that check. The one that I can find easily is https://storyboard.openstack.org/#!/story/2001974 (fixed in Rocky; backported to Queens and Pike). I think there might have been earlier issues found but they predated the existence of that log message (those were fun to debug). The log message was added in Queens (https://review.opendev.org/533015) so in theory whatever version you're running, the fix should be available in the latest stable release - though if memory serves that only prevents the issue rather than recovering from it. You'll be happy to hear that the check was eliminated forever in Stein: https://review.opendev.org/600278
It goes to a DELETE_COMPLETED state and just sits there. The stack stays UPDATE_IN_PROGRESS and nothing else moves. It doesn't even time out after 4 hours.
Doing a stack check puts everytinng as CHECK_COMPLETE, even the non-existent load balancers. I can mark the LB and its components unhealthy and start another update, but this just repeats the cycle.
This all started with some Octavia shenanigans which ended with all the load balancers being deleted manually. I have 2 similar stacks which recreated fine, but this one went through the cycle several other times as we were trying to fix the LB problem. This is a super edge case, but hopefully someone has another idea how to get out of it.
If you're up for some database hacking, removing that (DELETE_COMPLETE) resource ought to get you unblocked:
DELETE FROM resource WHERE id=3978;
Obviously take appropriate precautions, back up the DB first, &c. cheers, Zane.
participants (2)
-
Erik McCormick
-
Zane Bitter