On 11/6/2019 5:07 PM, Erik Olof Gunnar Andersson wrote:
Yea - this is our number one pain point with Nova and Rocky, and having this backported would be invaluable.
I posted [1] today. If that's accepted I can work on Rocky afterward.
Since we are on the topic some additional issues we are having.
- Sometimes heal_allocations just fails without a good error (e.g. Compute host could not be found.)
- Errors are always sequential and always halt execution, so if you have a lot of errors, you'll end up fixing them all one-by-one.
- Better logging when unexpected errors do happen (maybe something more verbose like --debug would be good?).
Could you open a bug with more details about the issues you're hitting. Like in what case do you hit ComputeHostNotFound?
The sequential errors thing is pretty obvious but I'm not sure what to do about it off the top of my head besides some option to say "process as much as possible storing up all of the errors to dump at the end" kind of thing.
As for better logging about unexpected errors, it's hard to know what to log that's better when it's unexpected, you know? If you have examples can you throw those into the bug report?
[1] https://review.opendev.org/#/q/status:open+project:openstack/nova+branch:sta...