----- Original Message -----
From: "Eric K" <ekcs.openstack@gmail.com> To: "openstack-discuss" <openstack-discuss@lists.openstack.org> Sent: Wednesday, May 1, 2019 4:59:57 PM Subject: [self-healing] live-migrate instance in response to fault signals ...
I just want to follow up to get more info on the context; specifically, which of the following pieces are the main difficulties? - detecting the failure/soft-fail/early failure indication - codifying how to respond to each failure scenario - triggering/executing the desired workflow - something else
We currently attempt to do all of the above using less-than-optimal custom scripts (using openstacksdk) and pipelines (running Ansible). I think there is tremendous value in developing at least one tested way to do all of the above by connecting e.g. Monasca, Mistral and Nova together to do the above. Maybe it's currently somewhat possible - then it's more of a documentation issue that would benefit operators. One of the derivative issues is the quality of live-migration in Nova. (I don't have production-level experience with Rocky/Stein yet.) When we do lots of live migrations, there is obviously a limit on the number of live migrations happening at the same time (doing more would be counter productive). These limits could be smarter/more dynamic in some cases. There is no immediate action item here right now though. I would like to begin with putting together all the pieces that currently work together and go from there - see what's missing. -Daniel