Re: [self-healing] live-migrate instance in response to fault signals

2 May 2019

      ----- Original Message -----
...
From: "Eric K" <ekcs.openstack@gmail.com>
To: "openstack-discuss" <openstack-discuss@lists.openstack.org>
Sent: Wednesday, May 1, 2019 4:59:57 PM
Subject: [self-healing] live-migrate instance in response to fault signals
...
I just want to follow up to get more info on the context;
specifically, which of the following pieces are the main difficulties?
- detecting the failure/soft-fail/early failure indication
- codifying how to respond to each failure scenario
- triggering/executing the desired workflow
- something else
[1] https://etherpad.openstack.org/p/DEN-self-healing-SIG
We currently attempt to do all of the above using less-than-optimal custom
scripts (using openstacksdk) and pipelines (running Ansible).

I think there is tremendous value in developing at least one tested
way to do all of the above by connecting e.g. Monasca, Mistral and Nova
together to do the above. Maybe it's currently somewhat possible - then
it's more of a documentation issue that would benefit operators.

One of the derivative issues is the quality of live-migration in Nova.
(I don't have production-level experience with Rocky/Stein yet.)
When we do lots of live migrations, there is obviously a limit on the number
of live migrations happening at the same time (doing more would be counter
productive). These limits could be smarter/more dynamic in some cases.
There is no immediate action item here right now though.

I would like to begin with putting together all the pieces that currently
work together and go from there - see what's missing.

-Daniel

Re: [self-healing] live-migrate instance in response to fault signals

Daniel Speichert