On 22 Apr 2024, at 21:19, smooney@redhat.com wrote:
if you have a server on host A and that host has a failure, when the operator evacuates it to host B there is a a period of time where a user could issue api actiosn like stop that will be queue for host A.
when the operator restores Host A (i.e. replaces the power supply) when that service start up it may dequeue a message that was sent before teh evauate was done.
nova needs to check that when we dequeue an message that the instance is activly managed on this host and avoid processing request for instance it nolonger manages. in this case it should avoid doing na instance.save to record the instance should be powered off.
The last section very accurately describes the issue, we process messages even if we still don’t manage the instance. We also handle the power state and hit a save to the database which makes the destination where the instance was placed when evacuated to erroneously power-off the instance causing an “outage” on that instance again after the fact. Best regards Tobias