On 07-01-21 11:58:05, Jean-Philippe Méthot wrote:
Considering that this issue happened in our production environment, it’s not exactly possible to try to reproduce without shutting down servers that are currently in use. That said, If the current logs I have are enough, I will try opening a bug on the bugtracker.
Yup appreciate that, if you still have logs then using the event list to determine the request-id for the evacuation and then providing any n-api/n-cpu logs referencing that request-id in the bug would be great. Lots more detail in the following doc: https://docs.openstack.org/api-guide/compute/faults.html
Compute22, the source host, was completely dead. It refused to boot up through IPMI.
ACK.
It is possible that that stein fix prevented me from reproducing the problem in my staging environment (production is on rocky, staging is on stein).
Also, it may be important to note that our neutron is split, as we use neutron-rpc-server to answer rpc calls. It’s also HA, as we have two controllers with neutron-rpc-server and the api running (and that won’t work anymore when we upgrade production to stein, but that’s another problem entirely and probably off-topic here).
I doubt that played a part, we've fixed many many bugs with Nova's evacuation logic over the releases so for now I'm going to assume it's something within Nova.
Le 7 janv. 2021 à 09:26, Lee Yarwood <lyarwood@redhat.com> a écrit :
Would you be able to trace an example evacuation request fully and pastebin it somewhere using `openstack server event list $instance [1]` output to determine the request-id etc? Feel free to also open a bug about this and we can just triage there instead of the ML.
The fact that q-api has sent the network-vif-plugged:80371c01-930d-4ea2-9d28-14438e948b65 to n-api suggests that the q-agt is actually alive on compute22, was that the case? Note that a pre-condition of calling the evacuation API is that the source host has been fenced [2].
That all said I wonder if this is somehow related too the following stein change:
https://review.opendev.org/c/openstack/nova/+/603844 <https://review.opendev.org/c/openstack/nova/+/603844>
-- Lee Yarwood A5D1 9385 88CB 7E5F BE64 6618 BCA6 6E33 F672 2D76