Re: [nova] Nova evacuate issue

8 Jan 2021

      Considering that this issue happened in our production environment, it’s not exactly possible to try to reproduce without shutting down servers that are currently in use. That said, If the current logs I have are enough, I will try opening a bug on the bugtracker.

Compute22, the source host, was completely dead. It refused to boot up through IPMI.

It is possible that that stein fix prevented me from reproducing the problem in my staging environment (production is on rocky, staging is on stein).

Also, it may be important to note that our neutron is split, as we use neutron-rpc-server to answer rpc calls. It’s also HA, as we have two controllers with neutron-rpc-server and the api running (and that won’t work anymore when we upgrade production to stein, but that’s another problem entirely and probably off-topic here).

Jean-Philippe Méthot
Senior Openstack system administrator
Administrateur système Openstack sénior
PlanetHoster inc.
4414-4416 Louis B Mayer
Laval, QC, H7P 0G1, Canada
...
Le 7 janv. 2021 à 09:26, Lee Yarwood <lyarwood@redhat.com> a écrit :
Would you be able to trace an example evacuation request fully and
pastebin it somewhere using `openstack server event list $instance [1]`
output to determine the request-id etc? Feel free to also open a bug
about this and we can just triage there instead of the ML.
The fact that q-api has sent the
network-vif-plugged:80371c01-930d-4ea2-9d28-14438e948b65 to n-api
suggests that the q-agt is actually alive on compute22, was that the
case? Note that a pre-condition of calling the evacuation API is that
the source host has been fenced [2].
That all said I wonder if this is somehow related too the following
stein change:
https://review.opendev.org/c/openstack/nova/+/603844 <https://review.opendev.org/c/openstack/nova/+/603844>