[nova] Nova evacuate issue

Lee Yarwood lyarwood at redhat.com
Fri Jan 8 13:05:31 UTC 2021


On 07-01-21 11:58:05, Jean-Philippe Méthot wrote:
> Considering that this issue happened in our production environment,
> it’s not exactly possible to try to reproduce without shutting down
> servers that are currently in use. That said, If the current logs I
> have are enough, I will try opening a bug on the bugtracker.

Yup appreciate that, if you still have logs then using the event list to
determine the request-id for the evacuation and then providing any
n-api/n-cpu logs referencing that request-id in the bug would be great.

Lots more detail in the following doc:

https://docs.openstack.org/api-guide/compute/faults.html
 
> Compute22, the source host, was completely dead. It refused to boot up
> through IPMI.

ACK.
 
> It is possible that that stein fix prevented me from reproducing the
> problem in my staging environment (production is on rocky, staging is
> on stein).
>
> Also, it may be important to note that our neutron is split, as we use
> neutron-rpc-server to answer rpc calls. It’s also HA, as we have two
> controllers with neutron-rpc-server and the api running (and that
> won’t work anymore when we upgrade production to stein, but that’s
> another problem entirely and probably off-topic here).

I doubt that played a part, we've fixed many many bugs with Nova's
evacuation logic over the releases so for now I'm going to assume it's
something within Nova.

> > Le 7 janv. 2021 à 09:26, Lee Yarwood <lyarwood at redhat.com> a écrit :
> > 
> > Would you be able to trace an example evacuation request fully and
> > pastebin it somewhere using `openstack server event list $instance [1]`
> > output to determine the request-id etc? Feel free to also open a bug
> > about this and we can just triage there instead of the ML.
> > 
> > The fact that q-api has sent the
> > network-vif-plugged:80371c01-930d-4ea2-9d28-14438e948b65 to n-api
> > suggests that the q-agt is actually alive on compute22, was that the
> > case? Note that a pre-condition of calling the evacuation API is that
> > the source host has been fenced [2].
> > 
> > That all said I wonder if this is somehow related too the following
> > stein change:
> > 
> > https://review.opendev.org/c/openstack/nova/+/603844 <https://review.opendev.org/c/openstack/nova/+/603844>

-- 
Lee Yarwood                 A5D1 9385 88CB 7E5F BE64  6618 BCA6 6E33 F672 2D76
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: not available
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20210108/084cf4d5/attachment.sig>


More information about the openstack-discuss mailing list