Hi Rudolfo,

Well, I sent logs because same LOG error I see in my test environment, and thought that it would be useful for you.

Michal



Dňa št 25. 11. 2021, 11:41 Rodolfo Alonso Hernandez <ralonsoh@redhat.com> napísal(a):
Hello Michal:

I think this thread is related to OVN. In any case, I've analyzed your logs from the Neutron point of view. Those are basically the key points in your logs: [1]. The Neutron server receives the request to create and bind a new port. Both the DHCP agent and the OVS agent provision the port (8a472e87-4a7a-4ad4-9fbb-fd9785136611).

Neutron server sends the "network-vif-plugged" event at 08:51:05.635 and receives the ACK from Nova at 08:51:05.704. Nova server creates the corresponding event for the instance on the compute0:
2021-11-25 08:51:05.692 23 INFO nova.api.openstack.compute.server_external_events [req-393fcb9a-5c68-4b0a-af1b-661e73d947cb 01a75e3a9a9148218916d3beafae2120 7d16babef0524b7dade9a59b0a3569e1 - default default] Creating event network-vif-plugged:8a472e87-4a7a-4ad4-9fbb-fd9785136611 for instance 09eff2ce-356f-430f-ab30-5de58f58d698 on compute0

Nova compute agent receives this server event:
2021-11-25 08:51:05.705 8 DEBUG nova.compute.manager [req-393fcb9a-5c68-4b0a-af1b-661e73d947cb 01a75e3a9a9148218916d3beafae2120 7d16babef0524b7dade9a59b0a3569e1 - default default] [instance: 09eff2ce-356f-430f-ab30-5de58f58d698] Processing event network-vif-plugged-8a472e87-4a7a-4ad4-9fbb-fd9785136611 _process_instance_event /var/lib/kolla/venv/lib/python3.9/site-packages/nova/compute/manager.py:10205

Further triagging and log analysis should be done by Nova folks to understand why Nova compute didn't boot this VM. I fail to understand some parts.

Regards.

[1]https://paste.opendev.org/show/811273/

On Thu, Nov 25, 2021 at 10:15 AM Michal Arbet <michal.arbet@ultimum.io> wrote:
Hello,

In attachment you can find logs for compute0 and controller0 (other computes and controllers were turned off for this test).
No OVN used, this stack is based on OVS.

Thank you,
Michal Arbet
Openstack Engineer

Ultimum Technologies a.s.
Na Poříčí 1047/26, 11000 Praha 1
Czech Republic

+420 604 228 897 
michal.arbet@ultimum.io
https://ultimum.io



st 24. 11. 2021 v 15:59 odesílatel Rodolfo Alonso Hernandez <ralonsoh@redhat.com> napsal:
Hello Tony:

Do you have the Neutron server logs? Do you see the "PortBindingChassisUpdateEvent"? When a port is bound, a Port_Bound (SB) event is issued and captured in the Neutron server. That will trigger the port binding process and the "vif-plugged" event. This OVN SB event should call "set_port_status_up" and that should write "OVN reports status up for port: %s" in the logs.

Of course, this Neutron method will be called only if the logical switch port is UP.

Regards.


On Tue, Nov 23, 2021 at 11:59 PM Tony Liu <tonyliu0592@hotmail.com> wrote:
Hi,

I see such problem from time to time. It's not consistently reproduceable.
======================
2021-11-23 22:16:28.532 7 INFO nova.compute.manager [req-50817b01-e7ae-4991-94fe-e4c5672c481b - - - - -] [instance: 7021bf37-348d-4dfa-b0c5-6a251fceb6cd] During sync_power_state the instance has a pending task (spawning). Skip.         
2021-11-23 22:21:28.342 7 WARNING nova.virt.libvirt.driver [req-814c98a4-3fd9-4607-9abb-5fbe5cef8650 3a4c320d64664d9cb6784b7ea52d618a 21384fe21ebb4339b1a5f2a8f9000ea3 - default default] [instance: 7021bf37-348d-4dfa-b0c5-6a251fceb6cd] Timeout waiting for [('network-vif-plugged', '735d97a3-db7c-42ed-b3be-1596d4cc7f4b')] for instance with vm_state building and task_state spawning.: eventlet.timeout.Timeout: 300 seconds
======================
The VIF/port is activated by OVN ovn-controller to ovn-sb. It seems that, either Neutron didn't
capture the update or didn't send message back to nova-compute. Is there any known fix for
this problem?


Thanks!
Tony