[Neutron][OVN] nova-compute timeout while waiting for VIF activation confirmed by Neutron
Hi, I see such problem from time to time. It's not consistently reproduceable. ====================== 2021-11-23 22:16:28.532 7 INFO nova.compute.manager [req-50817b01-e7ae-4991-94fe-e4c5672c481b - - - - -] [instance: 7021bf37-348d-4dfa-b0c5-6a251fceb6cd] During sync_power_state the instance has a pending task (spawning). Skip. 2021-11-23 22:21:28.342 7 WARNING nova.virt.libvirt.driver [req-814c98a4-3fd9-4607-9abb-5fbe5cef8650 3a4c320d64664d9cb6784b7ea52d618a 21384fe21ebb4339b1a5f2a8f9000ea3 - default default] [instance: 7021bf37-348d-4dfa-b0c5-6a251fceb6cd] Timeout waiting for [('network-vif-plugged', '735d97a3-db7c-42ed-b3be-1596d4cc7f4b')] for instance with vm_state building and task_state spawning.: eventlet.timeout.Timeout: 300 seconds ====================== The VIF/port is activated by OVN ovn-controller to ovn-sb. It seems that, either Neutron didn't capture the update or didn't send message back to nova-compute. Is there any known fix for this problem? Thanks! Tony
Hello Tony: Do you have the Neutron server logs? Do you see the "PortBindingChassisUpdateEvent"? When a port is bound, a Port_Bound (SB) event is issued and captured in the Neutron server. That will trigger the port binding process and the "vif-plugged" event. This OVN SB event should call "set_port_status_up" and that should write "OVN reports status up for port: %s" in the logs. Of course, this Neutron method will be called only if the logical switch port is UP. Regards. On Tue, Nov 23, 2021 at 11:59 PM Tony Liu <tonyliu0592@hotmail.com> wrote:
Hi,
I see such problem from time to time. It's not consistently reproduceable. ====================== 2021-11-23 22:16:28.532 7 INFO nova.compute.manager [req-50817b01-e7ae-4991-94fe-e4c5672c481b - - - - -] [instance: 7021bf37-348d-4dfa-b0c5-6a251fceb6cd] During sync_power_state the instance has a pending task (spawning). Skip. 2021-11-23 22:21:28.342 7 WARNING nova.virt.libvirt.driver [req-814c98a4-3fd9-4607-9abb-5fbe5cef8650 3a4c320d64664d9cb6784b7ea52d618a 21384fe21ebb4339b1a5f2a8f9000ea3 - default default] [instance: 7021bf37-348d-4dfa-b0c5-6a251fceb6cd] Timeout waiting for [('network-vif-plugged', '735d97a3-db7c-42ed-b3be-1596d4cc7f4b')] for instance with vm_state building and task_state spawning.: eventlet.timeout.Timeout: 300 seconds ====================== The VIF/port is activated by OVN ovn-controller to ovn-sb. It seems that, either Neutron didn't capture the update or didn't send message back to nova-compute. Is there any known fix for this problem?
Thanks! Tony
Hello, In attachment you can find logs for compute0 and controller0 (other computes and controllers were turned off for this test). No OVN used, this stack is based on OVS. Thank you, Michal Arbet Openstack Engineer Ultimum Technologies a.s. Na Poříčí 1047/26, 11000 Praha 1 Czech Republic +420 604 228 897 michal.arbet@ultimum.io *https://ultimum.io <https://ultimum.io/>* LinkedIn <https://www.linkedin.com/company/ultimum-technologies> | Twitter <https://twitter.com/ultimumtech> | Facebook <https://www.facebook.com/ultimumtechnologies/timeline> st 24. 11. 2021 v 15:59 odesílatel Rodolfo Alonso Hernandez < ralonsoh@redhat.com> napsal:
Hello Tony:
Do you have the Neutron server logs? Do you see the "PortBindingChassisUpdateEvent"? When a port is bound, a Port_Bound (SB) event is issued and captured in the Neutron server. That will trigger the port binding process and the "vif-plugged" event. This OVN SB event should call "set_port_status_up" and that should write "OVN reports status up for port: %s" in the logs.
Of course, this Neutron method will be called only if the logical switch port is UP.
Regards.
On Tue, Nov 23, 2021 at 11:59 PM Tony Liu <tonyliu0592@hotmail.com> wrote:
Hi,
I see such problem from time to time. It's not consistently reproduceable. ====================== 2021-11-23 22:16:28.532 7 INFO nova.compute.manager [req-50817b01-e7ae-4991-94fe-e4c5672c481b - - - - -] [instance: 7021bf37-348d-4dfa-b0c5-6a251fceb6cd] During sync_power_state the instance has a pending task (spawning). Skip. 2021-11-23 22:21:28.342 7 WARNING nova.virt.libvirt.driver [req-814c98a4-3fd9-4607-9abb-5fbe5cef8650 3a4c320d64664d9cb6784b7ea52d618a 21384fe21ebb4339b1a5f2a8f9000ea3 - default default] [instance: 7021bf37-348d-4dfa-b0c5-6a251fceb6cd] Timeout waiting for [('network-vif-plugged', '735d97a3-db7c-42ed-b3be-1596d4cc7f4b')] for instance with vm_state building and task_state spawning.: eventlet.timeout.Timeout: 300 seconds ====================== The VIF/port is activated by OVN ovn-controller to ovn-sb. It seems that, either Neutron didn't capture the update or didn't send message back to nova-compute. Is there any known fix for this problem?
Thanks! Tony
Hello Michal: I think this thread is related to OVN. In any case, I've analyzed your logs from the Neutron point of view. Those are basically the key points in your logs: [1]. The Neutron server receives the request to create and bind a new port. Both the DHCP agent and the OVS agent provision the port (8a472e87-4a7a-4ad4-9fbb-fd9785136611). Neutron server sends the "network-vif-plugged" event at 08:51:05.635 and receives the ACK from Nova at 08:51:05.704. Nova server creates the corresponding event for the instance on the compute0: 2021-11-25 08:51:05.692 23 INFO nova.api.openstack.compute.server_external_events [req-393fcb9a-5c68-4b0a-af1b-661e73d947cb 01a75e3a9a9148218916d3beafae2120 7d16babef0524b7dade9a59b0a3569e1 - default default] Creating event network-vif-plugged:8a472e87-4a7a-4ad4-9fbb-fd9785136611 for instance 09eff2ce-356f-430f-ab30-5de58f58d698 on compute0 Nova compute agent receives this server event: 2021-11-25 08:51:05.705 8 DEBUG nova.compute.manager [req-393fcb9a-5c68-4b0a-af1b-661e73d947cb 01a75e3a9a9148218916d3beafae2120 7d16babef0524b7dade9a59b0a3569e1 - default default] [instance: 09eff2ce-356f-430f-ab30-5de58f58d698] Processing event network-vif-plugged-8a472e87-4a7a-4ad4-9fbb-fd9785136611 _process_instance_event /var/lib/kolla/venv/lib/python3.9/site-packages/nova/compute/manager.py:10205 Further triagging and log analysis should be done by Nova folks to understand why Nova compute didn't boot this VM. I fail to understand some parts. Regards. [1]https://paste.opendev.org/show/811273/ On Thu, Nov 25, 2021 at 10:15 AM Michal Arbet <michal.arbet@ultimum.io> wrote:
Hello,
In attachment you can find logs for compute0 and controller0 (other computes and controllers were turned off for this test). No OVN used, this stack is based on OVS.
Thank you, Michal Arbet Openstack Engineer
Ultimum Technologies a.s. Na Poříčí 1047/26, 11000 Praha 1 Czech Republic
+420 604 228 897 michal.arbet@ultimum.io *https://ultimum.io <https://ultimum.io/>*
LinkedIn <https://www.linkedin.com/company/ultimum-technologies> | Twitter <https://twitter.com/ultimumtech> | Facebook <https://www.facebook.com/ultimumtechnologies/timeline>
st 24. 11. 2021 v 15:59 odesílatel Rodolfo Alonso Hernandez < ralonsoh@redhat.com> napsal:
Hello Tony:
Do you have the Neutron server logs? Do you see the "PortBindingChassisUpdateEvent"? When a port is bound, a Port_Bound (SB) event is issued and captured in the Neutron server. That will trigger the port binding process and the "vif-plugged" event. This OVN SB event should call "set_port_status_up" and that should write "OVN reports status up for port: %s" in the logs.
Of course, this Neutron method will be called only if the logical switch port is UP.
Regards.
On Tue, Nov 23, 2021 at 11:59 PM Tony Liu <tonyliu0592@hotmail.com> wrote:
Hi,
I see such problem from time to time. It's not consistently reproduceable. ====================== 2021-11-23 22:16:28.532 7 INFO nova.compute.manager [req-50817b01-e7ae-4991-94fe-e4c5672c481b - - - - -] [instance: 7021bf37-348d-4dfa-b0c5-6a251fceb6cd] During sync_power_state the instance has a pending task (spawning). Skip. 2021-11-23 22:21:28.342 7 WARNING nova.virt.libvirt.driver [req-814c98a4-3fd9-4607-9abb-5fbe5cef8650 3a4c320d64664d9cb6784b7ea52d618a 21384fe21ebb4339b1a5f2a8f9000ea3 - default default] [instance: 7021bf37-348d-4dfa-b0c5-6a251fceb6cd] Timeout waiting for [('network-vif-plugged', '735d97a3-db7c-42ed-b3be-1596d4cc7f4b')] for instance with vm_state building and task_state spawning.: eventlet.timeout.Timeout: 300 seconds ====================== The VIF/port is activated by OVN ovn-controller to ovn-sb. It seems that, either Neutron didn't capture the update or didn't send message back to nova-compute. Is there any known fix for this problem?
Thanks! Tony
Hi Rudolfo, Well, I sent logs because same LOG error I see in my test environment, and thought that it would be useful for you. Michal Dňa št 25. 11. 2021, 11:41 Rodolfo Alonso Hernandez <ralonsoh@redhat.com> napísal(a):
Hello Michal:
I think this thread is related to OVN. In any case, I've analyzed your logs from the Neutron point of view. Those are basically the key points in your logs: [1]. The Neutron server receives the request to create and bind a new port. Both the DHCP agent and the OVS agent provision the port (8a472e87-4a7a-4ad4-9fbb-fd9785136611).
Neutron server sends the "network-vif-plugged" event at 08:51:05.635 and receives the ACK from Nova at 08:51:05.704. Nova server creates the corresponding event for the instance on the compute0: 2021-11-25 08:51:05.692 23 INFO nova.api.openstack.compute.server_external_events [req-393fcb9a-5c68-4b0a-af1b-661e73d947cb 01a75e3a9a9148218916d3beafae2120 7d16babef0524b7dade9a59b0a3569e1 - default default] Creating event network-vif-plugged:8a472e87-4a7a-4ad4-9fbb-fd9785136611 for instance 09eff2ce-356f-430f-ab30-5de58f58d698 on compute0
Nova compute agent receives this server event: 2021-11-25 08:51:05.705 8 DEBUG nova.compute.manager [req-393fcb9a-5c68-4b0a-af1b-661e73d947cb 01a75e3a9a9148218916d3beafae2120 7d16babef0524b7dade9a59b0a3569e1 - default default] [instance: 09eff2ce-356f-430f-ab30-5de58f58d698] Processing event network-vif-plugged-8a472e87-4a7a-4ad4-9fbb-fd9785136611 _process_instance_event /var/lib/kolla/venv/lib/python3.9/site-packages/nova/compute/manager.py:10205
Further triagging and log analysis should be done by Nova folks to understand why Nova compute didn't boot this VM. I fail to understand some parts.
Regards.
[1]https://paste.opendev.org/show/811273/
On Thu, Nov 25, 2021 at 10:15 AM Michal Arbet <michal.arbet@ultimum.io> wrote:
Hello,
In attachment you can find logs for compute0 and controller0 (other computes and controllers were turned off for this test). No OVN used, this stack is based on OVS.
Thank you, Michal Arbet Openstack Engineer
Ultimum Technologies a.s. Na Poříčí 1047/26, 11000 Praha 1 Czech Republic
+420 604 228 897 michal.arbet@ultimum.io *https://ultimum.io <https://ultimum.io/>*
LinkedIn <https://www.linkedin.com/company/ultimum-technologies> | Twitter <https://twitter.com/ultimumtech> | Facebook <https://www.facebook.com/ultimumtechnologies/timeline>
st 24. 11. 2021 v 15:59 odesílatel Rodolfo Alonso Hernandez < ralonsoh@redhat.com> napsal:
Hello Tony:
Do you have the Neutron server logs? Do you see the "PortBindingChassisUpdateEvent"? When a port is bound, a Port_Bound (SB) event is issued and captured in the Neutron server. That will trigger the port binding process and the "vif-plugged" event. This OVN SB event should call "set_port_status_up" and that should write "OVN reports status up for port: %s" in the logs.
Of course, this Neutron method will be called only if the logical switch port is UP.
Regards.
On Tue, Nov 23, 2021 at 11:59 PM Tony Liu <tonyliu0592@hotmail.com> wrote:
Hi,
I see such problem from time to time. It's not consistently reproduceable. ====================== 2021-11-23 22:16:28.532 7 INFO nova.compute.manager [req-50817b01-e7ae-4991-94fe-e4c5672c481b - - - - -] [instance: 7021bf37-348d-4dfa-b0c5-6a251fceb6cd] During sync_power_state the instance has a pending task (spawning). Skip. 2021-11-23 22:21:28.342 7 WARNING nova.virt.libvirt.driver [req-814c98a4-3fd9-4607-9abb-5fbe5cef8650 3a4c320d64664d9cb6784b7ea52d618a 21384fe21ebb4339b1a5f2a8f9000ea3 - default default] [instance: 7021bf37-348d-4dfa-b0c5-6a251fceb6cd] Timeout waiting for [('network-vif-plugged', '735d97a3-db7c-42ed-b3be-1596d4cc7f4b')] for instance with vm_state building and task_state spawning.: eventlet.timeout.Timeout: 300 seconds ====================== The VIF/port is activated by OVN ovn-controller to ovn-sb. It seems that, either Neutron didn't capture the update or didn't send message back to nova-compute. Is there any known fix for this problem?
Thanks! Tony
Hi, On czwartek, 25 listopada 2021 10:41:14 CET Rodolfo Alonso Hernandez wrote:
Hello Michal:
I think this thread is related to OVN. In any case, I've analyzed your logs from the Neutron point of view. Those are basically the key points in your logs: [1]. The Neutron server receives the request to create and bind a new port. Both the DHCP agent and the OVS agent provision the port (8a472e87-4a7a-4ad4-9fbb-fd9785136611).
Neutron server sends the "network-vif-plugged" event at 08:51:05.635 and receives the ACK from Nova at 08:51:05.704. Nova server creates the corresponding event for the instance on the compute0: 2021-11-25 08:51:05.692 23 INFO nova.api.openstack.compute.server_external_events [req-393fcb9a-5c68-4b0a-af1b-661e73d947cb 01a75e3a9a9148218916d3beafae2120 7d16babef0524b7dade9a59b0a3569e1 - default default] Creating event network-vif-plugged:8a472e87-4a7a-4ad4-9fbb-fd9785136611 for instance 09eff2ce-356f-430f-ab30-5de58f58d698 on compute0
Nova compute agent receives this server event: 2021-11-25 08:51:05.705 8 DEBUG nova.compute.manager [req-393fcb9a-5c68-4b0a-af1b-661e73d947cb 01a75e3a9a9148218916d3beafae2120 7d16babef0524b7dade9a59b0a3569e1 - default default] [instance: 09eff2ce-356f-430f-ab30-5de58f58d698] Processing event network-vif-plugged-8a472e87-4a7a-4ad4-9fbb-fd9785136611 _process_instance_event /var/lib/kolla/venv/lib/python3.9/site-packages/nova/compute/manager.py: 10205
Further triagging and log analysis should be done by Nova folks to understand why Nova compute didn't boot this VM. I fail to understand some parts.
Thx Rodolfo. I also took a look at those logs (for server 09eff2ce-356f-430f- ab30-5de58f58d698 which had port 0761ff2f-ff6a-40a5-85b7-c9b40386cbe0 but I can confirm what You found actually. Port was pretty quickly provisioned and it was reported to Nova. I don't know why Nova didn't unpause that vm. Port creation in Nova: 2021-11-25 08:50:24.250 8 INFO os_vif [req-6eab0f2a- d7f4-4251-96f4-177fc0c62212 647a328aef9b4a8b890f102d8f018576 49231caa42f34da8a9d10229b5c7a5d8 - default default] Successfully plugged vif VIFOpenVSwitch(active=False,address=fa:16:3e:3d:cd:b9,bridge_name='br- int',has_traffic_filtering=True,id=0761ff2f-ff6a-40a5-85b7- c9b40386cbe0,network=Network(cb4e6be4-9bc7-4c34-95dd- dfc3c8e9d93c),plugin='ovs',port_profile=VIFPortProfileOpenVSwitch,preserve_on_delete=True,vif_name='tap0761ff2f- ff') Neutron finished provisioning of that port and sent notification to nova about 34 seconds later: 2021-11-25 08:51:00.403 30 DEBUG neutron.plugins.ml2.rpc [req- df864eb8-0c2e-48ca-a3e7-d90dc70e2aa0 - - - - -] Device 0761ff2f- ff6a-40a5-85b7-c9b40386cbe0 up at agent ovs-agent-compute0 update_device_up / var/lib/kolla/venv/lib/python3.9/site-packages/neutron/plugins/ml2/rpc.py:278 2021-11-25 08:51:00.503 29 DEBUG neutron.db.provisioning_blocks [req- a5711e9f-9b15-4a1a-a58d-07952fb8ac34 - - - - -] Provisioning complete for port 67b32ddf-dddd-439e-92be-6a89df49b4af triggered by entity L2. provisioning_complete /var/lib/kolla/venv/lib/python3.9/site-packages/neutron/ db/provisioning_blocks.py:139 2021-11-25 08:51:00.504 29 DEBUG neutron_lib.callbacks.manager [req- a5711e9f-9b15-4a1a-a58d-07952fb8ac34 - - - - -] Notify callbacks ['neutron.plugins.ml2.plugin.Ml2Plugin._port_provisioned-8372308'] for port, provisioning_complete _notify_loop /var/lib/kolla/venv/lib/python3.9/site- packages/neutron_lib/callbacks/manager.py:192 2021-11-25 08:51:00.505 30 DEBUG neutron.db.provisioning_blocks [req- df864eb8-0c2e-48ca-a3e7-d90dc70e2aa0 - - - - -] Provisioning for port 0761ff2f-ff6a-40a5-85b7-c9b40386cbe0 completed by entity L2. provisioning_complete /var/lib/kolla/venv/lib/python3.9/site-packages/neutron/ db/provisioning_blocks.py:133 2021-11-25 08:51:00.515 30 DEBUG neutron.db.provisioning_blocks [req- df864eb8-0c2e-48ca-a3e7-d90dc70e2aa0 - - - - -] Provisioning complete for port 0761ff2f-ff6a-40a5-85b7-c9b40386cbe0 triggered by entity L2. provisioning_complete /var/lib/kolla/venv/lib/python3.9/site-packages/neutron/ db/provisioning_blocks.py:139 2021-11-25 08:51:00.516 30 DEBUG neutron_lib.callbacks.manager [req- df864eb8-0c2e-48ca-a3e7-d90dc70e2aa0 - - - - -] Notify callbacks ['neutron.plugins.ml2.plugin.Ml2Plugin._port_provisioned-8372308'] for port, provisioning_complete _notify_loop /var/lib/kolla/venv/lib/python3.9/site- packages/neutron_lib/callbacks/manager.py:192 2021-11-25 08:51:00.701 30 DEBUG neutron_lib.callbacks.manager [req- df864eb8-0c2e-48ca-a3e7-d90dc70e2aa0 - - - - -] Notify callbacks ['neutron.plugins.ml2.plugin.SecurityGroupDbMixin._ensure_default_security_group_handler-7972195'] for port, before_update _notify_loop /var/lib/kolla/venv/lib/python3.9/site- packages/neutron_lib/callbacks/manager.py:192 2021-11-25 08:51:00.746 30 DEBUG neutron.notifiers.nova [-] Sending events: [{'server_uuid': '09eff2ce-356f-430f-ab30-5de58f58d698', 'name': 'network-vif- plugged', 'status': 'completed', 'tag': '0761ff2f-ff6a-40a5-85b7- c9b40386cbe0'}] send_events /var/lib/kolla/venv/lib/python3.9/site-packages/ neutron/notifiers/nova.py:262 2021-11-25 08:51:00.762 30 DEBUG neutron.plugins.ml2.db [req- df864eb8-0c2e-48ca-a3e7-d90dc70e2aa0 - - - - -] For port 0761ff2f- ff6a-40a5-85b7-c9b40386cbe0, host compute0, got binding levels [PortBindingLevel(driver='openvswitch',host='compute0',level=0,port_id=0761ff2f- ff6a-40a5-85b7-c9b40386cbe0,segment=NetworkSegment(ea3a27b6- c3fa-40e3-9937-71b652cba855),segment_id=ea3a27b6-c3fa-40e3-9937-71b652cba855)] get_binding_level_objs /var/lib/kolla/venv/lib/python3.9/site-packages/ neutron/plugins/ml2/db.py:75 2021-11-25 08:51:00.818 30 INFO neutron.notifiers.nova [-] Nova event response: {'server_uuid': '09eff2ce-356f-430f-ab30-5de58f58d698', 'name': 'network-vif-plugged', 'status': 'completed', 'tag': '0761ff2f-ff6a-40a5-85b7- c9b40386cbe0', 'code': 200}
Regards.
[1]https://paste.opendev.org/show/811273/
On Thu, Nov 25, 2021 at 10:15 AM Michal Arbet <michal.arbet@ultimum.io>
wrote:
Hello,
In attachment you can find logs for compute0 and controller0 (other computes and controllers were turned off for this test). No OVN used, this stack is based on OVS.
Thank you, Michal Arbet Openstack Engineer
Ultimum Technologies a.s. Na Poříčí 1047/26, 11000 Praha 1 Czech Republic
+420 604 228 897 michal.arbet@ultimum.io *https://ultimum.io <https://ultimum.io/>*
LinkedIn <https://www.linkedin.com/company/ultimum-technologies> | Twitter <https://twitter.com/ultimumtech> | Facebook <https://www.facebook.com/ultimumtechnologies/timeline>
st 24. 11. 2021 v 15:59 odesílatel Rodolfo Alonso Hernandez <
ralonsoh@redhat.com> napsal:
Hello Tony:
Do you have the Neutron server logs? Do you see the "PortBindingChassisUpdateEvent"? When a port is bound, a Port_Bound (SB) event is issued and captured in the Neutron server. That will trigger the port binding process and the "vif-plugged" event. This OVN SB event
should
call "set_port_status_up" and that should write "OVN reports status up for port: %s" in the logs.
Of course, this Neutron method will be called only if the logical switch port is UP.
Regards.
On Tue, Nov 23, 2021 at 11:59 PM Tony Liu <tonyliu0592@hotmail.com>
wrote:
Hi,
I see such problem from time to time. It's not consistently reproduceable. ====================== 2021-11-23 22:16:28.532 7 INFO nova.compute.manager [req-50817b01-e7ae-4991-94fe-e4c5672c481b - - - - -] [instance: 7021bf37-348d-4dfa-b0c5-6a251fceb6cd] During sync_power_state the instance has a pending task (spawning). Skip. 2021-11-23 22:21:28.342 7 WARNING nova.virt.libvirt.driver [req-814c98a4-3fd9-4607-9abb-5fbe5cef8650 3a4c320d64664d9cb6784b7ea52d618a 21384fe21ebb4339b1a5f2a8f9000ea3 - default default] [instance: 7021bf37-348d-4dfa-b0c5-6a251fceb6cd] Timeout waiting for [('network-vif-plugged', '735d97a3-db7c-42ed-b3be-1596d4cc7f4b')] for instance with vm_state building and task_state spawning.: eventlet.timeout.Timeout: 300 seconds ====================== The VIF/port is activated by OVN ovn-controller to ovn-sb. It seems that, either Neutron didn't capture the update or didn't send message back to nova-compute. Is there any known fix for this problem?
Thanks! Tony
-- Slawek Kaplonski Principal Software Engineer Red Hat
On Thu, Nov 25 2021 at 11:57:45 AM +0100, Slawek Kaplonski <skaplons@redhat.com> wrote:
Hi,
On czwartek, 25 listopada 2021 10:41:14 CET Rodolfo Alonso Hernandez wrote:
Hello Michal:
I think this thread is related to OVN. In any case, I've analyzed your logs from the Neutron point of view. Those are basically the key points in your logs: [1]. The Neutron server receives the request to create and bind a new port. Both the DHCP agent and the OVS agent provision the port (8a472e87-4a7a-4ad4-9fbb-fd9785136611).
Neutron server sends the "network-vif-plugged" event at 08:51:05.635 and receives the ACK from Nova at 08:51:05.704. Nova server creates the corresponding event for the instance on the compute0: 2021-11-25 08:51:05.692 23 INFO nova.api.openstack.compute.server_external_events [req-393fcb9a-5c68-4b0a-af1b-661e73d947cb 01a75e3a9a9148218916d3beafae2120 7d16babef0524b7dade9a59b0a3569e1 - default default] Creating event network-vif-plugged:8a472e87-4a7a-4ad4-9fbb-fd9785136611 for instance 09eff2ce-356f-430f-ab30-5de58f58d698 on compute0
Nova compute agent receives this server event: 2021-11-25 08:51:05.705 8 DEBUG nova.compute.manager [req-393fcb9a-5c68-4b0a-af1b-661e73d947cb 01a75e3a9a9148218916d3beafae2120 7d16babef0524b7dade9a59b0a3569e1 - default default] [instance: 09eff2ce-356f-430f-ab30-5de58f58d698] Processing event network-vif-plugged-8a472e87-4a7a-4ad4-9fbb-fd9785136611 _process_instance_event
/var/lib/kolla/venv/lib/python3.9/site-packages/nova/compute/manager.py: 10205
Further triagging and log analysis should be done by Nova folks to understand why Nova compute didn't boot this VM. I fail to understand some parts.
Thx Rodolfo. I also took a look at those logs (for server 09eff2ce-356f-430f- ab30-5de58f58d698 which had port 0761ff2f-ff6a-40a5-85b7-c9b40386cbe0 but I can confirm what You found actually. Port was pretty quickly provisioned and it was reported to Nova. I don't know why Nova didn't unpause that vm.
I took a look at the logs (thanks for providing it!). Nova did not unpause the VM a there was more than one port to be plugged an from one, bf0f9dd2-4fd3-4028-a3dc-e2581ff081d0, there was no vif-plugged event received. See my grepping of the log to show this https://paste.opendev.org/show/811348/ Cheers, gibi
Port creation in Nova: 2021-11-25 08:50:24.250 8 INFO os_vif [req-6eab0f2a- d7f4-4251-96f4-177fc0c62212 647a328aef9b4a8b890f102d8f018576 49231caa42f34da8a9d10229b5c7a5d8 - default default] Successfully plugged vif VIFOpenVSwitch(active=False,address=fa:16:3e:3d:cd:b9,bridge_name='br- int',has_traffic_filtering=True,id=0761ff2f-ff6a-40a5-85b7- c9b40386cbe0,network=Network(cb4e6be4-9bc7-4c34-95dd- dfc3c8e9d93c),plugin='ovs',port_profile=VIFPortProfileOpenVSwitch,preserve_on_delete=True,vif_name='tap0761ff2f- ff')
Neutron finished provisioning of that port and sent notification to nova about 34 seconds later: 2021-11-25 08:51:00.403 30 DEBUG neutron.plugins.ml2.rpc [req- df864eb8-0c2e-48ca-a3e7-d90dc70e2aa0 - - - - -] Device 0761ff2f- ff6a-40a5-85b7-c9b40386cbe0 up at agent ovs-agent-compute0 update_device_up / var/lib/kolla/venv/lib/python3.9/site-packages/neutron/plugins/ml2/rpc.py:278 2021-11-25 08:51:00.503 29 DEBUG neutron.db.provisioning_blocks [req- a5711e9f-9b15-4a1a-a58d-07952fb8ac34 - - - - -] Provisioning complete for port 67b32ddf-dddd-439e-92be-6a89df49b4af triggered by entity L2. provisioning_complete /var/lib/kolla/venv/lib/python3.9/site-packages/neutron/ db/provisioning_blocks.py:139 2021-11-25 08:51:00.504 29 DEBUG neutron_lib.callbacks.manager [req- a5711e9f-9b15-4a1a-a58d-07952fb8ac34 - - - - -] Notify callbacks ['neutron.plugins.ml2.plugin.Ml2Plugin._port_provisioned-8372308'] for port, provisioning_complete _notify_loop /var/lib/kolla/venv/lib/python3.9/site- packages/neutron_lib/callbacks/manager.py:192 2021-11-25 08:51:00.505 30 DEBUG neutron.db.provisioning_blocks [req- df864eb8-0c2e-48ca-a3e7-d90dc70e2aa0 - - - - -] Provisioning for port 0761ff2f-ff6a-40a5-85b7-c9b40386cbe0 completed by entity L2. provisioning_complete /var/lib/kolla/venv/lib/python3.9/site-packages/neutron/ db/provisioning_blocks.py:133 2021-11-25 08:51:00.515 30 DEBUG neutron.db.provisioning_blocks [req- df864eb8-0c2e-48ca-a3e7-d90dc70e2aa0 - - - - -] Provisioning complete for port 0761ff2f-ff6a-40a5-85b7-c9b40386cbe0 triggered by entity L2. provisioning_complete /var/lib/kolla/venv/lib/python3.9/site-packages/neutron/ db/provisioning_blocks.py:139 2021-11-25 08:51:00.516 30 DEBUG neutron_lib.callbacks.manager [req- df864eb8-0c2e-48ca-a3e7-d90dc70e2aa0 - - - - -] Notify callbacks ['neutron.plugins.ml2.plugin.Ml2Plugin._port_provisioned-8372308'] for port, provisioning_complete _notify_loop /var/lib/kolla/venv/lib/python3.9/site- packages/neutron_lib/callbacks/manager.py:192 2021-11-25 08:51:00.701 30 DEBUG neutron_lib.callbacks.manager [req- df864eb8-0c2e-48ca-a3e7-d90dc70e2aa0 - - - - -] Notify callbacks ['neutron.plugins.ml2.plugin.SecurityGroupDbMixin._ensure_default_security_group_handler-7972195'] for port, before_update _notify_loop /var/lib/kolla/venv/lib/python3.9/site- packages/neutron_lib/callbacks/manager.py:192 2021-11-25 08:51:00.746 30 DEBUG neutron.notifiers.nova [-] Sending events: [{'server_uuid': '09eff2ce-356f-430f-ab30-5de58f58d698', 'name': 'network-vif- plugged', 'status': 'completed', 'tag': '0761ff2f-ff6a-40a5-85b7- c9b40386cbe0'}] send_events /var/lib/kolla/venv/lib/python3.9/site-packages/ neutron/notifiers/nova.py:262 2021-11-25 08:51:00.762 30 DEBUG neutron.plugins.ml2.db [req- df864eb8-0c2e-48ca-a3e7-d90dc70e2aa0 - - - - -] For port 0761ff2f- ff6a-40a5-85b7-c9b40386cbe0, host compute0, got binding levels [PortBindingLevel(driver='openvswitch',host='compute0',level=0,port_id=0761ff2f- ff6a-40a5-85b7-c9b40386cbe0,segment=NetworkSegment(ea3a27b6- c3fa-40e3-9937-71b652cba855),segment_id=ea3a27b6-c3fa-40e3-9937-71b652cba855)] get_binding_level_objs /var/lib/kolla/venv/lib/python3.9/site-packages/ neutron/plugins/ml2/db.py:75 2021-11-25 08:51:00.818 30 INFO neutron.notifiers.nova [-] Nova event response: {'server_uuid': '09eff2ce-356f-430f-ab30-5de58f58d698', 'name': 'network-vif-plugged', 'status': 'completed', 'tag': '0761ff2f-ff6a-40a5-85b7- c9b40386cbe0', 'code': 200}
Regards.
[1]https://paste.opendev.org/show/811273/
On Thu, Nov 25, 2021 at 10:15 AM Michal Arbet <michal.arbet@ultimum.io>
Hello,
In attachment you can find logs for compute0 and controller0 (other computes and controllers were turned off for this test). No OVN used, this stack is based on OVS.
Thank you, Michal Arbet Openstack Engineer
Ultimum Technologies a.s. Na Poříčí 1047/26, 11000 Praha 1 Czech Republic
+420 604 228 897 michal.arbet@ultimum.io *https://ultimum.io <https://ultimum.io/>*
LinkedIn <https://www.linkedin.com/company/ultimum-technologies> | Twitter <https://twitter.com/ultimumtech> | Facebook <https://www.facebook.com/ultimumtechnologies/timeline>
st 24. 11. 2021 v 15:59 odesílatel Rodolfo Alonso Hernandez <
ralonsoh@redhat.com> napsal:
Hello Tony:
Do you have the Neutron server logs? Do you see the "PortBindingChassisUpdateEvent"? When a port is bound, a Port_Bound (SB) event is issued and captured in the Neutron server. That will
wrote: trigger the
port binding process and the "vif-plugged" event. This OVN SB event
should
call "set_port_status_up" and that should write "OVN reports status up for port: %s" in the logs.
Of course, this Neutron method will be called only if the logical switch port is UP.
Regards.
On Tue, Nov 23, 2021 at 11:59 PM Tony Liu <tonyliu0592@hotmail.com>
Hi,
I see such problem from time to time. It's not consistently reproduceable. ====================== 2021-11-23 22:16:28.532 7 INFO nova.compute.manager [req-50817b01-e7ae-4991-94fe-e4c5672c481b - - - - -] [instance: 7021bf37-348d-4dfa-b0c5-6a251fceb6cd] During sync_power_state
wrote: the instance
has a pending task (spawning). Skip. 2021-11-23 22:21:28.342 7 WARNING nova.virt.libvirt.driver [req-814c98a4-3fd9-4607-9abb-5fbe5cef8650 3a4c320d64664d9cb6784b7ea52d618a 21384fe21ebb4339b1a5f2a8f9000ea3 - default default] [instance: 7021bf37-348d-4dfa-b0c5-6a251fceb6cd] Timeout waiting for [('network-vif-plugged', '735d97a3-db7c-42ed-b3be-1596d4cc7f4b')] for instance with vm_state building and task_state spawning.: eventlet.timeout.Timeout: 300 seconds ====================== The VIF/port is activated by OVN ovn-controller to ovn-sb. It seems that, either Neutron didn't capture the update or didn't send message back to nova-compute. Is there any known fix for this problem?
Thanks! Tony
-- Slawek Kaplonski Principal Software Engineer Red Hat
On Tue, Nov 30 2021 at 11:16:00 AM +0100, Balazs Gibizer <balazs.gibizer@est.tech> wrote:
On Thu, Nov 25 2021 at 11:57:45 AM +0100, Slawek Kaplonski <skaplons@redhat.com> wrote:
Hi,
On czwartek, 25 listopada 2021 10:41:14 CET Rodolfo Alonso Hernandez wrote:
Hello Michal:
I think this thread is related to OVN. In any case, I've analyzed your logs from the Neutron point of view. Those are basically the key points in your logs: [1]. The Neutron server receives the request to create and bind a new port. Both the DHCP agent and the OVS agent provision the port (8a472e87-4a7a-4ad4-9fbb-fd9785136611).
Neutron server sends the "network-vif-plugged" event at 08:51:05.635 and receives the ACK from Nova at 08:51:05.704. Nova server creates the corresponding event for the instance on the compute0: 2021-11-25 08:51:05.692 23 INFO nova.api.openstack.compute.server_external_events [req-393fcb9a-5c68-4b0a-af1b-661e73d947cb 01a75e3a9a9148218916d3beafae2120 7d16babef0524b7dade9a59b0a3569e1 - default default] Creating event network-vif-plugged:8a472e87-4a7a-4ad4-9fbb-fd9785136611 for instance 09eff2ce-356f-430f-ab30-5de58f58d698 on compute0
Nova compute agent receives this server event: 2021-11-25 08:51:05.705 8 DEBUG nova.compute.manager [req-393fcb9a-5c68-4b0a-af1b-661e73d947cb 01a75e3a9a9148218916d3beafae2120 7d16babef0524b7dade9a59b0a3569e1 - default default] [instance: 09eff2ce-356f-430f-ab30-5de58f58d698] Processing event network-vif-plugged-8a472e87-4a7a-4ad4-9fbb-fd9785136611 _process_instance_event
/var/lib/kolla/venv/lib/python3.9/site-packages/nova/compute/manager.py: 10205
Further triagging and log analysis should be done by Nova folks to understand why Nova compute didn't boot this VM. I fail to understand some parts.
Thx Rodolfo. I also took a look at those logs (for server 09eff2ce-356f-430f- ab30-5de58f58d698 which had port 0761ff2f-ff6a-40a5-85b7-c9b40386cbe0 but I can confirm what You found actually. Port was pretty quickly provisioned and it was reported to Nova. I don't know why Nova didn't unpause that vm.
I took a look at the logs (thanks for providing it!). Nova did not unpause the VM a there was more than one port to be plugged an from one, bf0f9dd2-4fd3-4028-a3dc-e2581ff081d0, there was no vif-plugged event received.
See my grepping of the log to show this https://paste.opendev.org/show/811348/
as a side note I pushed a small patch to nova to log the which event was timed out https://review.opendev.org/c/openstack/nova/+/819817
Cheers, gibi
Port creation in Nova: 2021-11-25 08:50:24.250 8 INFO os_vif [req-6eab0f2a- d7f4-4251-96f4-177fc0c62212 647a328aef9b4a8b890f102d8f018576 49231caa42f34da8a9d10229b5c7a5d8 - default default] Successfully plugged vif VIFOpenVSwitch(active=False,address=fa:16:3e:3d:cd:b9,bridge_name='br- int',has_traffic_filtering=True,id=0761ff2f-ff6a-40a5-85b7- c9b40386cbe0,network=Network(cb4e6be4-9bc7-4c34-95dd- dfc3c8e9d93c),plugin='ovs',port_profile=VIFPortProfileOpenVSwitch,preserve_on_delete=True,vif_name='tap0761ff2f- ff')
Neutron finished provisioning of that port and sent notification to nova about 34 seconds later: 2021-11-25 08:51:00.403 30 DEBUG neutron.plugins.ml2.rpc [req- df864eb8-0c2e-48ca-a3e7-d90dc70e2aa0 - - - - -] Device 0761ff2f- ff6a-40a5-85b7-c9b40386cbe0 up at agent ovs-agent-compute0 update_device_up / var/lib/kolla/venv/lib/python3.9/site-packages/neutron/plugins/ml2/rpc.py:278 2021-11-25 08:51:00.503 29 DEBUG neutron.db.provisioning_blocks [req- a5711e9f-9b15-4a1a-a58d-07952fb8ac34 - - - - -] Provisioning complete for port 67b32ddf-dddd-439e-92be-6a89df49b4af triggered by entity L2. provisioning_complete /var/lib/kolla/venv/lib/python3.9/site-packages/neutron/ db/provisioning_blocks.py:139 2021-11-25 08:51:00.504 29 DEBUG neutron_lib.callbacks.manager [req- a5711e9f-9b15-4a1a-a58d-07952fb8ac34 - - - - -] Notify callbacks ['neutron.plugins.ml2.plugin.Ml2Plugin._port_provisioned-8372308'] for port, provisioning_complete _notify_loop /var/lib/kolla/venv/lib/python3.9/site- packages/neutron_lib/callbacks/manager.py:192 2021-11-25 08:51:00.505 30 DEBUG neutron.db.provisioning_blocks [req- df864eb8-0c2e-48ca-a3e7-d90dc70e2aa0 - - - - -] Provisioning for port 0761ff2f-ff6a-40a5-85b7-c9b40386cbe0 completed by entity L2. provisioning_complete /var/lib/kolla/venv/lib/python3.9/site-packages/neutron/ db/provisioning_blocks.py:133 2021-11-25 08:51:00.515 30 DEBUG neutron.db.provisioning_blocks [req- df864eb8-0c2e-48ca-a3e7-d90dc70e2aa0 - - - - -] Provisioning complete for port 0761ff2f-ff6a-40a5-85b7-c9b40386cbe0 triggered by entity L2. provisioning_complete /var/lib/kolla/venv/lib/python3.9/site-packages/neutron/ db/provisioning_blocks.py:139 2021-11-25 08:51:00.516 30 DEBUG neutron_lib.callbacks.manager [req- df864eb8-0c2e-48ca-a3e7-d90dc70e2aa0 - - - - -] Notify callbacks ['neutron.plugins.ml2.plugin.Ml2Plugin._port_provisioned-8372308'] for port, provisioning_complete _notify_loop /var/lib/kolla/venv/lib/python3.9/site- packages/neutron_lib/callbacks/manager.py:192 2021-11-25 08:51:00.701 30 DEBUG neutron_lib.callbacks.manager [req- df864eb8-0c2e-48ca-a3e7-d90dc70e2aa0 - - - - -] Notify callbacks ['neutron.plugins.ml2.plugin.SecurityGroupDbMixin._ensure_default_security_group_handler-7972195'] for port, before_update _notify_loop /var/lib/kolla/venv/lib/python3.9/site- packages/neutron_lib/callbacks/manager.py:192 2021-11-25 08:51:00.746 30 DEBUG neutron.notifiers.nova [-] Sending events: [{'server_uuid': '09eff2ce-356f-430f-ab30-5de58f58d698', 'name': 'network-vif- plugged', 'status': 'completed', 'tag': '0761ff2f-ff6a-40a5-85b7- c9b40386cbe0'}] send_events /var/lib/kolla/venv/lib/python3.9/site-packages/ neutron/notifiers/nova.py:262 2021-11-25 08:51:00.762 30 DEBUG neutron.plugins.ml2.db [req- df864eb8-0c2e-48ca-a3e7-d90dc70e2aa0 - - - - -] For port 0761ff2f- ff6a-40a5-85b7-c9b40386cbe0, host compute0, got binding levels [PortBindingLevel(driver='openvswitch',host='compute0',level=0,port_id=0761ff2f- ff6a-40a5-85b7-c9b40386cbe0,segment=NetworkSegment(ea3a27b6- c3fa-40e3-9937-71b652cba855),segment_id=ea3a27b6-c3fa-40e3-9937-71b652cba855)] get_binding_level_objs /var/lib/kolla/venv/lib/python3.9/site-packages/ neutron/plugins/ml2/db.py:75 2021-11-25 08:51:00.818 30 INFO neutron.notifiers.nova [-] Nova event response: {'server_uuid': '09eff2ce-356f-430f-ab30-5de58f58d698', 'name': 'network-vif-plugged', 'status': 'completed', 'tag': '0761ff2f-ff6a-40a5-85b7- c9b40386cbe0', 'code': 200}
Regards.
[1]https://paste.opendev.org/show/811273/
On Thu, Nov 25, 2021 at 10:15 AM Michal Arbet <michal.arbet@ultimum.io>
wrote:
Hello,
In attachment you can find logs for compute0 and controller0 (other computes and controllers were turned off for this test). No OVN used, this stack is based on OVS.
Thank you, Michal Arbet Openstack Engineer
Ultimum Technologies a.s. Na Poříčí 1047/26, 11000 Praha 1 Czech Republic
+420 604 228 897 michal.arbet@ultimum.io *https://ultimum.io <https://ultimum.io/>*
LinkedIn <https://www.linkedin.com/company/ultimum-technologies> | Twitter <https://twitter.com/ultimumtech> | Facebook <https://www.facebook.com/ultimumtechnologies/timeline>
st 24. 11. 2021 v 15:59 odesílatel Rodolfo Alonso Hernandez <
ralonsoh@redhat.com> napsal:
Hello Tony:
Do you have the Neutron server logs? Do you see the "PortBindingChassisUpdateEvent"? When a port is bound, a Port_Bound (SB) event is issued and captured in the Neutron server. That will trigger the port binding process and the "vif-plugged" event. This OVN SB event
should
call "set_port_status_up" and that should write "OVN reports status up for port: %s" in the logs.
Of course, this Neutron method will be called only if the logical switch port is UP.
Regards.
On Tue, Nov 23, 2021 at 11:59 PM Tony Liu <tonyliu0592@hotmail.com>
wrote:
Hi,
I see such problem from time to time. It's not consistently reproduceable. ====================== 2021-11-23 22:16:28.532 7 INFO nova.compute.manager [req-50817b01-e7ae-4991-94fe-e4c5672c481b - - - - -] [instance: 7021bf37-348d-4dfa-b0c5-6a251fceb6cd] During sync_power_state the instance has a pending task (spawning). Skip. 2021-11-23 22:21:28.342 7 WARNING nova.virt.libvirt.driver [req-814c98a4-3fd9-4607-9abb-5fbe5cef8650 3a4c320d64664d9cb6784b7ea52d618a 21384fe21ebb4339b1a5f2a8f9000ea3 - default default] [instance: 7021bf37-348d-4dfa-b0c5-6a251fceb6cd] Timeout waiting for [('network-vif-plugged', '735d97a3-db7c-42ed-b3be-1596d4cc7f4b')] for instance with vm_state building and task_state spawning.: eventlet.timeout.Timeout: 300 seconds ====================== The VIF/port is activated by OVN ovn-controller to ovn-sb. It seems that, either Neutron didn't capture the update or didn't send message back to nova-compute. Is there any known fix for this problem?
Thanks! Tony
-- Slawek Kaplonski Principal Software Engineer Red Hat
Hi, Thank you Balazs for investigation, so, the question is, why other VIFs events are received quickly and that last one no ? It looks like a bug, isn't it ? Sorry, I am not as good as you to review code in deep. Btw, did you see message from my colleague Jan Vondra ? It's from email chain [neutron][nova] [kolla] vif plugged timeout
>>>>>>>
Hi guys, I've been further investigating Michal's (OP) issue, since he is on his holiday, and I've found out that the issue is not really plugging the VIF but cleanup after previous port bindings. We are creating 6 servers with 2-4 vifs using heat template [0]. We were hitting some problems with placement so the stack sometimes failed to create and we had to delete the stack and recreate it. If we recreate it right after the deletion, the vif plugging timeout occurs. If we wait some time (approx. 10 minutes) the stack is created successfully. This brings me to believe that there is some issue with deferring the removal of security groups from unbound ports (somewhere around this part of code [1]) and it somehow affects the creation of new ports. However, I am unable to find any lock that could cause this behaviour. The only proof I have is that after the stack recreation scenario I have measured that the process_network_ports [2] function call could take up to 650 s (varies from 5 s to 651 s in our environment). Any idea what could be causing this? [0] https://pastebin.com/infvj4ai [1] https://github.com/openstack/neutron/blob/master/neutron/agent/firewall.py#L... [2] https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers... Jan Vondra http://ultimum.io <<<<<<<<<<<<< Thank you, Michal Arbet (kevko) Michal Arbet Openstack Engineer Ultimum Technologies a.s. Na Poříčí 1047/26, 11000 Praha 1 Czech Republic +420 604 228 897 michal.arbet@ultimum.io *https://ultimum.io <https://ultimum.io/>* LinkedIn <https://www.linkedin.com/company/ultimum-technologies> | Twitter <https://twitter.com/ultimumtech> | Facebook <https://www.facebook.com/ultimumtechnologies/timeline> út 30. 11. 2021 v 14:28 odesílatel Balazs Gibizer <balazs.gibizer@est.tech> napsal:
On Tue, Nov 30 2021 at 11:16:00 AM +0100, Balazs Gibizer <balazs.gibizer@est.tech> wrote:
On Thu, Nov 25 2021 at 11:57:45 AM +0100, Slawek Kaplonski <skaplons@redhat.com> wrote:
Hi,
On czwartek, 25 listopada 2021 10:41:14 CET Rodolfo Alonso Hernandez wrote:
Hello Michal:
I think this thread is related to OVN. In any case, I've analyzed your logs from the Neutron point of view. Those are basically the key points in your logs: [1]. The Neutron server receives the request to create and bind a new port. Both the DHCP agent and the OVS agent provision the port (8a472e87-4a7a-4ad4-9fbb-fd9785136611).
Neutron server sends the "network-vif-plugged" event at 08:51:05.635 and receives the ACK from Nova at 08:51:05.704. Nova server creates the corresponding event for the instance on the compute0: 2021-11-25 08:51:05.692 23 INFO nova.api.openstack.compute.server_external_events [req-393fcb9a-5c68-4b0a-af1b-661e73d947cb 01a75e3a9a9148218916d3beafae2120 7d16babef0524b7dade9a59b0a3569e1 - default default] Creating event network-vif-plugged:8a472e87-4a7a-4ad4-9fbb-fd9785136611 for instance 09eff2ce-356f-430f-ab30-5de58f58d698 on compute0
Nova compute agent receives this server event: 2021-11-25 08:51:05.705 8 DEBUG nova.compute.manager [req-393fcb9a-5c68-4b0a-af1b-661e73d947cb 01a75e3a9a9148218916d3beafae2120 7d16babef0524b7dade9a59b0a3569e1 - default default] [instance: 09eff2ce-356f-430f-ab30-5de58f58d698] Processing event network-vif-plugged-8a472e87-4a7a-4ad4-9fbb-fd9785136611 _process_instance_event
/var/lib/kolla/venv/lib/python3.9/site-packages/nova/compute/manager.py:
10205
Further triagging and log analysis should be done by Nova folks to understand why Nova compute didn't boot this VM. I fail to understand some parts.
Thx Rodolfo. I also took a look at those logs (for server 09eff2ce-356f-430f- ab30-5de58f58d698 which had port 0761ff2f-ff6a-40a5-85b7-c9b40386cbe0 but I can confirm what You found actually. Port was pretty quickly provisioned and it was reported to Nova. I don't know why Nova didn't unpause that vm.
I took a look at the logs (thanks for providing it!). Nova did not unpause the VM a there was more than one port to be plugged an from one, bf0f9dd2-4fd3-4028-a3dc-e2581ff081d0, there was no vif-plugged event received.
See my grepping of the log to show this https://paste.opendev.org/show/811348/
as a side note I pushed a small patch to nova to log the which event was timed out https://review.opendev.org/c/openstack/nova/+/819817
Cheers, gibi
Port creation in Nova: 2021-11-25 08:50:24.250 8 INFO os_vif [req-6eab0f2a- d7f4-4251-96f4-177fc0c62212 647a328aef9b4a8b890f102d8f018576 49231caa42f34da8a9d10229b5c7a5d8 - default default] Successfully plugged vif VIFOpenVSwitch(active=False,address=fa:16:3e:3d:cd:b9,bridge_name='br- int',has_traffic_filtering=True,id=0761ff2f-ff6a-40a5-85b7- c9b40386cbe0,network=Network(cb4e6be4-9bc7-4c34-95dd-
dfc3c8e9d93c),plugin='ovs',port_profile=VIFPortProfileOpenVSwitch,preserve_on_delete=True,vif_name='tap0761ff2f-
ff')
Neutron finished provisioning of that port and sent notification to nova about 34 seconds later: 2021-11-25 08:51:00.403 30 DEBUG neutron.plugins.ml2.rpc [req- df864eb8-0c2e-48ca-a3e7-d90dc70e2aa0 - - - - -] Device 0761ff2f- ff6a-40a5-85b7-c9b40386cbe0 up at agent ovs-agent-compute0 update_device_up /
var/lib/kolla/venv/lib/python3.9/site-packages/neutron/plugins/ml2/rpc.py:278
2021-11-25 08:51:00.503 29 DEBUG neutron.db.provisioning_blocks [req- a5711e9f-9b15-4a1a-a58d-07952fb8ac34 - - - - -] Provisioning complete for port 67b32ddf-dddd-439e-92be-6a89df49b4af triggered by entity L2. provisioning_complete /var/lib/kolla/venv/lib/python3.9/site-packages/neutron/ db/provisioning_blocks.py:139 2021-11-25 08:51:00.504 29 DEBUG neutron_lib.callbacks.manager [req- a5711e9f-9b15-4a1a-a58d-07952fb8ac34 - - - - -] Notify callbacks ['neutron.plugins.ml2.plugin.Ml2Plugin._port_provisioned-8372308'] for port, provisioning_complete _notify_loop /var/lib/kolla/venv/lib/python3.9/site- packages/neutron_lib/callbacks/manager.py:192 2021-11-25 08:51:00.505 30 DEBUG neutron.db.provisioning_blocks [req- df864eb8-0c2e-48ca-a3e7-d90dc70e2aa0 - - - - -] Provisioning for port 0761ff2f-ff6a-40a5-85b7-c9b40386cbe0 completed by entity L2. provisioning_complete /var/lib/kolla/venv/lib/python3.9/site-packages/neutron/ db/provisioning_blocks.py:133 2021-11-25 08:51:00.515 30 DEBUG neutron.db.provisioning_blocks [req- df864eb8-0c2e-48ca-a3e7-d90dc70e2aa0 - - - - -] Provisioning complete for port 0761ff2f-ff6a-40a5-85b7-c9b40386cbe0 triggered by entity L2. provisioning_complete /var/lib/kolla/venv/lib/python3.9/site-packages/neutron/ db/provisioning_blocks.py:139 2021-11-25 08:51:00.516 30 DEBUG neutron_lib.callbacks.manager [req- df864eb8-0c2e-48ca-a3e7-d90dc70e2aa0 - - - - -] Notify callbacks ['neutron.plugins.ml2.plugin.Ml2Plugin._port_provisioned-8372308'] for port, provisioning_complete _notify_loop /var/lib/kolla/venv/lib/python3.9/site- packages/neutron_lib/callbacks/manager.py:192 2021-11-25 08:51:00.701 30 DEBUG neutron_lib.callbacks.manager [req- df864eb8-0c2e-48ca-a3e7-d90dc70e2aa0 - - - - -] Notify callbacks
['neutron.plugins.ml2.plugin.SecurityGroupDbMixin._ensure_default_security_group_handler-7972195']
for port, before_update _notify_loop /var/lib/kolla/venv/lib/python3.9/site- packages/neutron_lib/callbacks/manager.py:192 2021-11-25 08:51:00.746 30 DEBUG neutron.notifiers.nova [-] Sending events: [{'server_uuid': '09eff2ce-356f-430f-ab30-5de58f58d698', 'name': 'network-vif- plugged', 'status': 'completed', 'tag': '0761ff2f-ff6a-40a5-85b7- c9b40386cbe0'}] send_events /var/lib/kolla/venv/lib/python3.9/site-packages/ neutron/notifiers/nova.py:262 2021-11-25 08:51:00.762 30 DEBUG neutron.plugins.ml2.db [req- df864eb8-0c2e-48ca-a3e7-d90dc70e2aa0 - - - - -] For port 0761ff2f- ff6a-40a5-85b7-c9b40386cbe0, host compute0, got binding levels
[PortBindingLevel(driver='openvswitch',host='compute0',level=0,port_id=0761ff2f-
ff6a-40a5-85b7-c9b40386cbe0,segment=NetworkSegment(ea3a27b6-
c3fa-40e3-9937-71b652cba855),segment_id=ea3a27b6-c3fa-40e3-9937-71b652cba855)]
get_binding_level_objs /var/lib/kolla/venv/lib/python3.9/site-packages/ neutron/plugins/ml2/db.py:75 2021-11-25 08:51:00.818 30 INFO neutron.notifiers.nova [-] Nova event response: {'server_uuid': '09eff2ce-356f-430f-ab30-5de58f58d698', 'name': 'network-vif-plugged', 'status': 'completed', 'tag': '0761ff2f-ff6a-40a5-85b7- c9b40386cbe0', 'code': 200}
Regards.
[1]https://paste.opendev.org/show/811273/
On Thu, Nov 25, 2021 at 10:15 AM Michal Arbet <michal.arbet@ultimum.io>
Hello,
In attachment you can find logs for compute0 and controller0 (other computes and controllers were turned off for this test). No OVN used, this stack is based on OVS.
Thank you, Michal Arbet Openstack Engineer
Ultimum Technologies a.s. Na Poříčí 1047/26, 11000 Praha 1 Czech Republic
+420 604 228 897 michal.arbet@ultimum.io *https://ultimum.io <https://ultimum.io/>*
LinkedIn <https://www.linkedin.com/company/ultimum-technologies> | Twitter <https://twitter.com/ultimumtech> | Facebook <https://www.facebook.com/ultimumtechnologies/timeline>
st 24. 11. 2021 v 15:59 odesílatel Rodolfo Alonso Hernandez <
ralonsoh@redhat.com> napsal:
Hello Tony:
Do you have the Neutron server logs? Do you see the "PortBindingChassisUpdateEvent"? When a port is bound, a Port_Bound (SB) event is issued and captured in the Neutron server. That will
wrote: trigger the
port binding process and the "vif-plugged" event. This OVN SB event
should
call "set_port_status_up" and that should write "OVN reports status up for port: %s" in the logs.
Of course, this Neutron method will be called only if the logical switch port is UP.
Regards.
On Tue, Nov 23, 2021 at 11:59 PM Tony Liu <tonyliu0592@hotmail.com>
wrote: > Hi, > > I see such problem from time to time. It's not consistently > reproduceable. > ====================== > 2021-11-23 22:16:28.532 7 INFO nova.compute.manager > [req-50817b01-e7ae-4991-94fe-e4c5672c481b - - - - -] [instance: > 7021bf37-348d-4dfa-b0c5-6a251fceb6cd] During sync_power_state the instance > has a pending task (spawning). Skip. > 2021-11-23 22:21:28.342 7 WARNING nova.virt.libvirt.driver > [req-814c98a4-3fd9-4607-9abb-5fbe5cef8650 3a4c320d64664d9cb6784b7ea52d618a > 21384fe21ebb4339b1a5f2a8f9000ea3 - default default] [instance: > 7021bf37-348d-4dfa-b0c5-6a251fceb6cd] Timeout waiting for > [('network-vif-plugged', '735d97a3-db7c-42ed-b3be-1596d4cc7f4b')] for > instance with vm_state building and task_state spawning.: > eventlet.timeout.Timeout: 300 seconds > ====================== > The VIF/port is activated by OVN ovn-controller to ovn-sb. It seems > that, either Neutron didn't > capture the update or didn't send message back to nova-compute. Is there > any known fix for > this problem? > > > Thanks! > Tony
-- Slawek Kaplonski Principal Software Engineer Red Hat
On Tue, Nov 30 2021 at 03:19:55 PM +0100, Michal Arbet <michal.arbet@ultimum.io> wrote:
Hi,
Thank you Balazs for investigation, so, the question is, why other VIFs events are received quickly and that last one no ?
Yes, It seems that most of the events came within a minute but the last one did not received within the timeout value.
It looks like a bug, isn't it ? Sorry, I am not as good as you to review code in deep.
It could be but I did not check the neutron side of the missing event. I hope Slaweq or Rodolfo can take that. (Yeah it is pretty unfortunate to ping-pong about this type of issues between nova and neutron :/)
Btw, did you see message from my colleague Jan Vondra ?
I saw it but it is mostly targeting neutron where I don't have deep enough knowledge to comment. Cheers, gibi
It's from email chain [neutron][nova] [kolla] vif plugged timeout
>>>>>>>>
Hi guys, I've been further investigating Michal's (OP) issue, since he is on his holiday, and I've found out that the issue is not really plugging the VIF but cleanup after previous port bindings.
We are creating 6 servers with 2-4 vifs using heat template [0]. We were hitting some problems with placement so the stack sometimes failed to create and we had to delete the stack and recreate it. If we recreate it right after the deletion, the vif plugging timeout occurs. If we wait some time (approx. 10 minutes) the stack is created successfully.
This brings me to believe that there is some issue with deferring the removal of security groups from unbound ports (somewhere around this part of code [1]) and it somehow affects the creation of new ports. However, I am unable to find any lock that could cause this behaviour.
The only proof I have is that after the stack recreation scenario I have measured that the process_network_ports [2] function call could take up to 650 s (varies from 5 s to 651 s in our environment).
Any idea what could be causing this? [0] https://pastebin.com/infvj4ai [1] https://github.com/openstack/neutron/blob/master/neutron/agent/firewall.py#L... [2] https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers...
Jan Vondra http://ultimum.io
<<<<<<<<<<<<<
Thank you, Michal Arbet (kevko)
Michal Arbet Openstack Engineer
Ultimum Technologies a.s. Na Poříčí 1047/26, 11000 Praha 1 Czech Republic
+420 604 228 897 michal.arbet@ultimum.io https://ultimum.io
LinkedIn | Twitter | Facebook
út 30. 11. 2021 v 14:28 odesílatel Balazs Gibizer <balazs.gibizer@est.tech> napsal:
On Tue, Nov 30 2021 at 11:16:00 AM +0100, Balazs Gibizer <balazs.gibizer@est.tech> wrote:
On Thu, Nov 25 2021 at 11:57:45 AM +0100, Slawek Kaplonski <skaplons@redhat.com> wrote:
Hi,
On czwartek, 25 listopada 2021 10:41:14 CET Rodolfo Alonso
wrote:
Hello Michal:
I think this thread is related to OVN. In any case, I've analyzed your logs from the Neutron point of view. Those are basically the key
in your logs: [1]. The Neutron server receives the request to create and bind a new port. Both the DHCP agent and the OVS agent provision the port (8a472e87-4a7a-4ad4-9fbb-fd9785136611).
Neutron server sends the "network-vif-plugged" event at 08:51:05.635 and receives the ACK from Nova at 08:51:05.704. Nova server creates the corresponding event for the instance on the compute0: 2021-11-25 08:51:05.692 23 INFO nova.api.openstack.compute.server_external_events [req-393fcb9a-5c68-4b0a-af1b-661e73d947cb 01a75e3a9a9148218916d3beafae2120 7d16babef0524b7dade9a59b0a3569e1 - default default] Creating event network-vif-plugged:8a472e87-4a7a-4ad4-9fbb-fd9785136611 for instance 09eff2ce-356f-430f-ab30-5de58f58d698 on compute0
Nova compute agent receives this server event: 2021-11-25 08:51:05.705 8 DEBUG nova.compute.manager [req-393fcb9a-5c68-4b0a-af1b-661e73d947cb 01a75e3a9a9148218916d3beafae2120 7d16babef0524b7dade9a59b0a3569e1 - default default] [instance: 09eff2ce-356f-430f-ab30-5de58f58d698] Processing event network-vif-plugged-8a472e87-4a7a-4ad4-9fbb-fd9785136611 _process_instance_event
/var/lib/kolla/venv/lib/python3.9/site-packages/nova/compute/manager.py: 10205
Further triagging and log analysis should be done by Nova
folks to
understand why Nova compute didn't boot this VM. I fail to understand some parts.
Thx Rodolfo. I also took a look at those logs (for server 09eff2ce-356f-430f- ab30-5de58f58d698 which had port 0761ff2f-ff6a-40a5-85b7-c9b40386cbe0 but I can confirm what You found actually. Port was pretty quickly provisioned and it was reported to Nova. I don't know why Nova didn't unpause
Hernandez points that
vm.
I took a look at the logs (thanks for providing it!). Nova did not unpause the VM a there was more than one port to be plugged an from one, bf0f9dd2-4fd3-4028-a3dc-e2581ff081d0, there was no vif-plugged event received.
See my grepping of the log to show this https://paste.opendev.org/show/811348/
as a side note I pushed a small patch to nova to log the which event was timed out https://review.opendev.org/c/openstack/nova/+/819817
Cheers, gibi
Port creation in Nova: 2021-11-25 08:50:24.250 8 INFO os_vif [req-6eab0f2a- d7f4-4251-96f4-177fc0c62212 647a328aef9b4a8b890f102d8f018576 49231caa42f34da8a9d10229b5c7a5d8 - default default] Successfully plugged vif
int',has_traffic_filtering=True,id=0761ff2f-ff6a-40a5-85b7- c9b40386cbe0,network=Network(cb4e6be4-9bc7-4c34-95dd-
dfc3c8e9d93c),plugin='ovs',port_profile=VIFPortProfileOpenVSwitch,preserve_on_delete=True,vif_name='tap0761ff2f-
ff')
Neutron finished provisioning of that port and sent notification to nova about 34 seconds later: 2021-11-25 08:51:00.403 30 DEBUG neutron.plugins.ml2.rpc [req- df864eb8-0c2e-48ca-a3e7-d90dc70e2aa0 - - - - -] Device 0761ff2f- ff6a-40a5-85b7-c9b40386cbe0 up at agent ovs-agent-compute0 update_device_up /
var/lib/kolla/venv/lib/python3.9/site-packages/neutron/plugins/ml2/rpc.py:278
2021-11-25 08:51:00.503 29 DEBUG neutron.db.provisioning_blocks [req- a5711e9f-9b15-4a1a-a58d-07952fb8ac34 - - - - -] Provisioning complete for port 67b32ddf-dddd-439e-92be-6a89df49b4af triggered by entity L2. provisioning_complete /var/lib/kolla/venv/lib/python3.9/site-packages/neutron/ db/provisioning_blocks.py:139 2021-11-25 08:51:00.504 29 DEBUG neutron_lib.callbacks.manager [req- a5711e9f-9b15-4a1a-a58d-07952fb8ac34 - - - - -] Notify callbacks
['neutron.plugins.ml2.plugin.Ml2Plugin._port_provisioned-8372308']
for port, provisioning_complete _notify_loop /var/lib/kolla/venv/lib/python3.9/site- packages/neutron_lib/callbacks/manager.py:192 2021-11-25 08:51:00.505 30 DEBUG neutron.db.provisioning_blocks [req- df864eb8-0c2e-48ca-a3e7-d90dc70e2aa0 - - - - -] Provisioning for
VIFOpenVSwitch(active=False,address=fa:16:3e:3d:cd:b9,bridge_name='br- port
0761ff2f-ff6a-40a5-85b7-c9b40386cbe0 completed by entity L2. provisioning_complete /var/lib/kolla/venv/lib/python3.9/site-packages/neutron/ db/provisioning_blocks.py:133 2021-11-25 08:51:00.515 30 DEBUG neutron.db.provisioning_blocks [req- df864eb8-0c2e-48ca-a3e7-d90dc70e2aa0 - - - - -] Provisioning complete for port 0761ff2f-ff6a-40a5-85b7-c9b40386cbe0 triggered by entity L2. provisioning_complete /var/lib/kolla/venv/lib/python3.9/site-packages/neutron/ db/provisioning_blocks.py:139 2021-11-25 08:51:00.516 30 DEBUG neutron_lib.callbacks.manager [req- df864eb8-0c2e-48ca-a3e7-d90dc70e2aa0 - - - - -] Notify callbacks
['neutron.plugins.ml2.plugin.Ml2Plugin._port_provisioned-8372308']
for port, provisioning_complete _notify_loop /var/lib/kolla/venv/lib/python3.9/site- packages/neutron_lib/callbacks/manager.py:192 2021-11-25 08:51:00.701 30 DEBUG neutron_lib.callbacks.manager [req- df864eb8-0c2e-48ca-a3e7-d90dc70e2aa0 - - - - -] Notify callbacks
['neutron.plugins.ml2.plugin.SecurityGroupDbMixin._ensure_default_security_group_handler-7972195']
for port, before_update _notify_loop /var/lib/kolla/venv/lib/python3.9/site- packages/neutron_lib/callbacks/manager.py:192 2021-11-25 08:51:00.746 30 DEBUG neutron.notifiers.nova [-] Sending events: [{'server_uuid': '09eff2ce-356f-430f-ab30-5de58f58d698', 'name': 'network-vif- plugged', 'status': 'completed', 'tag': '0761ff2f-ff6a-40a5-85b7- c9b40386cbe0'}] send_events /var/lib/kolla/venv/lib/python3.9/site-packages/ neutron/notifiers/nova.py:262 2021-11-25 08:51:00.762 30 DEBUG neutron.plugins.ml2.db [req- df864eb8-0c2e-48ca-a3e7-d90dc70e2aa0 - - - - -] For port 0761ff2f- ff6a-40a5-85b7-c9b40386cbe0, host compute0, got binding levels
[PortBindingLevel(driver='openvswitch',host='compute0',level=0,port_id=0761ff2f-
ff6a-40a5-85b7-c9b40386cbe0,segment=NetworkSegment(ea3a27b6-
c3fa-40e3-9937-71b652cba855),segment_id=ea3a27b6-c3fa-40e3-9937-71b652cba855)]
get_binding_level_objs /var/lib/kolla/venv/lib/python3.9/site-packages/ neutron/plugins/ml2/db.py:75 2021-11-25 08:51:00.818 30 INFO neutron.notifiers.nova [-] Nova event response: {'server_uuid': '09eff2ce-356f-430f-ab30-5de58f58d698', 'name': 'network-vif-plugged', 'status': 'completed', 'tag': '0761ff2f-ff6a-40a5-85b7- c9b40386cbe0', 'code': 200}
Regards.
[1]https://paste.opendev.org/show/811273/
On Thu, Nov 25, 2021 at 10:15 AM Michal Arbet <michal.arbet@ultimum.io>
wrote:
Hello,
In attachment you can find logs for compute0 and controller0 (other computes and controllers were turned off for this test). No OVN used, this stack is based on OVS.
Thank you, Michal Arbet Openstack Engineer
Ultimum Technologies a.s. Na Poříčí 1047/26, 11000 Praha 1 Czech Republic
+420 604 228 897 michal.arbet@ultimum.io *https://ultimum.io <https://ultimum.io/>*
<https://www.linkedin.com/company/ultimum-technologies>
<https://twitter.com/ultimumtech> | Facebook <https://www.facebook.com/ultimumtechnologies/timeline>
st 24. 11. 2021 v 15:59 odesílatel Rodolfo Alonso Hernandez <
ralonsoh@redhat.com> napsal: > Hello Tony: > > Do you have the Neutron server logs? Do you see the > "PortBindingChassisUpdateEvent"? When a port is bound, a Port_Bound (SB) > event is issued and captured in the Neutron server. That will
> port binding process and the "vif-plugged" event. This OVN SB event should > call "set_port_status_up" and that should write "OVN reports status up for > port: %s" in the logs. > > Of course, this Neutron method will be called only if the logical switch > port is UP. > > Regards. > > > On Tue, Nov 23, 2021 at 11:59 PM Tony Liu <tonyliu0592@hotmail.com> > > wrote: >> Hi, >> >> I see such problem from time to time. It's not consistently >> reproduceable. >> ====================== >> 2021-11-23 22:16:28.532 7 INFO nova.compute.manager >> [req-50817b01-e7ae-4991-94fe-e4c5672c481b - - - - -] [instance: >> 7021bf37-348d-4dfa-b0c5-6a251fceb6cd] During sync_power_state
| Twitter trigger the the instance
>> has a pending task (spawning). Skip. >> 2021-11-23 22:21:28.342 7 WARNING nova.virt.libvirt.driver >> [req-814c98a4-3fd9-4607-9abb-5fbe5cef8650 3a4c320d64664d9cb6784b7ea52d618a >> 21384fe21ebb4339b1a5f2a8f9000ea3 - default default] [instance: >> 7021bf37-348d-4dfa-b0c5-6a251fceb6cd] Timeout waiting for >> [('network-vif-plugged', '735d97a3-db7c-42ed-b3be-1596d4cc7f4b')] for >> instance with vm_state building and task_state spawning.: >> eventlet.timeout.Timeout: 300 seconds >> ====================== >> The VIF/port is activated by OVN ovn-controller to ovn-sb. It seems >> that, either Neutron didn't >> capture the update or didn't send message back to nova-compute. Is there >> any known fix for >> this problem? >> >> >> Thanks! >> Tony
-- Slawek Kaplonski Principal Software Engineer Red Hat
participants (5)
-
Balazs Gibizer
-
Michal Arbet
-
Rodolfo Alonso Hernandez
-
Slawek Kaplonski
-
Tony Liu