Hi,

Thank you Balazs for investigation, so, the question is, why other VIFs events are received quickly and that last one no ? 
It looks like a bug, isn't it ?  Sorry, I am not as good as you to review code in deep.

Btw, did you see message from my colleague Jan Vondra ? 
It's from email chain  [neutron][nova] [kolla] vif plugged timeout

>>>>>>>>>>>>>


Hi guys,
I've been further investigating Michal's (OP) issue, since he is on his holiday, and I've found out that the issue is not really plugging the VIF but cleanup after previous port bindings.

We are creating 6 servers with 2-4 vifs using heat template [0]. We were hitting some problems with placement so the stack sometimes failed to create and we had to delete the stack and recreate it.
If we recreate it right after the deletion, the vif plugging timeout occurs. If we wait some time (approx. 10 minutes) the stack is created successfully.

This brings me to believe that there is some issue with deferring the removal of security groups from unbound ports (somewhere around this part of code [1]) and it somehow affects the creation of new ports. However, I am unable to find any lock that could cause this behaviour.

The only proof I have is that after the stack recreation scenario I have measured that the process_network_ports [2] function call could take up to 650 s (varies from 5 s to 651 s in our environment).

Any idea what could be causing this?
[0] https://pastebin.com/infvj4ai
[1] https://github.com/openstack/neutron/blob/master/neutron/agent/firewall.py#L133
[2] https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py#L2079

Jan Vondra
http://ultimum.io

<<<<<<<<<<<<<

Thank you,
Michal Arbet (kevko)


Michal Arbet
Openstack Engineer

Ultimum Technologies a.s.
Na Poříčí 1047/26, 11000 Praha 1
Czech Republic

+420 604 228 897 
michal.arbet@ultimum.io
https://ultimum.io



út 30. 11. 2021 v 14:28 odesílatel Balazs Gibizer <balazs.gibizer@est.tech> napsal:


On Tue, Nov 30 2021 at 11:16:00 AM +0100, Balazs Gibizer
<balazs.gibizer@est.tech> wrote:
>
>
> On Thu, Nov 25 2021 at 11:57:45 AM +0100, Slawek Kaplonski
> <skaplons@redhat.com> wrote:
>> Hi,
>>
>> On czwartek, 25 listopada 2021 10:41:14 CET Rodolfo Alonso Hernandez
>>  wrote:
>>>  Hello Michal:
>>>
>>>  I think this thread is related to OVN. In any case, I've analyzed
>>>   your logs
>>>  from the Neutron point of view. Those are basically the key points
>>>   in your
>>>  logs: [1]. The Neutron server receives the request to create and
>>>   bind a new
>>>  port. Both the DHCP agent and the OVS agent provision the port
>>>  (8a472e87-4a7a-4ad4-9fbb-fd9785136611).
>>>
>>>  Neutron server sends the "network-vif-plugged" event at
>>>   08:51:05.635 and
>>>  receives the ACK from Nova at 08:51:05.704. Nova server creates the
>>>  corresponding event for the instance on the compute0:
>>>  2021-11-25 08:51:05.692 23 INFO
>>>  nova.api.openstack.compute.server_external_events
>>>  [req-393fcb9a-5c68-4b0a-af1b-661e73d947cb
>>>   01a75e3a9a9148218916d3beafae2120
>>>  7d16babef0524b7dade9a59b0a3569e1 - default default] Creating event
>>>  network-vif-plugged:8a472e87-4a7a-4ad4-9fbb-fd9785136611 for
>>>   instance
>>>  09eff2ce-356f-430f-ab30-5de58f58d698 on compute0
>>>
>>>  Nova compute agent receives this server event:
>>>  2021-11-25 08:51:05.705 8 DEBUG nova.compute.manager
>>>  [req-393fcb9a-5c68-4b0a-af1b-661e73d947cb
>>>   01a75e3a9a9148218916d3beafae2120
>>>  7d16babef0524b7dade9a59b0a3569e1 - default default] [instance:
>>>  09eff2ce-356f-430f-ab30-5de58f58d698] Processing event
>>>  network-vif-plugged-8a472e87-4a7a-4ad4-9fbb-fd9785136611
>>>  _process_instance_event
>>> 
>>>   /var/lib/kolla/venv/lib/python3.9/site-packages/nova/compute/manager.py:
>> 10205
>>>
>>>  Further triagging and log analysis should be done by Nova folks to
>>>  understand why Nova compute didn't boot this VM. I fail to
>>>   understand some
>>>  parts.
>>
>> Thx Rodolfo. I also took a look at those logs (for server
>>  09eff2ce-356f-430f-
>> ab30-5de58f58d698 which had port
>> 0761ff2f-ff6a-40a5-85b7-c9b40386cbe0  but I
>> can confirm what You found actually. Port was pretty quickly
>>  provisioned and
>> it was reported to Nova. I don't know why Nova didn't unpause that
>> vm.
>
> I took a look at the logs (thanks for providing it!). Nova did not
> unpause the VM a there was more than one port to be plugged an from
> one, bf0f9dd2-4fd3-4028-a3dc-e2581ff081d0, there was no vif-plugged
> event received.
>
> See my grepping of the log to show this
> https://paste.opendev.org/show/811348/


as a side note I pushed a small patch to nova to log the which event
was timed out https://review.opendev.org/c/openstack/nova/+/819817

>
> Cheers,
> gibi
>
>>
>> Port creation in Nova:
>> 2021-11-25 08:50:24.250 8 INFO os_vif [req-6eab0f2a-
>> d7f4-4251-96f4-177fc0c62212 647a328aef9b4a8b890f102d8f018576
>> 49231caa42f34da8a9d10229b5c7a5d8 - default default] Successfully
>>  plugged vif
>> VIFOpenVSwitch(active=False,address=fa:16:3e:3d:cd:b9,bridge_name='br-
>> int',has_traffic_filtering=True,id=0761ff2f-ff6a-40a5-85b7-
>> c9b40386cbe0,network=Network(cb4e6be4-9bc7-4c34-95dd-
>> dfc3c8e9d93c),plugin='ovs',port_profile=VIFPortProfileOpenVSwitch,preserve_on_delete=True,vif_name='tap0761ff2f-
>> ff')
>>
>> Neutron finished provisioning of that port and sent notification to
>>  nova about
>> 34 seconds later:
>> 2021-11-25 08:51:00.403 30 DEBUG neutron.plugins.ml2.rpc [req-
>> df864eb8-0c2e-48ca-a3e7-d90dc70e2aa0 - - - - -] Device 0761ff2f-
>> ff6a-40a5-85b7-c9b40386cbe0 up at agent ovs-agent-compute0
>>  update_device_up /
>> var/lib/kolla/venv/lib/python3.9/site-packages/neutron/plugins/ml2/rpc.py:278
>> 2021-11-25 08:51:00.503 29 DEBUG neutron.db.provisioning_blocks [req-
>> a5711e9f-9b15-4a1a-a58d-07952fb8ac34 - - - - -] Provisioning
>> complete  for port
>> 67b32ddf-dddd-439e-92be-6a89df49b4af triggered by entity L2.
>> provisioning_complete
>>  /var/lib/kolla/venv/lib/python3.9/site-packages/neutron/
>> db/provisioning_blocks.py:139
>> 2021-11-25 08:51:00.504 29 DEBUG neutron_lib.callbacks.manager [req-
>> a5711e9f-9b15-4a1a-a58d-07952fb8ac34 - - - - -] Notify callbacks
>> ['neutron.plugins.ml2.plugin.Ml2Plugin._port_provisioned-8372308']
>>  for port,
>> provisioning_complete _notify_loop
>>  /var/lib/kolla/venv/lib/python3.9/site-
>> packages/neutron_lib/callbacks/manager.py:192
>> 2021-11-25 08:51:00.505 30 DEBUG neutron.db.provisioning_blocks [req-
>> df864eb8-0c2e-48ca-a3e7-d90dc70e2aa0 - - - - -] Provisioning for port
>> 0761ff2f-ff6a-40a5-85b7-c9b40386cbe0 completed by entity L2.
>> provisioning_complete
>>  /var/lib/kolla/venv/lib/python3.9/site-packages/neutron/
>> db/provisioning_blocks.py:133
>> 2021-11-25 08:51:00.515 30 DEBUG neutron.db.provisioning_blocks [req-
>> df864eb8-0c2e-48ca-a3e7-d90dc70e2aa0 - - - - -] Provisioning
>> complete  for port
>> 0761ff2f-ff6a-40a5-85b7-c9b40386cbe0 triggered by entity L2.
>> provisioning_complete
>>  /var/lib/kolla/venv/lib/python3.9/site-packages/neutron/
>> db/provisioning_blocks.py:139
>> 2021-11-25 08:51:00.516 30 DEBUG neutron_lib.callbacks.manager [req-
>> df864eb8-0c2e-48ca-a3e7-d90dc70e2aa0 - - - - -] Notify callbacks
>> ['neutron.plugins.ml2.plugin.Ml2Plugin._port_provisioned-8372308']
>>  for port,
>> provisioning_complete _notify_loop
>>  /var/lib/kolla/venv/lib/python3.9/site-
>> packages/neutron_lib/callbacks/manager.py:192
>> 2021-11-25 08:51:00.701 30 DEBUG neutron_lib.callbacks.manager [req-
>> df864eb8-0c2e-48ca-a3e7-d90dc70e2aa0 - - - - -] Notify callbacks
>> ['neutron.plugins.ml2.plugin.SecurityGroupDbMixin._ensure_default_security_group_handler-7972195']
>> for port, before_update _notify_loop
>>  /var/lib/kolla/venv/lib/python3.9/site-
>> packages/neutron_lib/callbacks/manager.py:192
>> 2021-11-25 08:51:00.746 30 DEBUG neutron.notifiers.nova [-] Sending
>>  events:
>> [{'server_uuid': '09eff2ce-356f-430f-ab30-5de58f58d698', 'name':
>>  'network-vif-
>> plugged', 'status': 'completed', 'tag': '0761ff2f-ff6a-40a5-85b7-
>> c9b40386cbe0'}] send_events
>>  /var/lib/kolla/venv/lib/python3.9/site-packages/
>> neutron/notifiers/nova.py:262
>> 2021-11-25 08:51:00.762 30 DEBUG neutron.plugins.ml2.db [req-
>> df864eb8-0c2e-48ca-a3e7-d90dc70e2aa0 - - - - -] For port 0761ff2f-
>> ff6a-40a5-85b7-c9b40386cbe0, host compute0, got binding levels
>> [PortBindingLevel(driver='openvswitch',host='compute0',level=0,port_id=0761ff2f-
>> ff6a-40a5-85b7-c9b40386cbe0,segment=NetworkSegment(ea3a27b6-
>> c3fa-40e3-9937-71b652cba855),segment_id=ea3a27b6-c3fa-40e3-9937-71b652cba855)]
>> get_binding_level_objs
>>  /var/lib/kolla/venv/lib/python3.9/site-packages/
>> neutron/plugins/ml2/db.py:75
>> 2021-11-25 08:51:00.818 30 INFO neutron.notifiers.nova [-] Nova event
>> response: {'server_uuid': '09eff2ce-356f-430f-ab30-5de58f58d698',
>>  'name':
>> 'network-vif-plugged', 'status': 'completed', 'tag':
>>  '0761ff2f-ff6a-40a5-85b7-
>> c9b40386cbe0', 'code': 200}
>>
>>>
>>>  Regards.
>>>
>>>  [1]https://paste.opendev.org/show/811273/
>>>
>>>  On Thu, Nov 25, 2021 at 10:15 AM Michal Arbet
>>>   <michal.arbet@ultimum.io>
>>>
>>>  wrote:
>>>  > Hello,
>>>  >
>>>  > In attachment you can find logs for compute0 and controller0
>>>   (other
>>>  > computes and controllers were turned off for this test).
>>>  > No OVN used, this stack is based on OVS.
>>>  >
>>>  > Thank you,
>>>  > Michal Arbet
>>>  > Openstack Engineer
>>>  >
>>>  > Ultimum Technologies a.s.
>>>  > Na Poříčí 1047/26, 11000 Praha 1
>>>  > Czech Republic
>>>  >
>>>  > +420 604 228 897
>>>  > michal.arbet@ultimum.io
>>>  > *https://ultimum.io <https://ultimum.io/>*
>>>  >
>>>  > LinkedIn <https://www.linkedin.com/company/ultimum-technologies>
>>>   | Twitter
>>>  > <https://twitter.com/ultimumtech> | Facebook
>>>  > <https://www.facebook.com/ultimumtechnologies/timeline>
>>>  >
>>>  >
>>>  > st 24. 11. 2021 v 15:59 odesílatel Rodolfo Alonso Hernandez <
>>>  >
>>>  > ralonsoh@redhat.com> napsal:
>>>  >> Hello Tony:
>>>  >>
>>>  >> Do you have the Neutron server logs? Do you see the
>>>  >> "PortBindingChassisUpdateEvent"? When a port is bound, a
>>>   Port_Bound (SB)
>>>  >> event is issued and captured in the Neutron server. That will
>>>   trigger the
>>>  >> port binding process and the "vif-plugged" event. This OVN SB
>>>   event
>> should
>>>  >> call "set_port_status_up" and that should write "OVN reports
>>>   status up
>> for
>>>  >> port: %s" in the logs.
>>>  >>
>>>  >> Of course, this Neutron method will be called only if the
>>>   logical switch
>>>  >> port is UP.
>>>  >>
>>>  >> Regards.
>>>  >>
>>>  >>
>>>  >> On Tue, Nov 23, 2021 at 11:59 PM Tony Liu
>>>   <tonyliu0592@hotmail.com>
>>>  >>
>>>  >> wrote:
>>>  >>> Hi,
>>>  >>>
>>>  >>> I see such problem from time to time. It's not consistently
>>>  >>> reproduceable.
>>>  >>> ======================
>>>  >>> 2021-11-23 22:16:28.532 7 INFO nova.compute.manager
>>>  >>> [req-50817b01-e7ae-4991-94fe-e4c5672c481b - - - - -] [instance:
>>>  >>> 7021bf37-348d-4dfa-b0c5-6a251fceb6cd] During sync_power_state
>>>   the
>> instance
>>>  >>> has a pending task (spawning). Skip.
>>>  >>> 2021-11-23 22:21:28.342 7 WARNING nova.virt.libvirt.driver
>>>  >>> [req-814c98a4-3fd9-4607-9abb-5fbe5cef8650
>> 3a4c320d64664d9cb6784b7ea52d618a
>>>  >>> 21384fe21ebb4339b1a5f2a8f9000ea3 - default default] [instance:
>>>  >>> 7021bf37-348d-4dfa-b0c5-6a251fceb6cd] Timeout waiting for
>>>  >>> [('network-vif-plugged',
>>>   '735d97a3-db7c-42ed-b3be-1596d4cc7f4b')] for
>>>  >>> instance with vm_state building and task_state spawning.:
>>>  >>> eventlet.timeout.Timeout: 300 seconds
>>>  >>> ======================
>>>  >>> The VIF/port is activated by OVN ovn-controller to ovn-sb. It
>>>   seems
>>>  >>> that, either Neutron didn't
>>>  >>> capture the update or didn't send message back to
>>> nova-compute.   Is there
>>>  >>> any known fix for
>>>  >>> this problem?
>>>  >>>
>>>  >>>
>>>  >>> Thanks!
>>>  >>> Tony
>>
>>
>> --
>> Slawek Kaplonski
>> Principal Software Engineer
>> Red Hat
>
>
>