[Neutron][OVN] nova-compute timeout while waiting for VIF activation confirmed by Neutron

Balazs Gibizer balazs.gibizer at est.tech
Tue Nov 30 14:32:31 UTC 2021



On Tue, Nov 30 2021 at 03:19:55 PM +0100, Michal Arbet 
<michal.arbet at ultimum.io> wrote:
> Hi,
> 
> Thank you Balazs for investigation, so, the question is, why other 
> VIFs events are received quickly and that last one no ?

Yes, It seems that most of the events came within a minute but the last 
one did not received within the timeout value.

> It looks like a bug, isn't it ?  Sorry, I am not as good as you to 
> review code in deep.

It could be but I did not check the neutron side of the missing event. 
I hope Slaweq or Rodolfo can take that. (Yeah it is pretty unfortunate 
to ping-pong about this type of issues between nova and neutron :/)

> 
> Btw, did you see message from my colleague Jan Vondra ?

I saw it but it is mostly targeting neutron where I don't have deep 
enough knowledge to comment.

Cheers,
gibi

> It's from email chain  [neutron][nova] [kolla] vif plugged timeout
> 
> >>>>>>>>>>>>>
> 
> 
> Hi guys,
> I've been further investigating Michal's (OP) issue, since he is on 
> his holiday, and I've found out that the issue is not really plugging 
> the VIF but cleanup after previous port bindings.
> 
> We are creating 6 servers with 2-4 vifs using heat template [0]. We 
> were hitting some problems with placement so the stack sometimes 
> failed to create and we had to delete the stack and recreate it.
> If we recreate it right after the deletion, the vif plugging timeout 
> occurs. If we wait some time (approx. 10 minutes) the stack is 
> created successfully.
> 
> This brings me to believe that there is some issue with deferring the 
> removal of security groups from unbound ports (somewhere around this 
> part of code [1]) and it somehow affects the creation of new ports. 
> However, I am unable to find any lock that could cause this behaviour.
> 
> The only proof I have is that after the stack recreation scenario I 
> have measured that the process_network_ports [2] function call could 
> take up to 650 s (varies from 5 s to 651 s in our environment).
> 
> Any idea what could be causing this?
> [0] https://pastebin.com/infvj4ai
> [1] 
> https://github.com/openstack/neutron/blob/master/neutron/agent/firewall.py#L133
> [2] 
> https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py#L2079
> 
> Jan Vondra
> http://ultimum.io
> 
> <<<<<<<<<<<<<
> 
> Thank you,
> Michal Arbet (kevko)
> 
> 
> Michal Arbet
> Openstack Engineer
> 
> Ultimum Technologies a.s.
> Na Poříčí 1047/26, 11000 Praha 1
> Czech Republic
> 
> +420 604 228 897
> michal.arbet at ultimum.io
> https://ultimum.io
> 
> LinkedIn | Twitter | Facebook
> 
> 
> út 30. 11. 2021 v 14:28 odesílatel Balazs Gibizer 
> <balazs.gibizer at est.tech> napsal:
>> 
>> 
>>  On Tue, Nov 30 2021 at 11:16:00 AM +0100, Balazs Gibizer
>>  <balazs.gibizer at est.tech> wrote:
>>  >
>>  >
>>  > On Thu, Nov 25 2021 at 11:57:45 AM +0100, Slawek Kaplonski
>>  > <skaplons at redhat.com> wrote:
>>  >> Hi,
>>  >>
>>  >> On czwartek, 25 listopada 2021 10:41:14 CET Rodolfo Alonso 
>> Hernandez
>>  >>  wrote:
>>  >>>  Hello Michal:
>>  >>>
>>  >>>  I think this thread is related to OVN. In any case, I've 
>> analyzed
>>  >>>   your logs
>>  >>>  from the Neutron point of view. Those are basically the key 
>> points
>>  >>>   in your
>>  >>>  logs: [1]. The Neutron server receives the request to create 
>> and
>>  >>>   bind a new
>>  >>>  port. Both the DHCP agent and the OVS agent provision the port
>>  >>>  (8a472e87-4a7a-4ad4-9fbb-fd9785136611).
>>  >>>
>>  >>>  Neutron server sends the "network-vif-plugged" event at
>>  >>>   08:51:05.635 and
>>  >>>  receives the ACK from Nova at 08:51:05.704. Nova server 
>> creates the
>>  >>>  corresponding event for the instance on the compute0:
>>  >>>  2021-11-25 08:51:05.692 23 INFO
>>  >>>  nova.api.openstack.compute.server_external_events
>>  >>>  [req-393fcb9a-5c68-4b0a-af1b-661e73d947cb
>>  >>>   01a75e3a9a9148218916d3beafae2120
>>  >>>  7d16babef0524b7dade9a59b0a3569e1 - default default] Creating 
>> event
>>  >>>  network-vif-plugged:8a472e87-4a7a-4ad4-9fbb-fd9785136611 for
>>  >>>   instance
>>  >>>  09eff2ce-356f-430f-ab30-5de58f58d698 on compute0
>>  >>>
>>  >>>  Nova compute agent receives this server event:
>>  >>>  2021-11-25 08:51:05.705 8 DEBUG nova.compute.manager
>>  >>>  [req-393fcb9a-5c68-4b0a-af1b-661e73d947cb
>>  >>>   01a75e3a9a9148218916d3beafae2120
>>  >>>  7d16babef0524b7dade9a59b0a3569e1 - default default] [instance:
>>  >>>  09eff2ce-356f-430f-ab30-5de58f58d698] Processing event
>>  >>>  network-vif-plugged-8a472e87-4a7a-4ad4-9fbb-fd9785136611
>>  >>>  _process_instance_event
>>  >>>
>>  >>>   
>> /var/lib/kolla/venv/lib/python3.9/site-packages/nova/compute/manager.py:
>>  >> 10205
>>  >>>
>>  >>>  Further triagging and log analysis should be done by Nova 
>> folks to
>>  >>>  understand why Nova compute didn't boot this VM. I fail to
>>  >>>   understand some
>>  >>>  parts.
>>  >>
>>  >> Thx Rodolfo. I also took a look at those logs (for server
>>  >>  09eff2ce-356f-430f-
>>  >> ab30-5de58f58d698 which had port
>>  >> 0761ff2f-ff6a-40a5-85b7-c9b40386cbe0  but I
>>  >> can confirm what You found actually. Port was pretty quickly
>>  >>  provisioned and
>>  >> it was reported to Nova. I don't know why Nova didn't unpause 
>> that
>>  >> vm.
>>  >
>>  > I took a look at the logs (thanks for providing it!). Nova did not
>>  > unpause the VM a there was more than one port to be plugged an 
>> from
>>  > one, bf0f9dd2-4fd3-4028-a3dc-e2581ff081d0, there was no 
>> vif-plugged
>>  > event received.
>>  >
>>  > See my grepping of the log to show this
>>  > https://paste.opendev.org/show/811348/
>> 
>> 
>>  as a side note I pushed a small patch to nova to log the which event
>>  was timed out https://review.opendev.org/c/openstack/nova/+/819817
>> 
>>  >
>>  > Cheers,
>>  > gibi
>>  >
>>  >>
>>  >> Port creation in Nova:
>>  >> 2021-11-25 08:50:24.250 8 INFO os_vif [req-6eab0f2a-
>>  >> d7f4-4251-96f4-177fc0c62212 647a328aef9b4a8b890f102d8f018576
>>  >> 49231caa42f34da8a9d10229b5c7a5d8 - default default] Successfully
>>  >>  plugged vif
>>  >> 
>> VIFOpenVSwitch(active=False,address=fa:16:3e:3d:cd:b9,bridge_name='br-
>>  >> int',has_traffic_filtering=True,id=0761ff2f-ff6a-40a5-85b7-
>>  >> c9b40386cbe0,network=Network(cb4e6be4-9bc7-4c34-95dd-
>>  >> 
>> dfc3c8e9d93c),plugin='ovs',port_profile=VIFPortProfileOpenVSwitch,preserve_on_delete=True,vif_name='tap0761ff2f-
>>  >> ff')
>>  >>
>>  >> Neutron finished provisioning of that port and sent notification 
>> to
>>  >>  nova about
>>  >> 34 seconds later:
>>  >> 2021-11-25 08:51:00.403 30 DEBUG neutron.plugins.ml2.rpc [req-
>>  >> df864eb8-0c2e-48ca-a3e7-d90dc70e2aa0 - - - - -] Device 0761ff2f-
>>  >> ff6a-40a5-85b7-c9b40386cbe0 up at agent ovs-agent-compute0
>>  >>  update_device_up /
>>  >> 
>> var/lib/kolla/venv/lib/python3.9/site-packages/neutron/plugins/ml2/rpc.py:278
>>  >> 2021-11-25 08:51:00.503 29 DEBUG neutron.db.provisioning_blocks 
>> [req-
>>  >> a5711e9f-9b15-4a1a-a58d-07952fb8ac34 - - - - -] Provisioning
>>  >> complete  for port
>>  >> 67b32ddf-dddd-439e-92be-6a89df49b4af triggered by entity L2.
>>  >> provisioning_complete
>>  >>  /var/lib/kolla/venv/lib/python3.9/site-packages/neutron/
>>  >> db/provisioning_blocks.py:139
>>  >> 2021-11-25 08:51:00.504 29 DEBUG neutron_lib.callbacks.manager 
>> [req-
>>  >> a5711e9f-9b15-4a1a-a58d-07952fb8ac34 - - - - -] Notify callbacks
>>  >> 
>> ['neutron.plugins.ml2.plugin.Ml2Plugin._port_provisioned-8372308']
>>  >>  for port,
>>  >> provisioning_complete _notify_loop
>>  >>  /var/lib/kolla/venv/lib/python3.9/site-
>>  >> packages/neutron_lib/callbacks/manager.py:192
>>  >> 2021-11-25 08:51:00.505 30 DEBUG neutron.db.provisioning_blocks 
>> [req-
>>  >> df864eb8-0c2e-48ca-a3e7-d90dc70e2aa0 - - - - -] Provisioning for 
>> port
>>  >> 0761ff2f-ff6a-40a5-85b7-c9b40386cbe0 completed by entity L2.
>>  >> provisioning_complete
>>  >>  /var/lib/kolla/venv/lib/python3.9/site-packages/neutron/
>>  >> db/provisioning_blocks.py:133
>>  >> 2021-11-25 08:51:00.515 30 DEBUG neutron.db.provisioning_blocks 
>> [req-
>>  >> df864eb8-0c2e-48ca-a3e7-d90dc70e2aa0 - - - - -] Provisioning
>>  >> complete  for port
>>  >> 0761ff2f-ff6a-40a5-85b7-c9b40386cbe0 triggered by entity L2.
>>  >> provisioning_complete
>>  >>  /var/lib/kolla/venv/lib/python3.9/site-packages/neutron/
>>  >> db/provisioning_blocks.py:139
>>  >> 2021-11-25 08:51:00.516 30 DEBUG neutron_lib.callbacks.manager 
>> [req-
>>  >> df864eb8-0c2e-48ca-a3e7-d90dc70e2aa0 - - - - -] Notify callbacks
>>  >> 
>> ['neutron.plugins.ml2.plugin.Ml2Plugin._port_provisioned-8372308']
>>  >>  for port,
>>  >> provisioning_complete _notify_loop
>>  >>  /var/lib/kolla/venv/lib/python3.9/site-
>>  >> packages/neutron_lib/callbacks/manager.py:192
>>  >> 2021-11-25 08:51:00.701 30 DEBUG neutron_lib.callbacks.manager 
>> [req-
>>  >> df864eb8-0c2e-48ca-a3e7-d90dc70e2aa0 - - - - -] Notify callbacks
>>  >> 
>> ['neutron.plugins.ml2.plugin.SecurityGroupDbMixin._ensure_default_security_group_handler-7972195']
>>  >> for port, before_update _notify_loop
>>  >>  /var/lib/kolla/venv/lib/python3.9/site-
>>  >> packages/neutron_lib/callbacks/manager.py:192
>>  >> 2021-11-25 08:51:00.746 30 DEBUG neutron.notifiers.nova [-] 
>> Sending
>>  >>  events:
>>  >> [{'server_uuid': '09eff2ce-356f-430f-ab30-5de58f58d698', 'name':
>>  >>  'network-vif-
>>  >> plugged', 'status': 'completed', 'tag': '0761ff2f-ff6a-40a5-85b7-
>>  >> c9b40386cbe0'}] send_events
>>  >>  /var/lib/kolla/venv/lib/python3.9/site-packages/
>>  >> neutron/notifiers/nova.py:262
>>  >> 2021-11-25 08:51:00.762 30 DEBUG neutron.plugins.ml2.db [req-
>>  >> df864eb8-0c2e-48ca-a3e7-d90dc70e2aa0 - - - - -] For port 
>> 0761ff2f-
>>  >> ff6a-40a5-85b7-c9b40386cbe0, host compute0, got binding levels
>>  >> 
>> [PortBindingLevel(driver='openvswitch',host='compute0',level=0,port_id=0761ff2f-
>>  >> ff6a-40a5-85b7-c9b40386cbe0,segment=NetworkSegment(ea3a27b6-
>>  >> 
>> c3fa-40e3-9937-71b652cba855),segment_id=ea3a27b6-c3fa-40e3-9937-71b652cba855)]
>>  >> get_binding_level_objs
>>  >>  /var/lib/kolla/venv/lib/python3.9/site-packages/
>>  >> neutron/plugins/ml2/db.py:75
>>  >> 2021-11-25 08:51:00.818 30 INFO neutron.notifiers.nova [-] Nova 
>> event
>>  >> response: {'server_uuid': '09eff2ce-356f-430f-ab30-5de58f58d698',
>>  >>  'name':
>>  >> 'network-vif-plugged', 'status': 'completed', 'tag':
>>  >>  '0761ff2f-ff6a-40a5-85b7-
>>  >> c9b40386cbe0', 'code': 200}
>>  >>
>>  >>>
>>  >>>  Regards.
>>  >>>
>>  >>>  [1]https://paste.opendev.org/show/811273/
>>  >>>
>>  >>>  On Thu, Nov 25, 2021 at 10:15 AM Michal Arbet
>>  >>>   <michal.arbet at ultimum.io>
>>  >>>
>>  >>>  wrote:
>>  >>>  > Hello,
>>  >>>  >
>>  >>>  > In attachment you can find logs for compute0 and controller0
>>  >>>   (other
>>  >>>  > computes and controllers were turned off for this test).
>>  >>>  > No OVN used, this stack is based on OVS.
>>  >>>  >
>>  >>>  > Thank you,
>>  >>>  > Michal Arbet
>>  >>>  > Openstack Engineer
>>  >>>  >
>>  >>>  > Ultimum Technologies a.s.
>>  >>>  > Na Poříčí 1047/26, 11000 Praha 1
>>  >>>  > Czech Republic
>>  >>>  >
>>  >>>  > +420 604 228 897
>>  >>>  > michal.arbet at ultimum.io
>>  >>>  > *https://ultimum.io <https://ultimum.io/>*
>>  >>>  >
>>  >>>  > LinkedIn 
>> <https://www.linkedin.com/company/ultimum-technologies>
>>  >>>   | Twitter
>>  >>>  > <https://twitter.com/ultimumtech> | Facebook
>>  >>>  > <https://www.facebook.com/ultimumtechnologies/timeline>
>>  >>>  >
>>  >>>  >
>>  >>>  > st 24. 11. 2021 v 15:59 odesílatel Rodolfo Alonso Hernandez 
>> <
>>  >>>  >
>>  >>>  > ralonsoh at redhat.com> napsal:
>>  >>>  >> Hello Tony:
>>  >>>  >>
>>  >>>  >> Do you have the Neutron server logs? Do you see the
>>  >>>  >> "PortBindingChassisUpdateEvent"? When a port is bound, a
>>  >>>   Port_Bound (SB)
>>  >>>  >> event is issued and captured in the Neutron server. That 
>> will
>>  >>>   trigger the
>>  >>>  >> port binding process and the "vif-plugged" event. This OVN 
>> SB
>>  >>>   event
>>  >> should
>>  >>>  >> call "set_port_status_up" and that should write "OVN reports
>>  >>>   status up
>>  >> for
>>  >>>  >> port: %s" in the logs.
>>  >>>  >>
>>  >>>  >> Of course, this Neutron method will be called only if the
>>  >>>   logical switch
>>  >>>  >> port is UP.
>>  >>>  >>
>>  >>>  >> Regards.
>>  >>>  >>
>>  >>>  >>
>>  >>>  >> On Tue, Nov 23, 2021 at 11:59 PM Tony Liu
>>  >>>   <tonyliu0592 at hotmail.com>
>>  >>>  >>
>>  >>>  >> wrote:
>>  >>>  >>> Hi,
>>  >>>  >>>
>>  >>>  >>> I see such problem from time to time. It's not consistently
>>  >>>  >>> reproduceable.
>>  >>>  >>> ======================
>>  >>>  >>> 2021-11-23 22:16:28.532 7 INFO nova.compute.manager
>>  >>>  >>> [req-50817b01-e7ae-4991-94fe-e4c5672c481b - - - - -] 
>> [instance:
>>  >>>  >>> 7021bf37-348d-4dfa-b0c5-6a251fceb6cd] During 
>> sync_power_state
>>  >>>   the
>>  >> instance
>>  >>>  >>> has a pending task (spawning). Skip.
>>  >>>  >>> 2021-11-23 22:21:28.342 7 WARNING nova.virt.libvirt.driver
>>  >>>  >>> [req-814c98a4-3fd9-4607-9abb-5fbe5cef8650
>>  >> 3a4c320d64664d9cb6784b7ea52d618a
>>  >>>  >>> 21384fe21ebb4339b1a5f2a8f9000ea3 - default default] 
>> [instance:
>>  >>>  >>> 7021bf37-348d-4dfa-b0c5-6a251fceb6cd] Timeout waiting for
>>  >>>  >>> [('network-vif-plugged',
>>  >>>   '735d97a3-db7c-42ed-b3be-1596d4cc7f4b')] for
>>  >>>  >>> instance with vm_state building and task_state spawning.:
>>  >>>  >>> eventlet.timeout.Timeout: 300 seconds
>>  >>>  >>> ======================
>>  >>>  >>> The VIF/port is activated by OVN ovn-controller to ovn-sb. 
>> It
>>  >>>   seems
>>  >>>  >>> that, either Neutron didn't
>>  >>>  >>> capture the update or didn't send message back to
>>  >>> nova-compute.   Is there
>>  >>>  >>> any known fix for
>>  >>>  >>> this problem?
>>  >>>  >>>
>>  >>>  >>>
>>  >>>  >>> Thanks!
>>  >>>  >>> Tony
>>  >>
>>  >>
>>  >> --
>>  >> Slawek Kaplonski
>>  >> Principal Software Engineer
>>  >> Red Hat
>>  >
>>  >
>>  >
>> 
>> 





More information about the openstack-discuss mailing list