[Neutron][OVN] nova-compute timeout while waiting for VIF activation confirmed by Neutron
Balazs Gibizer
balazs.gibizer at est.tech
Tue Nov 30 14:32:31 UTC 2021
On Tue, Nov 30 2021 at 03:19:55 PM +0100, Michal Arbet
<michal.arbet at ultimum.io> wrote:
> Hi,
>
> Thank you Balazs for investigation, so, the question is, why other
> VIFs events are received quickly and that last one no ?
Yes, It seems that most of the events came within a minute but the last
one did not received within the timeout value.
> It looks like a bug, isn't it ? Sorry, I am not as good as you to
> review code in deep.
It could be but I did not check the neutron side of the missing event.
I hope Slaweq or Rodolfo can take that. (Yeah it is pretty unfortunate
to ping-pong about this type of issues between nova and neutron :/)
>
> Btw, did you see message from my colleague Jan Vondra ?
I saw it but it is mostly targeting neutron where I don't have deep
enough knowledge to comment.
Cheers,
gibi
> It's from email chain [neutron][nova] [kolla] vif plugged timeout
>
> >>>>>>>>>>>>>
>
>
> Hi guys,
> I've been further investigating Michal's (OP) issue, since he is on
> his holiday, and I've found out that the issue is not really plugging
> the VIF but cleanup after previous port bindings.
>
> We are creating 6 servers with 2-4 vifs using heat template [0]. We
> were hitting some problems with placement so the stack sometimes
> failed to create and we had to delete the stack and recreate it.
> If we recreate it right after the deletion, the vif plugging timeout
> occurs. If we wait some time (approx. 10 minutes) the stack is
> created successfully.
>
> This brings me to believe that there is some issue with deferring the
> removal of security groups from unbound ports (somewhere around this
> part of code [1]) and it somehow affects the creation of new ports.
> However, I am unable to find any lock that could cause this behaviour.
>
> The only proof I have is that after the stack recreation scenario I
> have measured that the process_network_ports [2] function call could
> take up to 650 s (varies from 5 s to 651 s in our environment).
>
> Any idea what could be causing this?
> [0] https://pastebin.com/infvj4ai
> [1]
> https://github.com/openstack/neutron/blob/master/neutron/agent/firewall.py#L133
> [2]
> https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py#L2079
>
> Jan Vondra
> http://ultimum.io
>
> <<<<<<<<<<<<<
>
> Thank you,
> Michal Arbet (kevko)
>
>
> Michal Arbet
> Openstack Engineer
>
> Ultimum Technologies a.s.
> Na Poříčí 1047/26, 11000 Praha 1
> Czech Republic
>
> +420 604 228 897
> michal.arbet at ultimum.io
> https://ultimum.io
>
> LinkedIn | Twitter | Facebook
>
>
> út 30. 11. 2021 v 14:28 odesílatel Balazs Gibizer
> <balazs.gibizer at est.tech> napsal:
>>
>>
>> On Tue, Nov 30 2021 at 11:16:00 AM +0100, Balazs Gibizer
>> <balazs.gibizer at est.tech> wrote:
>> >
>> >
>> > On Thu, Nov 25 2021 at 11:57:45 AM +0100, Slawek Kaplonski
>> > <skaplons at redhat.com> wrote:
>> >> Hi,
>> >>
>> >> On czwartek, 25 listopada 2021 10:41:14 CET Rodolfo Alonso
>> Hernandez
>> >> wrote:
>> >>> Hello Michal:
>> >>>
>> >>> I think this thread is related to OVN. In any case, I've
>> analyzed
>> >>> your logs
>> >>> from the Neutron point of view. Those are basically the key
>> points
>> >>> in your
>> >>> logs: [1]. The Neutron server receives the request to create
>> and
>> >>> bind a new
>> >>> port. Both the DHCP agent and the OVS agent provision the port
>> >>> (8a472e87-4a7a-4ad4-9fbb-fd9785136611).
>> >>>
>> >>> Neutron server sends the "network-vif-plugged" event at
>> >>> 08:51:05.635 and
>> >>> receives the ACK from Nova at 08:51:05.704. Nova server
>> creates the
>> >>> corresponding event for the instance on the compute0:
>> >>> 2021-11-25 08:51:05.692 23 INFO
>> >>> nova.api.openstack.compute.server_external_events
>> >>> [req-393fcb9a-5c68-4b0a-af1b-661e73d947cb
>> >>> 01a75e3a9a9148218916d3beafae2120
>> >>> 7d16babef0524b7dade9a59b0a3569e1 - default default] Creating
>> event
>> >>> network-vif-plugged:8a472e87-4a7a-4ad4-9fbb-fd9785136611 for
>> >>> instance
>> >>> 09eff2ce-356f-430f-ab30-5de58f58d698 on compute0
>> >>>
>> >>> Nova compute agent receives this server event:
>> >>> 2021-11-25 08:51:05.705 8 DEBUG nova.compute.manager
>> >>> [req-393fcb9a-5c68-4b0a-af1b-661e73d947cb
>> >>> 01a75e3a9a9148218916d3beafae2120
>> >>> 7d16babef0524b7dade9a59b0a3569e1 - default default] [instance:
>> >>> 09eff2ce-356f-430f-ab30-5de58f58d698] Processing event
>> >>> network-vif-plugged-8a472e87-4a7a-4ad4-9fbb-fd9785136611
>> >>> _process_instance_event
>> >>>
>> >>>
>> /var/lib/kolla/venv/lib/python3.9/site-packages/nova/compute/manager.py:
>> >> 10205
>> >>>
>> >>> Further triagging and log analysis should be done by Nova
>> folks to
>> >>> understand why Nova compute didn't boot this VM. I fail to
>> >>> understand some
>> >>> parts.
>> >>
>> >> Thx Rodolfo. I also took a look at those logs (for server
>> >> 09eff2ce-356f-430f-
>> >> ab30-5de58f58d698 which had port
>> >> 0761ff2f-ff6a-40a5-85b7-c9b40386cbe0 but I
>> >> can confirm what You found actually. Port was pretty quickly
>> >> provisioned and
>> >> it was reported to Nova. I don't know why Nova didn't unpause
>> that
>> >> vm.
>> >
>> > I took a look at the logs (thanks for providing it!). Nova did not
>> > unpause the VM a there was more than one port to be plugged an
>> from
>> > one, bf0f9dd2-4fd3-4028-a3dc-e2581ff081d0, there was no
>> vif-plugged
>> > event received.
>> >
>> > See my grepping of the log to show this
>> > https://paste.opendev.org/show/811348/
>>
>>
>> as a side note I pushed a small patch to nova to log the which event
>> was timed out https://review.opendev.org/c/openstack/nova/+/819817
>>
>> >
>> > Cheers,
>> > gibi
>> >
>> >>
>> >> Port creation in Nova:
>> >> 2021-11-25 08:50:24.250 8 INFO os_vif [req-6eab0f2a-
>> >> d7f4-4251-96f4-177fc0c62212 647a328aef9b4a8b890f102d8f018576
>> >> 49231caa42f34da8a9d10229b5c7a5d8 - default default] Successfully
>> >> plugged vif
>> >>
>> VIFOpenVSwitch(active=False,address=fa:16:3e:3d:cd:b9,bridge_name='br-
>> >> int',has_traffic_filtering=True,id=0761ff2f-ff6a-40a5-85b7-
>> >> c9b40386cbe0,network=Network(cb4e6be4-9bc7-4c34-95dd-
>> >>
>> dfc3c8e9d93c),plugin='ovs',port_profile=VIFPortProfileOpenVSwitch,preserve_on_delete=True,vif_name='tap0761ff2f-
>> >> ff')
>> >>
>> >> Neutron finished provisioning of that port and sent notification
>> to
>> >> nova about
>> >> 34 seconds later:
>> >> 2021-11-25 08:51:00.403 30 DEBUG neutron.plugins.ml2.rpc [req-
>> >> df864eb8-0c2e-48ca-a3e7-d90dc70e2aa0 - - - - -] Device 0761ff2f-
>> >> ff6a-40a5-85b7-c9b40386cbe0 up at agent ovs-agent-compute0
>> >> update_device_up /
>> >>
>> var/lib/kolla/venv/lib/python3.9/site-packages/neutron/plugins/ml2/rpc.py:278
>> >> 2021-11-25 08:51:00.503 29 DEBUG neutron.db.provisioning_blocks
>> [req-
>> >> a5711e9f-9b15-4a1a-a58d-07952fb8ac34 - - - - -] Provisioning
>> >> complete for port
>> >> 67b32ddf-dddd-439e-92be-6a89df49b4af triggered by entity L2.
>> >> provisioning_complete
>> >> /var/lib/kolla/venv/lib/python3.9/site-packages/neutron/
>> >> db/provisioning_blocks.py:139
>> >> 2021-11-25 08:51:00.504 29 DEBUG neutron_lib.callbacks.manager
>> [req-
>> >> a5711e9f-9b15-4a1a-a58d-07952fb8ac34 - - - - -] Notify callbacks
>> >>
>> ['neutron.plugins.ml2.plugin.Ml2Plugin._port_provisioned-8372308']
>> >> for port,
>> >> provisioning_complete _notify_loop
>> >> /var/lib/kolla/venv/lib/python3.9/site-
>> >> packages/neutron_lib/callbacks/manager.py:192
>> >> 2021-11-25 08:51:00.505 30 DEBUG neutron.db.provisioning_blocks
>> [req-
>> >> df864eb8-0c2e-48ca-a3e7-d90dc70e2aa0 - - - - -] Provisioning for
>> port
>> >> 0761ff2f-ff6a-40a5-85b7-c9b40386cbe0 completed by entity L2.
>> >> provisioning_complete
>> >> /var/lib/kolla/venv/lib/python3.9/site-packages/neutron/
>> >> db/provisioning_blocks.py:133
>> >> 2021-11-25 08:51:00.515 30 DEBUG neutron.db.provisioning_blocks
>> [req-
>> >> df864eb8-0c2e-48ca-a3e7-d90dc70e2aa0 - - - - -] Provisioning
>> >> complete for port
>> >> 0761ff2f-ff6a-40a5-85b7-c9b40386cbe0 triggered by entity L2.
>> >> provisioning_complete
>> >> /var/lib/kolla/venv/lib/python3.9/site-packages/neutron/
>> >> db/provisioning_blocks.py:139
>> >> 2021-11-25 08:51:00.516 30 DEBUG neutron_lib.callbacks.manager
>> [req-
>> >> df864eb8-0c2e-48ca-a3e7-d90dc70e2aa0 - - - - -] Notify callbacks
>> >>
>> ['neutron.plugins.ml2.plugin.Ml2Plugin._port_provisioned-8372308']
>> >> for port,
>> >> provisioning_complete _notify_loop
>> >> /var/lib/kolla/venv/lib/python3.9/site-
>> >> packages/neutron_lib/callbacks/manager.py:192
>> >> 2021-11-25 08:51:00.701 30 DEBUG neutron_lib.callbacks.manager
>> [req-
>> >> df864eb8-0c2e-48ca-a3e7-d90dc70e2aa0 - - - - -] Notify callbacks
>> >>
>> ['neutron.plugins.ml2.plugin.SecurityGroupDbMixin._ensure_default_security_group_handler-7972195']
>> >> for port, before_update _notify_loop
>> >> /var/lib/kolla/venv/lib/python3.9/site-
>> >> packages/neutron_lib/callbacks/manager.py:192
>> >> 2021-11-25 08:51:00.746 30 DEBUG neutron.notifiers.nova [-]
>> Sending
>> >> events:
>> >> [{'server_uuid': '09eff2ce-356f-430f-ab30-5de58f58d698', 'name':
>> >> 'network-vif-
>> >> plugged', 'status': 'completed', 'tag': '0761ff2f-ff6a-40a5-85b7-
>> >> c9b40386cbe0'}] send_events
>> >> /var/lib/kolla/venv/lib/python3.9/site-packages/
>> >> neutron/notifiers/nova.py:262
>> >> 2021-11-25 08:51:00.762 30 DEBUG neutron.plugins.ml2.db [req-
>> >> df864eb8-0c2e-48ca-a3e7-d90dc70e2aa0 - - - - -] For port
>> 0761ff2f-
>> >> ff6a-40a5-85b7-c9b40386cbe0, host compute0, got binding levels
>> >>
>> [PortBindingLevel(driver='openvswitch',host='compute0',level=0,port_id=0761ff2f-
>> >> ff6a-40a5-85b7-c9b40386cbe0,segment=NetworkSegment(ea3a27b6-
>> >>
>> c3fa-40e3-9937-71b652cba855),segment_id=ea3a27b6-c3fa-40e3-9937-71b652cba855)]
>> >> get_binding_level_objs
>> >> /var/lib/kolla/venv/lib/python3.9/site-packages/
>> >> neutron/plugins/ml2/db.py:75
>> >> 2021-11-25 08:51:00.818 30 INFO neutron.notifiers.nova [-] Nova
>> event
>> >> response: {'server_uuid': '09eff2ce-356f-430f-ab30-5de58f58d698',
>> >> 'name':
>> >> 'network-vif-plugged', 'status': 'completed', 'tag':
>> >> '0761ff2f-ff6a-40a5-85b7-
>> >> c9b40386cbe0', 'code': 200}
>> >>
>> >>>
>> >>> Regards.
>> >>>
>> >>> [1]https://paste.opendev.org/show/811273/
>> >>>
>> >>> On Thu, Nov 25, 2021 at 10:15 AM Michal Arbet
>> >>> <michal.arbet at ultimum.io>
>> >>>
>> >>> wrote:
>> >>> > Hello,
>> >>> >
>> >>> > In attachment you can find logs for compute0 and controller0
>> >>> (other
>> >>> > computes and controllers were turned off for this test).
>> >>> > No OVN used, this stack is based on OVS.
>> >>> >
>> >>> > Thank you,
>> >>> > Michal Arbet
>> >>> > Openstack Engineer
>> >>> >
>> >>> > Ultimum Technologies a.s.
>> >>> > Na Poříčí 1047/26, 11000 Praha 1
>> >>> > Czech Republic
>> >>> >
>> >>> > +420 604 228 897
>> >>> > michal.arbet at ultimum.io
>> >>> > *https://ultimum.io <https://ultimum.io/>*
>> >>> >
>> >>> > LinkedIn
>> <https://www.linkedin.com/company/ultimum-technologies>
>> >>> | Twitter
>> >>> > <https://twitter.com/ultimumtech> | Facebook
>> >>> > <https://www.facebook.com/ultimumtechnologies/timeline>
>> >>> >
>> >>> >
>> >>> > st 24. 11. 2021 v 15:59 odesílatel Rodolfo Alonso Hernandez
>> <
>> >>> >
>> >>> > ralonsoh at redhat.com> napsal:
>> >>> >> Hello Tony:
>> >>> >>
>> >>> >> Do you have the Neutron server logs? Do you see the
>> >>> >> "PortBindingChassisUpdateEvent"? When a port is bound, a
>> >>> Port_Bound (SB)
>> >>> >> event is issued and captured in the Neutron server. That
>> will
>> >>> trigger the
>> >>> >> port binding process and the "vif-plugged" event. This OVN
>> SB
>> >>> event
>> >> should
>> >>> >> call "set_port_status_up" and that should write "OVN reports
>> >>> status up
>> >> for
>> >>> >> port: %s" in the logs.
>> >>> >>
>> >>> >> Of course, this Neutron method will be called only if the
>> >>> logical switch
>> >>> >> port is UP.
>> >>> >>
>> >>> >> Regards.
>> >>> >>
>> >>> >>
>> >>> >> On Tue, Nov 23, 2021 at 11:59 PM Tony Liu
>> >>> <tonyliu0592 at hotmail.com>
>> >>> >>
>> >>> >> wrote:
>> >>> >>> Hi,
>> >>> >>>
>> >>> >>> I see such problem from time to time. It's not consistently
>> >>> >>> reproduceable.
>> >>> >>> ======================
>> >>> >>> 2021-11-23 22:16:28.532 7 INFO nova.compute.manager
>> >>> >>> [req-50817b01-e7ae-4991-94fe-e4c5672c481b - - - - -]
>> [instance:
>> >>> >>> 7021bf37-348d-4dfa-b0c5-6a251fceb6cd] During
>> sync_power_state
>> >>> the
>> >> instance
>> >>> >>> has a pending task (spawning). Skip.
>> >>> >>> 2021-11-23 22:21:28.342 7 WARNING nova.virt.libvirt.driver
>> >>> >>> [req-814c98a4-3fd9-4607-9abb-5fbe5cef8650
>> >> 3a4c320d64664d9cb6784b7ea52d618a
>> >>> >>> 21384fe21ebb4339b1a5f2a8f9000ea3 - default default]
>> [instance:
>> >>> >>> 7021bf37-348d-4dfa-b0c5-6a251fceb6cd] Timeout waiting for
>> >>> >>> [('network-vif-plugged',
>> >>> '735d97a3-db7c-42ed-b3be-1596d4cc7f4b')] for
>> >>> >>> instance with vm_state building and task_state spawning.:
>> >>> >>> eventlet.timeout.Timeout: 300 seconds
>> >>> >>> ======================
>> >>> >>> The VIF/port is activated by OVN ovn-controller to ovn-sb.
>> It
>> >>> seems
>> >>> >>> that, either Neutron didn't
>> >>> >>> capture the update or didn't send message back to
>> >>> nova-compute. Is there
>> >>> >>> any known fix for
>> >>> >>> this problem?
>> >>> >>>
>> >>> >>>
>> >>> >>> Thanks!
>> >>> >>> Tony
>> >>
>> >>
>> >> --
>> >> Slawek Kaplonski
>> >> Principal Software Engineer
>> >> Red Hat
>> >
>> >
>> >
>>
>>
More information about the openstack-discuss
mailing list