Wallaby on Ubuntu 20.04, Neutron 18.6.0 neutron-dhcp-agent RPC unusually slow

Zakhar Kirpichenko zakhar at gmail.com
Tue Mar 14 06:34:48 UTC 2023


Hi!

We're running Openstack Wallaby on Ubuntu 20.04, 3 high-performance infra
nodes with a RabbitMQ cluster. I updated Neutron components to version
18.6.0, which recently became available in the cloud repository (
http://ubuntu-cloud.archive.canonical.com/ubuntu focal-updates/wallaby
main). The exact package versions updated are as follows:

Install: libunbound8:amd64 (1.9.4-2ubuntu1.4, automatic),
openvswitch-common:amd64 (2.15.2-0ubuntu1~cloud0, automatic)
Upgrade: neutron-common:amd64 (2:18.5.0-0ubuntu1~cloud0,
2:18.6.0-0ubuntu1~cloud1), python3-werkzeug:amd64 (0.16.1+dfsg1-2,
0.16.1+dfsg1-2ubuntu0.1), neutron-dhcp-agent:amd64
(2:18.5.0-0ubuntu1~cloud0, 2:18.6.0-0ubuntu1~cloud1),
neutron-l3-agent:amd64 (2:18.5.0-0ubuntu1~cloud0,
2:18.6.0-0ubuntu1~cloud1), python3-neutron:amd64 (2:18.5.0-0ubuntu1~cloud0,
2:18.6.0-0ubuntu1~cloud1), neutron-server:amd64 (2:18.5.0-0ubuntu1~cloud0,
2:18.6.0-0ubuntu1~cloud1), neutron-plugin-ml2:amd64
(2:18.5.0-0ubuntu1~cloud0, 2:18.6.0-0ubuntu1~cloud1),
neutron-metadata-agent:amd64 (2:18.5.0-0ubuntu1~cloud0,
2:18.6.0-0ubuntu1~cloud1), neutron-linuxbridge-agent:amd64
(2:18.5.0-0ubuntu1~cloud0, 2:18.6.0-0ubuntu1~cloud1)

Installed Neutron packages:

ii  neutron-common                        2:18.6.0-0ubuntu1~cloud1
                    all          Neutron is a virtual network service for
Openstack - common
ii  neutron-dhcp-agent                    2:18.6.0-0ubuntu1~cloud1
                    all          Neutron is a virtual network service for
Openstack - DHCP agent
 Firewall-as-a-Service driver for OpenStack Neutron
ii  neutron-l3-agent                      2:18.6.0-0ubuntu1~cloud1
                    all          Neutron is a virtual network service for
Openstack - l3 agent
ii  neutron-linuxbridge-agent             2:18.6.0-0ubuntu1~cloud1
                    all          Neutron is a virtual network service for
Openstack - linuxbridge agent
ii  neutron-metadata-agent                2:18.6.0-0ubuntu1~cloud1
                    all          Neutron is a virtual network service for
Openstack - metadata agent
ii  neutron-plugin-ml2                    2:18.6.0-0ubuntu1~cloud1
                    all          Neutron is a virtual network service for
Openstack - ML2 plugin
ii  neutron-server                        2:18.6.0-0ubuntu1~cloud1
                    all          Neutron is a virtual network service for
Openstack - server
ii  python3-neutron                       2:18.6.0-0ubuntu1~cloud1
                    all          Neutron is a virtual network service for
Openstack - Python library
ii  python3-neutron-lib                   2.10.1-0ubuntu1~cloud0
                    all          Neutron shared routines and utilities -
Python 3.x
ii  python3-neutronclient                 1:7.2.1-0ubuntu1~cloud0
                   all          client API library for Neutron - Python 3.x

Normally this would be an easy update, but this time neutron-dhcp-agent
doesn't work properly:

2023-03-14 05:44:27.572 2534501 INFO neutron.agent.dhcp.agent
[req-4a362701-cc1f-4b9d-87e6-045b6a388709 - - - - -] Synchronizing state
complete
2023-03-14 05:44:38.868 2534501 ERROR neutron_lib.rpc
[req-cb1dc604-1372-44cd-bc06-09496ed5f68f - - - - -] Timeout in RPC method
dhcp_ready_on_ports. Waiting for 55 seconds before next attempt. If the
server is not down, consider increasing the rpc_response_timeout option as
Neutron server(s) may be overloaded and unable to respond quickly enough.:
oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply
to message ID bd97110b004e413cb2d6b05d9fb3b57c
2023-03-14 05:44:38.871 2534501 WARNING neutron_lib.rpc
[req-cb1dc604-1372-44cd-bc06-09496ed5f68f - - - - -] Increasing timeout for
dhcp_ready_on_ports calls to 120 seconds. Restart the agent to restore it
to the default value.: oslo_messaging.exceptions.MessagingTimeout: Timed
out waiting for a reply to message ID bd97110b004e413cb2d6b05d9fb3b57c
2023-03-14 05:45:34.244 2534501 ERROR neutron.agent.dhcp.agent
[req-cb1dc604-1372-44cd-bc06-09496ed5f68f - - - - -] Timeout notifying
server of ports ready. Retrying...:
oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply
to message ID bd97110b004e413cb2d6b05d9fb3b57c
2023-03-14 05:47:10.876 2534501 INFO oslo_messaging._drivers.amqpdriver [-]
No calling threads waiting for msg_id : bd97110b004e413cb2d6b05d9fb3b57c
2023-03-14 05:47:34.353 2534501 ERROR neutron_lib.rpc
[req-607a9252-49b1-4043-aa0d-2457b78dc99e - - - - -] Timeout in RPC method
dhcp_ready_on_ports. Waiting for 27 seconds before next attempt. If the
server is not down, consider increasing the rpc_response_timeout option as
Neutron server(s) may be overloaded and unable to respond quickly enough.:
oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply
to message ID f254f735998243c4b0a58ce95c974534
2023-03-14 05:47:34.354 2534501 WARNING neutron_lib.rpc
[req-607a9252-49b1-4043-aa0d-2457b78dc99e - - - - -] Increasing timeout for
dhcp_ready_on_ports calls to 240 seconds. Restart the agent to restore it
to the default value.: oslo_messaging.exceptions.MessagingTimeout: Timed
out waiting for a reply to message ID f254f735998243c4b0a58ce95c974534
2023-03-14 05:47:46.681 2534501 INFO oslo_messaging._drivers.amqpdriver [-]
No calling threads waiting for msg_id : f254f735998243c4b0a58ce95c974534
2023-03-14 05:48:01.086 2534501 ERROR neutron.agent.dhcp.agent
[req-607a9252-49b1-4043-aa0d-2457b78dc99e - - - - -] Timeout notifying
server of ports ready. Retrying...:
oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply
to message ID f254f735998243c4b0a58ce95c974534
2023-03-14 05:49:45.035 2534501 INFO neutron.agent.dhcp.agent
[req-5935a0d0-a981-463c-a4ea-23ccbb54c896 - - - - -] DHCP configuration for
ports ... (A successful configuration here).

While neutron-dhcp-agent is waiting, neutron-server log gets filled up with:

neutron-server.log:2023-03-14 05:47:05.761 4171971 INFO
neutron.plugins.ml2.plugin [req-cb1dc604-1372-44cd-bc06-09496ed5f68f - - -
- -] Attempt 1 to provision port 18cddbb8-f3ed-4b49-9c6f-c0c67b4f7c76
...
neutron-server.log:2023-03-14 05:47:10.727 4171971 INFO
neutron.plugins.ml2.plugin [req-cb1dc604-1372-44cd-bc06-09496ed5f68f - - -
- -] Attempt 10 to provision port 18cddbb8-f3ed-4b49-9c6f-c0c67b4f7c76

This repeats for each port of each network neutron-dhcp-agent needs to
configure.

Each subsequent configuration for each network takes about 1-2
minutes, depending on the network size. With earlier Neutron versions the
whole process of configuring all networks would finish in under a minute,
i.e. DHCP configuration per port (and network) is several orders of
magnitude slower than it should be. Once neutron-dhcp-agent finishes
synchronization, it seems to work without issues although there aren't that
many changes in our cloud to tell whether it's fast or slow, individual
port updates seem to happen quickly.

All other services are working well, RabbitMQ cluster is working well,
infra nodes are not overloaded and there are no apparent issues other than
this one with Neutron, thus I am inclined to think that the issue is
specific to version 18.6.0 of neutron-dhcp-agent or neutron-server.

I would appreciate any advice!

Best regards,
Zakhar
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230314/26be3614/attachment-0001.htm>


More information about the openstack-discuss mailing list