Hi neutron team,

We've been troubleshooting an issue with neutron's dhcp-client for sometime now.


We were previously on neutron 12.0.5 and observed that upon reloading 5-8 baremetals simultaneously almost 
always led to a few baremetals failing PXE boot during provisioning and/or cleaning.

This was traced to a fixed bug.
https://bugs.launchpad.net/neutron/+bug/1760047
which was applied to Queens in April 2019.
https://review.opendev.org/#/c/649580/


We patched the above fix but found out the problem was not resolved.
The fix gets rid of the semaphores by serializing the multiple messages into a Priority Queue.
The Priority Queue then drains the messages serially one by one making sure not to yield during the processing
of each message.  All in all this just seems like a more elegant way of getting rid of the semaphores but does not really 
fix the issue at hand.



Below are the logs from dhcp-agent in neutron release 12.0.5.  As can be seen the semaphore locks all threads for almost 6 seconds.
While the below has been fixed using 
the underlying problem has not been fixed.  The semaphore has been removed but instead the message is being serialized and does not yield
resulting in PXE boot failures on the baremetal nodes.

Any pointers would be appreciated.

thanks,
Fred


2019-10-03 18:07:37.454 318956 DEBUG oslo_concurrency.lockutils [req-eac79995-3846-46e6-b946-c5b5ccdb7aa5 8941137e383548bda725e74a93b2f865 19f6fb7446dc47dd88c63cf03c1cce94 - - -] Acquired semaphore "dhcp-agent-network-lock-077aa2d1-605c-48ec-842d-7dd6767bfd01" lock /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:212

2019-10-03 18:07:37.455 318956 DEBUG neutron.agent.dhcp.agent [req-eac79995-3846-46e6-b946-c5b5ccdb7aa5 8941137e383548bda725e74a93b2f865 19f6fb7446dc47dd88c63cf03c1cce94 - - -] Calling driver for network: 077aa2d1-605c-48ec-842d-7dd6767bfd01 action: reload_allocations call_driver /usr/lib/python2.7/site-packages/neutron/agent/dhcp/agent.py:135

2019-10-03 18:07:37.456 318956 DEBUG neutron.agent.linux.utils [req-eac79995-3846-46e6-b946-c5b5ccdb7aa5 8941137e383548bda725e74a93b2f865 19f6fb7446dc47dd88c63cf03c1cce94 - - -] Running command (rootwrap daemon): ['ip', 'netns', 'exec', 'qdhcp-077aa2d1-605c-48ec-842d-7dd6767bfd01', 'dhcp_release', 'ns-8387b854-d1', '10.33.27.77', '9c:71:3a:cb:7c:43', '01:9c:71:3a:cb:7c:43'] execute_rootwrap_daemon /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:108

2019-10-03 18:07:38.101 318956 DEBUG neutron.agent.linux.utils [req-eac79995-3846-46e6-b946-c5b5ccdb7aa5 8941137e383548bda725e74a93b2f865 19f6fb7446dc47dd88c63cf03c1cce94 - - -] Running command (rootwrap daemon): ['ip', 'netns', 'exec', 'qdhcp-077aa2d1-605c-48ec-842d-7dd6767bfd01', 'dhcp_release', 'ns-8387b854-d1', '10.33.27.75', '9c:71:3a:cb:7b:fb'] execute_rootwrap_daemon /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:108

2019-10-03 18:07:38.717 318956 DEBUG neutron.agent.linux.utils [req-eac79995-3846-46e6-b946-c5b5ccdb7aa5 8941137e383548bda725e74a93b2f865 19f6fb7446dc47dd88c63cf03c1cce94 - - -] Running command (rootwrap daemon): ['ip', 'netns', 'exec', 'qdhcp-077aa2d1-605c-48ec-842d-7dd6767bfd01', 'dhcp_release', 'ns-8387b854-d1', '10.33.27.75', '9c:71:3a:cb:7b:fb', 'ff:3a:cb:7b:fb:00:04:8a:ef:2f:58:b4:20:45:03:80:27:0f:15:84:a4:70:7b'] execute_rootwrap_daemon /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:108

2019-10-03 18:07:39.631 318956 DEBUG neutron.agent.linux.dhcp [req-eac79995-3846-46e6-b946-c5b5ccdb7aa5 8941137e383548bda725e74a93b2f865 19f6fb7446dc47dd88c63cf03c1cce94 - - -] Building host file: /var/lib/neutron/dhcp/077aa2d1-605c-48ec-842d-7dd6767bfd01/host _output_hosts_file /usr/lib/python2.7/site-packages/neutron/agent/linux/dhcp.py:695

2019-10-03 18:07:39.632 318956 DEBUG neutron.agent.linux.dhcp [req-eac79995-3846-46e6-b946-c5b5ccdb7aa5 8941137e383548bda725e74a93b2f865 19f6fb7446dc47dd88c63cf03c1cce94 - - -] Done building host file /var/lib/neutron/dhcp/077aa2d1-605c-48ec-842d-7dd6767bfd01/host _output_hosts_file /usr/lib/python2.7/site-packages/neutron/agent/linux/dhcp.py:734

2019-10-03 18:07:39.633 318956 DEBUG neutron.agent.linux.utils [req-eac79995-3846-46e6-b946-c5b5ccdb7aa5 8941137e383548bda725e74a93b2f865 19f6fb7446dc47dd88c63cf03c1cce94 - - -] Running command (rootwrap daemon): ['ip', 'netns', 'exec', 'qdhcp-077aa2d1-605c-48ec-842d-7dd6767bfd01', 'ip', 'addr', 'show', 'ns-8387b854-d1'] execute_rootwrap_daemon /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:108

2019-10-03 18:07:40.263 318956 DEBUG neutron.agent.linux.utils [req-eac79995-3846-46e6-b946-c5b5ccdb7aa5 8941137e383548bda725e74a93b2f865 19f6fb7446dc47dd88c63cf03c1cce94 - - -] Running command (rootwrap daemon): ['kill', '-HUP', '319109'] execute_rootwrap_daemon /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:108

2019-10-03 18:07:40.843 318956 DEBUG neutron.agent.linux.dhcp [req-eac79995-3846-46e6-b946-c5b5ccdb7aa5 8941137e383548bda725e74a93b2f865 19f6fb7446dc47dd88c63cf03c1cce94 - - -] Reloading allocations for network: 077aa2d1-605c-48ec-842d-7dd6767bfd01 reload_allocations /usr/lib/python2.7/site-packages/neutron/agent/linux/dhcp.py:524

2019-10-03 18:07:40.843 318956 DEBUG neutron.agent.linux.utils [req-eac79995-3846-46e6-b946-c5b5ccdb7aa5 8941137e383548bda725e74a93b2f865 19f6fb7446dc47dd88c63cf03c1cce94 - - -] Running command (rootwrap daemon): ['ip', 'netns', 'exec', 'qdhcp-077aa2d1-605c-48ec-842d-7dd6767bfd01', 'ip', '-4', 'route', 'list', 'dev', 'ns-8387b854-d1'] execute_rootwrap_daemon /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:108

2019-10-03 18:07:41.462 318956 DEBUG neutron.agent.linux.utils [req-eac79995-3846-46e6-b946-c5b5ccdb7aa5 8941137e383548bda725e74a93b2f865 19f6fb7446dc47dd88c63cf03c1cce94 - - -] Running command (rootwrap daemon): ['ip', 'netns', 'exec', 'qdhcp-077aa2d1-605c-48ec-842d-7dd6767bfd01', 'ip', '-6', 'route', 'list', 'dev', 'ns-8387b854-d1'] execute_rootwrap_daemon /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:108

2019-10-03 18:07:42.101 318956 DEBUG oslo_concurrency.lockutils [req-eac79995-3846-46e6-b946-c5b5ccdb7aa5 8941137e383548bda725e74a93b2f865 19f6fb7446dc47dd88c63cf03c1cce94 - - -] Releasing semaphore "dhcp-agent-network-lock-077aa2d1-605c-48ec-842d-7dd6767bfd01" lock /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:228