[neutron]: Latest Queens release: dhcp-client takes too long processing messages and falls behind.
Hi neutron team, We've been troubleshooting an issue with neutron's dhcp-client for sometime now. We were previously on neutron 12.0.5 and observed that upon reloading 5-8 baremetals simultaneously almost always led to a few baremetals failing PXE boot during provisioning and/or cleaning. This was traced to afixed bug. https://bugs.launchpad.net/neutron/+bug/1760047 which was applied to Queens in April 2019. https://review.opendev.org/#/c/649580/ We patched the above fix but found out the problem was not resolved. The fix gets rid of the semaphores by serializing the multiple messages into aPriority Queue. The Priority Queue then drains the messages serially one by one making sure notto yield during the processing of each message. All in all this just seems like a more elegant way ofgetting rid of the semaphores but does not really fix the issue at hand. Below are the logs from dhcp-agent in neutron release 12.0.5. As can be seen the semaphore locks all threads for almost 6 seconds.While the below has been fixed using https://review.opendev.org/#/c/649580/ the underlying problem has not been fixed. The semaphore has been removed but instead the message is being serialized and does not yieldresulting in PXE boot failures on the baremetal nodes. Any pointers would be appreciated. thanks,Fred 2019-10-03 18:07:37.454 318956DEBUG oslo_concurrency.lockutils [req-eac79995-3846-46e6-b946-c5b5ccdb7aa58941137e383548bda725e74a93b2f865 19f6fb7446dc47dd88c63cf03c1cce94 - - -] Acquired semaphore"dhcp-agent-network-lock-077aa2d1-605c-48ec-842d-7dd6767bfd01" lock/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:212 2019-10-03 18:07:37.455 318956DEBUG neutron.agent.dhcp.agent [req-eac79995-3846-46e6-b946-c5b5ccdb7aa5 8941137e383548bda725e74a93b2f86519f6fb7446dc47dd88c63cf03c1cce94 - - -] Calling driver for network:077aa2d1-605c-48ec-842d-7dd6767bfd01 action: reload_allocations call_driver/usr/lib/python2.7/site-packages/neutron/agent/dhcp/agent.py:135 2019-10-03 18:07:37.456 318956DEBUG neutron.agent.linux.utils [req-eac79995-3846-46e6-b946-c5b5ccdb7aa58941137e383548bda725e74a93b2f865 19f6fb7446dc47dd88c63cf03c1cce94 - - -]Running command (rootwrap daemon): ['ip', 'netns', 'exec','qdhcp-077aa2d1-605c-48ec-842d-7dd6767bfd01', 'dhcp_release', 'ns-8387b854-d1','10.33.27.77', '9c:71:3a:cb:7c:43', '01:9c:71:3a:cb:7c:43']execute_rootwrap_daemon/usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:108 2019-10-03 18:07:38.101 318956DEBUG neutron.agent.linux.utils [req-eac79995-3846-46e6-b946-c5b5ccdb7aa58941137e383548bda725e74a93b2f865 19f6fb7446dc47dd88c63cf03c1cce94 - - -]Running command (rootwrap daemon): ['ip', 'netns', 'exec','qdhcp-077aa2d1-605c-48ec-842d-7dd6767bfd01', 'dhcp_release', 'ns-8387b854-d1','10.33.27.75', '9c:71:3a:cb:7b:fb'] execute_rootwrap_daemon/usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:108 2019-10-03 18:07:38.717 318956DEBUG neutron.agent.linux.utils [req-eac79995-3846-46e6-b946-c5b5ccdb7aa58941137e383548bda725e74a93b2f865 19f6fb7446dc47dd88c63cf03c1cce94 - - -]Running command (rootwrap daemon): ['ip', 'netns', 'exec','qdhcp-077aa2d1-605c-48ec-842d-7dd6767bfd01', 'dhcp_release', 'ns-8387b854-d1','10.33.27.75', '9c:71:3a:cb:7b:fb', 'ff:3a:cb:7b:fb:00:04:8a:ef:2f:58:b4:20:45:03:80:27:0f:15:84:a4:70:7b']execute_rootwrap_daemon/usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:108 2019-10-03 18:07:39.631 318956DEBUG neutron.agent.linux.dhcp [req-eac79995-3846-46e6-b946-c5b5ccdb7aa58941137e383548bda725e74a93b2f865 19f6fb7446dc47dd88c63cf03c1cce94 - - -]Building host file: /var/lib/neutron/dhcp/077aa2d1-605c-48ec-842d-7dd6767bfd01/host_output_hosts_file/usr/lib/python2.7/site-packages/neutron/agent/linux/dhcp.py:695 2019-10-03 18:07:39.632 318956DEBUG neutron.agent.linux.dhcp [req-eac79995-3846-46e6-b946-c5b5ccdb7aa58941137e383548bda725e74a93b2f865 19f6fb7446dc47dd88c63cf03c1cce94 - - -] Donebuilding host file/var/lib/neutron/dhcp/077aa2d1-605c-48ec-842d-7dd6767bfd01/host_output_hosts_file/usr/lib/python2.7/site-packages/neutron/agent/linux/dhcp.py:734 2019-10-03 18:07:39.633 318956DEBUG neutron.agent.linux.utils [req-eac79995-3846-46e6-b946-c5b5ccdb7aa58941137e383548bda725e74a93b2f865 19f6fb7446dc47dd88c63cf03c1cce94 - - -]Running command (rootwrap daemon): ['ip', 'netns', 'exec','qdhcp-077aa2d1-605c-48ec-842d-7dd6767bfd01', 'ip', 'addr', 'show','ns-8387b854-d1'] execute_rootwrap_daemon/usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:108 2019-10-03 18:07:40.263 318956DEBUG neutron.agent.linux.utils [req-eac79995-3846-46e6-b946-c5b5ccdb7aa58941137e383548bda725e74a93b2f865 19f6fb7446dc47dd88c63cf03c1cce94 - - -]Running command (rootwrap daemon): ['kill', '-HUP', '319109']execute_rootwrap_daemon/usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:108 2019-10-03 18:07:40.843 318956DEBUG neutron.agent.linux.dhcp [req-eac79995-3846-46e6-b946-c5b5ccdb7aa58941137e383548bda725e74a93b2f865 19f6fb7446dc47dd88c63cf03c1cce94 - - -]Reloading allocations for network: 077aa2d1-605c-48ec-842d-7dd6767bfd01reload_allocations/usr/lib/python2.7/site-packages/neutron/agent/linux/dhcp.py:524 2019-10-03 18:07:40.843 318956DEBUG neutron.agent.linux.utils [req-eac79995-3846-46e6-b946-c5b5ccdb7aa58941137e383548bda725e74a93b2f865 19f6fb7446dc47dd88c63cf03c1cce94 - - -]Running command (rootwrap daemon): ['ip', 'netns', 'exec','qdhcp-077aa2d1-605c-48ec-842d-7dd6767bfd01', 'ip', '-4', 'route', 'list','dev', 'ns-8387b854-d1'] execute_rootwrap_daemon/usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:108 2019-10-03 18:07:41.462 318956DEBUG neutron.agent.linux.utils [req-eac79995-3846-46e6-b946-c5b5ccdb7aa5 8941137e383548bda725e74a93b2f86519f6fb7446dc47dd88c63cf03c1cce94 - - -] Running command (rootwrap daemon):['ip', 'netns', 'exec', 'qdhcp-077aa2d1-605c-48ec-842d-7dd6767bfd01', 'ip','-6', 'route', 'list', 'dev', 'ns-8387b854-d1'] execute_rootwrap_daemon /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:108 2019-10-03 18:07:42.101 318956DEBUG oslo_concurrency.lockutils [req-eac79995-3846-46e6-b946-c5b5ccdb7aa58941137e383548bda725e74a93b2f865 19f6fb7446dc47dd88c63cf03c1cce94 - - -] Releasing semaphore"dhcp-agent-network-lock-077aa2d1-605c-48ec-842d-7dd6767bfd01" lock/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:228
participants (1)
- 
                
                fsbiz@yahoo.com