Port creation times out for some VMs in large group
When I create 100 VMs in our prod cluster: openstack server create --flavor s1.tiny --network it-network --image cirros-0.4.0-x86_64 --min 100 --max 100 alberttest Most of them build successfully in about a minute. 5 or 10 will stay in BUILD status for 5 minutes and then fail with " BuildAbortException: Build of instance <UUID> aborted: Failed to allocate the network(s), not rescheduling." If I build smaller numbers, I see less failures, and no failures if I build one at a time. This does not happen in dev or QA; it appears that we are exhausting a resource in prod. I tried reducing various config values in dev but am not able to duplicate the issue. The neutron servers don't appear to be overloaded during the failure. What config variables should I be looking at? Here are the relevant log entries from the HV: 2019-09-26 10:10:43.001 57008 INFO os_vif [req-dea54d9a-3f3e-4d47-b901-a4f41b1947a8 d28c3871f61e4c8c8f8c7600417f7b14 e9621e3b105245ba8660f434ab21016c - default 4fb72165eee4468e8033cdc7d506ddf0] Successfully plugged vif VIFBridge(active=False,address=fa:16:3e:8b:45:07,bridge_name='brq49cbe55d-51',has_traffic_filtering=True,id=18f4e419-b19c-4b62-b6e4-152ec78e72bc,network=Network(49cbe55d-5188-4183-b5ad-e65f9b46f8f2),plugin='linux_bridge',port_profile=<?>,preserve_on_delete=False,vif_name='tap18f4e419-b1') 2019-09-26 10:15:44.029 57008 WARNING nova.virt.libvirt.driver [req-dea54d9a-3f3e-4d47-b901-a4f41b1947a8 d28c3871f61e4c8c8f8c7600417f7b14 e9621e3b105245ba8660f434ab21016c - default 4fb72165eee4468e8033cdc7d506ddf0] [instance: dc58f154-00f9-4c45-8986-94b10821cbc9] Timeout waiting for [('network-vif-plugged', u'18f4e419-b19c-4b62-b6e4-152ec78e72bc')] for instance with vm_state building and task_state spawning.: Timeout: 300 seconds More logs and data: http://paste.openstack.org/show/779524/
Albert, Do this: https://cloudblog.switch.ch/2017/08/28/starting-1000-instances-on-switchengi... The problem will go away. I'm of the opinion that daemon mode for rootwrap should be the default since the performance improvement is an order of magnitude, but privsep may obviate that concern once its fully implemented. Either way, that should solve your problem. r Chris Apsey ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On Friday, September 27, 2019 12:17 PM, Albert Braden <Albert.Braden@synopsys.com> wrote:
When I create 100 VMs in our prod cluster:
openstack server create --flavor s1.tiny --network it-network --image cirros-0.4.0-x86_64 --min 100 --max 100 alberttest
Most of them build successfully in about a minute. 5 or 10 will stay in BUILD status for 5 minutes and then fail with “ BuildAbortException: Build of instance <UUID> aborted: Failed to allocate the network(s), not rescheduling.”
If I build smaller numbers, I see less failures, and no failures if I build one at a time. This does not happen in dev or QA; it appears that we are exhausting a resource in prod. I tried reducing various config values in dev but am not able to duplicate the issue. The neutron servers don’t appear to be overloaded during the failure.
What config variables should I be looking at?
Here are the relevant log entries from the HV:
2019-09-26 10:10:43.001 57008 INFO os_vif [req-dea54d9a-3f3e-4d47-b901-a4f41b1947a8 d28c3871f61e4c8c8f8c7600417f7b14 e9621e3b105245ba8660f434ab21016c - default 4fb72165eee4468e8033cdc7d506ddf0] Successfully plugged vif VIFBridge(active=False,address=fa:16:3e:8b:45:07,bridge_name='brq49cbe55d-51',has_traffic_filtering=True,id=18f4e419-b19c-4b62-b6e4-152ec78e72bc,network=Network(49cbe55d-5188-4183-b5ad-e65f9b46f8f2),plugin='linux_bridge',port_profile=<?>,preserve_on_delete=False,vif_name='tap18f4e419-b1')
2019-09-26 10:15:44.029 57008 WARNING nova.virt.libvirt.driver [req-dea54d9a-3f3e-4d47-b901-a4f41b1947a8 d28c3871f61e4c8c8f8c7600417f7b14 e9621e3b105245ba8660f434ab21016c - default 4fb72165eee4468e8033cdc7d506ddf0] [instance: dc58f154-00f9-4c45-8986-94b10821cbc9] Timeout waiting for [('network-vif-plugged', u'18f4e419-b19c-4b62-b6e4-152ec78e72bc')] for instance with vm_state building and task_state spawning.: Timeout: 300 seconds
More logs and data:
Thanks. This definitely helps. I am running a stable release of Queens.Even after this change I still see 10-15 failures when I create 100 VMs in our cluster. I have tracked this down (to a reasonable degree of certainty) to the SIGHUPs caused by DNSMASQ reloadsevery time a new MAC entry is added, deleted or updated. It seems to be related tohttps://bugs.launchpad.net/neutron/+bug/1598078 The fix for the above bug was abandoned. Gerrit Code Review | | | | Gerrit Code Review | | | Any further fine tuning that can be done? Thanks,Fred. On Friday, September 27, 2019, 09:37:51 AM PDT, Chris Apsey <bitskrieg@bitskrieg.net> wrote: Albert, Do this: https://cloudblog.switch.ch/2017/08/28/starting-1000-instances-on-switchengi... The problem will go away. I'm of the opinion that daemon mode for rootwrap should be the default since the performance improvement is an order of magnitude, but privsep may obviate that concern once its fully implemented. Either way, that should solve your problem. r Chris Apsey ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On Friday, September 27, 2019 12:17 PM, Albert Braden <Albert.Braden@synopsys.com> wrote: When I create 100 VMs in our prod cluster: openstack server create --flavor s1.tiny --network it-network --image cirros-0.4.0-x86_64 --min 100 --max 100 alberttest Most of them build successfully in about a minute. 5 or 10 will stay in BUILD status for 5 minutes and then fail with “ BuildAbortException: Build of instance <UUID> aborted: Failed to allocate the network(s), not rescheduling.” If I build smaller numbers, I see less failures, and no failures if I build one at a time. This does not happen in dev or QA; it appears that we are exhausting a resource in prod. I tried reducing various config values in dev but am not able to duplicate the issue. The neutron servers don’t appear to be overloaded during the failure. What config variables should I be looking at? Here are the relevant log entries from the HV: 2019-09-26 10:10:43.001 57008 INFO os_vif [req-dea54d9a-3f3e-4d47-b901-a4f41b1947a8 d28c3871f61e4c8c8f8c7600417f7b14 e9621e3b105245ba8660f434ab21016c - default 4fb72165eee4468e8033cdc7d506ddf0] Successfully plugged vif VIFBridge(active=False,address=fa:16:3e:8b:45:07,bridge_name='brq49cbe55d-51',has_traffic_filtering=True,id=18f4e419-b19c-4b62-b6e4-152ec78e72bc,network=Network(49cbe55d-5188-4183-b5ad-e65f9b46f8f2),plugin='linux_bridge',port_profile=<?>,preserve_on_delete=False,vif_name='tap18f4e419-b1') 2019-09-26 10:15:44.029 57008 WARNING nova.virt.libvirt.driver [req-dea54d9a-3f3e-4d47-b901-a4f41b1947a8 d28c3871f61e4c8c8f8c7600417f7b14 e9621e3b105245ba8660f434ab21016c - default 4fb72165eee4468e8033cdc7d506ddf0] [instance: dc58f154-00f9-4c45-8986-94b10821cbc9] Timeout waiting for [('network-vif-plugged', u'18f4e419-b19c-4b62-b6e4-152ec78e72bc')] for instance with vm_state building and task_state spawning.: Timeout: 300 seconds More logs and data: http://paste.openstack.org/show/779524/
Is that still spitting out a vif plug failure or are your instances spawning but not getting addresses? I've found that adding in the no-ping option to dnsmasq lowers load significantly, but can be dangerous if you've got potentially conflicting sources of address allocation. While it doesn't address the below bug report specifically, it may breathe some more CPU cycles into dnsmasq so it can handle other items better. R CA -------- Original Message -------- On Oct 2, 2019, 12:41, fsbiz@yahoo.com wrote:
Thanks. This definitely helps.
I am running a stable release of Queens. Even after this change I still see 10-15 failures when I create 100 VMs in our cluster.
I have tracked this down (to a reasonable degree of certainty) to the SIGHUPs caused by DNSMASQ reloads every time a new MAC entry is added, deleted or updated.
It seems to be related to https://bugs.launchpad.net/neutron/+bug/1598078
The fix for the above bug was abandoned. [Gerrit Code Review](https://review.opendev.org/#/c/336462/)
https://review.opendev.org/#/c/336462/
Gerrit Code Review
Any further fine tuning that can be done?
Thanks, Fred.
On Friday, September 27, 2019, 09:37:51 AM PDT, Chris Apsey <bitskrieg@bitskrieg.net> wrote:
Albert,
Do this: https://cloudblog.switch.ch/2017/08/28/starting-1000-instances-on-switchengi...
The problem will go away. I'm of the opinion that daemon mode for rootwrap should be the default since the performance improvement is an order of magnitude, but privsep may obviate that concern once its fully implemented.
Either way, that should solve your problem.
r
Chris Apsey
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On Friday, September 27, 2019 12:17 PM, Albert Braden <Albert.Braden@synopsys.com> wrote:
When I create 100 VMs in our prod cluster:
openstack server create --flavor s1.tiny --network it-network --image cirros-0.4.0-x86_64 --min 100 --max 100 alberttest
Most of them build successfully in about a minute. 5 or 10 will stay in BUILD status for 5 minutes and then fail with “ BuildAbortException: Build of instance <UUID> aborted: Failed to allocate the network(s), not rescheduling.”
If I build smaller numbers, I see less failures, and no failures if I build one at a time. This does not happen in dev or QA; it appears that we are exhausting a resource in prod. I tried reducing various config values in dev but am not able to duplicate the issue. The neutron servers don’t appear to be overloaded during the failure.
What config variables should I be looking at?
Here are the relevant log entries from the HV:
2019-09-26 10:10:43.001 57008 INFO os_vif [req-dea54d9a-3f3e-4d47-b901-a4f41b1947a8 d28c3871f61e4c8c8f8c7600417f7b14 e9621e3b105245ba8660f434ab21016c - default 4fb72165eee4468e8033cdc7d506ddf0] Successfully plugged vif VIFBridge(active=False,address=fa:16:3e:8b:45:07,bridge_name='brq49cbe55d-51',has_traffic_filtering=True,id=18f4e419-b19c-4b62-b6e4-152ec78e72bc,network=Network(49cbe55d-5188-4183-b5ad-e65f9b46f8f2),plugin='linux_bridge',port_profile=<?>,preserve_on_delete=False,vif_name='tap18f4e419-b1')
2019-09-26 10:15:44.029 57008 WARNING nova.virt.libvirt.driver [req-dea54d9a-3f3e-4d47-b901-a4f41b1947a8 d28c3871f61e4c8c8f8c7600417f7b14 e9621e3b105245ba8660f434ab21016c - default 4fb72165eee4468e8033cdc7d506ddf0] [instance: dc58f154-00f9-4c45-8986-94b10821cbc9] Timeout waiting for [('network-vif-plugged', u'18f4e419-b19c-4b62-b6e4-152ec78e72bc')] for instance with vm_state building and task_state spawning.: Timeout: 300 seconds
More logs and data:
Thanks.Instances are spawning but not getting addresses.We have Infoblox as the IPAM so --no-ping should be fine.Will run the tests and update. Thanks,Fred. On Wednesday, October 2, 2019, 11:34:39 AM PDT, Chris Apsey <bitskrieg@bitskrieg.net> wrote: Is that still spitting out a vif plug failure or are your instances spawning but not getting addresses? I've found that adding in the no-ping option to dnsmasq lowers load significantly, but can be dangerous if you've got potentially conflicting sources of address allocation. While it doesn't address the below bug report specifically, it may breathe some more CPU cycles into dnsmasq so it can handle other items better. R CA -------- Original Message -------- On Oct 2, 2019, 12:41, fsbiz@yahoo.com < fsbiz@yahoo.com> wrote: Thanks. This definitely helps. I am running a stable release of Queens.Even after this change I still see 10-15 failures when I create 100 VMs in our cluster. I have tracked this down (to a reasonable degree of certainty) to the SIGHUPs caused by DNSMASQ reloadsevery time a new MAC entry is added, deleted or updated. It seems to be related tohttps://bugs.launchpad.net/neutron/+bug/1598078 The fix for the above bug was abandoned. Gerrit Code Review | | | | Gerrit Code Review | | | Any further fine tuning that can be done? Thanks,Fred. On Friday, September 27, 2019, 09:37:51 AM PDT, Chris Apsey <bitskrieg@bitskrieg.net> wrote: Albert, Do this: https://cloudblog.switch.ch/2017/08/28/starting-1000-instances-on-switchengi... The problem will go away. I'm of the opinion that daemon mode for rootwrap should be the default since the performance improvement is an order of magnitude, but privsep may obviate that concern once its fully implemented. Either way, that should solve your problem. r Chris Apsey ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On Friday, September 27, 2019 12:17 PM, Albert Braden <Albert.Braden@synopsys.com> wrote: When I create 100 VMs in our prod cluster: openstack server create --flavor s1.tiny --network it-network --image cirros-0.4.0-x86_64 --min 100 --max 100 alberttest Most of them build successfully in about a minute. 5 or 10 will stay in BUILD status for 5 minutes and then fail with “ BuildAbortException: Build of instance <UUID> aborted: Failed to allocate the network(s), not rescheduling.” If I build smaller numbers, I see less failures, and no failures if I build one at a time. This does not happen in dev or QA; it appears that we are exhausting a resource in prod. I tried reducing various config values in dev but am not able to duplicate the issue. The neutron servers don’t appear to be overloaded during the failure. What config variables should I be looking at? Here are the relevant log entries from the HV: 2019-09-26 10:10:43.001 57008 INFO os_vif [req-dea54d9a-3f3e-4d47-b901-a4f41b1947a8 d28c3871f61e4c8c8f8c7600417f7b14 e9621e3b105245ba8660f434ab21016c - default 4fb72165eee4468e8033cdc7d506ddf0] Successfully plugged vif VIFBridge(active=False,address=fa:16:3e:8b:45:07,bridge_name='brq49cbe55d-51',has_traffic_filtering=True,id=18f4e419-b19c-4b62-b6e4-152ec78e72bc,network=Network(49cbe55d-5188-4183-b5ad-e65f9b46f8f2),plugin='linux_bridge',port_profile=<?>,preserve_on_delete=False,vif_name='tap18f4e419-b1') 2019-09-26 10:15:44.029 57008 WARNING nova.virt.libvirt.driver [req-dea54d9a-3f3e-4d47-b901-a4f41b1947a8 d28c3871f61e4c8c8f8c7600417f7b14 e9621e3b105245ba8660f434ab21016c - default 4fb72165eee4468e8033cdc7d506ddf0] [instance: dc58f154-00f9-4c45-8986-94b10821cbc9] Timeout waiting for [('network-vif-plugged', u'18f4e419-b19c-4b62-b6e4-152ec78e72bc')] for instance with vm_state building and task_state spawning.: Timeout: 300 seconds More logs and data: http://paste.openstack.org/show/779524/
Thanks. Yes, it helps breathe some CPU cycles. This was traced to afixed bug. https://bugs.launchpad.net/neutron/+bug/1760047 which was applied to Queens in April 2019. https://review.opendev.org/#/c/649580/ Unfortunately, the patch simply makes the code more elegant by removing the semaphores.But it does not really fix the real issue that is dhcp-client serializes all the port update messages and eachmessage is processed too slowly resulting in PXE boot timeouts. The issue still remains open. thanks,Fred. On Wednesday, October 2, 2019, 11:34:39 AM PDT, Chris Apsey <bitskrieg@bitskrieg.net> wrote: Is that still spitting out a vif plug failure or are your instances spawning but not getting addresses? I've found that adding in the no-ping option to dnsmasq lowers load significantly, but can be dangerous if you've got potentially conflicting sources of address allocation. While it doesn't address the below bug report specifically, it may breathe some more CPU cycles into dnsmasq so it can handle other items better. R CA -------- Original Message -------- On Oct 2, 2019, 12:41, fsbiz@yahoo.com < fsbiz@yahoo.com> wrote: Thanks. This definitely helps. I am running a stable release of Queens.Even after this change I still see 10-15 failures when I create 100 VMs in our cluster. I have tracked this down (to a reasonable degree of certainty) to the SIGHUPs caused by DNSMASQ reloadsevery time a new MAC entry is added, deleted or updated. It seems to be related tohttps://bugs.launchpad.net/neutron/+bug/1598078 The fix for the above bug was abandoned. Gerrit Code Review | | | | Gerrit Code Review | | | Any further fine tuning that can be done? Thanks,Fred. On Friday, September 27, 2019, 09:37:51 AM PDT, Chris Apsey <bitskrieg@bitskrieg.net> wrote: Albert, Do this: https://cloudblog.switch.ch/2017/08/28/starting-1000-instances-on-switchengi... The problem will go away. I'm of the opinion that daemon mode for rootwrap should be the default since the performance improvement is an order of magnitude, but privsep may obviate that concern once its fully implemented. Either way, that should solve your problem. r Chris Apsey ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On Friday, September 27, 2019 12:17 PM, Albert Braden <Albert.Braden@synopsys.com> wrote: When I create 100 VMs in our prod cluster: openstack server create --flavor s1.tiny --network it-network --image cirros-0.4.0-x86_64 --min 100 --max 100 alberttest Most of them build successfully in about a minute. 5 or 10 will stay in BUILD status for 5 minutes and then fail with “ BuildAbortException: Build of instance <UUID> aborted: Failed to allocate the network(s), not rescheduling.” If I build smaller numbers, I see less failures, and no failures if I build one at a time. This does not happen in dev or QA; it appears that we are exhausting a resource in prod. I tried reducing various config values in dev but am not able to duplicate the issue. The neutron servers don’t appear to be overloaded during the failure. What config variables should I be looking at? Here are the relevant log entries from the HV: 2019-09-26 10:10:43.001 57008 INFO os_vif [req-dea54d9a-3f3e-4d47-b901-a4f41b1947a8 d28c3871f61e4c8c8f8c7600417f7b14 e9621e3b105245ba8660f434ab21016c - default 4fb72165eee4468e8033cdc7d506ddf0] Successfully plugged vif VIFBridge(active=False,address=fa:16:3e:8b:45:07,bridge_name='brq49cbe55d-51',has_traffic_filtering=True,id=18f4e419-b19c-4b62-b6e4-152ec78e72bc,network=Network(49cbe55d-5188-4183-b5ad-e65f9b46f8f2),plugin='linux_bridge',port_profile=<?>,preserve_on_delete=False,vif_name='tap18f4e419-b1') 2019-09-26 10:15:44.029 57008 WARNING nova.virt.libvirt.driver [req-dea54d9a-3f3e-4d47-b901-a4f41b1947a8 d28c3871f61e4c8c8f8c7600417f7b14 e9621e3b105245ba8660f434ab21016c - default 4fb72165eee4468e8033cdc7d506ddf0] [instance: dc58f154-00f9-4c45-8986-94b10821cbc9] Timeout waiting for [('network-vif-plugged', u'18f4e419-b19c-4b62-b6e4-152ec78e72bc')] for instance with vm_state building and task_state spawning.: Timeout: 300 seconds More logs and data: http://paste.openstack.org/show/779524/
We tested this in dev and qa and then implemented in production and it did make a difference, but 2 weeks later we started seeing an issue, first in dev, and then in qa. In syslog we see neutron-linuxbridge-agent.service stopping and starting[1]. In neutron-linuxbridge-agent.log we see a rootwrap error[2]: “Exception: Failed to spawn rootwrap process.” If I comment out ‘root_helper_daemon = "sudo /usr/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf"’ and restart neutron services then the error goes away. How can I use the root_helper_daemon setting without creating this new error? [1] Oct 9 13:48:38 us01odc-qa-ctrl1 systemd[1]: neutron-linuxbridge-agent.service: Main process exited, code=exited, status=1/FAILURE Oct 9 13:48:38 us01odc-qa-ctrl1 systemd[1]: neutron-linuxbridge-agent.service: Failed with result 'exit-code'. Oct 9 13:48:38 us01odc-qa-ctrl1 systemd[1]: neutron-linuxbridge-agent.service: Service hold-off time over, scheduling restart. Oct 9 13:48:38 us01odc-qa-ctrl1 systemd[1]: neutron-linuxbridge-agent.service: Scheduled restart job, restart counter is at 2. Oct 9 13:48:38 us01odc-qa-ctrl1 systemd[1]: Stopped Openstack Neutron Linux Bridge Agent. Oct 9 13:48:38 us01odc-qa-ctrl1 systemd[1]: Starting Openstack Neutron Linux Bridge Agent... Oct 9 13:48:38 us01odc-qa-ctrl1 systemd[1]: Started Openstack Neutron Linux Bridge Agent. Oct 9 13:48:41 us01odc-qa-ctrl1 systemd[1]: neutron-linuxbridge-agent.service: Main process exited, code=exited, status=1/FAILURE Oct 9 13:48:41 us01odc-qa-ctrl1 systemd[1]: neutron-linuxbridge-agent.service: Failed with result 'exit-code'. Oct 9 13:48:41 us01odc-qa-ctrl1 systemd[1]: neutron-linuxbridge-agent.service: Service hold-off time over, scheduling restart. Oct 9 13:48:41 us01odc-qa-ctrl1 systemd[1]: neutron-linuxbridge-agent.service: Scheduled restart job, restart counter is at 3. Oct 9 13:48:41 us01odc-qa-ctrl1 systemd[1]: Stopped Openstack Neutron Linux Bridge Agent. Oct 9 13:48:41 us01odc-qa-ctrl1 systemd[1]: Starting Openstack Neutron Linux Bridge Agent... Oct 9 13:48:41 us01odc-qa-ctrl1 systemd[1]: Started Openstack Neutron Linux Bridge Agent. Oct 9 13:48:43 us01odc-qa-ctrl1 systemd[1]: neutron-linuxbridge-agent.service: Main process exited, code=exited, status=1/FAILURE Oct 9 13:48:43 us01odc-qa-ctrl1 systemd[1]: neutron-linuxbridge-agent.service: Failed with result 'exit-code'. Oct 9 13:48:43 us01odc-qa-ctrl1 systemd[1]: neutron-linuxbridge-agent.service: Service hold-off time over, scheduling restart. Oct 9 13:48:43 us01odc-qa-ctrl1 systemd[1]: neutron-linuxbridge-agent.service: Scheduled restart job, restart counter is at 4. Oct 9 13:48:43 us01odc-qa-ctrl1 systemd[1]: Stopped Openstack Neutron Linux Bridge Agent. Oct 9 13:48:43 us01odc-qa-ctrl1 systemd[1]: neutron-linuxbridge-agent.service: Start request repeated too quickly. Oct 9 13:48:43 us01odc-qa-ctrl1 systemd[1]: neutron-linuxbridge-agent.service: Failed with result 'exit-code'. Oct 9 13:48:43 us01odc-qa-ctrl1 systemd[1]: Failed to start Openstack Neutron Linux Bridge Agent. [2] 2019-10-09 17:05:24.519 5803 INFO neutron.common.config [-] Logging enabled! 2019-10-09 17:05:24.519 5803 INFO neutron.common.config [-] /usr/bin/neutron-linuxbridge-agent version 13.0.4 2019-10-09 17:05:24.520 5803 INFO neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent [-] Interface mappings: {'physnet1': 'eno1'} 2019-10-09 17:05:24.520 5803 INFO neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent [-] Bridge mappings: {} 2019-10-09 17:05:24.522 5803 INFO oslo.privsep.daemon [-] Running privsep helper: ['sudo', '/usr/bin/neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'privsep-helper', '--config-file', '/etc/neutron/neutron.conf', '--config-file', '/etc/neutron/plugins/ml2/linuxbridge_agent.ini', '--privsep_context', 'neutron.privileged.default', '--privsep_sock_path', '/tmp/tmpmdyxcD/privsep.sock'] 2019-10-09 17:05:25.071 5803 INFO oslo.privsep.daemon [-] Spawned new privsep daemon via rootwrap 2019-10-09 17:05:25.022 5828 INFO oslo.privsep.daemon [-] privsep daemon starting 2019-10-09 17:05:25.025 5828 INFO oslo.privsep.daemon [-] privsep process running with uid/gid: 0/0 2019-10-09 17:05:25.027 5828 INFO oslo.privsep.daemon [-] privsep process running with capabilities (eff/prm/inh): CAP_DAC_OVERRIDE|CAP_DAC_READ_SEARCH|CAP_NET_ADMIN|CAP_SYS_ADMIN/CAP_DAC_OVERRIDE|CAP_DAC_READ_SEARCH|CAP_NET_ADMIN|CAP_SYS_ADMIN/none 2019-10-09 17:05:25.027 5828 INFO oslo.privsep.daemon [-] privsep daemon running as pid 5828 2019-10-09 17:05:25.125 5803 INFO neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent [-] Agent initialized successfully, now running... 2019-10-09 17:05:25.193 5803 ERROR neutron.agent.linux.utils [req-8aaf64a2-8f0d-44ce-888f-09ae3d1acd78 - - - - -] Rootwrap error running command: ['iptables-save', '-t', 'raw']: Exception: Failed to spawn rootwrap process. 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service [req-8aaf64a2-8f0d-44ce-888f-09ae3d1acd78 - - - - -] Error starting thread.: Exception: Failed to spawn rootwrap process. stderr: sudo: no tty present and no askpass program specified 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service Traceback (most recent call last): 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/oslo_service/service.py", line 794, in run_service 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service service.start() 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/osprofiler/profiler.py", line 158, in wrapper 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service result = f(*args, **kwargs) 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/agent/_common_agent.py", line 86, in start 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service self.setup_rpc() 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/osprofiler/profiler.py", line 158, in wrapper 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service result = f(*args, **kwargs) 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/agent/_common_agent.py", line 153, in setup_rpc 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service self.context, self.sg_plugin_rpc, defer_refresh_firewall=True) 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/neutron/agent/securitygroups_rpc.py", line 58, in __init__ 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service self.init_firewall(defer_refresh_firewall, integration_bridge) 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/neutron/agent/securitygroups_rpc.py", line 83, in init_firewall 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service self.firewall = firewall_class() 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/neutron/agent/linux/iptables_firewall.py", line 88, in __init__ 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service zone_per_port=self.CONNTRACK_ZONE_PER_PORT) 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py", line 274, in inner 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service return f(*args, **kwargs) 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/neutron/agent/linux/ip_conntrack.py", line 58, in get_conntrack 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service execute, namespace, zone_per_port) 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/neutron/agent/linux/ip_conntrack.py", line 75, in __init__ 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service self._populate_initial_zone_map() 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/neutron/agent/linux/ip_conntrack.py", line 182, in _populate_initial_zone_map 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service rules = self.get_rules_for_table_func('raw') 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/neutron/agent/linux/iptables_manager.py", line 477, in get_rules_for_table 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service return self.execute(args, run_as_root=True).split('\n') 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/neutron/agent/linux/utils.py", line 122, in execute 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service execute_rootwrap_daemon(cmd, process_input, addl_env)) 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/neutron/agent/linux/utils.py", line 109, in execute_rootwrap_daemon 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service LOG.error("Rootwrap error running command: %s", cmd) 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__ 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service self.force_reraise() 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service six.reraise(self.type_, self.value, self.tb) 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/neutron/agent/linux/utils.py", line 106, in execute_rootwrap_daemon 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service return client.execute(cmd, process_input) 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/oslo_rootwrap/client.py", line 148, in execute 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service self._ensure_initialized() 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/oslo_rootwrap/client.py", line 115, in _ensure_initialized 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service self._initialize() 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/oslo_rootwrap/client.py", line 85, in _initialize 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service (stderr,)) 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service Exception: Failed to spawn rootwrap process. 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service stderr: 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service sudo: no tty present and no askpass program specified 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service 2019-10-09 17:05:25.197 5803 INFO neutron.plugins.ml2.drivers.agent._common_agent [-] Stopping Linux bridge agent agent. 2019-10-09 17:05:25.198 5803 CRITICAL neutron [-] Unhandled error: AttributeError: 'CommonAgentLoop' object has no attribute 'state_rpc' 2019-10-09 17:05:25.198 5803 ERROR neutron Traceback (most recent call last): 2019-10-09 17:05:25.198 5803 ERROR neutron File "/usr/bin/neutron-linuxbridge-agent", line 10, in <module> 2019-10-09 17:05:25.198 5803 ERROR neutron sys.exit(main()) 2019-10-09 17:05:25.198 5803 ERROR neutron File "/usr/lib/python2.7/dist-packages/neutron/cmd/eventlet/plugins/linuxbridge_neutron_agent.py", line 21, in main 2019-10-09 17:05:25.198 5803 ERROR neutron agent_main.main() 2019-10-09 17:05:25.198 5803 ERROR neutron File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/linuxbridge/agent/linuxbridge_neutron_agent.py", line 1051, in main 2019-10-09 17:05:25.198 5803 ERROR neutron launcher.wait() 2019-10-09 17:05:25.198 5803 ERROR neutron File "/usr/lib/python2.7/dist-packages/oslo_service/service.py", line 392, in wait 2019-10-09 17:05:25.198 5803 ERROR neutron status, signo = self._wait_for_exit_or_signal() 2019-10-09 17:05:25.198 5803 ERROR neutron File "/usr/lib/python2.7/dist-packages/oslo_service/service.py", line 377, in _wait_for_exit_or_signal 2019-10-09 17:05:25.198 5803 ERROR neutron self.stop() 2019-10-09 17:05:25.198 5803 ERROR neutron File "/usr/lib/python2.7/dist-packages/oslo_service/service.py", line 292, in stop 2019-10-09 17:05:25.198 5803 ERROR neutron self.services.stop() 2019-10-09 17:05:25.198 5803 ERROR neutron File "/usr/lib/python2.7/dist-packages/oslo_service/service.py", line 760, in stop 2019-10-09 17:05:25.198 5803 ERROR neutron service.stop() 2019-10-09 17:05:25.198 5803 ERROR neutron File "/usr/lib/python2.7/dist-packages/osprofiler/profiler.py", line 158, in wrapper 2019-10-09 17:05:25.198 5803 ERROR neutron result = f(*args, **kwargs) 2019-10-09 17:05:25.198 5803 ERROR neutron File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/agent/_common_agent.py", line 117, in stop 2019-10-09 17:05:25.198 5803 ERROR neutron self.set_rpc_timeout(self.quitting_rpc_timeout) 2019-10-09 17:05:25.198 5803 ERROR neutron File "/usr/lib/python2.7/dist-packages/osprofiler/profiler.py", line 158, in wrapper 2019-10-09 17:05:25.198 5803 ERROR neutron result = f(*args, **kwargs) 2019-10-09 17:05:25.198 5803 ERROR neutron File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/agent/_common_agent.py", line 476, in set_rpc_timeout 2019-10-09 17:05:25.198 5803 ERROR neutron self.state_rpc): 2019-10-09 17:05:25.198 5803 ERROR neutron AttributeError: 'CommonAgentLoop' object has no attribute 'state_rpc' 2019-10-09 17:05:25.198 5803 ERROR neutron From: Chris Apsey <bitskrieg@bitskrieg.net> Sent: Friday, September 27, 2019 9:34 AM To: Albert Braden <albertb@synopsys.com> Cc: openstack-discuss@lists.openstack.org Subject: Re: Port creation times out for some VMs in large group Albert, Do this: https://cloudblog.switch.ch/2017/08/28/starting-1000-instances-on-switchengines/<https://urldefense.proofpoint.com/v2/url?u=https-3A__cloudblog.switch.ch_2017_08_28_starting-2D1000-2Dinstances-2Don-2Dswitchengines_&d=DwMGaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=izFVaT90rpGh_939STWbrLr4vnSwK2KBtqFKv_J8Gfs&s=UO9bh6wArCWKbxRWfWjt8egNaw9cxrbDwCZ8-2t0GmE&e=> The problem will go away. I'm of the opinion that daemon mode for rootwrap should be the default since the performance improvement is an order of magnitude, but privsep may obviate that concern once its fully implemented. Either way, that should solve your problem. r Chris Apsey ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On Friday, September 27, 2019 12:17 PM, Albert Braden <Albert.Braden@synopsys.com<mailto:Albert.Braden@synopsys.com>> wrote: When I create 100 VMs in our prod cluster: openstack server create --flavor s1.tiny --network it-network --image cirros-0.4.0-x86_64 --min 100 --max 100 alberttest Most of them build successfully in about a minute. 5 or 10 will stay in BUILD status for 5 minutes and then fail with “ BuildAbortException: Build of instance <UUID> aborted: Failed to allocate the network(s), not rescheduling.” If I build smaller numbers, I see less failures, and no failures if I build one at a time. This does not happen in dev or QA; it appears that we are exhausting a resource in prod. I tried reducing various config values in dev but am not able to duplicate the issue. The neutron servers don’t appear to be overloaded during the failure. What config variables should I be looking at? Here are the relevant log entries from the HV: 2019-09-26 10:10:43.001 57008 INFO os_vif [req-dea54d9a-3f3e-4d47-b901-a4f41b1947a8 d28c3871f61e4c8c8f8c7600417f7b14 e9621e3b105245ba8660f434ab21016c - default 4fb72165eee4468e8033cdc7d506ddf0] Successfully plugged vif VIFBridge(active=False,address=fa:16:3e:8b:45:07,bridge_name='brq49cbe55d-51',has_traffic_filtering=True,id=18f4e419-b19c-4b62-b6e4-152ec78e72bc,network=Network(49cbe55d-5188-4183-b5ad-e65f9b46f8f2),plugin='linux_bridge',port_profile=<?>,preserve_on_delete=False,vif_name='tap18f4e419-b1') 2019-09-26 10:15:44.029 57008 WARNING nova.virt.libvirt.driver [req-dea54d9a-3f3e-4d47-b901-a4f41b1947a8 d28c3871f61e4c8c8f8c7600417f7b14 e9621e3b105245ba8660f434ab21016c - default 4fb72165eee4468e8033cdc7d506ddf0] [instance: dc58f154-00f9-4c45-8986-94b10821cbc9] Timeout waiting for [('network-vif-plugged', u'18f4e419-b19c-4b62-b6e4-152ec78e72bc')] for instance with vm_state building and task_state spawning.: Timeout: 300 seconds More logs and data: http://paste.openstack.org/show/779524/
You are probably missing an entry in your sudoers file. You need something like neutron ALL = (root) NOPASSWD: /usr/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf ________________________________ From: Albert Braden <Albert.Braden@synopsys.com> Sent: Wednesday, October 9, 2019 5:20 PM To: Chris Apsey <bitskrieg@bitskrieg.net> Cc: openstack-discuss@lists.openstack.org <openstack-discuss@lists.openstack.org> Subject: RE: Port creation times out for some VMs in large group We tested this in dev and qa and then implemented in production and it did make a difference, but 2 weeks later we started seeing an issue, first in dev, and then in qa. In syslog we see neutron-linuxbridge-agent.service stopping and starting[1]. In neutron-linuxbridge-agent.log we see a rootwrap error[2]: “Exception: Failed to spawn rootwrap process.” If I comment out ‘root_helper_daemon = "sudo /usr/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf"’ and restart neutron services then the error goes away. How can I use the root_helper_daemon setting without creating this new error? [1] Oct 9 13:48:38 us01odc-qa-ctrl1 systemd[1]: neutron-linuxbridge-agent.service: Main process exited, code=exited, status=1/FAILURE Oct 9 13:48:38 us01odc-qa-ctrl1 systemd[1]: neutron-linuxbridge-agent.service: Failed with result 'exit-code'. Oct 9 13:48:38 us01odc-qa-ctrl1 systemd[1]: neutron-linuxbridge-agent.service: Service hold-off time over, scheduling restart. Oct 9 13:48:38 us01odc-qa-ctrl1 systemd[1]: neutron-linuxbridge-agent.service: Scheduled restart job, restart counter is at 2. Oct 9 13:48:38 us01odc-qa-ctrl1 systemd[1]: Stopped Openstack Neutron Linux Bridge Agent. Oct 9 13:48:38 us01odc-qa-ctrl1 systemd[1]: Starting Openstack Neutron Linux Bridge Agent... Oct 9 13:48:38 us01odc-qa-ctrl1 systemd[1]: Started Openstack Neutron Linux Bridge Agent. Oct 9 13:48:41 us01odc-qa-ctrl1 systemd[1]: neutron-linuxbridge-agent.service: Main process exited, code=exited, status=1/FAILURE Oct 9 13:48:41 us01odc-qa-ctrl1 systemd[1]: neutron-linuxbridge-agent.service: Failed with result 'exit-code'. Oct 9 13:48:41 us01odc-qa-ctrl1 systemd[1]: neutron-linuxbridge-agent.service: Service hold-off time over, scheduling restart. Oct 9 13:48:41 us01odc-qa-ctrl1 systemd[1]: neutron-linuxbridge-agent.service: Scheduled restart job, restart counter is at 3. Oct 9 13:48:41 us01odc-qa-ctrl1 systemd[1]: Stopped Openstack Neutron Linux Bridge Agent. Oct 9 13:48:41 us01odc-qa-ctrl1 systemd[1]: Starting Openstack Neutron Linux Bridge Agent... Oct 9 13:48:41 us01odc-qa-ctrl1 systemd[1]: Started Openstack Neutron Linux Bridge Agent. Oct 9 13:48:43 us01odc-qa-ctrl1 systemd[1]: neutron-linuxbridge-agent.service: Main process exited, code=exited, status=1/FAILURE Oct 9 13:48:43 us01odc-qa-ctrl1 systemd[1]: neutron-linuxbridge-agent.service: Failed with result 'exit-code'. Oct 9 13:48:43 us01odc-qa-ctrl1 systemd[1]: neutron-linuxbridge-agent.service: Service hold-off time over, scheduling restart. Oct 9 13:48:43 us01odc-qa-ctrl1 systemd[1]: neutron-linuxbridge-agent.service: Scheduled restart job, restart counter is at 4. Oct 9 13:48:43 us01odc-qa-ctrl1 systemd[1]: Stopped Openstack Neutron Linux Bridge Agent. Oct 9 13:48:43 us01odc-qa-ctrl1 systemd[1]: neutron-linuxbridge-agent.service: Start request repeated too quickly. Oct 9 13:48:43 us01odc-qa-ctrl1 systemd[1]: neutron-linuxbridge-agent.service: Failed with result 'exit-code'. Oct 9 13:48:43 us01odc-qa-ctrl1 systemd[1]: Failed to start Openstack Neutron Linux Bridge Agent. [2] 2019-10-09 17:05:24.519 5803 INFO neutron.common.config [-] Logging enabled! 2019-10-09 17:05:24.519 5803 INFO neutron.common.config [-] /usr/bin/neutron-linuxbridge-agent version 13.0.4 2019-10-09 17:05:24.520 5803 INFO neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent [-] Interface mappings: {'physnet1': 'eno1'} 2019-10-09 17:05:24.520 5803 INFO neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent [-] Bridge mappings: {} 2019-10-09 17:05:24.522 5803 INFO oslo.privsep.daemon [-] Running privsep helper: ['sudo', '/usr/bin/neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'privsep-helper', '--config-file', '/etc/neutron/neutron.conf', '--config-file', '/etc/neutron/plugins/ml2/linuxbridge_agent.ini', '--privsep_context', 'neutron.privileged.default', '--privsep_sock_path', '/tmp/tmpmdyxcD/privsep.sock'] 2019-10-09 17:05:25.071 5803 INFO oslo.privsep.daemon [-] Spawned new privsep daemon via rootwrap 2019-10-09 17:05:25.022 5828 INFO oslo.privsep.daemon [-] privsep daemon starting 2019-10-09 17:05:25.025 5828 INFO oslo.privsep.daemon [-] privsep process running with uid/gid: 0/0 2019-10-09 17:05:25.027 5828 INFO oslo.privsep.daemon [-] privsep process running with capabilities (eff/prm/inh): CAP_DAC_OVERRIDE|CAP_DAC_READ_SEARCH|CAP_NET_ADMIN|CAP_SYS_ADMIN/CAP_DAC_OVERRIDE|CAP_DAC_READ_SEARCH|CAP_NET_ADMIN|CAP_SYS_ADMIN/none 2019-10-09 17:05:25.027 5828 INFO oslo.privsep.daemon [-] privsep daemon running as pid 5828 2019-10-09 17:05:25.125 5803 INFO neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent [-] Agent initialized successfully, now running... 2019-10-09 17:05:25.193 5803 ERROR neutron.agent.linux.utils [req-8aaf64a2-8f0d-44ce-888f-09ae3d1acd78 - - - - -] Rootwrap error running command: ['iptables-save', '-t', 'raw']: Exception: Failed to spawn rootwrap process. 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service [req-8aaf64a2-8f0d-44ce-888f-09ae3d1acd78 - - - - -] Error starting thread.: Exception: Failed to spawn rootwrap process. stderr: sudo: no tty present and no askpass program specified 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service Traceback (most recent call last): 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/oslo_service/service.py", line 794, in run_service 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service service.start() 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/osprofiler/profiler.py", line 158, in wrapper 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service result = f(*args, **kwargs) 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/agent/_common_agent.py", line 86, in start 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service self.setup_rpc() 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/osprofiler/profiler.py", line 158, in wrapper 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service result = f(*args, **kwargs) 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/agent/_common_agent.py", line 153, in setup_rpc 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service self.context, self.sg_plugin_rpc, defer_refresh_firewall=True) 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/neutron/agent/securitygroups_rpc.py", line 58, in __init__ 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service self.init_firewall(defer_refresh_firewall, integration_bridge) 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/neutron/agent/securitygroups_rpc.py", line 83, in init_firewall 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service self.firewall = firewall_class() 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/neutron/agent/linux/iptables_firewall.py", line 88, in __init__ 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service zone_per_port=self.CONNTRACK_ZONE_PER_PORT) 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py", line 274, in inner 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service return f(*args, **kwargs) 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/neutron/agent/linux/ip_conntrack.py", line 58, in get_conntrack 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service execute, namespace, zone_per_port) 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/neutron/agent/linux/ip_conntrack.py", line 75, in __init__ 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service self._populate_initial_zone_map() 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/neutron/agent/linux/ip_conntrack.py", line 182, in _populate_initial_zone_map 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service rules = self.get_rules_for_table_func('raw') 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/neutron/agent/linux/iptables_manager.py", line 477, in get_rules_for_table 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service return self.execute(args, run_as_root=True).split('\n') 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/neutron/agent/linux/utils.py", line 122, in execute 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service execute_rootwrap_daemon(cmd, process_input, addl_env)) 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/neutron/agent/linux/utils.py", line 109, in execute_rootwrap_daemon 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service LOG.error("Rootwrap error running command: %s", cmd) 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__ 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service self.force_reraise() 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service six.reraise(self.type_, self.value, self.tb) 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/neutron/agent/linux/utils.py", line 106, in execute_rootwrap_daemon 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service return client.execute(cmd, process_input) 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/oslo_rootwrap/client.py", line 148, in execute 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service self._ensure_initialized() 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/oslo_rootwrap/client.py", line 115, in _ensure_initialized 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service self._initialize() 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/oslo_rootwrap/client.py", line 85, in _initialize 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service (stderr,)) 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service Exception: Failed to spawn rootwrap process. 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service stderr: 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service sudo: no tty present and no askpass program specified 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service 2019-10-09 17:05:25.197 5803 INFO neutron.plugins.ml2.drivers.agent._common_agent [-] Stopping Linux bridge agent agent. 2019-10-09 17:05:25.198 5803 CRITICAL neutron [-] Unhandled error: AttributeError: 'CommonAgentLoop' object has no attribute 'state_rpc' 2019-10-09 17:05:25.198 5803 ERROR neutron Traceback (most recent call last): 2019-10-09 17:05:25.198 5803 ERROR neutron File "/usr/bin/neutron-linuxbridge-agent", line 10, in <module> 2019-10-09 17:05:25.198 5803 ERROR neutron sys.exit(main()) 2019-10-09 17:05:25.198 5803 ERROR neutron File "/usr/lib/python2.7/dist-packages/neutron/cmd/eventlet/plugins/linuxbridge_neutron_agent.py", line 21, in main 2019-10-09 17:05:25.198 5803 ERROR neutron agent_main.main() 2019-10-09 17:05:25.198 5803 ERROR neutron File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/linuxbridge/agent/linuxbridge_neutron_agent.py", line 1051, in main 2019-10-09 17:05:25.198 5803 ERROR neutron launcher.wait() 2019-10-09 17:05:25.198 5803 ERROR neutron File "/usr/lib/python2.7/dist-packages/oslo_service/service.py", line 392, in wait 2019-10-09 17:05:25.198 5803 ERROR neutron status, signo = self._wait_for_exit_or_signal() 2019-10-09 17:05:25.198 5803 ERROR neutron File "/usr/lib/python2.7/dist-packages/oslo_service/service.py", line 377, in _wait_for_exit_or_signal 2019-10-09 17:05:25.198 5803 ERROR neutron self.stop() 2019-10-09 17:05:25.198 5803 ERROR neutron File "/usr/lib/python2.7/dist-packages/oslo_service/service.py", line 292, in stop 2019-10-09 17:05:25.198 5803 ERROR neutron self.services.stop() 2019-10-09 17:05:25.198 5803 ERROR neutron File "/usr/lib/python2.7/dist-packages/oslo_service/service.py", line 760, in stop 2019-10-09 17:05:25.198 5803 ERROR neutron service.stop() 2019-10-09 17:05:25.198 5803 ERROR neutron File "/usr/lib/python2.7/dist-packages/osprofiler/profiler.py", line 158, in wrapper 2019-10-09 17:05:25.198 5803 ERROR neutron result = f(*args, **kwargs) 2019-10-09 17:05:25.198 5803 ERROR neutron File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/agent/_common_agent.py", line 117, in stop 2019-10-09 17:05:25.198 5803 ERROR neutron self.set_rpc_timeout(self.quitting_rpc_timeout) 2019-10-09 17:05:25.198 5803 ERROR neutron File "/usr/lib/python2.7/dist-packages/osprofiler/profiler.py", line 158, in wrapper 2019-10-09 17:05:25.198 5803 ERROR neutron result = f(*args, **kwargs) 2019-10-09 17:05:25.198 5803 ERROR neutron File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/agent/_common_agent.py", line 476, in set_rpc_timeout 2019-10-09 17:05:25.198 5803 ERROR neutron self.state_rpc): 2019-10-09 17:05:25.198 5803 ERROR neutron AttributeError: 'CommonAgentLoop' object has no attribute 'state_rpc' 2019-10-09 17:05:25.198 5803 ERROR neutron From: Chris Apsey <bitskrieg@bitskrieg.net> Sent: Friday, September 27, 2019 9:34 AM To: Albert Braden <albertb@synopsys.com> Cc: openstack-discuss@lists.openstack.org Subject: Re: Port creation times out for some VMs in large group Albert, Do this: https://cloudblog.switch.ch/2017/08/28/starting-1000-instances-on-switchengines/<https://urldefense.proofpoint.com/v2/url?u=https-3A__cloudblog.switch.ch_2017_08_28_starting-2D1000-2Dinstances-2Don-2Dswitchengines_&d=DwMGaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=izFVaT90rpGh_939STWbrLr4vnSwK2KBtqFKv_J8Gfs&s=UO9bh6wArCWKbxRWfWjt8egNaw9cxrbDwCZ8-2t0GmE&e=> The problem will go away. I'm of the opinion that daemon mode for rootwrap should be the default since the performance improvement is an order of magnitude, but privsep may obviate that concern once its fully implemented. Either way, that should solve your problem. r Chris Apsey ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On Friday, September 27, 2019 12:17 PM, Albert Braden <Albert.Braden@synopsys.com<mailto:Albert.Braden@synopsys.com>> wrote: When I create 100 VMs in our prod cluster: openstack server create --flavor s1.tiny --network it-network --image cirros-0.4.0-x86_64 --min 100 --max 100 alberttest Most of them build successfully in about a minute. 5 or 10 will stay in BUILD status for 5 minutes and then fail with “ BuildAbortException: Build of instance <UUID> aborted: Failed to allocate the network(s), not rescheduling.” If I build smaller numbers, I see less failures, and no failures if I build one at a time. This does not happen in dev or QA; it appears that we are exhausting a resource in prod. I tried reducing various config values in dev but am not able to duplicate the issue. The neutron servers don’t appear to be overloaded during the failure. What config variables should I be looking at? Here are the relevant log entries from the HV: 2019-09-26 10:10:43.001 57008 INFO os_vif [req-dea54d9a-3f3e-4d47-b901-a4f41b1947a8 d28c3871f61e4c8c8f8c7600417f7b14 e9621e3b105245ba8660f434ab21016c - default 4fb72165eee4468e8033cdc7d506ddf0] Successfully plugged vif VIFBridge(active=False,address=fa:16:3e:8b:45:07,bridge_name='brq49cbe55d-51',has_traffic_filtering=True,id=18f4e419-b19c-4b62-b6e4-152ec78e72bc,network=Network(49cbe55d-5188-4183-b5ad-e65f9b46f8f2),plugin='linux_bridge',port_profile=<?>,preserve_on_delete=False,vif_name='tap18f4e419-b1') 2019-09-26 10:15:44.029 57008 WARNING nova.virt.libvirt.driver [req-dea54d9a-3f3e-4d47-b901-a4f41b1947a8 d28c3871f61e4c8c8f8c7600417f7b14 e9621e3b105245ba8660f434ab21016c - default 4fb72165eee4468e8033cdc7d506ddf0] [instance: dc58f154-00f9-4c45-8986-94b10821cbc9] Timeout waiting for [('network-vif-plugged', u'18f4e419-b19c-4b62-b6e4-152ec78e72bc')] for instance with vm_state building and task_state spawning.: Timeout: 300 seconds More logs and data: http://paste.openstack.org/show/779524/<https://urldefense.com/v3/__http://paste.openstack.org/show/779524/__;!2E0gRdhhnqPNNL0!wGtOA-OJ_wlWtnIouvVoT4cU2U8Lxu_O3U6DeY1vBMUEV0iavwJCBL9kUcLkfiUm4w$>
I have the neutron sudoers line under sudoers.d: root@us01odc-qa-ctrl1:/etc/sudoers.d# cat neutron_sudoers Defaults:neutron !requiretty neutron ALL = (root) NOPASSWD: /usr/bin/neutron-rootwrap /etc/neutron/rootwrap.conf * Whatever is causing this didn't start until I had been running the rootwrap daemon for 2 weeks, and it has not started in our prod cluster. From: Erik Olof Gunnar Andersson <eandersson@blizzard.com> Sent: Wednesday, October 9, 2019 6:40 PM To: Albert Braden <albertb@synopsys.com>; Chris Apsey <bitskrieg@bitskrieg.net> Cc: openstack-discuss@lists.openstack.org Subject: Re: Port creation times out for some VMs in large group You are probably missing an entry in your sudoers file. You need something like neutron ALL = (root) NOPASSWD: /usr/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf ________________________________ From: Albert Braden <Albert.Braden@synopsys.com<mailto:Albert.Braden@synopsys.com>> Sent: Wednesday, October 9, 2019 5:20 PM To: Chris Apsey <bitskrieg@bitskrieg.net<mailto:bitskrieg@bitskrieg.net>> Cc: openstack-discuss@lists.openstack.org<mailto:openstack-discuss@lists.openstack.org> <openstack-discuss@lists.openstack.org<mailto:openstack-discuss@lists.openstack.org>> Subject: RE: Port creation times out for some VMs in large group We tested this in dev and qa and then implemented in production and it did make a difference, but 2 weeks later we started seeing an issue, first in dev, and then in qa. In syslog we see neutron-linuxbridge-agent.service stopping and starting[1]. In neutron-linuxbridge-agent.log we see a rootwrap error[2]: "Exception: Failed to spawn rootwrap process." If I comment out 'root_helper_daemon = "sudo /usr/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf"' and restart neutron services then the error goes away. How can I use the root_helper_daemon setting without creating this new error? http://paste.openstack.org/show/782622/
Yea - if you look at your sudoers its only allowing the old traditional rootwrap, and not the new daemon. You need both. Defaults:neutron !requiretty neutron ALL = (root) NOPASSWD: /usr/bin/neutron-rootwrap /etc/neutron/rootwrap.conf * neutron ALL = (root) NOPASSWD: /usr/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf Best Regards, Erik Olof Gunnar Andersson From: Albert Braden <Albert.Braden@synopsys.com> Sent: Thursday, October 10, 2019 11:05 AM To: Erik Olof Gunnar Andersson <eandersson@blizzard.com>; Chris Apsey <bitskrieg@bitskrieg.net> Cc: openstack-discuss@lists.openstack.org Subject: RE: Port creation times out for some VMs in large group I have the neutron sudoers line under sudoers.d: root@us01odc-qa-ctrl1:/etc/sudoers.d#<mailto:root@us01odc-qa-ctrl1:/etc/sudoers.d#> cat neutron_sudoers Defaults:neutron !requiretty neutron ALL = (root) NOPASSWD: /usr/bin/neutron-rootwrap /etc/neutron/rootwrap.conf * Whatever is causing this didn't start until I had been running the rootwrap daemon for 2 weeks, and it has not started in our prod cluster. From: Erik Olof Gunnar Andersson <eandersson@blizzard.com<mailto:eandersson@blizzard.com>> Sent: Wednesday, October 9, 2019 6:40 PM To: Albert Braden <albertb@synopsys.com<mailto:albertb@synopsys.com>>; Chris Apsey <bitskrieg@bitskrieg.net<mailto:bitskrieg@bitskrieg.net>> Cc: openstack-discuss@lists.openstack.org<mailto:openstack-discuss@lists.openstack.org> Subject: Re: Port creation times out for some VMs in large group You are probably missing an entry in your sudoers file. You need something like neutron ALL = (root) NOPASSWD: /usr/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf ________________________________ From: Albert Braden <Albert.Braden@synopsys.com<mailto:Albert.Braden@synopsys.com>> Sent: Wednesday, October 9, 2019 5:20 PM To: Chris Apsey <bitskrieg@bitskrieg.net<mailto:bitskrieg@bitskrieg.net>> Cc: openstack-discuss@lists.openstack.org<mailto:openstack-discuss@lists.openstack.org> <openstack-discuss@lists.openstack.org<mailto:openstack-discuss@lists.openstack.org>> Subject: RE: Port creation times out for some VMs in large group We tested this in dev and qa and then implemented in production and it did make a difference, but 2 weeks later we started seeing an issue, first in dev, and then in qa. In syslog we see neutron-linuxbridge-agent.service stopping and starting[1]. In neutron-linuxbridge-agent.log we see a rootwrap error[2]: "Exception: Failed to spawn rootwrap process." If I comment out 'root_helper_daemon = "sudo /usr/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf"' and restart neutron services then the error goes away. How can I use the root_helper_daemon setting without creating this new error? http://paste.openstack.org/show/782622/
It looks like something is still missing. I added the line to /etc/sudoers.d/neutron_sudoers: root@us01odc-qa-ctrl3:/var/log/neutron# cat /etc/sudoers.d/neutron_sudoers Defaults:neutron !requiretty neutron ALL = (root) NOPASSWD: /usr/bin/neutron-rootwrap /etc/neutron/rootwrap.conf * neutron ALL = (root) NOPASSWD: /usr/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf * Then I restarted neutron services and the error was gone... for a few minutes, and then it came back on ctrl3. Ctrl1/2 aren't erroring at this time. I changed neutron's shell and tested the daemon command and it seems to work: root@us01odc-qa-ctrl3:~# su - neutron neutron@us01odc-qa-ctrl3:~$ /usr/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf /tmp/rootwrap-5b1QoP/rootwrap.sock Z%▒"▒▒▒Vs▒▒5-▒,a▒▒▒▒G▒▒▒▒v▒▒ But neutron-linuxbridge-agent.log still scrolls errors: http://paste.openstack.org/show/782740/ It appears that there is another factor besides the config, because even when the sudoers line was missing, it would work for hours or days before the error started. It has been working in our prod cluster for about a week now, without the sudoers line. It seems like it should not work that way. What am I missing? From: Erik Olof Gunnar Andersson <eandersson@blizzard.com> Sent: Thursday, October 10, 2019 11:08 AM To: Albert Braden <albertb@synopsys.com>; Chris Apsey <bitskrieg@bitskrieg.net> Cc: openstack-discuss@lists.openstack.org Subject: RE: Port creation times out for some VMs in large group Yea - if you look at your sudoers its only allowing the old traditional rootwrap, and not the new daemon. You need both. Defaults:neutron !requiretty neutron ALL = (root) NOPASSWD: /usr/bin/neutron-rootwrap /etc/neutron/rootwrap.conf * neutron ALL = (root) NOPASSWD: /usr/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf Best Regards, Erik Olof Gunnar Andersson From: Albert Braden <Albert.Braden@synopsys.com<mailto:Albert.Braden@synopsys.com>> Sent: Thursday, October 10, 2019 11:05 AM To: Erik Olof Gunnar Andersson <eandersson@blizzard.com<mailto:eandersson@blizzard.com>>; Chris Apsey <bitskrieg@bitskrieg.net<mailto:bitskrieg@bitskrieg.net>> Cc: openstack-discuss@lists.openstack.org<mailto:openstack-discuss@lists.openstack.org> Subject: RE: Port creation times out for some VMs in large group I have the neutron sudoers line under sudoers.d: root@us01odc-qa-ctrl1:/etc/sudoers.d#<mailto:root@us01odc-qa-ctrl1:/etc/sudoers.d> cat neutron_sudoers Defaults:neutron !requiretty neutron ALL = (root) NOPASSWD: /usr/bin/neutron-rootwrap /etc/neutron/rootwrap.conf * Whatever is causing this didn't start until I had been running the rootwrap daemon for 2 weeks, and it has not started in our prod cluster. From: Erik Olof Gunnar Andersson <eandersson@blizzard.com<mailto:eandersson@blizzard.com>> Sent: Wednesday, October 9, 2019 6:40 PM To: Albert Braden <albertb@synopsys.com<mailto:albertb@synopsys.com>>; Chris Apsey <bitskrieg@bitskrieg.net<mailto:bitskrieg@bitskrieg.net>> Cc: openstack-discuss@lists.openstack.org<mailto:openstack-discuss@lists.openstack.org> Subject: Re: Port creation times out for some VMs in large group You are probably missing an entry in your sudoers file. You need something like neutron ALL = (root) NOPASSWD: /usr/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf ________________________________ From: Albert Braden <Albert.Braden@synopsys.com<mailto:Albert.Braden@synopsys.com>> Sent: Wednesday, October 9, 2019 5:20 PM To: Chris Apsey <bitskrieg@bitskrieg.net<mailto:bitskrieg@bitskrieg.net>> Cc: openstack-discuss@lists.openstack.org<mailto:openstack-discuss@lists.openstack.org> <openstack-discuss@lists.openstack.org<mailto:openstack-discuss@lists.openstack.org>> Subject: RE: Port creation times out for some VMs in large group We tested this in dev and qa and then implemented in production and it did make a difference, but 2 weeks later we started seeing an issue, first in dev, and then in qa. In syslog we see neutron-linuxbridge-agent.service stopping and starting[1]. In neutron-linuxbridge-agent.log we see a rootwrap error[2]: "Exception: Failed to spawn rootwrap process." If I comment out 'root_helper_daemon = "sudo /usr/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf"' and restart neutron services then the error goes away. How can I use the root_helper_daemon setting without creating this new error? http://paste.openstack.org/show/782622/<https://urldefense.proofpoint.com/v2/url?u=http-3A__paste.openstack.org_show_782622_&d=DwMFAg&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=hT1YhRjyM0zYEXl5feVL1lmrbHaM7sytttrPvi1aZzg&s=mxxRA-SpuIF9xc1Pgx9RrbC3UjGdAFrXm4X6lH6UbR8&e=>
The errors appear to start with this line: 2019-10-10 13:42:48.261 1211336 ERROR neutron.agent.linux.utils [req-42c530f6-6e08-47c1-8ed4-dcb31c9cd972 - - - - -] Rootwrap error running command: ['iptables-save', '-t', 'raw']: Exception: Failed to spawn rootwrap process. We’re not running iptables. Do we need it, to use the rootwrap daemon? From: Albert Braden <Albert.Braden@synopsys.com> Sent: Thursday, October 10, 2019 12:13 PM To: Erik Olof Gunnar Andersson <eandersson@blizzard.com>; Chris Apsey <bitskrieg@bitskrieg.net> Cc: openstack-discuss@lists.openstack.org Subject: RE: Port creation times out for some VMs in large group It looks like something is still missing. I added the line to /etc/sudoers.d/neutron_sudoers: root@us01odc-qa-ctrl3:/var/log/neutron# cat /etc/sudoers.d/neutron_sudoers Defaults:neutron !requiretty neutron ALL = (root) NOPASSWD: /usr/bin/neutron-rootwrap /etc/neutron/rootwrap.conf * neutron ALL = (root) NOPASSWD: /usr/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf * Then I restarted neutron services and the error was gone… for a few minutes, and then it came back on ctrl3. Ctrl1/2 aren’t erroring at this time. I changed neutron’s shell and tested the daemon command and it seems to work: root@us01odc-qa-ctrl3:~# su - neutron neutron@us01odc-qa-ctrl3:~$ /usr/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf /tmp/rootwrap-5b1QoP/rootwrap.sock Z%▒"▒▒▒Vs▒▒5-▒,a▒▒▒▒G▒▒▒▒v▒▒ But neutron-linuxbridge-agent.log still scrolls errors: http://paste.openstack.org/show/782740/ It appears that there is another factor besides the config, because even when the sudoers line was missing, it would work for hours or days before the error started. It has been working in our prod cluster for about a week now, without the sudoers line. It seems like it should not work that way. What am I missing? From: Erik Olof Gunnar Andersson <eandersson@blizzard.com<mailto:eandersson@blizzard.com>> Sent: Thursday, October 10, 2019 11:08 AM To: Albert Braden <albertb@synopsys.com<mailto:albertb@synopsys.com>>; Chris Apsey <bitskrieg@bitskrieg.net<mailto:bitskrieg@bitskrieg.net>> Cc: openstack-discuss@lists.openstack.org<mailto:openstack-discuss@lists.openstack.org> Subject: RE: Port creation times out for some VMs in large group Yea – if you look at your sudoers its only allowing the old traditional rootwrap, and not the new daemon. You need both. Defaults:neutron !requiretty neutron ALL = (root) NOPASSWD: /usr/bin/neutron-rootwrap /etc/neutron/rootwrap.conf * neutron ALL = (root) NOPASSWD: /usr/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf Best Regards, Erik Olof Gunnar Andersson From: Albert Braden <Albert.Braden@synopsys.com<mailto:Albert.Braden@synopsys.com>> Sent: Thursday, October 10, 2019 11:05 AM To: Erik Olof Gunnar Andersson <eandersson@blizzard.com<mailto:eandersson@blizzard.com>>; Chris Apsey <bitskrieg@bitskrieg.net<mailto:bitskrieg@bitskrieg.net>> Cc: openstack-discuss@lists.openstack.org<mailto:openstack-discuss@lists.openstack.org> Subject: RE: Port creation times out for some VMs in large group I have the neutron sudoers line under sudoers.d: root@us01odc-qa-ctrl1:/etc/sudoers.d#<mailto:root@us01odc-qa-ctrl1:/etc/sudoers.d> cat neutron_sudoers Defaults:neutron !requiretty neutron ALL = (root) NOPASSWD: /usr/bin/neutron-rootwrap /etc/neutron/rootwrap.conf * Whatever is causing this didn’t start until I had been running the rootwrap daemon for 2 weeks, and it has not started in our prod cluster. From: Erik Olof Gunnar Andersson <eandersson@blizzard.com<mailto:eandersson@blizzard.com>> Sent: Wednesday, October 9, 2019 6:40 PM To: Albert Braden <albertb@synopsys.com<mailto:albertb@synopsys.com>>; Chris Apsey <bitskrieg@bitskrieg.net<mailto:bitskrieg@bitskrieg.net>> Cc: openstack-discuss@lists.openstack.org<mailto:openstack-discuss@lists.openstack.org> Subject: Re: Port creation times out for some VMs in large group You are probably missing an entry in your sudoers file. You need something like neutron ALL = (root) NOPASSWD: /usr/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf ________________________________ From: Albert Braden <Albert.Braden@synopsys.com<mailto:Albert.Braden@synopsys.com>> Sent: Wednesday, October 9, 2019 5:20 PM To: Chris Apsey <bitskrieg@bitskrieg.net<mailto:bitskrieg@bitskrieg.net>> Cc: openstack-discuss@lists.openstack.org<mailto:openstack-discuss@lists.openstack.org> <openstack-discuss@lists.openstack.org<mailto:openstack-discuss@lists.openstack.org>> Subject: RE: Port creation times out for some VMs in large group We tested this in dev and qa and then implemented in production and it did make a difference, but 2 weeks later we started seeing an issue, first in dev, and then in qa. In syslog we see neutron-linuxbridge-agent.service stopping and starting[1]. In neutron-linuxbridge-agent.log we see a rootwrap error[2]: “Exception: Failed to spawn rootwrap process.” If I comment out ‘root_helper_daemon = "sudo /usr/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf"’ and restart neutron services then the error goes away. How can I use the root_helper_daemon setting without creating this new error? http://paste.openstack.org/show/782622/<https://urldefense.proofpoint.com/v2/url?u=http-3A__paste.openstack.org_show_782622_&d=DwMFAg&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=hT1YhRjyM0zYEXl5feVL1lmrbHaM7sytttrPvi1aZzg&s=mxxRA-SpuIF9xc1Pgx9RrbC3UjGdAFrXm4X6lH6UbR8&e=>
Maybe double check that your rootwrap config is up to date? /etc/neutron/rootwrap .conf and /etc/neutron/rootwrap.d (Make sure to pick the appropriate branch in github) https://github.com/openstack/neutron/blob/master/etc/rootwrap.conf https://github.com/openstack/neutron/tree/master/etc/neutron/rootwrap.d ________________________________ From: Albert Braden <Albert.Braden@synopsys.com> Sent: Thursday, October 10, 2019 1:45 PM To: Erik Olof Gunnar Andersson <eandersson@blizzard.com>; Chris Apsey <bitskrieg@bitskrieg.net> Cc: openstack-discuss@lists.openstack.org <openstack-discuss@lists.openstack.org> Subject: RE: Port creation times out for some VMs in large group The errors appear to start with this line: 2019-10-10 13:42:48.261 1211336 ERROR neutron.agent.linux.utils [req-42c530f6-6e08-47c1-8ed4-dcb31c9cd972 - - - - -] Rootwrap error running command: ['iptables-save', '-t', 'raw']: Exception: Failed to spawn rootwrap process. We’re not running iptables. Do we need it, to use the rootwrap daemon? From: Albert Braden <Albert.Braden@synopsys.com> Sent: Thursday, October 10, 2019 12:13 PM To: Erik Olof Gunnar Andersson <eandersson@blizzard.com>; Chris Apsey <bitskrieg@bitskrieg.net> Cc: openstack-discuss@lists.openstack.org Subject: RE: Port creation times out for some VMs in large group It looks like something is still missing. I added the line to /etc/sudoers.d/neutron_sudoers: root@us01odc-qa-ctrl3:/var/log/neutron# cat /etc/sudoers.d/neutron_sudoers Defaults:neutron !requiretty neutron ALL = (root) NOPASSWD: /usr/bin/neutron-rootwrap /etc/neutron/rootwrap.conf * neutron ALL = (root) NOPASSWD: /usr/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf * Then I restarted neutron services and the error was gone… for a few minutes, and then it came back on ctrl3. Ctrl1/2 aren’t erroring at this time. I changed neutron’s shell and tested the daemon command and it seems to work: root@us01odc-qa-ctrl3:~# su - neutron neutron@us01odc-qa-ctrl3:~$ /usr/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf /tmp/rootwrap-5b1QoP/rootwrap.sock Z%▒"▒▒▒Vs▒▒5-▒,a▒▒▒▒G▒▒▒▒v▒▒ But neutron-linuxbridge-agent.log still scrolls errors: http://paste.openstack.org/show/782740/<https://urldefense.com/v3/__http://paste.openstack.org/show/782740/__;!2E0gRdhhnqPNNL0!z5cwPxQ1y_zz0MvtFzMZSCIh7-3d80kxciHbPtkj4LbHCzSkzNpf36RwLi8kWGm1Ew$> It appears that there is another factor besides the config, because even when the sudoers line was missing, it would work for hours or days before the error started. It has been working in our prod cluster for about a week now, without the sudoers line. It seems like it should not work that way. What am I missing? From: Erik Olof Gunnar Andersson <eandersson@blizzard.com<mailto:eandersson@blizzard.com>> Sent: Thursday, October 10, 2019 11:08 AM To: Albert Braden <albertb@synopsys.com<mailto:albertb@synopsys.com>>; Chris Apsey <bitskrieg@bitskrieg.net<mailto:bitskrieg@bitskrieg.net>> Cc: openstack-discuss@lists.openstack.org<mailto:openstack-discuss@lists.openstack.org> Subject: RE: Port creation times out for some VMs in large group Yea – if you look at your sudoers its only allowing the old traditional rootwrap, and not the new daemon. You need both. Defaults:neutron !requiretty neutron ALL = (root) NOPASSWD: /usr/bin/neutron-rootwrap /etc/neutron/rootwrap.conf * neutron ALL = (root) NOPASSWD: /usr/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf Best Regards, Erik Olof Gunnar Andersson From: Albert Braden <Albert.Braden@synopsys.com<mailto:Albert.Braden@synopsys.com>> Sent: Thursday, October 10, 2019 11:05 AM To: Erik Olof Gunnar Andersson <eandersson@blizzard.com<mailto:eandersson@blizzard.com>>; Chris Apsey <bitskrieg@bitskrieg.net<mailto:bitskrieg@bitskrieg.net>> Cc: openstack-discuss@lists.openstack.org<mailto:openstack-discuss@lists.openstack.org> Subject: RE: Port creation times out for some VMs in large group I have the neutron sudoers line under sudoers.d: root@us01odc-qa-ctrl1:/etc/sudoers.d#<mailto:root@us01odc-qa-ctrl1:/etc/sudoers.d> cat neutron_sudoers Defaults:neutron !requiretty neutron ALL = (root) NOPASSWD: /usr/bin/neutron-rootwrap /etc/neutron/rootwrap.conf * Whatever is causing this didn’t start until I had been running the rootwrap daemon for 2 weeks, and it has not started in our prod cluster. From: Erik Olof Gunnar Andersson <eandersson@blizzard.com<mailto:eandersson@blizzard.com>> Sent: Wednesday, October 9, 2019 6:40 PM To: Albert Braden <albertb@synopsys.com<mailto:albertb@synopsys.com>>; Chris Apsey <bitskrieg@bitskrieg.net<mailto:bitskrieg@bitskrieg.net>> Cc: openstack-discuss@lists.openstack.org<mailto:openstack-discuss@lists.openstack.org> Subject: Re: Port creation times out for some VMs in large group You are probably missing an entry in your sudoers file. You need something like neutron ALL = (root) NOPASSWD: /usr/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf ________________________________ From: Albert Braden <Albert.Braden@synopsys.com<mailto:Albert.Braden@synopsys.com>> Sent: Wednesday, October 9, 2019 5:20 PM To: Chris Apsey <bitskrieg@bitskrieg.net<mailto:bitskrieg@bitskrieg.net>> Cc: openstack-discuss@lists.openstack.org<mailto:openstack-discuss@lists.openstack.org> <openstack-discuss@lists.openstack.org<mailto:openstack-discuss@lists.openstack.org>> Subject: RE: Port creation times out for some VMs in large group We tested this in dev and qa and then implemented in production and it did make a difference, but 2 weeks later we started seeing an issue, first in dev, and then in qa. In syslog we see neutron-linuxbridge-agent.service stopping and starting[1]. In neutron-linuxbridge-agent.log we see a rootwrap error[2]: “Exception: Failed to spawn rootwrap process.” If I comment out ‘root_helper_daemon = "sudo /usr/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf"’ and restart neutron services then the error goes away. How can I use the root_helper_daemon setting without creating this new error? http://paste.openstack.org/show/782622/<https://urldefense.proofpoint.com/v2/url?u=http-3A__paste.openstack.org_show_782622_&d=DwMFAg&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=hT1YhRjyM0zYEXl5feVL1lmrbHaM7sytttrPvi1aZzg&s=mxxRA-SpuIF9xc1Pgx9RrbC3UjGdAFrXm4X6lH6UbR8&e=>
Btw I still think your suders is slightly incorrect. I feel like that is significant, but not a hundred. Drop the star at the end of the last line. root@us01odc-qa-ctrl3:/var/log/neutron# cat /etc/sudoers.d/neutron_sudoers Defaults:neutron !requiretty neutron ALL = (root) NOPASSWD: /usr/bin/neutron-rootwrap /etc/neutron/rootwrap.conf * neutron ALL = (root) NOPASSWD: /usr/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf ________________________________ From: Erik Olof Gunnar Andersson <eandersson@blizzard.com> Sent: Thursday, October 10, 2019 6:18 PM To: Albert Braden <Albert.Braden@synopsys.com>; Chris Apsey <bitskrieg@bitskrieg.net> Cc: openstack-discuss@lists.openstack.org <openstack-discuss@lists.openstack.org> Subject: Re: Port creation times out for some VMs in large group Maybe double check that your rootwrap config is up to date? /etc/neutron/rootwrap .conf and /etc/neutron/rootwrap.d (Make sure to pick the appropriate branch in github) https://github.com/openstack/neutron/blob/master/etc/rootwrap.conf https://github.com/openstack/neutron/tree/master/etc/neutron/rootwrap.d ________________________________ From: Albert Braden <Albert.Braden@synopsys.com> Sent: Thursday, October 10, 2019 1:45 PM To: Erik Olof Gunnar Andersson <eandersson@blizzard.com>; Chris Apsey <bitskrieg@bitskrieg.net> Cc: openstack-discuss@lists.openstack.org <openstack-discuss@lists.openstack.org> Subject: RE: Port creation times out for some VMs in large group The errors appear to start with this line: 2019-10-10 13:42:48.261 1211336 ERROR neutron.agent.linux.utils [req-42c530f6-6e08-47c1-8ed4-dcb31c9cd972 - - - - -] Rootwrap error running command: ['iptables-save', '-t', 'raw']: Exception: Failed to spawn rootwrap process. We’re not running iptables. Do we need it, to use the rootwrap daemon? From: Albert Braden <Albert.Braden@synopsys.com> Sent: Thursday, October 10, 2019 12:13 PM To: Erik Olof Gunnar Andersson <eandersson@blizzard.com>; Chris Apsey <bitskrieg@bitskrieg.net> Cc: openstack-discuss@lists.openstack.org Subject: RE: Port creation times out for some VMs in large group It looks like something is still missing. I added the line to /etc/sudoers.d/neutron_sudoers: root@us01odc-qa-ctrl3:/var/log/neutron# cat /etc/sudoers.d/neutron_sudoers Defaults:neutron !requiretty neutron ALL = (root) NOPASSWD: /usr/bin/neutron-rootwrap /etc/neutron/rootwrap.conf * neutron ALL = (root) NOPASSWD: /usr/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf * Then I restarted neutron services and the error was gone… for a few minutes, and then it came back on ctrl3. Ctrl1/2 aren’t erroring at this time. I changed neutron’s shell and tested the daemon command and it seems to work: root@us01odc-qa-ctrl3:~# su - neutron neutron@us01odc-qa-ctrl3:~$ /usr/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf /tmp/rootwrap-5b1QoP/rootwrap.sock Z%▒"▒▒▒Vs▒▒5-▒,a▒▒▒▒G▒▒▒▒v▒▒ But neutron-linuxbridge-agent.log still scrolls errors: http://paste.openstack.org/show/782740/<https://urldefense.com/v3/__http://paste.openstack.org/show/782740/__;!2E0gRdhhnqPNNL0!z5cwPxQ1y_zz0MvtFzMZSCIh7-3d80kxciHbPtkj4LbHCzSkzNpf36RwLi8kWGm1Ew$> It appears that there is another factor besides the config, because even when the sudoers line was missing, it would work for hours or days before the error started. It has been working in our prod cluster for about a week now, without the sudoers line. It seems like it should not work that way. What am I missing? From: Erik Olof Gunnar Andersson <eandersson@blizzard.com<mailto:eandersson@blizzard.com>> Sent: Thursday, October 10, 2019 11:08 AM To: Albert Braden <albertb@synopsys.com<mailto:albertb@synopsys.com>>; Chris Apsey <bitskrieg@bitskrieg.net<mailto:bitskrieg@bitskrieg.net>> Cc: openstack-discuss@lists.openstack.org<mailto:openstack-discuss@lists.openstack.org> Subject: RE: Port creation times out for some VMs in large group Yea – if you look at your sudoers its only allowing the old traditional rootwrap, and not the new daemon. You need both. Defaults:neutron !requiretty neutron ALL = (root) NOPASSWD: /usr/bin/neutron-rootwrap /etc/neutron/rootwrap.conf * neutron ALL = (root) NOPASSWD: /usr/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf Best Regards, Erik Olof Gunnar Andersson From: Albert Braden <Albert.Braden@synopsys.com<mailto:Albert.Braden@synopsys.com>> Sent: Thursday, October 10, 2019 11:05 AM To: Erik Olof Gunnar Andersson <eandersson@blizzard.com<mailto:eandersson@blizzard.com>>; Chris Apsey <bitskrieg@bitskrieg.net<mailto:bitskrieg@bitskrieg.net>> Cc: openstack-discuss@lists.openstack.org<mailto:openstack-discuss@lists.openstack.org> Subject: RE: Port creation times out for some VMs in large group I have the neutron sudoers line under sudoers.d: root@us01odc-qa-ctrl1:/etc/sudoers.d#<mailto:root@us01odc-qa-ctrl1:/etc/sudoers.d> cat neutron_sudoers Defaults:neutron !requiretty neutron ALL = (root) NOPASSWD: /usr/bin/neutron-rootwrap /etc/neutron/rootwrap.conf * Whatever is causing this didn’t start until I had been running the rootwrap daemon for 2 weeks, and it has not started in our prod cluster. From: Erik Olof Gunnar Andersson <eandersson@blizzard.com<mailto:eandersson@blizzard.com>> Sent: Wednesday, October 9, 2019 6:40 PM To: Albert Braden <albertb@synopsys.com<mailto:albertb@synopsys.com>>; Chris Apsey <bitskrieg@bitskrieg.net<mailto:bitskrieg@bitskrieg.net>> Cc: openstack-discuss@lists.openstack.org<mailto:openstack-discuss@lists.openstack.org> Subject: Re: Port creation times out for some VMs in large group You are probably missing an entry in your sudoers file. You need something like neutron ALL = (root) NOPASSWD: /usr/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf ________________________________ From: Albert Braden <Albert.Braden@synopsys.com<mailto:Albert.Braden@synopsys.com>> Sent: Wednesday, October 9, 2019 5:20 PM To: Chris Apsey <bitskrieg@bitskrieg.net<mailto:bitskrieg@bitskrieg.net>> Cc: openstack-discuss@lists.openstack.org<mailto:openstack-discuss@lists.openstack.org> <openstack-discuss@lists.openstack.org<mailto:openstack-discuss@lists.openstack.org>> Subject: RE: Port creation times out for some VMs in large group We tested this in dev and qa and then implemented in production and it did make a difference, but 2 weeks later we started seeing an issue, first in dev, and then in qa. In syslog we see neutron-linuxbridge-agent.service stopping and starting[1]. In neutron-linuxbridge-agent.log we see a rootwrap error[2]: “Exception: Failed to spawn rootwrap process.” If I comment out ‘root_helper_daemon = "sudo /usr/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf"’ and restart neutron services then the error goes away. How can I use the root_helper_daemon setting without creating this new error? http://paste.openstack.org/show/782622/<https://urldefense.proofpoint.com/v2/url?u=http-3A__paste.openstack.org_show_782622_&d=DwMFAg&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=hT1YhRjyM0zYEXl5feVL1lmrbHaM7sytttrPvi1aZzg&s=mxxRA-SpuIF9xc1Pgx9RrbC3UjGdAFrXm4X6lH6UbR8&e=>
Btw I still think your suders is slightly incorrect. I feel like that is significant, but not a hundred. Drop the star at the end of the last line. root@us01odc-qa-ctrl3:/var/log/neutron# cat /etc/sudoers.d/neutron_sudoers Defaults:neutron !requiretty neutron ALL = (root) NOPASSWD: /usr/bin/neutron-rootwrap /etc/neutron/rootwrap.conf * neutron ALL = (root) NOPASSWD: /usr/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf ________________________________ From: Erik Olof Gunnar Andersson <eandersson@blizzard.com> Sent: Thursday, October 10, 2019 6:18 PM To: Albert Braden <Albert.Braden@synopsys.com>; Chris Apsey <bitskrieg@bitskrieg.net> Cc: openstack-discuss@lists.openstack.org <openstack-discuss@lists.openstack.org> Subject: Re: Port creation times out for some VMs in large group Maybe double check that your rootwrap config is up to date? /etc/neutron/rootwrap .conf and /etc/neutron/rootwrap.d (Make sure to pick the appropriate branch in github) https://github.com/openstack/neutron/blob/master/etc/rootwrap.conf https://github.com/openstack/neutron/tree/master/etc/neutron/rootwrap.d ________________________________ From: Albert Braden <Albert.Braden@synopsys.com> Sent: Thursday, October 10, 2019 1:45 PM To: Erik Olof Gunnar Andersson <eandersson@blizzard.com>; Chris Apsey <bitskrieg@bitskrieg.net> Cc: openstack-discuss@lists.openstack.org <openstack-discuss@lists.openstack.org> Subject: RE: Port creation times out for some VMs in large group The errors appear to start with this line: 2019-10-10 13:42:48.261 1211336 ERROR neutron.agent.linux.utils [req-42c530f6-6e08-47c1-8ed4-dcb31c9cd972 - - - - -] Rootwrap error running command: ['iptables-save', '-t', 'raw']: Exception: Failed to spawn rootwrap process. We’re not running iptables. Do we need it, to use the rootwrap daemon? From: Albert Braden <Albert.Braden@synopsys.com> Sent: Thursday, October 10, 2019 12:13 PM To: Erik Olof Gunnar Andersson <eandersson@blizzard.com>; Chris Apsey <bitskrieg@bitskrieg.net> Cc: openstack-discuss@lists.openstack.org Subject: RE: Port creation times out for some VMs in large group It looks like something is still missing. I added the line to /etc/sudoers.d/neutron_sudoers: root@us01odc-qa-ctrl3:/var/log/neutron# cat /etc/sudoers.d/neutron_sudoers Defaults:neutron !requiretty neutron ALL = (root) NOPASSWD: /usr/bin/neutron-rootwrap /etc/neutron/rootwrap.conf * neutron ALL = (root) NOPASSWD: /usr/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf * Then I restarted neutron services and the error was gone… for a few minutes, and then it came back on ctrl3. Ctrl1/2 aren’t erroring at this time. I changed neutron’s shell and tested the daemon command and it seems to work: root@us01odc-qa-ctrl3:~# su - neutron neutron@us01odc-qa-ctrl3:~$ /usr/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf /tmp/rootwrap-5b1QoP/rootwrap.sock Z%▒"▒▒▒Vs▒▒5-▒,a▒▒▒▒G▒▒▒▒v▒▒ But neutron-linuxbridge-agent.log still scrolls errors: http://paste.openstack.org/show/782740/<https://urldefense.com/v3/__http://paste.openstack.org/show/782740/__;!2E0gRdhhnqPNNL0!z5cwPxQ1y_zz0MvtFzMZSCIh7-3d80kxciHbPtkj4LbHCzSkzNpf36RwLi8kWGm1Ew$> It appears that there is another factor besides the config, because even when the sudoers line was missing, it would work for hours or days before the error started. It has been working in our prod cluster for about a week now, without the sudoers line. It seems like it should not work that way. What am I missing?
It appears that the extra * was the issue. After removing it I can run the rootwrap daemon without errors. I'm not 100% sure because the issue took 2 weeks to show up after the initial config change, but this seems to have fixed the problem. From: Erik Olof Gunnar Andersson <eandersson@blizzard.com> Sent: Thursday, October 10, 2019 6:21 PM To: Albert Braden <albertb@synopsys.com>; Chris Apsey <bitskrieg@bitskrieg.net> Cc: openstack-discuss@lists.openstack.org Subject: Re: Port creation times out for some VMs in large group Btw I still think your suders is slightly incorrect. I feel like that is significant, but not a hundred. Drop the star at the end of the last line. root@us01odc-qa-ctrl3:/var/log/neutron# cat /etc/sudoers.d/neutron_sudoers Defaults:neutron !requiretty neutron ALL = (root) NOPASSWD: /usr/bin/neutron-rootwrap /etc/neutron/rootwrap.conf * neutron ALL = (root) NOPASSWD: /usr/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf ________________________________ From: Erik Olof Gunnar Andersson <eandersson@blizzard.com<mailto:eandersson@blizzard.com>> Sent: Thursday, October 10, 2019 6:18 PM To: Albert Braden <Albert.Braden@synopsys.com<mailto:Albert.Braden@synopsys.com>>; Chris Apsey <bitskrieg@bitskrieg.net<mailto:bitskrieg@bitskrieg.net>> Cc: openstack-discuss@lists.openstack.org<mailto:openstack-discuss@lists.openstack.org> <openstack-discuss@lists.openstack.org<mailto:openstack-discuss@lists.openstack.org>> Subject: Re: Port creation times out for some VMs in large group Maybe double check that your rootwrap config is up to date? /etc/neutron/rootwrap .conf and /etc/neutron/rootwrap.d (Make sure to pick the appropriate branch in github) https://github.com/openstack/neutron/blob/master/etc/rootwrap.conf<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_openstack_neutron_blob_master_etc_rootwrap.conf&d=DwMFIw&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=29xksG5hik0V-K7r47f3d2k4DKuxpLDhvkdGmQ2rX5o&s=gAk-Efh-K5lP8Zm2rRhHbkXiZvUUFvZJtNKuqbmMukM&e=> https://github.com/openstack/neutron/tree/master/etc/neutron/rootwrap.d<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_openstack_neutron_tree_master_etc_neutron_rootwrap.d&d=DwMFIw&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=29xksG5hik0V-K7r47f3d2k4DKuxpLDhvkdGmQ2rX5o&s=asd9vqO5bTt0d0a1Y2ckk_5HyxY4criuTMgVu-MQlDU&e=> ________________________________ From: Albert Braden <Albert.Braden@synopsys.com<mailto:Albert.Braden@synopsys.com>> Sent: Thursday, October 10, 2019 1:45 PM To: Erik Olof Gunnar Andersson <eandersson@blizzard.com<mailto:eandersson@blizzard.com>>; Chris Apsey <bitskrieg@bitskrieg.net<mailto:bitskrieg@bitskrieg.net>> Cc: openstack-discuss@lists.openstack.org<mailto:openstack-discuss@lists.openstack.org> <openstack-discuss@lists.openstack.org<mailto:openstack-discuss@lists.openstack.org>> Subject: RE: Port creation times out for some VMs in large group The errors appear to start with this line: 2019-10-10 13:42:48.261 1211336 ERROR neutron.agent.linux.utils [req-42c530f6-6e08-47c1-8ed4-dcb31c9cd972 - - - - -] Rootwrap error running command: ['iptables-save', '-t', 'raw']: Exception: Failed to spawn rootwrap process. We're not running iptables. Do we need it, to use the rootwrap daemon? From: Albert Braden <Albert.Braden@synopsys.com<mailto:Albert.Braden@synopsys.com>> Sent: Thursday, October 10, 2019 12:13 PM To: Erik Olof Gunnar Andersson <eandersson@blizzard.com<mailto:eandersson@blizzard.com>>; Chris Apsey <bitskrieg@bitskrieg.net<mailto:bitskrieg@bitskrieg.net>> Cc: openstack-discuss@lists.openstack.org<mailto:openstack-discuss@lists.openstack.org> Subject: RE: Port creation times out for some VMs in large group It looks like something is still missing. I added the line to /etc/sudoers.d/neutron_sudoers: root@us01odc-qa-ctrl3:/var/log/neutron# cat /etc/sudoers.d/neutron_sudoers Defaults:neutron !requiretty neutron ALL = (root) NOPASSWD: /usr/bin/neutron-rootwrap /etc/neutron/rootwrap.conf * neutron ALL = (root) NOPASSWD: /usr/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf * Then I restarted neutron services and the error was gone... for a few minutes, and then it came back on ctrl3. Ctrl1/2 aren't erroring at this time. I changed neutron's shell and tested the daemon command and it seems to work: root@us01odc-qa-ctrl3:~# su - neutron neutron@us01odc-qa-ctrl3:~$ /usr/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf /tmp/rootwrap-5b1QoP/rootwrap.sock Z%▒"▒▒▒Vs▒▒5-▒,a▒▒▒▒G▒▒▒▒v▒▒ But neutron-linuxbridge-agent.log still scrolls errors: http://paste.openstack.org/show/782740/<https://urldefense.com/v3/__http:/paste.openstack.org/show/782740/__;!2E0gRdhhnqPNNL0!z5cwPxQ1y_zz0MvtFzMZSCIh7-3d80kxciHbPtkj4LbHCzSkzNpf36RwLi8kWGm1Ew$> It appears that there is another factor besides the config, because even when the sudoers line was missing, it would work for hours or days before the error started. It has been working in our prod cluster for about a week now, without the sudoers line. It seems like it should not work that way. What am I missing?
You are probably missing an entry in your sudoers file. You need something like neutron ALL = (root) NOPASSWD: /usr/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf ________________________________ From: Albert Braden <Albert.Braden@synopsys.com> Sent: Wednesday, October 9, 2019 5:20 PM To: Chris Apsey <bitskrieg@bitskrieg.net> Cc: openstack-discuss@lists.openstack.org <openstack-discuss@lists.openstack.org> Subject: RE: Port creation times out for some VMs in large group We tested this in dev and qa and then implemented in production and it did make a difference, but 2 weeks later we started seeing an issue, first in dev, and then in qa. In syslog we see neutron-linuxbridge-agent.service stopping and starting[1]. In neutron-linuxbridge-agent.log we see a rootwrap error[2]: “Exception: Failed to spawn rootwrap process.” If I comment out ‘root_helper_daemon = "sudo /usr/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf"’ and restart neutron services then the error goes away. How can I use the root_helper_daemon setting without creating this new error? [1] Oct 9 13:48:38 us01odc-qa-ctrl1 systemd[1]: neutron-linuxbridge-agent.service: Main process exited, code=exited, status=1/FAILURE Oct 9 13:48:38 us01odc-qa-ctrl1 systemd[1]: neutron-linuxbridge-agent.service: Failed with result 'exit-code'. Oct 9 13:48:38 us01odc-qa-ctrl1 systemd[1]: neutron-linuxbridge-agent.service: Service hold-off time over, scheduling restart. Oct 9 13:48:38 us01odc-qa-ctrl1 systemd[1]: neutron-linuxbridge-agent.service: Scheduled restart job, restart counter is at 2. Oct 9 13:48:38 us01odc-qa-ctrl1 systemd[1]: Stopped Openstack Neutron Linux Bridge Agent. Oct 9 13:48:38 us01odc-qa-ctrl1 systemd[1]: Starting Openstack Neutron Linux Bridge Agent... Oct 9 13:48:38 us01odc-qa-ctrl1 systemd[1]: Started Openstack Neutron Linux Bridge Agent. Oct 9 13:48:41 us01odc-qa-ctrl1 systemd[1]: neutron-linuxbridge-agent.service: Main process exited, code=exited, status=1/FAILURE Oct 9 13:48:41 us01odc-qa-ctrl1 systemd[1]: neutron-linuxbridge-agent.service: Failed with result 'exit-code'. Oct 9 13:48:41 us01odc-qa-ctrl1 systemd[1]: neutron-linuxbridge-agent.service: Service hold-off time over, scheduling restart. Oct 9 13:48:41 us01odc-qa-ctrl1 systemd[1]: neutron-linuxbridge-agent.service: Scheduled restart job, restart counter is at 3. Oct 9 13:48:41 us01odc-qa-ctrl1 systemd[1]: Stopped Openstack Neutron Linux Bridge Agent. Oct 9 13:48:41 us01odc-qa-ctrl1 systemd[1]: Starting Openstack Neutron Linux Bridge Agent... Oct 9 13:48:41 us01odc-qa-ctrl1 systemd[1]: Started Openstack Neutron Linux Bridge Agent. Oct 9 13:48:43 us01odc-qa-ctrl1 systemd[1]: neutron-linuxbridge-agent.service: Main process exited, code=exited, status=1/FAILURE Oct 9 13:48:43 us01odc-qa-ctrl1 systemd[1]: neutron-linuxbridge-agent.service: Failed with result 'exit-code'. Oct 9 13:48:43 us01odc-qa-ctrl1 systemd[1]: neutron-linuxbridge-agent.service: Service hold-off time over, scheduling restart. Oct 9 13:48:43 us01odc-qa-ctrl1 systemd[1]: neutron-linuxbridge-agent.service: Scheduled restart job, restart counter is at 4. Oct 9 13:48:43 us01odc-qa-ctrl1 systemd[1]: Stopped Openstack Neutron Linux Bridge Agent. Oct 9 13:48:43 us01odc-qa-ctrl1 systemd[1]: neutron-linuxbridge-agent.service: Start request repeated too quickly. Oct 9 13:48:43 us01odc-qa-ctrl1 systemd[1]: neutron-linuxbridge-agent.service: Failed with result 'exit-code'. Oct 9 13:48:43 us01odc-qa-ctrl1 systemd[1]: Failed to start Openstack Neutron Linux Bridge Agent. [2] 2019-10-09 17:05:24.519 5803 INFO neutron.common.config [-] Logging enabled! 2019-10-09 17:05:24.519 5803 INFO neutron.common.config [-] /usr/bin/neutron-linuxbridge-agent version 13.0.4 2019-10-09 17:05:24.520 5803 INFO neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent [-] Interface mappings: {'physnet1': 'eno1'} 2019-10-09 17:05:24.520 5803 INFO neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent [-] Bridge mappings: {} 2019-10-09 17:05:24.522 5803 INFO oslo.privsep.daemon [-] Running privsep helper: ['sudo', '/usr/bin/neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'privsep-helper', '--config-file', '/etc/neutron/neutron.conf', '--config-file', '/etc/neutron/plugins/ml2/linuxbridge_agent.ini', '--privsep_context', 'neutron.privileged.default', '--privsep_sock_path', '/tmp/tmpmdyxcD/privsep.sock'] 2019-10-09 17:05:25.071 5803 INFO oslo.privsep.daemon [-] Spawned new privsep daemon via rootwrap 2019-10-09 17:05:25.022 5828 INFO oslo.privsep.daemon [-] privsep daemon starting 2019-10-09 17:05:25.025 5828 INFO oslo.privsep.daemon [-] privsep process running with uid/gid: 0/0 2019-10-09 17:05:25.027 5828 INFO oslo.privsep.daemon [-] privsep process running with capabilities (eff/prm/inh): CAP_DAC_OVERRIDE|CAP_DAC_READ_SEARCH|CAP_NET_ADMIN|CAP_SYS_ADMIN/CAP_DAC_OVERRIDE|CAP_DAC_READ_SEARCH|CAP_NET_ADMIN|CAP_SYS_ADMIN/none 2019-10-09 17:05:25.027 5828 INFO oslo.privsep.daemon [-] privsep daemon running as pid 5828 2019-10-09 17:05:25.125 5803 INFO neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent [-] Agent initialized successfully, now running... 2019-10-09 17:05:25.193 5803 ERROR neutron.agent.linux.utils [req-8aaf64a2-8f0d-44ce-888f-09ae3d1acd78 - - - - -] Rootwrap error running command: ['iptables-save', '-t', 'raw']: Exception: Failed to spawn rootwrap process. 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service [req-8aaf64a2-8f0d-44ce-888f-09ae3d1acd78 - - - - -] Error starting thread.: Exception: Failed to spawn rootwrap process. stderr: sudo: no tty present and no askpass program specified 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service Traceback (most recent call last): 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/oslo_service/service.py", line 794, in run_service 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service service.start() 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/osprofiler/profiler.py", line 158, in wrapper 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service result = f(*args, **kwargs) 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/agent/_common_agent.py", line 86, in start 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service self.setup_rpc() 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/osprofiler/profiler.py", line 158, in wrapper 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service result = f(*args, **kwargs) 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/agent/_common_agent.py", line 153, in setup_rpc 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service self.context, self.sg_plugin_rpc, defer_refresh_firewall=True) 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/neutron/agent/securitygroups_rpc.py", line 58, in __init__ 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service self.init_firewall(defer_refresh_firewall, integration_bridge) 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/neutron/agent/securitygroups_rpc.py", line 83, in init_firewall 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service self.firewall = firewall_class() 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/neutron/agent/linux/iptables_firewall.py", line 88, in __init__ 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service zone_per_port=self.CONNTRACK_ZONE_PER_PORT) 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py", line 274, in inner 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service return f(*args, **kwargs) 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/neutron/agent/linux/ip_conntrack.py", line 58, in get_conntrack 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service execute, namespace, zone_per_port) 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/neutron/agent/linux/ip_conntrack.py", line 75, in __init__ 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service self._populate_initial_zone_map() 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/neutron/agent/linux/ip_conntrack.py", line 182, in _populate_initial_zone_map 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service rules = self.get_rules_for_table_func('raw') 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/neutron/agent/linux/iptables_manager.py", line 477, in get_rules_for_table 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service return self.execute(args, run_as_root=True).split('\n') 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/neutron/agent/linux/utils.py", line 122, in execute 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service execute_rootwrap_daemon(cmd, process_input, addl_env)) 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/neutron/agent/linux/utils.py", line 109, in execute_rootwrap_daemon 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service LOG.error("Rootwrap error running command: %s", cmd) 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__ 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service self.force_reraise() 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service six.reraise(self.type_, self.value, self.tb) 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/neutron/agent/linux/utils.py", line 106, in execute_rootwrap_daemon 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service return client.execute(cmd, process_input) 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/oslo_rootwrap/client.py", line 148, in execute 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service self._ensure_initialized() 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/oslo_rootwrap/client.py", line 115, in _ensure_initialized 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service self._initialize() 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service File "/usr/lib/python2.7/dist-packages/oslo_rootwrap/client.py", line 85, in _initialize 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service (stderr,)) 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service Exception: Failed to spawn rootwrap process. 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service stderr: 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service sudo: no tty present and no askpass program specified 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service 2019-10-09 17:05:25.194 5803 ERROR oslo_service.service 2019-10-09 17:05:25.197 5803 INFO neutron.plugins.ml2.drivers.agent._common_agent [-] Stopping Linux bridge agent agent. 2019-10-09 17:05:25.198 5803 CRITICAL neutron [-] Unhandled error: AttributeError: 'CommonAgentLoop' object has no attribute 'state_rpc' 2019-10-09 17:05:25.198 5803 ERROR neutron Traceback (most recent call last): 2019-10-09 17:05:25.198 5803 ERROR neutron File "/usr/bin/neutron-linuxbridge-agent", line 10, in <module> 2019-10-09 17:05:25.198 5803 ERROR neutron sys.exit(main()) 2019-10-09 17:05:25.198 5803 ERROR neutron File "/usr/lib/python2.7/dist-packages/neutron/cmd/eventlet/plugins/linuxbridge_neutron_agent.py", line 21, in main 2019-10-09 17:05:25.198 5803 ERROR neutron agent_main.main() 2019-10-09 17:05:25.198 5803 ERROR neutron File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/linuxbridge/agent/linuxbridge_neutron_agent.py", line 1051, in main 2019-10-09 17:05:25.198 5803 ERROR neutron launcher.wait() 2019-10-09 17:05:25.198 5803 ERROR neutron File "/usr/lib/python2.7/dist-packages/oslo_service/service.py", line 392, in wait 2019-10-09 17:05:25.198 5803 ERROR neutron status, signo = self._wait_for_exit_or_signal() 2019-10-09 17:05:25.198 5803 ERROR neutron File "/usr/lib/python2.7/dist-packages/oslo_service/service.py", line 377, in _wait_for_exit_or_signal 2019-10-09 17:05:25.198 5803 ERROR neutron self.stop() 2019-10-09 17:05:25.198 5803 ERROR neutron File "/usr/lib/python2.7/dist-packages/oslo_service/service.py", line 292, in stop 2019-10-09 17:05:25.198 5803 ERROR neutron self.services.stop() 2019-10-09 17:05:25.198 5803 ERROR neutron File "/usr/lib/python2.7/dist-packages/oslo_service/service.py", line 760, in stop 2019-10-09 17:05:25.198 5803 ERROR neutron service.stop() 2019-10-09 17:05:25.198 5803 ERROR neutron File "/usr/lib/python2.7/dist-packages/osprofiler/profiler.py", line 158, in wrapper 2019-10-09 17:05:25.198 5803 ERROR neutron result = f(*args, **kwargs) 2019-10-09 17:05:25.198 5803 ERROR neutron File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/agent/_common_agent.py", line 117, in stop 2019-10-09 17:05:25.198 5803 ERROR neutron self.set_rpc_timeout(self.quitting_rpc_timeout) 2019-10-09 17:05:25.198 5803 ERROR neutron File "/usr/lib/python2.7/dist-packages/osprofiler/profiler.py", line 158, in wrapper 2019-10-09 17:05:25.198 5803 ERROR neutron result = f(*args, **kwargs) 2019-10-09 17:05:25.198 5803 ERROR neutron File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/agent/_common_agent.py", line 476, in set_rpc_timeout 2019-10-09 17:05:25.198 5803 ERROR neutron self.state_rpc): 2019-10-09 17:05:25.198 5803 ERROR neutron AttributeError: 'CommonAgentLoop' object has no attribute 'state_rpc' 2019-10-09 17:05:25.198 5803 ERROR neutron
We tested this in dev and qa and then implemented in production and it did make a difference, but 2 weeks later we started seeing an issue, first in dev, and then in qa. In syslog we see neutron-linuxbridge-agent.service stopping and starting[1]. In neutron-linuxbridge-agent.log we see a rootwrap error[2]: “Exception: Failed to spawn rootwrap process.” If I comment out ‘root_helper_daemon = "sudo /usr/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf"’ and restart neutron services then the error goes away. How can I use the root_helper_daemon setting without creating this new error? Message with logs got moderated so logs are here: http://paste.openstack.org/show/782622/ From: Chris Apsey <bitskrieg@bitskrieg.net> Sent: Friday, September 27, 2019 9:34 AM To: Albert Braden <albertb@synopsys.com> Cc: openstack-discuss@lists.openstack.org Subject: Re: Port creation times out for some VMs in large group Albert, Do this: https://cloudblog.switch.ch/2017/08/28/starting-1000-instances-on-switchengines/<https://urldefense.proofpoint.com/v2/url?u=https-3A__cloudblog.switch.ch_2017_08_28_starting-2D1000-2Dinstances-2Don-2Dswitchengines_&d=DwMGaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=izFVaT90rpGh_939STWbrLr4vnSwK2KBtqFKv_J8Gfs&s=UO9bh6wArCWKbxRWfWjt8egNaw9cxrbDwCZ8-2t0GmE&e=> The problem will go away. I'm of the opinion that daemon mode for rootwrap should be the default since the performance improvement is an order of magnitude, but privsep may obviate that concern once its fully implemented. Either way, that should solve your problem. r Chris Apsey ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On Friday, September 27, 2019 12:17 PM, Albert Braden <Albert.Braden@synopsys.com<mailto:Albert.Braden@synopsys.com>> wrote: When I create 100 VMs in our prod cluster: openstack server create --flavor s1.tiny --network it-network --image cirros-0.4.0-x86_64 --min 100 --max 100 alberttest Most of them build successfully in about a minute. 5 or 10 will stay in BUILD status for 5 minutes and then fail with “ BuildAbortException: Build of instance <UUID> aborted: Failed to allocate the network(s), not rescheduling.” If I build smaller numbers, I see less failures, and no failures if I build one at a time. This does not happen in dev or QA; it appears that we are exhausting a resource in prod. I tried reducing various config values in dev but am not able to duplicate the issue. The neutron servers don’t appear to be overloaded during the failure. What config variables should I be looking at? Here are the relevant log entries from the HV: 2019-09-26 10:10:43.001 57008 INFO os_vif [req-dea54d9a-3f3e-4d47-b901-a4f41b1947a8 d28c3871f61e4c8c8f8c7600417f7b14 e9621e3b105245ba8660f434ab21016c - default 4fb72165eee4468e8033cdc7d506ddf0] Successfully plugged vif VIFBridge(active=False,address=fa:16:3e:8b:45:07,bridge_name='brq49cbe55d-51',has_traffic_filtering=True,id=18f4e419-b19c-4b62-b6e4-152ec78e72bc,network=Network(49cbe55d-5188-4183-b5ad-e65f9b46f8f2),plugin='linux_bridge',port_profile=<?>,preserve_on_delete=False,vif_name='tap18f4e419-b1') 2019-09-26 10:15:44.029 57008 WARNING nova.virt.libvirt.driver [req-dea54d9a-3f3e-4d47-b901-a4f41b1947a8 d28c3871f61e4c8c8f8c7600417f7b14 e9621e3b105245ba8660f434ab21016c - default 4fb72165eee4468e8033cdc7d506ddf0] [instance: dc58f154-00f9-4c45-8986-94b10821cbc9] Timeout waiting for [('network-vif-plugged', u'18f4e419-b19c-4b62-b6e4-152ec78e72bc')] for instance with vm_state building and task_state spawning.: Timeout: 300 seconds More logs and data: http://paste.openstack.org/show/779524/
participants (4)
-
Albert Braden
-
Chris Apsey
-
Erik Olof Gunnar Andersson
-
fsbiz@yahoo.com