Port creation times out for some VMs in large group

Erik Olof Gunnar Andersson eandersson at blizzard.com
Thu Oct 10 01:40:17 UTC 2019


You are probably missing an entry in your sudoers file.

You need something like


neutron ALL = (root) NOPASSWD: /usr/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf

________________________________
From: Albert Braden <Albert.Braden at synopsys.com>
Sent: Wednesday, October 9, 2019 5:20 PM
To: Chris Apsey <bitskrieg at bitskrieg.net>
Cc: openstack-discuss at lists.openstack.org <openstack-discuss at lists.openstack.org>
Subject: RE: Port creation times out for some VMs in large group


We tested this in dev and qa and then implemented in production and it did make a difference, but 2 weeks later we started seeing an issue, first in dev, and then in qa. In syslog we see neutron-linuxbridge-agent.service stopping and starting[1]. In neutron-linuxbridge-agent.log we see a rootwrap error[2]: “Exception: Failed to spawn rootwrap process.”



If I comment out ‘root_helper_daemon = "sudo /usr/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf"’ and restart neutron services then the error goes away.



How can I use the root_helper_daemon setting without creating this new error?



[1]

Oct  9 13:48:38 us01odc-qa-ctrl1 systemd[1]: neutron-linuxbridge-agent.service: Main process exited, code=exited, status=1/FAILURE

Oct  9 13:48:38 us01odc-qa-ctrl1 systemd[1]: neutron-linuxbridge-agent.service: Failed with result 'exit-code'.

Oct  9 13:48:38 us01odc-qa-ctrl1 systemd[1]: neutron-linuxbridge-agent.service: Service hold-off time over, scheduling restart.

Oct  9 13:48:38 us01odc-qa-ctrl1 systemd[1]: neutron-linuxbridge-agent.service: Scheduled restart job, restart counter is at 2.

Oct  9 13:48:38 us01odc-qa-ctrl1 systemd[1]: Stopped Openstack Neutron Linux Bridge Agent.

Oct  9 13:48:38 us01odc-qa-ctrl1 systemd[1]: Starting Openstack Neutron Linux Bridge Agent...

Oct  9 13:48:38 us01odc-qa-ctrl1 systemd[1]: Started Openstack Neutron Linux Bridge Agent.

Oct  9 13:48:41 us01odc-qa-ctrl1 systemd[1]: neutron-linuxbridge-agent.service: Main process exited, code=exited, status=1/FAILURE

Oct  9 13:48:41 us01odc-qa-ctrl1 systemd[1]: neutron-linuxbridge-agent.service: Failed with result 'exit-code'.

Oct  9 13:48:41 us01odc-qa-ctrl1 systemd[1]: neutron-linuxbridge-agent.service: Service hold-off time over, scheduling restart.

Oct  9 13:48:41 us01odc-qa-ctrl1 systemd[1]: neutron-linuxbridge-agent.service: Scheduled restart job, restart counter is at 3.

Oct  9 13:48:41 us01odc-qa-ctrl1 systemd[1]: Stopped Openstack Neutron Linux Bridge Agent.

Oct  9 13:48:41 us01odc-qa-ctrl1 systemd[1]: Starting Openstack Neutron Linux Bridge Agent...

Oct  9 13:48:41 us01odc-qa-ctrl1 systemd[1]: Started Openstack Neutron Linux Bridge Agent.

Oct  9 13:48:43 us01odc-qa-ctrl1 systemd[1]: neutron-linuxbridge-agent.service: Main process exited, code=exited, status=1/FAILURE

Oct  9 13:48:43 us01odc-qa-ctrl1 systemd[1]: neutron-linuxbridge-agent.service: Failed with result 'exit-code'.

Oct  9 13:48:43 us01odc-qa-ctrl1 systemd[1]: neutron-linuxbridge-agent.service: Service hold-off time over, scheduling restart.

Oct  9 13:48:43 us01odc-qa-ctrl1 systemd[1]: neutron-linuxbridge-agent.service: Scheduled restart job, restart counter is at 4.

Oct  9 13:48:43 us01odc-qa-ctrl1 systemd[1]: Stopped Openstack Neutron Linux Bridge Agent.

Oct  9 13:48:43 us01odc-qa-ctrl1 systemd[1]: neutron-linuxbridge-agent.service: Start request repeated too quickly.

Oct  9 13:48:43 us01odc-qa-ctrl1 systemd[1]: neutron-linuxbridge-agent.service: Failed with result 'exit-code'.

Oct  9 13:48:43 us01odc-qa-ctrl1 systemd[1]: Failed to start Openstack Neutron Linux Bridge Agent.


[2]

2019-10-09 17:05:24.519 5803 INFO neutron.common.config [-] Logging enabled!

2019-10-09 17:05:24.519 5803 INFO neutron.common.config [-] /usr/bin/neutron-linuxbridge-agent version 13.0.4

2019-10-09 17:05:24.520 5803 INFO neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent [-] Interface mappings: {'physnet1': 'eno1'}

2019-10-09 17:05:24.520 5803 INFO neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent [-] Bridge mappings: {}

2019-10-09 17:05:24.522 5803 INFO oslo.privsep.daemon [-] Running privsep helper: ['sudo', '/usr/bin/neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'privsep-helper', '--config-file', '/etc/neutron/neutron.conf', '--config-file', '/etc/neutron/plugins/ml2/linuxbridge_agent.ini', '--privsep_context', 'neutron.privileged.default', '--privsep_sock_path', '/tmp/tmpmdyxcD/privsep.sock']

2019-10-09 17:05:25.071 5803 INFO oslo.privsep.daemon [-] Spawned new privsep daemon via rootwrap

2019-10-09 17:05:25.022 5828 INFO oslo.privsep.daemon [-] privsep daemon starting

2019-10-09 17:05:25.025 5828 INFO oslo.privsep.daemon [-] privsep process running with uid/gid: 0/0

2019-10-09 17:05:25.027 5828 INFO oslo.privsep.daemon [-] privsep process running with capabilities (eff/prm/inh): CAP_DAC_OVERRIDE|CAP_DAC_READ_SEARCH|CAP_NET_ADMIN|CAP_SYS_ADMIN/CAP_DAC_OVERRIDE|CAP_DAC_READ_SEARCH|CAP_NET_ADMIN|CAP_SYS_ADMIN/none

2019-10-09 17:05:25.027 5828 INFO oslo.privsep.daemon [-] privsep daemon running as pid 5828

2019-10-09 17:05:25.125 5803 INFO neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent [-] Agent initialized successfully, now running...

2019-10-09 17:05:25.193 5803 ERROR neutron.agent.linux.utils [req-8aaf64a2-8f0d-44ce-888f-09ae3d1acd78 - - - - -] Rootwrap error running command: ['iptables-save', '-t', 'raw']: Exception: Failed to spawn rootwrap process.

2019-10-09 17:05:25.194 5803 ERROR oslo_service.service [req-8aaf64a2-8f0d-44ce-888f-09ae3d1acd78 - - - - -] Error starting thread.: Exception: Failed to spawn rootwrap process.

stderr:

sudo: no tty present and no askpass program specified

2019-10-09 17:05:25.194 5803 ERROR oslo_service.service Traceback (most recent call last):

2019-10-09 17:05:25.194 5803 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/oslo_service/service.py", line 794, in run_service

2019-10-09 17:05:25.194 5803 ERROR oslo_service.service     service.start()

2019-10-09 17:05:25.194 5803 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/osprofiler/profiler.py", line 158, in wrapper

2019-10-09 17:05:25.194 5803 ERROR oslo_service.service     result = f(*args, **kwargs)

2019-10-09 17:05:25.194 5803 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/agent/_common_agent.py", line 86, in start

2019-10-09 17:05:25.194 5803 ERROR oslo_service.service     self.setup_rpc()

2019-10-09 17:05:25.194 5803 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/osprofiler/profiler.py", line 158, in wrapper

2019-10-09 17:05:25.194 5803 ERROR oslo_service.service     result = f(*args, **kwargs)

2019-10-09 17:05:25.194 5803 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/agent/_common_agent.py", line 153, in setup_rpc

2019-10-09 17:05:25.194 5803 ERROR oslo_service.service     self.context, self.sg_plugin_rpc, defer_refresh_firewall=True)

2019-10-09 17:05:25.194 5803 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/neutron/agent/securitygroups_rpc.py", line 58, in __init__

2019-10-09 17:05:25.194 5803 ERROR oslo_service.service     self.init_firewall(defer_refresh_firewall, integration_bridge)

2019-10-09 17:05:25.194 5803 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/neutron/agent/securitygroups_rpc.py", line 83, in init_firewall

2019-10-09 17:05:25.194 5803 ERROR oslo_service.service     self.firewall = firewall_class()

2019-10-09 17:05:25.194 5803 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/neutron/agent/linux/iptables_firewall.py", line 88, in __init__

2019-10-09 17:05:25.194 5803 ERROR oslo_service.service     zone_per_port=self.CONNTRACK_ZONE_PER_PORT)

2019-10-09 17:05:25.194 5803 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py", line 274, in inner

2019-10-09 17:05:25.194 5803 ERROR oslo_service.service     return f(*args, **kwargs)

2019-10-09 17:05:25.194 5803 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/neutron/agent/linux/ip_conntrack.py", line 58, in get_conntrack

2019-10-09 17:05:25.194 5803 ERROR oslo_service.service     execute, namespace, zone_per_port)

2019-10-09 17:05:25.194 5803 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/neutron/agent/linux/ip_conntrack.py", line 75, in __init__

2019-10-09 17:05:25.194 5803 ERROR oslo_service.service     self._populate_initial_zone_map()

2019-10-09 17:05:25.194 5803 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/neutron/agent/linux/ip_conntrack.py", line 182, in _populate_initial_zone_map

2019-10-09 17:05:25.194 5803 ERROR oslo_service.service     rules = self.get_rules_for_table_func('raw')

2019-10-09 17:05:25.194 5803 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/neutron/agent/linux/iptables_manager.py", line 477, in get_rules_for_table

2019-10-09 17:05:25.194 5803 ERROR oslo_service.service     return self.execute(args, run_as_root=True).split('\n')

2019-10-09 17:05:25.194 5803 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/neutron/agent/linux/utils.py", line 122, in execute

2019-10-09 17:05:25.194 5803 ERROR oslo_service.service     execute_rootwrap_daemon(cmd, process_input, addl_env))

2019-10-09 17:05:25.194 5803 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/neutron/agent/linux/utils.py", line 109, in execute_rootwrap_daemon

2019-10-09 17:05:25.194 5803 ERROR oslo_service.service     LOG.error("Rootwrap error running command: %s", cmd)

2019-10-09 17:05:25.194 5803 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__

2019-10-09 17:05:25.194 5803 ERROR oslo_service.service     self.force_reraise()

2019-10-09 17:05:25.194 5803 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise

2019-10-09 17:05:25.194 5803 ERROR oslo_service.service     six.reraise(self.type_, self.value, self.tb)

2019-10-09 17:05:25.194 5803 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/neutron/agent/linux/utils.py", line 106, in execute_rootwrap_daemon

2019-10-09 17:05:25.194 5803 ERROR oslo_service.service     return client.execute(cmd, process_input)

2019-10-09 17:05:25.194 5803 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/oslo_rootwrap/client.py", line 148, in execute

2019-10-09 17:05:25.194 5803 ERROR oslo_service.service     self._ensure_initialized()

2019-10-09 17:05:25.194 5803 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/oslo_rootwrap/client.py", line 115, in _ensure_initialized

2019-10-09 17:05:25.194 5803 ERROR oslo_service.service     self._initialize()

2019-10-09 17:05:25.194 5803 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/oslo_rootwrap/client.py", line 85, in _initialize

2019-10-09 17:05:25.194 5803 ERROR oslo_service.service     (stderr,))

2019-10-09 17:05:25.194 5803 ERROR oslo_service.service Exception: Failed to spawn rootwrap process.

2019-10-09 17:05:25.194 5803 ERROR oslo_service.service stderr:

2019-10-09 17:05:25.194 5803 ERROR oslo_service.service sudo: no tty present and no askpass program specified

2019-10-09 17:05:25.194 5803 ERROR oslo_service.service

2019-10-09 17:05:25.194 5803 ERROR oslo_service.service

2019-10-09 17:05:25.197 5803 INFO neutron.plugins.ml2.drivers.agent._common_agent [-] Stopping Linux bridge agent agent.

2019-10-09 17:05:25.198 5803 CRITICAL neutron [-] Unhandled error: AttributeError: 'CommonAgentLoop' object has no attribute 'state_rpc'

2019-10-09 17:05:25.198 5803 ERROR neutron Traceback (most recent call last):

2019-10-09 17:05:25.198 5803 ERROR neutron   File "/usr/bin/neutron-linuxbridge-agent", line 10, in <module>

2019-10-09 17:05:25.198 5803 ERROR neutron     sys.exit(main())

2019-10-09 17:05:25.198 5803 ERROR neutron   File "/usr/lib/python2.7/dist-packages/neutron/cmd/eventlet/plugins/linuxbridge_neutron_agent.py", line 21, in main

2019-10-09 17:05:25.198 5803 ERROR neutron     agent_main.main()

2019-10-09 17:05:25.198 5803 ERROR neutron   File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/linuxbridge/agent/linuxbridge_neutron_agent.py", line 1051, in main

2019-10-09 17:05:25.198 5803 ERROR neutron     launcher.wait()

2019-10-09 17:05:25.198 5803 ERROR neutron   File "/usr/lib/python2.7/dist-packages/oslo_service/service.py", line 392, in wait

2019-10-09 17:05:25.198 5803 ERROR neutron     status, signo = self._wait_for_exit_or_signal()

2019-10-09 17:05:25.198 5803 ERROR neutron   File "/usr/lib/python2.7/dist-packages/oslo_service/service.py", line 377, in _wait_for_exit_or_signal

2019-10-09 17:05:25.198 5803 ERROR neutron     self.stop()

2019-10-09 17:05:25.198 5803 ERROR neutron   File "/usr/lib/python2.7/dist-packages/oslo_service/service.py", line 292, in stop

2019-10-09 17:05:25.198 5803 ERROR neutron     self.services.stop()

2019-10-09 17:05:25.198 5803 ERROR neutron   File "/usr/lib/python2.7/dist-packages/oslo_service/service.py", line 760, in stop

2019-10-09 17:05:25.198 5803 ERROR neutron     service.stop()

2019-10-09 17:05:25.198 5803 ERROR neutron   File "/usr/lib/python2.7/dist-packages/osprofiler/profiler.py", line 158, in wrapper

2019-10-09 17:05:25.198 5803 ERROR neutron     result = f(*args, **kwargs)

2019-10-09 17:05:25.198 5803 ERROR neutron   File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/agent/_common_agent.py", line 117, in stop

2019-10-09 17:05:25.198 5803 ERROR neutron     self.set_rpc_timeout(self.quitting_rpc_timeout)

2019-10-09 17:05:25.198 5803 ERROR neutron   File "/usr/lib/python2.7/dist-packages/osprofiler/profiler.py", line 158, in wrapper

2019-10-09 17:05:25.198 5803 ERROR neutron     result = f(*args, **kwargs)

2019-10-09 17:05:25.198 5803 ERROR neutron   File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/agent/_common_agent.py", line 476, in set_rpc_timeout

2019-10-09 17:05:25.198 5803 ERROR neutron     self.state_rpc):

2019-10-09 17:05:25.198 5803 ERROR neutron AttributeError: 'CommonAgentLoop' object has no attribute 'state_rpc'

2019-10-09 17:05:25.198 5803 ERROR neutron



From: Chris Apsey <bitskrieg at bitskrieg.net>
Sent: Friday, September 27, 2019 9:34 AM
To: Albert Braden <albertb at synopsys.com>
Cc: openstack-discuss at lists.openstack.org
Subject: Re: Port creation times out for some VMs in large group



Albert,



Do this: https://cloudblog.switch.ch/2017/08/28/starting-1000-instances-on-switchengines/<https://urldefense.proofpoint.com/v2/url?u=https-3A__cloudblog.switch.ch_2017_08_28_starting-2D1000-2Dinstances-2Don-2Dswitchengines_&d=DwMGaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=izFVaT90rpGh_939STWbrLr4vnSwK2KBtqFKv_J8Gfs&s=UO9bh6wArCWKbxRWfWjt8egNaw9cxrbDwCZ8-2t0GmE&e=>



The problem will go away.  I'm of the opinion that daemon mode for rootwrap should be the default since the performance improvement is an order of magnitude, but privsep may obviate that concern once its fully implemented.



Either way, that should solve your problem.



r



Chris Apsey





‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

On Friday, September 27, 2019 12:17 PM, Albert Braden <Albert.Braden at synopsys.com<mailto:Albert.Braden at synopsys.com>> wrote:



When I create 100 VMs in our prod cluster:



openstack server create --flavor s1.tiny --network it-network --image cirros-0.4.0-x86_64 --min 100 --max 100 alberttest



Most of them build successfully in about a minute. 5 or 10 will stay in BUILD status for 5 minutes and then fail with “ BuildAbortException: Build of instance <UUID> aborted: Failed to allocate the network(s), not rescheduling.”



If I build smaller numbers, I see less failures, and no failures if I build one at a time. This does not happen in dev or QA; it appears that we are exhausting a resource in prod. I tried reducing various config values in dev but am not able to duplicate the issue. The neutron servers don’t appear to be overloaded during the failure.



What config variables should I be looking at?



Here are the relevant log entries from the HV:



2019-09-26 10:10:43.001 57008 INFO os_vif [req-dea54d9a-3f3e-4d47-b901-a4f41b1947a8 d28c3871f61e4c8c8f8c7600417f7b14 e9621e3b105245ba8660f434ab21016c - default 4fb72165eee4468e8033cdc7d506ddf0] Successfully plugged vif VIFBridge(active=False,address=fa:16:3e:8b:45:07,bridge_name='brq49cbe55d-51',has_traffic_filtering=True,id=18f4e419-b19c-4b62-b6e4-152ec78e72bc,network=Network(49cbe55d-5188-4183-b5ad-e65f9b46f8f2),plugin='linux_bridge',port_profile=<?>,preserve_on_delete=False,vif_name='tap18f4e419-b1')

2019-09-26 10:15:44.029 57008 WARNING nova.virt.libvirt.driver [req-dea54d9a-3f3e-4d47-b901-a4f41b1947a8 d28c3871f61e4c8c8f8c7600417f7b14 e9621e3b105245ba8660f434ab21016c - default 4fb72165eee4468e8033cdc7d506ddf0] [instance: dc58f154-00f9-4c45-8986-94b10821cbc9] Timeout waiting for [('network-vif-plugged', u'18f4e419-b19c-4b62-b6e4-152ec78e72bc')] for instance with vm_state building and task_state spawning.: Timeout: 300 seconds



More logs and data:



http://paste.openstack.org/show/779524/<https://urldefense.com/v3/__http://paste.openstack.org/show/779524/__;!2E0gRdhhnqPNNL0!wGtOA-OJ_wlWtnIouvVoT4cU2U8Lxu_O3U6DeY1vBMUEV0iavwJCBL9kUcLkfiUm4w$>




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20191010/a34b7c6d/attachment-0001.html>


More information about the openstack-discuss mailing list