Port creation times out for some VMs in large group

Albert Braden Albert.Braden at synopsys.com
Thu Oct 10 00:20:24 UTC 2019


We tested this in dev and qa and then implemented in production and it did make a difference, but 2 weeks later we started seeing an issue, first in dev, and then in qa. In syslog we see neutron-linuxbridge-agent.service stopping and starting[1]. In neutron-linuxbridge-agent.log we see a rootwrap error[2]: “Exception: Failed to spawn rootwrap process.”

If I comment out ‘root_helper_daemon = "sudo /usr/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf"’ and restart neutron services then the error goes away.

How can I use the root_helper_daemon setting without creating this new error?

[1]
Oct  9 13:48:38 us01odc-qa-ctrl1 systemd[1]: neutron-linuxbridge-agent.service: Main process exited, code=exited, status=1/FAILURE
Oct  9 13:48:38 us01odc-qa-ctrl1 systemd[1]: neutron-linuxbridge-agent.service: Failed with result 'exit-code'.
Oct  9 13:48:38 us01odc-qa-ctrl1 systemd[1]: neutron-linuxbridge-agent.service: Service hold-off time over, scheduling restart.
Oct  9 13:48:38 us01odc-qa-ctrl1 systemd[1]: neutron-linuxbridge-agent.service: Scheduled restart job, restart counter is at 2.
Oct  9 13:48:38 us01odc-qa-ctrl1 systemd[1]: Stopped Openstack Neutron Linux Bridge Agent.
Oct  9 13:48:38 us01odc-qa-ctrl1 systemd[1]: Starting Openstack Neutron Linux Bridge Agent...
Oct  9 13:48:38 us01odc-qa-ctrl1 systemd[1]: Started Openstack Neutron Linux Bridge Agent.
Oct  9 13:48:41 us01odc-qa-ctrl1 systemd[1]: neutron-linuxbridge-agent.service: Main process exited, code=exited, status=1/FAILURE
Oct  9 13:48:41 us01odc-qa-ctrl1 systemd[1]: neutron-linuxbridge-agent.service: Failed with result 'exit-code'.
Oct  9 13:48:41 us01odc-qa-ctrl1 systemd[1]: neutron-linuxbridge-agent.service: Service hold-off time over, scheduling restart.
Oct  9 13:48:41 us01odc-qa-ctrl1 systemd[1]: neutron-linuxbridge-agent.service: Scheduled restart job, restart counter is at 3.
Oct  9 13:48:41 us01odc-qa-ctrl1 systemd[1]: Stopped Openstack Neutron Linux Bridge Agent.
Oct  9 13:48:41 us01odc-qa-ctrl1 systemd[1]: Starting Openstack Neutron Linux Bridge Agent...
Oct  9 13:48:41 us01odc-qa-ctrl1 systemd[1]: Started Openstack Neutron Linux Bridge Agent.
Oct  9 13:48:43 us01odc-qa-ctrl1 systemd[1]: neutron-linuxbridge-agent.service: Main process exited, code=exited, status=1/FAILURE
Oct  9 13:48:43 us01odc-qa-ctrl1 systemd[1]: neutron-linuxbridge-agent.service: Failed with result 'exit-code'.
Oct  9 13:48:43 us01odc-qa-ctrl1 systemd[1]: neutron-linuxbridge-agent.service: Service hold-off time over, scheduling restart.
Oct  9 13:48:43 us01odc-qa-ctrl1 systemd[1]: neutron-linuxbridge-agent.service: Scheduled restart job, restart counter is at 4.
Oct  9 13:48:43 us01odc-qa-ctrl1 systemd[1]: Stopped Openstack Neutron Linux Bridge Agent.
Oct  9 13:48:43 us01odc-qa-ctrl1 systemd[1]: neutron-linuxbridge-agent.service: Start request repeated too quickly.
Oct  9 13:48:43 us01odc-qa-ctrl1 systemd[1]: neutron-linuxbridge-agent.service: Failed with result 'exit-code'.
Oct  9 13:48:43 us01odc-qa-ctrl1 systemd[1]: Failed to start Openstack Neutron Linux Bridge Agent.

[2]
2019-10-09 17:05:24.519 5803 INFO neutron.common.config [-] Logging enabled!
2019-10-09 17:05:24.519 5803 INFO neutron.common.config [-] /usr/bin/neutron-linuxbridge-agent version 13.0.4
2019-10-09 17:05:24.520 5803 INFO neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent [-] Interface mappings: {'physnet1': 'eno1'}
2019-10-09 17:05:24.520 5803 INFO neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent [-] Bridge mappings: {}
2019-10-09 17:05:24.522 5803 INFO oslo.privsep.daemon [-] Running privsep helper: ['sudo', '/usr/bin/neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'privsep-helper', '--config-file', '/etc/neutron/neutron.conf', '--config-file', '/etc/neutron/plugins/ml2/linuxbridge_agent.ini', '--privsep_context', 'neutron.privileged.default', '--privsep_sock_path', '/tmp/tmpmdyxcD/privsep.sock']
2019-10-09 17:05:25.071 5803 INFO oslo.privsep.daemon [-] Spawned new privsep daemon via rootwrap
2019-10-09 17:05:25.022 5828 INFO oslo.privsep.daemon [-] privsep daemon starting
2019-10-09 17:05:25.025 5828 INFO oslo.privsep.daemon [-] privsep process running with uid/gid: 0/0
2019-10-09 17:05:25.027 5828 INFO oslo.privsep.daemon [-] privsep process running with capabilities (eff/prm/inh): CAP_DAC_OVERRIDE|CAP_DAC_READ_SEARCH|CAP_NET_ADMIN|CAP_SYS_ADMIN/CAP_DAC_OVERRIDE|CAP_DAC_READ_SEARCH|CAP_NET_ADMIN|CAP_SYS_ADMIN/none
2019-10-09 17:05:25.027 5828 INFO oslo.privsep.daemon [-] privsep daemon running as pid 5828
2019-10-09 17:05:25.125 5803 INFO neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent [-] Agent initialized successfully, now running...
2019-10-09 17:05:25.193 5803 ERROR neutron.agent.linux.utils [req-8aaf64a2-8f0d-44ce-888f-09ae3d1acd78 - - - - -] Rootwrap error running command: ['iptables-save', '-t', 'raw']: Exception: Failed to spawn rootwrap process.
2019-10-09 17:05:25.194 5803 ERROR oslo_service.service [req-8aaf64a2-8f0d-44ce-888f-09ae3d1acd78 - - - - -] Error starting thread.: Exception: Failed to spawn rootwrap process.
stderr:
sudo: no tty present and no askpass program specified
2019-10-09 17:05:25.194 5803 ERROR oslo_service.service Traceback (most recent call last):
2019-10-09 17:05:25.194 5803 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/oslo_service/service.py", line 794, in run_service
2019-10-09 17:05:25.194 5803 ERROR oslo_service.service     service.start()
2019-10-09 17:05:25.194 5803 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/osprofiler/profiler.py", line 158, in wrapper
2019-10-09 17:05:25.194 5803 ERROR oslo_service.service     result = f(*args, **kwargs)
2019-10-09 17:05:25.194 5803 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/agent/_common_agent.py", line 86, in start
2019-10-09 17:05:25.194 5803 ERROR oslo_service.service     self.setup_rpc()
2019-10-09 17:05:25.194 5803 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/osprofiler/profiler.py", line 158, in wrapper
2019-10-09 17:05:25.194 5803 ERROR oslo_service.service     result = f(*args, **kwargs)
2019-10-09 17:05:25.194 5803 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/agent/_common_agent.py", line 153, in setup_rpc
2019-10-09 17:05:25.194 5803 ERROR oslo_service.service     self.context, self.sg_plugin_rpc, defer_refresh_firewall=True)
2019-10-09 17:05:25.194 5803 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/neutron/agent/securitygroups_rpc.py", line 58, in __init__
2019-10-09 17:05:25.194 5803 ERROR oslo_service.service     self.init_firewall(defer_refresh_firewall, integration_bridge)
2019-10-09 17:05:25.194 5803 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/neutron/agent/securitygroups_rpc.py", line 83, in init_firewall
2019-10-09 17:05:25.194 5803 ERROR oslo_service.service     self.firewall = firewall_class()
2019-10-09 17:05:25.194 5803 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/neutron/agent/linux/iptables_firewall.py", line 88, in __init__
2019-10-09 17:05:25.194 5803 ERROR oslo_service.service     zone_per_port=self.CONNTRACK_ZONE_PER_PORT)
2019-10-09 17:05:25.194 5803 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py", line 274, in inner
2019-10-09 17:05:25.194 5803 ERROR oslo_service.service     return f(*args, **kwargs)
2019-10-09 17:05:25.194 5803 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/neutron/agent/linux/ip_conntrack.py", line 58, in get_conntrack
2019-10-09 17:05:25.194 5803 ERROR oslo_service.service     execute, namespace, zone_per_port)
2019-10-09 17:05:25.194 5803 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/neutron/agent/linux/ip_conntrack.py", line 75, in __init__
2019-10-09 17:05:25.194 5803 ERROR oslo_service.service     self._populate_initial_zone_map()
2019-10-09 17:05:25.194 5803 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/neutron/agent/linux/ip_conntrack.py", line 182, in _populate_initial_zone_map
2019-10-09 17:05:25.194 5803 ERROR oslo_service.service     rules = self.get_rules_for_table_func('raw')
2019-10-09 17:05:25.194 5803 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/neutron/agent/linux/iptables_manager.py", line 477, in get_rules_for_table
2019-10-09 17:05:25.194 5803 ERROR oslo_service.service     return self.execute(args, run_as_root=True).split('\n')
2019-10-09 17:05:25.194 5803 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/neutron/agent/linux/utils.py", line 122, in execute
2019-10-09 17:05:25.194 5803 ERROR oslo_service.service     execute_rootwrap_daemon(cmd, process_input, addl_env))
2019-10-09 17:05:25.194 5803 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/neutron/agent/linux/utils.py", line 109, in execute_rootwrap_daemon
2019-10-09 17:05:25.194 5803 ERROR oslo_service.service     LOG.error("Rootwrap error running command: %s", cmd)
2019-10-09 17:05:25.194 5803 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
2019-10-09 17:05:25.194 5803 ERROR oslo_service.service     self.force_reraise()
2019-10-09 17:05:25.194 5803 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
2019-10-09 17:05:25.194 5803 ERROR oslo_service.service     six.reraise(self.type_, self.value, self.tb)
2019-10-09 17:05:25.194 5803 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/neutron/agent/linux/utils.py", line 106, in execute_rootwrap_daemon
2019-10-09 17:05:25.194 5803 ERROR oslo_service.service     return client.execute(cmd, process_input)
2019-10-09 17:05:25.194 5803 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/oslo_rootwrap/client.py", line 148, in execute
2019-10-09 17:05:25.194 5803 ERROR oslo_service.service     self._ensure_initialized()
2019-10-09 17:05:25.194 5803 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/oslo_rootwrap/client.py", line 115, in _ensure_initialized
2019-10-09 17:05:25.194 5803 ERROR oslo_service.service     self._initialize()
2019-10-09 17:05:25.194 5803 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/oslo_rootwrap/client.py", line 85, in _initialize
2019-10-09 17:05:25.194 5803 ERROR oslo_service.service     (stderr,))
2019-10-09 17:05:25.194 5803 ERROR oslo_service.service Exception: Failed to spawn rootwrap process.
2019-10-09 17:05:25.194 5803 ERROR oslo_service.service stderr:
2019-10-09 17:05:25.194 5803 ERROR oslo_service.service sudo: no tty present and no askpass program specified
2019-10-09 17:05:25.194 5803 ERROR oslo_service.service
2019-10-09 17:05:25.194 5803 ERROR oslo_service.service
2019-10-09 17:05:25.197 5803 INFO neutron.plugins.ml2.drivers.agent._common_agent [-] Stopping Linux bridge agent agent.
2019-10-09 17:05:25.198 5803 CRITICAL neutron [-] Unhandled error: AttributeError: 'CommonAgentLoop' object has no attribute 'state_rpc'
2019-10-09 17:05:25.198 5803 ERROR neutron Traceback (most recent call last):
2019-10-09 17:05:25.198 5803 ERROR neutron   File "/usr/bin/neutron-linuxbridge-agent", line 10, in <module>
2019-10-09 17:05:25.198 5803 ERROR neutron     sys.exit(main())
2019-10-09 17:05:25.198 5803 ERROR neutron   File "/usr/lib/python2.7/dist-packages/neutron/cmd/eventlet/plugins/linuxbridge_neutron_agent.py", line 21, in main
2019-10-09 17:05:25.198 5803 ERROR neutron     agent_main.main()
2019-10-09 17:05:25.198 5803 ERROR neutron   File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/linuxbridge/agent/linuxbridge_neutron_agent.py", line 1051, in main
2019-10-09 17:05:25.198 5803 ERROR neutron     launcher.wait()
2019-10-09 17:05:25.198 5803 ERROR neutron   File "/usr/lib/python2.7/dist-packages/oslo_service/service.py", line 392, in wait
2019-10-09 17:05:25.198 5803 ERROR neutron     status, signo = self._wait_for_exit_or_signal()
2019-10-09 17:05:25.198 5803 ERROR neutron   File "/usr/lib/python2.7/dist-packages/oslo_service/service.py", line 377, in _wait_for_exit_or_signal
2019-10-09 17:05:25.198 5803 ERROR neutron     self.stop()
2019-10-09 17:05:25.198 5803 ERROR neutron   File "/usr/lib/python2.7/dist-packages/oslo_service/service.py", line 292, in stop
2019-10-09 17:05:25.198 5803 ERROR neutron     self.services.stop()
2019-10-09 17:05:25.198 5803 ERROR neutron   File "/usr/lib/python2.7/dist-packages/oslo_service/service.py", line 760, in stop
2019-10-09 17:05:25.198 5803 ERROR neutron     service.stop()
2019-10-09 17:05:25.198 5803 ERROR neutron   File "/usr/lib/python2.7/dist-packages/osprofiler/profiler.py", line 158, in wrapper
2019-10-09 17:05:25.198 5803 ERROR neutron     result = f(*args, **kwargs)
2019-10-09 17:05:25.198 5803 ERROR neutron   File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/agent/_common_agent.py", line 117, in stop
2019-10-09 17:05:25.198 5803 ERROR neutron     self.set_rpc_timeout(self.quitting_rpc_timeout)
2019-10-09 17:05:25.198 5803 ERROR neutron   File "/usr/lib/python2.7/dist-packages/osprofiler/profiler.py", line 158, in wrapper
2019-10-09 17:05:25.198 5803 ERROR neutron     result = f(*args, **kwargs)
2019-10-09 17:05:25.198 5803 ERROR neutron   File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/agent/_common_agent.py", line 476, in set_rpc_timeout
2019-10-09 17:05:25.198 5803 ERROR neutron     self.state_rpc):
2019-10-09 17:05:25.198 5803 ERROR neutron AttributeError: 'CommonAgentLoop' object has no attribute 'state_rpc'
2019-10-09 17:05:25.198 5803 ERROR neutron

From: Chris Apsey <bitskrieg at bitskrieg.net>
Sent: Friday, September 27, 2019 9:34 AM
To: Albert Braden <albertb at synopsys.com>
Cc: openstack-discuss at lists.openstack.org
Subject: Re: Port creation times out for some VMs in large group

Albert,

Do this: https://cloudblog.switch.ch/2017/08/28/starting-1000-instances-on-switchengines/<https://urldefense.proofpoint.com/v2/url?u=https-3A__cloudblog.switch.ch_2017_08_28_starting-2D1000-2Dinstances-2Don-2Dswitchengines_&d=DwMGaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=izFVaT90rpGh_939STWbrLr4vnSwK2KBtqFKv_J8Gfs&s=UO9bh6wArCWKbxRWfWjt8egNaw9cxrbDwCZ8-2t0GmE&e=>

The problem will go away.  I'm of the opinion that daemon mode for rootwrap should be the default since the performance improvement is an order of magnitude, but privsep may obviate that concern once its fully implemented.

Either way, that should solve your problem.

r

Chris Apsey


‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Friday, September 27, 2019 12:17 PM, Albert Braden <Albert.Braden at synopsys.com<mailto:Albert.Braden at synopsys.com>> wrote:


When I create 100 VMs in our prod cluster:



openstack server create --flavor s1.tiny --network it-network --image cirros-0.4.0-x86_64 --min 100 --max 100 alberttest



Most of them build successfully in about a minute. 5 or 10 will stay in BUILD status for 5 minutes and then fail with “ BuildAbortException: Build of instance <UUID> aborted: Failed to allocate the network(s), not rescheduling.”



If I build smaller numbers, I see less failures, and no failures if I build one at a time. This does not happen in dev or QA; it appears that we are exhausting a resource in prod. I tried reducing various config values in dev but am not able to duplicate the issue. The neutron servers don’t appear to be overloaded during the failure.



What config variables should I be looking at?



Here are the relevant log entries from the HV:



2019-09-26 10:10:43.001 57008 INFO os_vif [req-dea54d9a-3f3e-4d47-b901-a4f41b1947a8 d28c3871f61e4c8c8f8c7600417f7b14 e9621e3b105245ba8660f434ab21016c - default 4fb72165eee4468e8033cdc7d506ddf0] Successfully plugged vif VIFBridge(active=False,address=fa:16:3e:8b:45:07,bridge_name='brq49cbe55d-51',has_traffic_filtering=True,id=18f4e419-b19c-4b62-b6e4-152ec78e72bc,network=Network(49cbe55d-5188-4183-b5ad-e65f9b46f8f2),plugin='linux_bridge',port_profile=<?>,preserve_on_delete=False,vif_name='tap18f4e419-b1')

2019-09-26 10:15:44.029 57008 WARNING nova.virt.libvirt.driver [req-dea54d9a-3f3e-4d47-b901-a4f41b1947a8 d28c3871f61e4c8c8f8c7600417f7b14 e9621e3b105245ba8660f434ab21016c - default 4fb72165eee4468e8033cdc7d506ddf0] [instance: dc58f154-00f9-4c45-8986-94b10821cbc9] Timeout waiting for [('network-vif-plugged', u'18f4e419-b19c-4b62-b6e4-152ec78e72bc')] for instance with vm_state building and task_state spawning.: Timeout: 300 seconds



More logs and data:



http://paste.openstack.org/show/779524/



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20191010/f824029f/attachment-0001.html>


More information about the openstack-discuss mailing list