neutron-metadata-agent broken pipe
Is anyone else seeing this? We are running Rocky. As our production cluster grows in size we are starting to have trouble with neutron-metadata-agent. After restarting it is happy for a minute and then it complains "2019-12-06 17:54:24.615 664587 WARNING oslo_messaging._drivers.amqpdriver [-] Number of call queues is 11, greater than warning threshold: 10. There could be a leak. Increasing threshold to: 20" It increases the threshold a couple of times and then after increasing to 40 we start to see errors: 2019-12-06 17:55:10.119 664578 INFO eventlet.wsgi.server [-] Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/eventlet/wsgi.py", line 521, in handle_one_response write(b''.join(towrite)) File "/usr/lib/python2.7/dist-packages/eventlet/wsgi.py", line 462, in write wfile.flush() File "/usr/lib/python2.7/socket.py", line 307, in flush self._sock.sendall(view[write_offset:write_offset+buffer_size]) File "/usr/lib/python2.7/dist-packages/eventlet/greenio/base.py", line 390, in sendall tail = self.send(data, flags) File "/usr/lib/python2.7/dist-packages/eventlet/greenio/base.py", line 384, in send return self._send_loop(self.fd.send, data, flags) File "/usr/lib/python2.7/dist-packages/eventlet/greenio/base.py", line 371, in _send_loop return send_method(data, *args) error: [Errno 32] Broken pipe It looks like increasing the threshold to 40 fails because it keeps trying: 2019-12-06 17:55:17.452 664597 WARNING oslo_messaging._drivers.amqpdriver [-] Number of call queues is 21, greater than warning threshold: 20. There could be a leak. Increasing threshold to: 40 And the errors increase until the log is nothing but errors, and VMs fail to boot. root@us01odc-p02-ctrl1:~# tail -f /var/log/neutron/neutron-metadata-agent.log self._sock.sendall(view[write_offset:write_offset+buffer_size]) File "/usr/lib/python2.7/dist-packages/eventlet/greenio/base.py", line 390, in sendall tail = self.send(data, flags) File "/usr/lib/python2.7/dist-packages/eventlet/greenio/base.py", line 384, in send return self._send_loop(self.fd.send, data, flags) File "/usr/lib/python2.7/dist-packages/eventlet/greenio/base.py", line 371, in _send_loop return send_method(data, *args) error: [Errno 32] Broken pipe 2019-12-06 17:59:23.664 664597 INFO eventlet.wsgi.server [-] 10.195.73.174,<local> "GET /latest/meta-data/instance-id HTTP/1.0" status: 200 len: 0 time: 62.1517761 2019-12-06 17:59:23.756 664583 INFO eventlet.wsgi.server [-] Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/eventlet/wsgi.py", line 521, in handle_one_response write(b''.join(towrite)) File "/usr/lib/python2.7/dist-packages/eventlet/wsgi.py", line 462, in write wfile.flush() File "/usr/lib/python2.7/socket.py", line 307, in flush self._sock.sendall(view[write_offset:write_offset+buffer_size]) File "/usr/lib/python2.7/dist-packages/eventlet/greenio/base.py", line 390, in sendall tail = self.send(data, flags) File "/usr/lib/python2.7/dist-packages/eventlet/greenio/base.py", line 384, in send return self._send_loop(self.fd.send, data, flags) File "/usr/lib/python2.7/dist-packages/eventlet/greenio/base.py", line 371, in _send_loop return send_method(data, *args) error: [Errno 32] Broken pipe 2019-12-06 17:59:23.757 664583 INFO eventlet.wsgi.server [-] 10.195.65.69,<local> "GET /latest/meta-data/instance-id HTTP/1.0" status: 200 len: 0 time: 63.0419171 2019-12-06 17:59:23.776 664595 ERROR neutron.agent.metadata.agent [-] Unexpected error.: MessagingTimeout: Timed out waiting for a reply to message ID 77551d4cf4394b7b9cdfad68c0be46e8 2019-12-06 17:59:23.776 664595 ERROR neutron.agent.metadata.agent Traceback (most recent call last): 2019-12-06 17:59:23.776 664595 ERROR neutron.agent.metadata.agent File "/usr/lib/python2.7/dist-packages/neutron/agent/metadata/agent.py", line 89, in __call__ 2019-12-06 17:59:23.776 664595 ERROR neutron.agent.metadata.agent instance_id, tenant_id = self._get_instance_and_tenant_id(req) 2019-12-06 17:59:23.776 664595 ERROR neutron.agent.metadata.agent File "/usr/lib/python2.7/dist-packages/neutron/agent/metadata/agent.py", line 162, in _get_instance_and_tenant_id 2019-12-06 17:59:23.776 664595 ERROR neutron.agent.metadata.agent ports = self._get_ports(remote_address, network_id, router_id) 2019-12-06 17:59:23.776 664595 ERROR neutron.agent.metadata.agent File "/usr/lib/python2.7/dist-packages/neutron/agent/metadata/agent.py", line 155, in _get_ports 2019-12-06 17:59:23.776 664595 ERROR neutron.agent.metadata.agent return self._get_ports_for_remote_address(remote_address, networks) 2019-12-06 17:59:23.776 664595 ERROR neutron.agent.metadata.agent File "/usr/lib/python2.7/dist-packages/neutron/common/cache_utils.py", line 116, in __call__ 2019-12-06 17:59:23.776 664595 ERROR neutron.agent.metadata.agent return self.func(target_self, *args, **kwargs) 2019-12-06 17:59:23.776 664595 ERROR neutron.agent.metadata.agent File "/usr/lib/python2.7/dist-packages/neutron/agent/metadata/agent.py", line 137, in _get_ports_for_remote_address 2019-12-06 17:59:23.776 664595 ERROR neutron.agent.metadata.agent ip_address=remote_address) 2019-12-06 17:59:23.776 664595 ERROR neutron.agent.metadata.agent File "/usr/lib/python2.7/dist-packages/neutron/agent/metadata/agent.py", line 106, in _get_ports_from_server 2019-12-06 17:59:23.776 664595 ERROR neutron.agent.metadata.agent return self.plugin_rpc.get_ports(self.context, filters) 2019-12-06 17:59:23.776 664595 ERROR neutron.agent.metadata.agent File "/usr/lib/python2.7/dist-packages/neutron/agent/metadata/agent.py", line 72, in get_ports 2019-12-06 17:59:23.776 664595 ERROR neutron.agent.metadata.agent return cctxt.call(context, 'get_ports', filters=filters) 2019-12-06 17:59:23.776 664595 ERROR neutron.agent.metadata.agent File "/usr/lib/python2.7/dist-packages/neutron/common/rpc.py", line 173, in call 2019-12-06 17:59:23.776 664595 ERROR neutron.agent.metadata.agent time.sleep(wait) 2019-12-06 17:59:23.776 664595 ERROR neutron.agent.metadata.agent File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__ 2019-12-06 17:59:23.776 664595 ERROR neutron.agent.metadata.agent self.force_reraise() 2019-12-06 17:59:23.776 664595 ERROR neutron.agent.metadata.agent File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise 2019-12-06 17:59:23.776 664595 ERROR neutron.agent.metadata.agent six.reraise(self.type_, self.value, self.tb) 2019-12-06 17:59:23.776 664595 ERROR neutron.agent.metadata.agent File "/usr/lib/python2.7/dist-packages/neutron/common/rpc.py", line 150, in call 2019-12-06 17:59:23.776 664595 ERROR neutron.agent.metadata.agent return self._original_context.call(ctxt, method, **kwargs) 2019-12-06 17:59:23.776 664595 ERROR neutron.agent.metadata.agent File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 179, in call 2019-12-06 17:59:23.776 664595 ERROR neutron.agent.metadata.agent retry=self.retry) 2019-12-06 17:59:23.776 664595 ERROR neutron.agent.metadata.agent File "/usr/lib/python2.7/dist-packages/oslo_messaging/transport.py", line 133, in _send 2019-12-06 17:59:23.776 664595 ERROR neutron.agent.metadata.agent retry=retry) 2019-12-06 17:59:23.776 664595 ERROR neutron.agent.metadata.agent File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 584, in send 2019-12-06 17:59:23.776 664595 ERROR neutron.agent.metadata.agent call_monitor_timeout, retry=retry) 2019-12-06 17:59:23.776 664595 ERROR neutron.agent.metadata.agent File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 573, in _send 2019-12-06 17:59:23.776 664595 ERROR neutron.agent.metadata.agent call_monitor_timeout) 2019-12-06 17:59:23.776 664595 ERROR neutron.agent.metadata.agent File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 459, in wait 2019-12-06 17:59:23.776 664595 ERROR neutron.agent.metadata.agent message = self.waiters.get(msg_id, timeout=timeout) 2019-12-06 17:59:23.776 664595 ERROR neutron.agent.metadata.agent File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 336, in get 2019-12-06 17:59:23.776 664595 ERROR neutron.agent.metadata.agent 'to message ID %s' % msg_id) 2019-12-06 17:59:23.776 664595 ERROR neutron.agent.metadata.agent MessagingTimeout: Timed out waiting for a reply to message ID 77551d4cf4394b7b9cdfad68c0be46e8 2019-12-06 17:59:23.776 664595 ERROR neutron.agent.metadata.agent 2019-12-06 17:59:23.778 664595 INFO eventlet.wsgi.server [-] Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/eventlet/wsgi.py", line 521, in handle_one_response write(b''.join(towrite)) File "/usr/lib/python2.7/dist-packages/eventlet/wsgi.py", line 462, in write wfile.flush() File "/usr/lib/python2.7/socket.py", line 307, in flush self._sock.sendall(view[write_offset:write_offset+buffer_size]) File "/usr/lib/python2.7/dist-packages/eventlet/greenio/base.py", line 390, in sendall tail = self.send(data, flags) File "/usr/lib/python2.7/dist-packages/eventlet/greenio/base.py", line 384, in send return self._send_loop(self.fd.send, data, flags) What's causing this? Are we overloading RMQ?
When I try to build a VM I see this in the VM logs: 2019-12-09 20:02:21,396 - url_helper.py[WARNING]: Calling 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [0/120s]: bad status code [503] 2019-12-09 20:03:41,084 - util.py[WARNING]: Failed fetching userdata from url http://169.254.169.254/2009-04-04/user-data 2019-12-09 12:03:53,041 - util.py[WARNING]: Failed running /var/lib/cloud/scripts/per-boot/config_instance.sh [1] 2019-12-09 12:03:53,043 - cc_scripts_per_boot.py[WARNING]: Failed to run module scripts-per-boot (per-boot in /var/lib/cloud/scripts/per-boot) This is the failing line from the script: name=`curl -s http://169.254.169.254/2009-04-04/meta-data/hostname` When I try this from the VM I get this error: albertb@<html><body><h1>503:~ $ curl -s http://169.254.169.254/2009-04-04/meta-data/hostname <html><body><h1>503 Service Unavailable</h1> No server is available to handle this request. </body></html> When I check neutron-metadata-agent.log for the time when the VM was failing I see the "broken pipe" errors: 2019-12-09 11:56:00.075 664593 INFO eventlet.wsgi.server [-] Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/eventlet/wsgi.py", line 521, in handle_one_response write(b''.join(towrite)) File "/usr/lib/python2.7/dist-packages/eventlet/wsgi.py", line 462, in write wfile.flush() File "/usr/lib/python2.7/socket.py", line 307, in flush self._sock.sendall(view[write_offset:write_offset+buffer_size]) File "/usr/lib/python2.7/dist-packages/eventlet/greenio/base.py", line 390, in sendall tail = self.send(data, flags) File "/usr/lib/python2.7/dist-packages/eventlet/greenio/base.py", line 384, in send return self._send_loop(self.fd.send, data, flags) File "/usr/lib/python2.7/dist-packages/eventlet/greenio/base.py", line 371, in _send_loop return send_method(data, *args) error: [Errno 32] Broken pipe Why is my neutron-metadata server failing? Has anyone else seen this problem? We are running Rocky with about 200 hypervisors; it started after we added 100. From: Albert Braden <Albert.Braden@synopsys.com> Sent: Monday, December 9, 2019 10:40 AM To: openstack-discuss@lists.openstack.org Subject: neutron-metadata-agent broken pipe Is anyone else seeing this? We are running Rocky. As our production cluster grows in size we are starting to have trouble with neutron-metadata-agent. After restarting it is happy for a minute and then it complains "2019-12-06 17:54:24.615 664587 WARNING oslo_messaging._drivers.amqpdriver [-] Number of call queues is 11, greater than warning threshold: 10. There could be a leak. Increasing threshold to: 20" It increases the threshold a couple of times and then after increasing to 40 we start to see errors: 2019-12-06 17:55:10.119 664578 INFO eventlet.wsgi.server [-] Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/eventlet/wsgi.py", line 521, in handle_one_response write(b''.join(towrite)) File "/usr/lib/python2.7/dist-packages/eventlet/wsgi.py", line 462, in write wfile.flush() File "/usr/lib/python2.7/socket.py", line 307, in flush self._sock.sendall(view[write_offset:write_offset+buffer_size]) File "/usr/lib/python2.7/dist-packages/eventlet/greenio/base.py", line 390, in sendall tail = self.send(data, flags) File "/usr/lib/python2.7/dist-packages/eventlet/greenio/base.py", line 384, in send return self._send_loop(self.fd.send, data, flags) File "/usr/lib/python2.7/dist-packages/eventlet/greenio/base.py", line 371, in _send_loop return send_method(data, *args) error: [Errno 32] Broken pipe <deleted the rest>
From my VM, if I try the metadata server repeatedly, it eventually gets a response:
albertb@<html><body><h1>503:~ $ curl http://169.254.169.254/2009-04-04/meta-data/instance-id <html><body><h1>503 Service Unavailable</h1> No server is available to handle this request. </body></html> albertb@<html><body><h1>503:~ $ curl http://169.254.169.254/2009-04-04/meta-data/instance-id <html><body><h1>504 Gateway Time-out</h1> The server didn't respond in time. </body></html> albertb@<html><body><h1>503:~ $ curl http://169.254.169.254/2009-04-04/meta-data/instance-id i-000017ccalbertb@<html><body><h1>503:~ $ Then I see this in neutron-metadata-agent.log: 2019-12-09 12:41:55.213 266833 INFO eventlet.wsgi.server [-] 10.195.72.75,<local> "GET /2009-04-04/meta-data/instance-id HTTP/1.1" status: 200 len: 146 time: 0.3922560 When it fails (503/504) nothing is logged by Neutron but I see haproxy logs: Dec 9 13:06:14 us01odc-p02-ctrl3 haproxy-metadata-proxy-569e2c42-3935-48e9-ae41-074650e57b2b[267096]: 10.195.72.75:53266 [09/Dec/2019:13:06:14.475] listener listener/metadata 0/0/-1/-1/0 503 212 - - SC-- 58/58/57/57/3 0/0 "GET /2009-04-04/meta-data/instance-id HTTP/1.1" Dec 9 13:06:59 us01odc-p02-ctrl3 haproxy-metadata-proxy-569e2c42-3935-48e9-ae41-074650e57b2b[267096]: 10.195.72.75:53268 [09/Dec/2019:13:06:27.067] listener listener/metadata 0/0/0/-1/32001 504 194 - - sH-- 90/90/89/89/0 0/0 "GET /2009-04-04/meta-data/instance-id HTTP/1.1" The load on my controllers is around 15 but they have 48 CPU so 15 should be OK. When I look at RMQ I see everything fine except for the q-plugin queue; it has 3 consumers but there are thousands of unacked messages and the number gradually increases. Could that be causing the neutron-metadata issue? From: Albert Braden <Albert.Braden@synopsys.com> Sent: Monday, December 9, 2019 12:20 PM To: openstack-discuss@lists.openstack.org Subject: RE: neutron-metadata-agent broken pipe When I try to build a VM I see this in the VM logs: 2019-12-09 20:02:21,396 - url_helper.py[WARNING]: Calling 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [0/120s]: bad status code [503] 2019-12-09 20:03:41,084 - util.py[WARNING]: Failed fetching userdata from url http://169.254.169.254/2009-04-04/user-data 2019-12-09 12:03:53,041 - util.py[WARNING]: Failed running /var/lib/cloud/scripts/per-boot/config_instance.sh [1] 2019-12-09 12:03:53,043 - cc_scripts_per_boot.py[WARNING]: Failed to run module scripts-per-boot (per-boot in /var/lib/cloud/scripts/per-boot) This is the failing line from the script: name=`curl -s http://169.254.169.254/2009-04-04/meta-data/hostname`<http://169.254.169.254/2009-04-04/meta-data/hostname%60> When I try this from the VM I get this error: albertb@<html><body><h1>503:~ $ curl -s http://169.254.169.254/2009-04-04/meta-data/hostname <html><body><h1>503 Service Unavailable</h1> No server is available to handle this request. </body></html> When I check neutron-metadata-agent.log for the time when the VM was failing I see the "broken pipe" errors: 2019-12-09 11:56:00.075 664593 INFO eventlet.wsgi.server [-] Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/eventlet/wsgi.py", line 521, in handle_one_response write(b''.join(towrite)) File "/usr/lib/python2.7/dist-packages/eventlet/wsgi.py", line 462, in write wfile.flush() File "/usr/lib/python2.7/socket.py", line 307, in flush self._sock.sendall(view[write_offset:write_offset+buffer_size]) File "/usr/lib/python2.7/dist-packages/eventlet/greenio/base.py", line 390, in sendall tail = self.send(data, flags) File "/usr/lib/python2.7/dist-packages/eventlet/greenio/base.py", line 384, in send return self._send_loop(self.fd.send, data, flags) File "/usr/lib/python2.7/dist-packages/eventlet/greenio/base.py", line 371, in _send_loop return send_method(data, *args) error: [Errno 32] Broken pipe Why is my neutron-metadata server failing? Has anyone else seen this problem? We are running Rocky with about 200 hypervisors; it started after we added 100.
It looks like we may be encountering this bug: https://bugs.launchpad.net/neutron/+bug/1853071 I'm testing this patch now. https://review.opendev.org/#/c/697405/ From: Albert Braden Sent: Monday, December 9, 2019 1:11 PM To: openstack-discuss@lists.openstack.org Subject: RE: neutron-metadata-agent broken pipe
From my VM, if I try the metadata server repeatedly, it eventually gets a response:
albertb@<html><body><h1>503:~ $ curl http://169.254.169.254/2009-04-04/meta-data/instance-id <html><body><h1>503 Service Unavailable</h1> No server is available to handle this request. </body></html> albertb@<html><body><h1>503:~ $ curl http://169.254.169.254/2009-04-04/meta-data/instance-id <html><body><h1>504 Gateway Time-out</h1> The server didn't respond in time. </body></html> albertb@<html><body><h1>503:~ $ curl http://169.254.169.254/2009-04-04/meta-data/instance-id i-000017ccalbertb@<html><body><h1>503:~ $ Then I see this in neutron-metadata-agent.log: 2019-12-09 12:41:55.213 266833 INFO eventlet.wsgi.server [-] 10.195.72.75,<local> "GET /2009-04-04/meta-data/instance-id HTTP/1.1" status: 200 len: 146 time: 0.3922560 When it fails (503/504) nothing is logged by Neutron but I see haproxy logs: Dec 9 13:06:14 us01odc-p02-ctrl3 haproxy-metadata-proxy-569e2c42-3935-48e9-ae41-074650e57b2b[267096]: 10.195.72.75:53266 [09/Dec/2019:13:06:14.475] listener listener/metadata 0/0/-1/-1/0 503 212 - - SC-- 58/58/57/57/3 0/0 "GET /2009-04-04/meta-data/instance-id HTTP/1.1" Dec 9 13:06:59 us01odc-p02-ctrl3 haproxy-metadata-proxy-569e2c42-3935-48e9-ae41-074650e57b2b[267096]: 10.195.72.75:53268 [09/Dec/2019:13:06:27.067] listener listener/metadata 0/0/0/-1/32001 504 194 - - sH-- 90/90/89/89/0 0/0 "GET /2009-04-04/meta-data/instance-id HTTP/1.1" The load on my controllers is around 15 but they have 48 CPU so 15 should be OK. When I look at RMQ I see everything fine except for the q-plugin queue; it has 3 consumers but there are thousands of unacked messages and the number gradually increases. Could that be causing the neutron-metadata issue? From: Albert Braden <Albert.Braden@synopsys.com<mailto:Albert.Braden@synopsys.com>> Sent: Monday, December 9, 2019 12:20 PM To: openstack-discuss@lists.openstack.org<mailto:openstack-discuss@lists.openstack.org> Subject: RE: neutron-metadata-agent broken pipe When I try to build a VM I see this in the VM logs: 2019-12-09 20:02:21,396 - url_helper.py[WARNING]: Calling 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [0/120s]: bad status code [503] 2019-12-09 20:03:41,084 - util.py[WARNING]: Failed fetching userdata from url http://169.254.169.254/2009-04-04/user-data 2019-12-09 12:03:53,041 - util.py[WARNING]: Failed running /var/lib/cloud/scripts/per-boot/config_instance.sh [1] 2019-12-09 12:03:53,043 - cc_scripts_per_boot.py[WARNING]: Failed to run module scripts-per-boot (per-boot in /var/lib/cloud/scripts/per-boot) This is the failing line from the script: name=`curl -s http://169.254.169.254/2009-04-04/meta-data/hostname`<http://169.254.169.254/2009-04-04/meta-data/hostname%60> When I try this from the VM I get this error: albertb@<html><body><h1>503:~ $ curl -s http://169.254.169.254/2009-04-04/meta-data/hostname <html><body><h1>503 Service Unavailable</h1> No server is available to handle this request. </body></html> When I check neutron-metadata-agent.log for the time when the VM was failing I see the "broken pipe" errors: 2019-12-09 11:56:00.075 664593 INFO eventlet.wsgi.server [-] Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/eventlet/wsgi.py", line 521, in handle_one_response write(b''.join(towrite)) File "/usr/lib/python2.7/dist-packages/eventlet/wsgi.py", line 462, in write wfile.flush() File "/usr/lib/python2.7/socket.py", line 307, in flush self._sock.sendall(view[write_offset:write_offset+buffer_size]) File "/usr/lib/python2.7/dist-packages/eventlet/greenio/base.py", line 390, in sendall tail = self.send(data, flags) File "/usr/lib/python2.7/dist-packages/eventlet/greenio/base.py", line 384, in send return self._send_loop(self.fd.send, data, flags) File "/usr/lib/python2.7/dist-packages/eventlet/greenio/base.py", line 371, in _send_loop return send_method(data, *args) error: [Errno 32] Broken pipe Why is my neutron-metadata server failing? Has anyone else seen this problem? We are running Rocky with about 200 hypervisors; it started after we added 100.
That patch didn't fix it. It looks like we have a different issue. Can anyone help? From: Albert Braden <Albert.Braden@synopsys.com> Sent: Tuesday, December 10, 2019 10:10 AM To: openstack-discuss@lists.openstack.org Subject: RE: neutron-metadata-agent broken pipe It looks like we may be encountering this bug: https://bugs.launchpad.net/neutron/+bug/1853071 I'm testing this patch now. https://review.opendev.org/#/c/697405/ From: Albert Braden Sent: Monday, December 9, 2019 1:11 PM To: openstack-discuss@lists.openstack.org<mailto:openstack-discuss@lists.openstack.org> Subject: RE: neutron-metadata-agent broken pipe
From my VM, if I try the metadata server repeatedly, it eventually gets a response:
albertb@<html><body><h1>503:~ $ curl http://169.254.169.254/2009-04-04/meta-data/instance-id<https://urldefense.proofpoint.com/v2/url?u=http-3A__169.254.169.254_2009-2D04-2D04_meta-2Ddata_instance-2Did&d=DwMFAg&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=RTlZ5Jyptp6PGLJVV7aNKTeilTRd8FEteQ8FQa0h-q8&s=dAR8ULtj-DCnAa8gFz_kCfaFc-yZFf7iRxJcUNjHG6g&e=> <html><body><h1>503 Service Unavailable</h1> No server is available to handle this request. </body></html> albertb@<html><body><h1>503:~ $ curl http://169.254.169.254/2009-04-04/meta-data/instance-id<https://urldefense.proofpoint.com/v2/url?u=http-3A__169.254.169.254_2009-2D04-2D04_meta-2Ddata_instance-2Did&d=DwMFAg&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=RTlZ5Jyptp6PGLJVV7aNKTeilTRd8FEteQ8FQa0h-q8&s=dAR8ULtj-DCnAa8gFz_kCfaFc-yZFf7iRxJcUNjHG6g&e=> <html><body><h1>504 Gateway Time-out</h1> The server didn't respond in time. </body></html> albertb@<html><body><h1>503:~ $ curl http://169.254.169.254/2009-04-04/meta-data/instance-id<https://urldefense.proofpoint.com/v2/url?u=http-3A__169.254.169.254_2009-2D04-2D04_meta-2Ddata_instance-2Did&d=DwMFAg&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=RTlZ5Jyptp6PGLJVV7aNKTeilTRd8FEteQ8FQa0h-q8&s=dAR8ULtj-DCnAa8gFz_kCfaFc-yZFf7iRxJcUNjHG6g&e=> i-000017ccalbertb@<html><body><h1>503:~ $ Then I see this in neutron-metadata-agent.log: 2019-12-09 12:41:55.213 266833 INFO eventlet.wsgi.server [-] 10.195.72.75,<local> "GET /2009-04-04/meta-data/instance-id HTTP/1.1" status: 200 len: 146 time: 0.3922560 When it fails (503/504) nothing is logged by Neutron but I see haproxy logs: Dec 9 13:06:14 us01odc-p02-ctrl3 haproxy-metadata-proxy-569e2c42-3935-48e9-ae41-074650e57b2b[267096]: 10.195.72.75:53266 [09/Dec/2019:13:06:14.475] listener listener/metadata 0/0/-1/-1/0 503 212 - - SC-- 58/58/57/57/3 0/0 "GET /2009-04-04/meta-data/instance-id HTTP/1.1" Dec 9 13:06:59 us01odc-p02-ctrl3 haproxy-metadata-proxy-569e2c42-3935-48e9-ae41-074650e57b2b[267096]: 10.195.72.75:53268 [09/Dec/2019:13:06:27.067] listener listener/metadata 0/0/0/-1/32001 504 194 - - sH-- 90/90/89/89/0 0/0 "GET /2009-04-04/meta-data/instance-id HTTP/1.1" The load on my controllers is around 15 but they have 48 CPU so 15 should be OK. When I look at RMQ I see everything fine except for the q-plugin queue; it has 3 consumers but there are thousands of unacked messages and the number gradually increases. Could that be causing the neutron-metadata issue? From: Albert Braden <Albert.Braden@synopsys.com<mailto:Albert.Braden@synopsys.com>> Sent: Monday, December 9, 2019 12:20 PM To: openstack-discuss@lists.openstack.org<mailto:openstack-discuss@lists.openstack.org> Subject: RE: neutron-metadata-agent broken pipe When I try to build a VM I see this in the VM logs: 2019-12-09 20:02:21,396 - url_helper.py[WARNING]: Calling 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [0/120s]: bad status code [503] 2019-12-09 20:03:41,084 - util.py[WARNING]: Failed fetching userdata from url http://169.254.169.254/2009-04-04/user-data<https://urldefense.proofpoint.com/v2/url?u=http-3A__169.254.169.254_2009-2D04-2D04_user-2Ddata&d=DwMFAg&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=RTlZ5Jyptp6PGLJVV7aNKTeilTRd8FEteQ8FQa0h-q8&s=JPv_swVY0usZe3pulTrue-YU9tSB9STeSOeM31D0XGs&e=> 2019-12-09 12:03:53,041 - util.py[WARNING]: Failed running /var/lib/cloud/scripts/per-boot/config_instance.sh [1] 2019-12-09 12:03:53,043 - cc_scripts_per_boot.py[WARNING]: Failed to run module scripts-per-boot (per-boot in /var/lib/cloud/scripts/per-boot) This is the failing line from the script: name=`curl -s http://169.254.169.254/2009-04-04/meta-data/hostname`<https://urldefense.proofpoint.com/v2/url?u=http-3A__169.254.169.254_2009-2D04-2D04_meta-2Ddata_hostname-2560&d=DwMFAg&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=RTlZ5Jyptp6PGLJVV7aNKTeilTRd8FEteQ8FQa0h-q8&s=HM_BR79M_6H0rb1UCrVgXrNnbxSQCx20btqfOkCf6pI&e=> When I try this from the VM I get this error: albertb@<html><body><h1>503:~ $ curl -s http://169.254.169.254/2009-04-04/meta-data/hostname<https://urldefense.proofpoint.com/v2/url?u=http-3A__169.254.169.254_2009-2D04-2D04_meta-2Ddata_hostname&d=DwMFAg&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=RTlZ5Jyptp6PGLJVV7aNKTeilTRd8FEteQ8FQa0h-q8&s=LxPWgb-9sfuKnK1sfmxB5LoRjKcmJ7Hs97fUlcmlnK4&e=> <html><body><h1>503 Service Unavailable</h1> No server is available to handle this request. </body></html> When I check neutron-metadata-agent.log for the time when the VM was failing I see the "broken pipe" errors: 2019-12-09 11:56:00.075 664593 INFO eventlet.wsgi.server [-] Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/eventlet/wsgi.py", line 521, in handle_one_response write(b''.join(towrite)) File "/usr/lib/python2.7/dist-packages/eventlet/wsgi.py", line 462, in write wfile.flush() File "/usr/lib/python2.7/socket.py", line 307, in flush self._sock.sendall(view[write_offset:write_offset+buffer_size]) File "/usr/lib/python2.7/dist-packages/eventlet/greenio/base.py", line 390, in sendall tail = self.send(data, flags) File "/usr/lib/python2.7/dist-packages/eventlet/greenio/base.py", line 384, in send return self._send_loop(self.fd.send, data, flags) File "/usr/lib/python2.7/dist-packages/eventlet/greenio/base.py", line 371, in _send_loop return send_method(data, *args) error: [Errno 32] Broken pipe Why is my neutron-metadata server failing? Has anyone else seen this problem? We are running Rocky with about 200 hypervisors; it started after we added 100.
participants (1)
-
Albert Braden