Swift issues in one cluster
Albert Braden
ozzzo at yahoo.com
Fri Jun 17 17:33:27 UTC 2022
We have multiple Swift clusters, all configured the same. One of them started failing this week. The symptom is that Swift commands take a long time to execute and sometimes they fail:
$ openstack container list
Unable to establish connection to https://swift.<region>.<domain>:8080/v1/AUTH_<project>: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer')
When we look at the logs we see lots of swift-proxy-server errors:
(from Splunk):
Payload: swift-proxy-server: STDERR: File "/usr/lib64/python3.6/socket.py", line 604, in write#012 return self._sock.send(b)
Payload: swift-proxy-server: STDERR: BlockingIOError
Payload: swift-proxy-server: STDERR: os.read(self.rfd, 1)
Payload: swift-proxy-server: STDERR: File "/usr/lib/python3.6/site-packages/eventlet/wsgi.py", line 818, in process_request#012 proto.__init__(conn_state, self)
Payload: swift-proxy-server: STDERR: File "/usr/lib/python3.6/site-packages/eventlet/greenio/base.py", line 397, in send
Payload: swift-proxy-server: STDERR: return self._sock.send(b)
When we look at network connections, we see haproxy stacking up (many lines of this):
# netstat -ntup | sort -b -k2 -n -r | head -n +100
tcp 5976932 0 127.0.0.1:60738 127.0.0.1:8080 ESTABLISHED 13045/haproxy
tcp 5976446 0 127.0.0.1:58480 127.0.0.1:8080 ESTABLISHED 13045/haproxy
tcp 5973217 0 127.0.0.1:33244 127.0.0.1:8080 ESTABLISHED 13045/haproxy
tcp 5973120 0 127.0.0.1:51836 127.0.0.1:8080 ESTABLISHED 13045/haproxy
tcp 5971968 0 127.0.0.1:58516 127.0.0.1:8080 ESTABLISHED 13045/haproxy
...
If we restart the swift_haproxy and swift_proxy_server containers then the problem goes away, and comes back over a few minutes. Where should we be looking for the root cause of this issue?
More information about the openstack-discuss
mailing list