Swift issues in one cluster

Albert Braden ozzzo at yahoo.com
Fri Jun 17 17:33:27 UTC 2022


We have multiple Swift clusters, all configured the same. One of them started failing this week. The symptom is that Swift commands take a long time to execute and sometimes they fail:
 
$ openstack container list
Unable to establish connection to https://swift.<region>.<domain>:8080/v1/AUTH_<project>: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer')
 
When we look at the logs we see lots of swift-proxy-server errors:
 
(from Splunk):
Payload: swift-proxy-server: STDERR: File "/usr/lib64/python3.6/socket.py", line 604, in write#012 return self._sock.send(b)
Payload: swift-proxy-server: STDERR: BlockingIOError
Payload: swift-proxy-server: STDERR: os.read(self.rfd, 1)
Payload: swift-proxy-server: STDERR: File "/usr/lib/python3.6/site-packages/eventlet/wsgi.py", line 818, in process_request#012 proto.__init__(conn_state, self)
Payload: swift-proxy-server: STDERR: File "/usr/lib/python3.6/site-packages/eventlet/greenio/base.py", line 397, in send
Payload: swift-proxy-server: STDERR: return self._sock.send(b)
 
When we look at network connections, we see haproxy stacking up (many lines of this):
 
# netstat -ntup | sort -b -k2 -n -r | head -n +100
tcp   5976932      0 127.0.0.1:60738         127.0.0.1:8080          ESTABLISHED 13045/haproxy      
tcp   5976446      0 127.0.0.1:58480         127.0.0.1:8080          ESTABLISHED 13045/haproxy      
tcp   5973217      0 127.0.0.1:33244         127.0.0.1:8080          ESTABLISHED 13045/haproxy      
tcp   5973120      0 127.0.0.1:51836         127.0.0.1:8080          ESTABLISHED 13045/haproxy      
tcp   5971968      0 127.0.0.1:58516         127.0.0.1:8080          ESTABLISHED 13045/haproxy      
 ...

If we restart the swift_haproxy and swift_proxy_server containers then the problem goes away, and comes back over a few minutes. Where should we be looking for the root cause of this issue?



More information about the openstack-discuss mailing list