[SWIFT] ssync receiver crash
Hello, We recently upgraded our swift servers to 20.04 from 18.04. After the upgrades were completed we are seeing 1 server where the ssync receiver keeps crashing. The crash is preventing hand offs from being cleanup up in the cluster (50 servers). Looking for some advice to fix/workaround the issue. We have 20 clusters but are only seeing this issue in 1 cluster. It is strange to me that we only see this ssync receiver error in one server out of the ~1000 we are running. OS: Ubuntu 20.04.5 Swift version: 2.25.2 On the receiver side we are seeing the following trace: object-server: senderIp/d1/2166 EXCEPTION in ssync.Receiver: Traceback (most recent call last): File "/usr/lib/python3/dist-packages/eventlet/wsgi.py", line 209, in _chunked_read self.chunk_length = int(rfile.readline().split(b";",1)[0], 16) ValueError: invalid literal for int() with base 16: b'' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/lib/python3/dist-packages/swift/obj/ssync_receiver.py", line 166, in __call__ for data in self.missing_check(): File "/usr/lib/python3/dist-packages/swift/obj/ssync_receiver.py", line 340, in missing_check line = self.fp.readline(self.app.network_chunk_size) File "/usr/lib/python3/dist-packages/eventlet/wsgi.py", line 226, in readline return self._chunked_read(self.rfile, size, True) File "/usr/lib/python3/dist-packages/eventlet/wsgi.py", line 211, in _chunked_read raise ChunkReadError(err) eventlet.wsgi.ChunkReadError: invalid literal for int() with base 16: b'' On the sender side we are seeing this message when I try to run the reconstructor manually with debug logging: root@use1-saas-p6-paco-9:~# swift-object-reconstructor object-server.conf -p 27999 -v -o object-reconstructor: Starting 271723 object-reconstructor: Spawned worker 271755 with {'override_partitions': [27999], 'override_devices': ['d10', 'd17', 'd19', 'd9', 'd29', 'd15', 'd2', 'd24', ' d32', 'd1', 'd8', 'd14', 'd11', 'd28', 'd27', 'd6', 'd7', 'd16', 'd13', 'd25', 'd26', 'd20', 'd18', 'd12', 'd4', 'd0', 'd22', 'd3', 'd5', 'd21', 'd33', 'd23', 'd30', 'd34', 'd31'], 'multiprocess_worker_index': 0} object-reconstructor: [worker 1/1 pid=271755] Running object reconstructor in script mode. object-reconstructor: [worker 1/1 pid=271755] Run listdir on /srv/node/d22/objects-4/27999 object-reconstructor: [worker 1/1 pid=271755] recieverIP:6200/d31/27999 10.0 seconds: connect receive object-reconstructor: [worker 1/1 pid=271755] 1/26820 (0.00%) partitions reconstructed in 10.14s (0.10/sec, 75h remaining) object-reconstructor: [worker 1/1 pid=271755] Object reconstruction complete (once). (0.17 minutes) object-reconstructor: Forked worker 271755 finished object-reconstructor: Worker 271755 exited object-reconstructor: Finished 271723 object-reconstructor: Exited 271723 Thanks! Reid Guyett Staff Storage Engineer Datto, A Kaseya Company www.datto.com<http://www.datto.com/> Important Notice: This email is intended to be received only by persons entitled to receive the confidential and legally privileged information it presumptively contains, and this notice constitutes identification as such. Any reading, disclosure, copying, distribution or use of this information by or to someone who is not the intended recipient, is prohibited. If you received this email in error, please notify us immediately at legal@kaseya.com, and then delete it. To opt-out of receiving emails Please click here<https://info.kaseya.com/email-subscription-center.html>. The term 'this e-mail' includes any and all attachments.
participants (1)
-
Reid Guyett