Hello,

We recently upgraded our swift servers to 20.04 from 18.04. After the upgrades were completed we are seeing 1 server where the ssync receiver keeps crashing. The crash is preventing hand offs from being cleanup up in the cluster (50 servers). Looking for some advice to fix/workaround the issue. We have 20 clusters but are only seeing this issue in 1 cluster. It is strange to me that we only see this ssync receiver error in one server out of the ~1000 we are running.

OS: Ubuntu 20.04.5
Swift version: 2.25.2

On the receiver side we are seeing the following trace:

object-server: senderIp/d1/2166 EXCEPTION in ssync.Receiver: 
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/eventlet/wsgi.py", line 209, in _chunked_read
    self.chunk_length = int(rfile.readline().split(b";",1)[0], 16)
ValueError: invalid literal for int() with base 16: b''

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/swift/obj/ssync_receiver.py", line 166, in __call__
    for data in self.missing_check():
  File "/usr/lib/python3/dist-packages/swift/obj/ssync_receiver.py", line 340, in missing_check
    line = self.fp.readline(self.app.network_chunk_size)
  File "/usr/lib/python3/dist-packages/eventlet/wsgi.py", line 226, in readline
    return self._chunked_read(self.rfile, size, True)
  File "/usr/lib/python3/dist-packages/eventlet/wsgi.py", line 211, in _chunked_read
    raise ChunkReadError(err)
eventlet.wsgi.ChunkReadError: invalid literal for int() with base 16: b''


On the sender side we are seeing this message when I try to run the reconstructor manually with debug logging:
root@use1-saas-p6-paco-9:~# swift-object-reconstructor object-server.conf -p 27999 -v -o
object-reconstructor: Starting 271723
object-reconstructor: Spawned worker 271755 with {'override_partitions': [27999], 'override_devices': ['d10', 'd17', 'd19', 'd9', 'd29', 'd15', 'd2', 'd24', '
d32', 'd1', 'd8', 'd14', 'd11', 'd28', 'd27', 'd6', 'd7', 'd16', 'd13', 'd25', 'd26', 'd20', 'd18', 'd12', 'd4', 'd0', 'd22', 'd3', 'd5', 'd21', 'd33', 'd23',
 'd30', 'd34', 'd31'], 'multiprocess_worker_index': 0}
object-reconstructor: [worker 1/1 pid=271755] Running object reconstructor in script mode.
object-reconstructor: [worker 1/1 pid=271755] Run listdir on /srv/node/d22/objects-4/27999
object-reconstructor: [worker 1/1 pid=271755] recieverIP:6200/d31/27999 10.0 seconds: connect receive
object-reconstructor: [worker 1/1 pid=271755] 1/26820 (0.00%) partitions reconstructed in 10.14s (0.10/sec, 75h remaining)
object-reconstructor: [worker 1/1 pid=271755] Object reconstruction complete (once). (0.17 minutes)
object-reconstructor: Forked worker 271755 finished
object-reconstructor: Worker 271755 exited
object-reconstructor: Finished 271723
object-reconstructor: Exited 271723

Thanks!

Reid Guyett
Staff Storage Engineer
Datto, A Kaseya Company
www.datto.com

Important Notice: This email is intended to be received only by persons entitled to receive the confidential and legally privileged information it presumptively contains, and this notice constitutes identification as such. Any reading, disclosure, copying, distribution or use of this information by or to someone who is not the intended recipient, is prohibited. If you received this email in error, please notify us immediately at legal@kaseya.com, and then delete it. To opt-out of receiving emails Please click here. The term 'this e-mail' includes any and all attachments.