[Swift] Object replication failures on newly upgraded servers

27 May 2021

      HI,

I'm in the process of upgrading a Swift cluster from 2.7/Mitaka to 
2.23/Train. While in general it seems to be going well, I'm noticing 
non-zero object replication failures on the upgraded nodes only, e.g:

$ curl http://localhost:6000/recon/replication/object
{"replication_last": 1622156911.019487, "replication_stats": {"rsync": 
40580, "success": 4141229, "attempted": 2081856, "remove": 4083, 
"suffix_count": 14960481, "failure": 26550, "hashmatch": 4127197, 
"failure_nodes": {"10.11.18.67": {"obj08": 2348, "obj09": 60, "obj10": 
3030, "obj02": 34, "obj03": 25, "obj01": 44, "obj06": 1498, "obj07": 28, 
"obj04": 69, "obj05": 36}, "10.11.18.68": {"obj03": 6901, "obj01": 293, 
"obj06": 1901, "obj04": 10281, "obj10": 1}, "10.12.18.76": {"obj10": 
1}}, "suffix_sync": 1785, "suffix_hash": 2778}, 
"object_replication_last": 1622156911.019487, "replication_time": 
1094.7836411476135, "object_replication_time": 1094.7836411476135}

Examining the logs (/var/log/swift/object.log and /var/log/syslog) these 
are not throwing up any red flags (i.e no failing rsyncs noted). Any 
suggesting about how to get more information about what went wrong e.g: 
"10.11.18.67": {"obj08": 2348}, how to find what those 2348 errors were?

regards

Mark

P.s: basic sanity checking is ok - uploaded objects go where they should 
and can be retrieved for 2.7 or 2.23 servers ok (the old and new version 
servers agree about object placement)

Mark Kirkwood

Pete Zaitcev

Mark Kirkwood

Pete Zaitcev

Mark Kirkwood

tags

participants (2)