[Swift] Object replication failures on newly upgraded servers
Mark Kirkwood
mark.kirkwood at catalyst.net.nz
Fri May 28 04:58:10 UTC 2021
HI,
I'm in the process of upgrading a Swift cluster from 2.7/Mitaka to
2.23/Train. While in general it seems to be going well, I'm noticing
non-zero object replication failures on the upgraded nodes only, e.g:
$ curl http://localhost:6000/recon/replication/object
{"replication_last": 1622156911.019487, "replication_stats": {"rsync":
40580, "success": 4141229, "attempted": 2081856, "remove": 4083,
"suffix_count": 14960481, "failure": 26550, "hashmatch": 4127197,
"failure_nodes": {"10.11.18.67": {"obj08": 2348, "obj09": 60, "obj10":
3030, "obj02": 34, "obj03": 25, "obj01": 44, "obj06": 1498, "obj07": 28,
"obj04": 69, "obj05": 36}, "10.11.18.68": {"obj03": 6901, "obj01": 293,
"obj06": 1901, "obj04": 10281, "obj10": 1}, "10.12.18.76": {"obj10":
1}}, "suffix_sync": 1785, "suffix_hash": 2778},
"object_replication_last": 1622156911.019487, "replication_time":
1094.7836411476135, "object_replication_time": 1094.7836411476135}
Examining the logs (/var/log/swift/object.log and /var/log/syslog) these
are not throwing up any red flags (i.e no failing rsyncs noted). Any
suggesting about how to get more information about what went wrong e.g:
"10.11.18.67": {"obj08": 2348}, how to find what those 2348 errors were?
regards
Mark
P.s: basic sanity checking is ok - uploaded objects go where they should
and can be retrieved for 2.7 or 2.23 servers ok (the old and new version
servers agree about object placement)
More information about the openstack-discuss
mailing list