[Swift] Object replication failures on newly upgraded servers
Mark Kirkwood
mark.kirkwood at catalyst.net.nz
Wed Jun 30 01:32:54 UTC 2021
On 3/06/21 6:22 pm, Pete Zaitcev wrote:
> On Fri, 28 May 2021 16:58:10 +1200
> Mark Kirkwood <mark.kirkwood at catalyst.net.nz> wrote:
>
>> Examining the logs (/var/log/swift/object.log and /var/log/syslog) these
>> are not throwing up any red flags (i.e no failing rsyncs noted).
> You should be seeing tracebacks and "Error syncing partition",
> "Error syncing handoff partition", or "Exception in top-level
> replication loop".
>
Thanks Pete!
Debugging during the upgrade was tricky as there were clearly errors
being caused when each storage node was down being rebuilt. However the
upgrade process is now complete, so I'm looking at this more closely.
Picking on 1 storage node I do see a reasonable number (63 in the last 5
days) of:
Jun 30 04:07:29 cat-hlz-ostor003 object-server: Error syncing with node:
{'index': 2, u'replication_port': 6000, u'weight': 6.0, u'zone': 10,
u'ip': u'x.x.x.x', u'region': 10, u'id': 18, u'replication_ip':
u'x.x.x.x', u'meta': u'', u'device': u'obj03', u'port': 6000}: Timeout (60s)
So this looks like the source (or at least *one* source) of the issue -
also why I'm not seeing any failing rsyncs (as we are not getting that far).
Also seeing a small number (1 in the last 5 days) of:
Jun 30 06:40:34 cat-hlz-ostor003 object-server: Error syncing partition:
LockTimeout (10s) /srv/node/obj06/objects-20/544096/.lock
So, I need to figure out why we are timing out!
regards
Mark
More information about the openstack-discuss
mailing list