Open Stack

Wed Jun 30 01:32:54 UTC 2021

On 3/06/21 6:22 pm, Pete Zaitcev wrote:

> On Fri, 28 May 2021 16:58:10 +1200
> Mark Kirkwood <mark.kirkwood at catalyst.net.nz> wrote:
>
>> Examining the logs (/var/log/swift/object.log and /var/log/syslog) these
>> are not throwing up any red flags (i.e no failing rsyncs noted).
> You should be seeing tracebacks and "Error syncing partition",
> "Error syncing handoff partition", or "Exception in top-level
> replication loop".
>

Thanks Pete!

Debugging during the upgrade was tricky as there were clearly errors 
being caused when each storage node was down being rebuilt. However the 
upgrade process is now complete, so I'm looking at this more closely.

Picking on 1 storage node I do see a reasonable number (63 in the last 5 
days) of:

Jun 30 04:07:29 cat-hlz-ostor003 object-server: Error syncing with node: 
{'index': 2, u'replication_port': 6000, u'weight': 6.0, u'zone': 10, 
u'ip': u'x.x.x.x', u'region': 10, u'id': 18, u'replication_ip': 
u'x.x.x.x', u'meta': u'', u'device': u'obj03', u'port': 6000}: Timeout (60s)

So this looks like the source (or at least *one* source) of the issue - 
also why I'm not seeing any failing rsyncs (as we are not getting that far).

Also seeing a small number (1 in the last 5 days) of:

Jun 30 06:40:34 cat-hlz-ostor003 object-server: Error syncing partition: 
LockTimeout (10s) /srv/node/obj06/objects-20/544096/.lock

So, I need to figure out why we are timing out!

regards

Mark

Open Stack

[Swift] Object replication failures on newly upgraded servers

OpenStack

Community

Documentation

Branding & Legal