[Openstack] [swift] Object replication failure counts confusing in 2.7.0
Mark Kirkwood
mark.kirkwood at catalyst.net.nz
Tue May 17 05:43:36 UTC 2016
I'm looking at a freshly started 2 node, 6 device cluster, with only 1
object uploaded:
$ swift stat
Account: AUTH_592baa6ee20c491c984ae4a16af31aa6
Containers: 1
Objects: 1
Bytes: 524288000
Objects in policy "policy-0": 1
Bytes in policy "policy-0": 524288000
X-Account-Project-Domain-Id: default
Connection: keep-alive
Server: nginx/1.10.0
X-Timestamp: 1463461640.04395
X-Trans-Id: txe41582a92e344d7e9a32c-00573aa9a6
Content-Type: text/plain; charset=utf-8
Accept-Ranges: bytes
I'm seeing some replication errors in the object server log:
May 17 05:27:36 markir-dev-ostor001 object-server: Starting object
replication pass.
May 17 05:27:36 markir-dev-ostor001 object-server: 1/1 (100.00%)
partitions replicated in 0.03s (38.19/sec, 0s remaining)
May 17 05:27:36 markir-dev-ostor001 object-server: 2 successes, 0 failures
May 17 05:27:36 markir-dev-ostor001 object-server: 1 suffixes checked -
0.00% hashed, 0.00% synced
May 17 05:27:36 markir-dev-ostor001 object-server: Partition times: max
0.0210s, min 0.0210s, med 0.0210s
May 17 05:27:36 markir-dev-ostor001 object-server: Object replication
complete. (0.00 minutes)
May 17 05:27:36 markir-dev-ostor001 object-server: Replication sleeping
for 30 seconds.
May 17 05:27:40 markir-dev-ostor001 object-server: Begin object audit
"forever" mode (ALL)
May 17 05:27:40 markir-dev-ostor001 object-server: Begin object audit
"forever" mode (ZBF)
May 17 05:27:40 markir-dev-ostor001 object-server: Object audit (ZBF).
Since Tue May 17 05:27:40 2016: Locally: 1 passed, 0 quarantined, 0
errors, files/sec: 83.24, bytes/sec: 0.00, Total time: 0.01, Auditing
time: 0.00, Rate: 0.00
May 17 05:27:40 markir-dev-ostor001 object-server: Object audit (ZBF)
"forever" mode completed: 0.01s. Total quarantined: 0, Total errors: 0,
Total files/sec: 66.89, Total bytes/sec: 0.00, Auditing time: 0.01,
Rate: 0.75
May 17 05:27:45 markir-dev-ostor001 object-server: ::ffff:10.0.3.242 - -
[17/May/2016:05:27:45 +0000] "REPLICATE /1/899" 200 56 "-" "-"
"object-replicator 18131" 0.0014 "-" 29108 0
May 17 05:27:45 markir-dev-ostor001 object-server: ::ffff:10.0.3.242 - -
[17/May/2016:05:27:45 +0000] "REPLICATE /1/899" 200 56 "-" "-"
"object-replicator 18131" 0.0016 "-" 29109 0
May 17 05:28:06 markir-dev-ostor001 object-server: Starting object
replication pass.
May 17 05:28:06 markir-dev-ostor001 object-server: 1/1 (100.00%)
partitions replicated in 0.02s (49.85/sec, 0s remaining)
May 17 05:28:06 markir-dev-ostor001 object-server: 2 successes, 6
failures <==============================
May 17 05:28:06 markir-dev-ostor001 object-server: 1 suffixes checked -
0.00% hashed, 0.00% synced
May 17 05:28:06 markir-dev-ostor001 object-server: Partition times: max
0.0155s, min 0.0155s, med 0.0155s
May 17 05:28:06 markir-dev-ostor001 object-server: Object replication
complete. (0.00 minutes)
May 17 05:28:06 markir-dev-ostor001 object-server: Replication sleeping
for 30 seconds.
Yet, it appears that everything has been replicated fine (3 replicas):
$ sudo swift-ring-builder
/etc/swift/object.builder/etc/swift/object.builder, build version 8
1024 partitions, 3.000000 replicas, 1 regions, 1 zones, 6 devices, 0.00
balance, 0.00 dispersion
The minimum number of hours before a partition can be reassigned is 1
(0:00:00 remaining)
The overload factor is 0.00% (0.000000)
Ring file /etc/swift/object.ring.gz is up-to-date
Devices: id region zone ip address port replication ip
replication port name weight partitions balance flags meta
0 1 1 10.0.2.241 6000 10.0.3.241
6000 0 1.00 512 0.00
1 1 1 10.0.2.241 6000 10.0.3.241
6000 2 1.00 512 0.00
2 1 1 10.0.2.241 6000 10.0.3.241
6000 1 1.00 512 0.00
3 1 1 10.0.2.242 6000 10.0.3.242
6000 1 1.00 512 0.00
5 1 1 10.0.2.242 6000 10.0.3.242
6000 2 1.00 512 0.00
6 1 1 10.0.2.242 6000 10.0.3.242
6000 0 1.00 512 0.00
markir-dev-ostor001:~$ df -m
Filesystem 1M-blocks Used Available Use% Mounted on
/dev/loop0 1490 33 1458 3% /srv/node/0
/dev/loop1 1490 533 958 36% /srv/node/1
/dev/loop2 1490 33 1458 3% /srv/node/2
markir-dev-ostor002:~$ df -m
Filesystem 1M-blocks Used Available Use% Mounted on
/dev/loop0 1490 533 958 36% /srv/node/0
/dev/loop1 1490 33 1458 3% /srv/node/1
/dev/loop2 1490 533 958 36% /srv/node/2
Any thoughts? I wondered if the error counter was being overly
enthusiastic here, or is it that rsync is being retried (not seeing that
in the rsync logs tho)? .
Regards
Mark
More information about the Openstack
mailing list