[Openstack] [swift] Object replication failure counts confusing in 2.7.0

Mark Kirkwood mark.kirkwood at catalyst.net.nz
Tue May 17 05:43:36 UTC 2016


I'm looking at a freshly started 2 node, 6 device cluster, with only 1 
object uploaded:

$ swift stat
                      Account: AUTH_592baa6ee20c491c984ae4a16af31aa6
                   Containers: 1
                      Objects: 1
                        Bytes: 524288000
Objects in policy "policy-0": 1
   Bytes in policy "policy-0": 524288000
  X-Account-Project-Domain-Id: default
                   Connection: keep-alive
                       Server: nginx/1.10.0
                  X-Timestamp: 1463461640.04395
                   X-Trans-Id: txe41582a92e344d7e9a32c-00573aa9a6
                 Content-Type: text/plain; charset=utf-8
                Accept-Ranges: bytes

I'm seeing some replication errors in the object server log:

May 17 05:27:36 markir-dev-ostor001 object-server: Starting object 
replication pass.
May 17 05:27:36 markir-dev-ostor001 object-server: 1/1 (100.00%) 
partitions replicated in 0.03s (38.19/sec, 0s remaining)
May 17 05:27:36 markir-dev-ostor001 object-server: 2 successes, 0 failures
May 17 05:27:36 markir-dev-ostor001 object-server: 1 suffixes checked - 
0.00% hashed, 0.00% synced
May 17 05:27:36 markir-dev-ostor001 object-server: Partition times: max 
0.0210s, min 0.0210s, med 0.0210s
May 17 05:27:36 markir-dev-ostor001 object-server: Object replication 
complete. (0.00 minutes)
May 17 05:27:36 markir-dev-ostor001 object-server: Replication sleeping 
for 30 seconds.
May 17 05:27:40 markir-dev-ostor001 object-server: Begin object audit 
"forever" mode (ALL)
May 17 05:27:40 markir-dev-ostor001 object-server: Begin object audit 
"forever" mode (ZBF)
May 17 05:27:40 markir-dev-ostor001 object-server: Object audit (ZBF). 
Since Tue May 17 05:27:40 2016: Locally: 1 passed, 0 quarantined, 0 
errors, files/sec: 83.24, bytes/sec: 0.00, Total time: 0.01, Auditing 
time: 0.00, Rate: 0.00
May 17 05:27:40 markir-dev-ostor001 object-server: Object audit (ZBF) 
"forever" mode completed: 0.01s. Total quarantined: 0, Total errors: 0, 
Total files/sec: 66.89, Total bytes/sec: 0.00, Auditing time: 0.01, 
Rate: 0.75
May 17 05:27:45 markir-dev-ostor001 object-server: ::ffff:10.0.3.242 - - 
[17/May/2016:05:27:45 +0000] "REPLICATE /1/899" 200 56 "-" "-" 
"object-replicator 18131" 0.0014 "-" 29108 0
May 17 05:27:45 markir-dev-ostor001 object-server: ::ffff:10.0.3.242 - - 
[17/May/2016:05:27:45 +0000] "REPLICATE /1/899" 200 56 "-" "-" 
"object-replicator 18131" 0.0016 "-" 29109 0
May 17 05:28:06 markir-dev-ostor001 object-server: Starting object 
replication pass.
May 17 05:28:06 markir-dev-ostor001 object-server: 1/1 (100.00%) 
partitions replicated in 0.02s (49.85/sec, 0s remaining)
May 17 05:28:06 markir-dev-ostor001 object-server: 2 successes, 6 
failures <==============================
May 17 05:28:06 markir-dev-ostor001 object-server: 1 suffixes checked - 
0.00% hashed, 0.00% synced
May 17 05:28:06 markir-dev-ostor001 object-server: Partition times: max 
0.0155s, min 0.0155s, med 0.0155s
May 17 05:28:06 markir-dev-ostor001 object-server: Object replication 
complete. (0.00 minutes)
May 17 05:28:06 markir-dev-ostor001 object-server: Replication sleeping 
for 30 seconds.


Yet, it appears that everything has been replicated fine (3 replicas):

$ sudo swift-ring-builder 
/etc/swift/object.builder/etc/swift/object.builder, build version 8
1024 partitions, 3.000000 replicas, 1 regions, 1 zones, 6 devices, 0.00 
balance, 0.00 dispersion
The minimum number of hours before a partition can be reassigned is 1 
(0:00:00 remaining)
The overload factor is 0.00% (0.000000)
Ring file /etc/swift/object.ring.gz is up-to-date
Devices:    id  region  zone      ip address  port  replication ip  
replication port      name weight partitions balance flags meta
              0       1     1 10.0.2.241  6000 10.0.3.241              
6000         0 1.00        512    0.00
              1       1     1 10.0.2.241  6000 10.0.3.241              
6000         2 1.00        512    0.00
              2       1     1 10.0.2.241  6000 10.0.3.241              
6000         1 1.00        512    0.00
              3       1     1 10.0.2.242  6000 10.0.3.242              
6000         1 1.00        512    0.00
              5       1     1 10.0.2.242  6000 10.0.3.242              
6000         2 1.00        512    0.00
              6       1     1 10.0.2.242  6000 10.0.3.242              
6000         0 1.00        512    0.00


markir-dev-ostor001:~$ df -m
Filesystem     1M-blocks  Used Available Use% Mounted on
/dev/loop0          1490    33      1458   3% /srv/node/0
/dev/loop1          1490   533       958  36% /srv/node/1
/dev/loop2          1490    33      1458   3% /srv/node/2

markir-dev-ostor002:~$ df -m
Filesystem     1M-blocks  Used Available Use% Mounted on
/dev/loop0          1490   533       958  36% /srv/node/0
/dev/loop1          1490    33      1458   3% /srv/node/1
/dev/loop2          1490   533       958  36% /srv/node/2


Any thoughts? I wondered if the error counter was being overly 
enthusiastic here, or is it that rsync is being retried (not seeing that 
in the rsync logs tho)? .

Regards

Mark





More information about the Openstack mailing list