[Openstack] [swift] Object replication failure counts confusing in 2.7.0

Mark Kirkwood mark.kirkwood at catalyst.net.nz
Wed May 18 04:46:05 UTC 2016


On 17/05/16 17:43, Mark Kirkwood wrote:
> (snippage)
>
> I'm seeing some replication errors in the object server log:
>
> May 17 05:27:36 markir-dev-ostor001 object-server: Starting object 
> replication pass.
> May 17 05:27:36 markir-dev-ostor001 object-server: 1/1 (100.00%) 
> partitions replicated in 0.03s (38.19/sec, 0s remaining)
> May 17 05:27:36 markir-dev-ostor001 object-server: 2 successes, 0 
> failures
> May 17 05:27:36 markir-dev-ostor001 object-server: 1 suffixes checked 
> - 0.00% hashed, 0.00% synced
> May 17 05:27:36 markir-dev-ostor001 object-server: Partition times: 
> max 0.0210s, min 0.0210s, med 0.0210s
> May 17 05:27:36 markir-dev-ostor001 object-server: Object replication 
> complete. (0.00 minutes)
> May 17 05:27:36 markir-dev-ostor001 object-server: Replication 
> sleeping for 30 seconds.
> May 17 05:27:40 markir-dev-ostor001 object-server: Begin object audit 
> "forever" mode (ALL)
> May 17 05:27:40 markir-dev-ostor001 object-server: Begin object audit 
> "forever" mode (ZBF)
> May 17 05:27:40 markir-dev-ostor001 object-server: Object audit (ZBF). 
> Since Tue May 17 05:27:40 2016: Locally: 1 passed, 0 quarantined, 0 
> errors, files/sec: 83.24, bytes/sec: 0.00, Total time: 0.01, Auditing 
> time: 0.00, Rate: 0.00
> May 17 05:27:40 markir-dev-ostor001 object-server: Object audit (ZBF) 
> "forever" mode completed: 0.01s. Total quarantined: 0, Total errors: 
> 0, Total files/sec: 66.89, Total bytes/sec: 0.00, Auditing time: 0.01, 
> Rate: 0.75
> May 17 05:27:45 markir-dev-ostor001 object-server: ::ffff:10.0.3.242 - 
> - [17/May/2016:05:27:45 +0000] "REPLICATE /1/899" 200 56 "-" "-" 
> "object-replicator 18131" 0.0014 "-" 29108 0
> May 17 05:27:45 markir-dev-ostor001 object-server: ::ffff:10.0.3.242 - 
> - [17/May/2016:05:27:45 +0000] "REPLICATE /1/899" 200 56 "-" "-" 
> "object-replicator 18131" 0.0016 "-" 29109 0
> May 17 05:28:06 markir-dev-ostor001 object-server: Starting object 
> replication pass.
> May 17 05:28:06 markir-dev-ostor001 object-server: 1/1 (100.00%) 
> partitions replicated in 0.02s (49.85/sec, 0s remaining)
> May 17 05:28:06 markir-dev-ostor001 object-server: 2 successes, 6 
> failures <==============================
> May 17 05:28:06 markir-dev-ostor001 object-server: 1 suffixes checked 
> - 0.00% hashed, 0.00% synced
> May 17 05:28:06 markir-dev-ostor001 object-server: Partition times: 
> max 0.0155s, min 0.0155s, med 0.0155s
> May 17 05:28:06 markir-dev-ostor001 object-server: Object replication 
> complete. (0.00 minutes)
> May 17 05:28:06 markir-dev-ostor001 object-server: Replication 
> sleeping for 30 seconds.
>
>

I've figured out one case:

Adding some debugging code and traceback gives more interesting output 
(see attached diff):

May 18 04:31:17 markir-dev-ostor002 object-server: object replication 
failure 4, detail Traceback (most recent call last):#012  File 
"/opt/cat/openstack/swift/local/lib/python2.7/site-packages/swift/obj/replicator.py", 
line 622, in build_replication_jobs#012 int(partition))#012ValueError: 
invalid literal for int() with base 10: 'auditor_status_ALL.json'#012

The code is doing:

|try: job_path = join(obj_path, partition) part_nodes = 
policy.object_ring.get_part_nodes( int(partition)) <=== 622 |

Looking at what is in my object dirs:

|$ ls /srv/node/2/objects/ 899 auditor_status_ALL.json |

||Yep, that's gotta hurt! We wither shouldn't be writing the audit json 
file there or should make the replicator code ignore it! Shall I raise 
an issue?

Cheers

Mark
-------------- next part --------------
A non-text attachment was scrubbed...
Name: replicator.py.diff
Type: text/x-patch
Size: 3774 bytes
Desc: not available
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20160518/4c064d57/attachment.bin>


More information about the Openstack mailing list