[Openstack] [swift] Object replication failure counts confusing in 2.7.0

Clay Gerrard clay.gerrard at gmail.com
Wed May 18 18:54:09 UTC 2016


Yeah that's a few undesirable behaviors there.

https://bugs.launchpad.net/swift/+bug/1583305

#willfix

On Tue, May 17, 2016 at 11:04 PM, Mark Kirkwood <
mark.kirkwood at catalyst.net.nz> wrote:

> On 17/05/16 17:43, Mark Kirkwood wrote:
>
>>
>> I'm seeing some replication errors in the object server log:
>>
>> May 17 05:27:36 markir-dev-ostor001 object-server: Starting object
>> replication pass.
>> May 17 05:27:36 markir-dev-ostor001 object-server: 1/1 (100.00%)
>> partitions replicated in 0.03s (38.19/sec, 0s remaining)
>> May 17 05:27:36 markir-dev-ostor001 object-server: 2 successes, 0 failures
>> May 17 05:27:36 markir-dev-ostor001 object-server: 1 suffixes checked -
>> 0.00% hashed, 0.00% synced
>> May 17 05:27:36 markir-dev-ostor001 object-server: Partition times: max
>> 0.0210s, min 0.0210s, med 0.0210s
>> May 17 05:27:36 markir-dev-ostor001 object-server: Object replication
>> complete. (0.00 minutes)
>> May 17 05:27:36 markir-dev-ostor001 object-server: Replication sleeping
>> for 30 seconds.
>> May 17 05:27:40 markir-dev-ostor001 object-server: Begin object audit
>> "forever" mode (ALL)
>> May 17 05:27:40 markir-dev-ostor001 object-server: Begin object audit
>> "forever" mode (ZBF)
>> May 17 05:27:40 markir-dev-ostor001 object-server: Object audit (ZBF).
>> Since Tue May 17 05:27:40 2016: Locally: 1 passed, 0 quarantined, 0 errors,
>> files/sec: 83.24, bytes/sec: 0.00, Total time: 0.01, Auditing time: 0.00,
>> Rate: 0.00
>> May 17 05:27:40 markir-dev-ostor001 object-server: Object audit (ZBF)
>> "forever" mode completed: 0.01s. Total quarantined: 0, Total errors: 0,
>> Total files/sec: 66.89, Total bytes/sec: 0.00, Auditing time: 0.01, Rate:
>> 0.75
>> May 17 05:27:45 markir-dev-ostor001 object-server: ::ffff:10.0.3.242 - -
>> [17/May/2016:05:27:45 +0000] "REPLICATE /1/899" 200 56 "-" "-"
>> "object-replicator 18131" 0.0014 "-" 29108 0
>> May 17 05:27:45 markir-dev-ostor001 object-server: ::ffff:10.0.3.242 - -
>> [17/May/2016:05:27:45 +0000] "REPLICATE /1/899" 200 56 "-" "-"
>> "object-replicator 18131" 0.0016 "-" 29109 0
>> May 17 05:28:06 markir-dev-ostor001 object-server: Starting object
>> replication pass.
>> May 17 05:28:06 markir-dev-ostor001 object-server: 1/1 (100.00%)
>> partitions replicated in 0.02s (49.85/sec, 0s remaining)
>> May 17 05:28:06 markir-dev-ostor001 object-server: 2 successes, 6
>> failures <==============================
>> May 17 05:28:06 markir-dev-ostor001 object-server: 1 suffixes checked -
>> 0.00% hashed, 0.00% synced
>> May 17 05:28:06 markir-dev-ostor001 object-server: Partition times: max
>> 0.0155s, min 0.0155s, med 0.0155s
>> May 17 05:28:06 markir-dev-ostor001 object-server: Object replication
>> complete. (0.00 minutes)
>> May 17 05:28:06 markir-dev-ostor001 object-server: Replication sleeping
>> for 30 seconds.
>>
>
>
> The other case is (bit more debugging, but trivial so will inline it):
>
> Log:
> May 18 05:59:50 markir-dev-ostor002 object-server: object replication
> failure 1 detail no error
> May 18 05:59:50 markir-dev-ostor002 object-server: object replication
> failure 1 detail no error
> May 18 05:59:50 markir-dev-ostor002 object-server: 2/2 (100.00%)
> partitions replicated in 0.04s (47.15/sec, 0s remaining)
> May 18 05:59:50 markir-dev-ostor002 object-server: 4 successes, 12 failures
>
>
> Code (around line 492 of replication.py):
>         except (Exception, Timeout):
>             trace = traceback.format_exc()
>             failure_devs_info.update(target_devs_info)
>             self.logger.exception(_("Error syncing partition"))
>         else:
>             trace = "no error"
>         finally:
>             self.stats['success'] += len(target_devs_info -
> failure_devs_info)
>             self.logger.warning('object replication failure 1 detail %s',
> trace)
> self._add_failure_stats(failure_devs_info) <===============
>             self.partition_times.append(time.time() - begin)
> self.logger.timing_since('partition.update.timing', begin)
>
>
> That 'finally' is gonna increment the error count even if there is no
> exception I think, probably should check if an exception actually occurred!
>
> Cheers
>
>
> Mark
>
>
> _______________________________________________
> Mailing list:
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to     : openstack at lists.openstack.org
> Unsubscribe :
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20160518/9b917a0f/attachment.html>


More information about the Openstack mailing list