[Openstack] Replication error

Piotr Kopec pkopec17 at gmail.com
Thu Sep 26 09:19:43 UTC 2013


Replications relies on rsync. Check if rsync is working correctly on all
swift nodes.
If you can please provide me with account-server.conf, container-server.conf,
proxy-server.conf.
I had plenty of problems with replicators too, so I'll try to help you.

Regards
Piotr

P.S.

try out http://markdown-here.com/ while attaching .conf files. Just a
suggestion. :)


2013/9/26 Mike Preston <mike.preston at synety.com>

>  I know it is poor form to reply to yourself, but I would appreciate it
> if anyone has any insight on this problem.****
>
> ** **
>
> *Mike Preston*
>
> Infrastructure Team  |  SYNETY****
>
> www.synety.com****
>
> ** **
>
> direct: 0116 424 4016****
>
> mobile: 07950 892038****
>
> main: 0116 424 4000****
>
> ** **
>
> ** **
>
> *From:* Mike Preston [mailto:mike.preston at synety.com]
> *Sent:* 24 September 2013 09:52
> *To:* openstack at lists.openstack.org
>
> *Subject:* Re: [Openstack] Replication error****
>
>  ** **
>
> root at storage-proxy-01:~/swift# swift-ring-builder object.builder validate*
> ***
>
> root at storage-proxy-01:~/swift# echo $?****
>
> 0****
>
> ** **
>
> I ran md5sum on the ring files on both the proxy (where we generate them)
> and the nodes and confirmed that they are identical.****
>
> ** **
>
> root at storage-proxy-01:~/swift# swift-ring-builder object.builder****
>
> object.builder, build version 72****
>
> 65536 partitions, 3 replicas, 4 zones, 32 devices, 999.99 balance****
>
> The minimum number of hours before a partition can be reassigned is 3****
>
> Devices:    id  zone      ip address  port      name weight partitions
> balance meta****
>
>              0     1     10.20.15.51  6000      sdb1 3000.00       7123
> 1.44****
>
>              1     1     10.20.15.51  6000      sdc1 3000.00       7123
> 1.44****
>
>              2     1     10.20.15.51  6000      sdd1 3000.00       7122
> 1.43****
>
>              3     1     10.20.15.51  6000      sde1 3000.00       7123
> 1.44****
>
>              4     1     10.20.15.51  6000      sdf1 3000.00       7122
> 1.43****
>
>              5     1     10.20.15.51  6000      sdg1 3000.00       7123
> 1.44****
>
>              6     3     10.20.15.51  6000      sdh1   0.00       1273
> 999.99****
>
>              7     3     10.20.15.51  6000      sdi1   0.00       1476
> 999.99****
>
>              8     2     10.20.15.52  6000      sdb1 3000.00       7122
> 1.43****
>
>              9     2     10.20.15.52  6000      sdc1 3000.00       7122
> 1.43****
>
>             10     2     10.20.15.52  6000      sdd1 3000.00       7122
> 1.43****
>
>             11     2     10.20.15.52  6000      sde1 3000.00       7122
> 1.43****
>
>             12     2     10.20.15.52  6000      sdf1 3000.00       7122
> 1.43****
>
>             13     2     10.20.15.52  6000      sdg1 3000.00       7122
> 1.43****
>
>             14     3     10.20.15.52  6000      sdh1   0.00       1378
> 999.99****
>
>             15     3     10.20.15.52  6000      sdi1   0.00        997
> 999.99****
>
>             16     3     10.20.15.53  6000      sas0 3000.00       6130
> -12.70****
>
>             17     3     10.20.15.53  6000      sas1 3000.00       6130
> -12.70****
>
>             18     3     10.20.15.53  6000      sas2 3000.00       6129
> -12.71****
>
>             19     3     10.20.15.53  6000      sas3 3000.00       6130
> -12.70****
>
>             20     3     10.20.15.53  6000      sas4 3000.00       6130
> -12.70****
>
>             21     3     10.20.15.53  6000      sas5 3000.00       6130
> -12.70****
>
>             22     3     10.20.15.53  6000      sas6 3000.00       6129
> -12.71****
>
>             23     3     10.20.15.53  6000      sas7 3000.00       6129
> -12.71****
>
>             24     4     10.20.15.54  6000      sas0 3000.00       7122
> 1.43****
>
>             25     4     10.20.15.54  6000      sas1 3000.00       7122
> 1.43****
>
>             26     4     10.20.15.54  6000      sas2 3000.00       7123
> 1.44****
>
>             27     4     10.20.15.54  6000      sas3 3000.00       7123
> 1.44****
>
>             28     4     10.20.15.54  6000      sas4 3000.00       7122
> 1.43****
>
>             29     4     10.20.15.54  6000      sas5 3000.00       7122
> 1.43****
>
>             30     4     10.20.15.54  6000      sas6 3000.00       7123
> 1.44****
>
>             31     4     10.20.15.54  6000      sas7 3000.00       7122
> 1.43****
>
> ** **
>
> (We are currently migrating data between boxes due to cluster hardware
> replacement, which is why zone 3 is weighted as such on the first 2 nodes)
> ****
>
> ** **
>
> Filelist attached (for the objects/ directory on the devices)… ****
>
> but I see nothing out of place.****
>
> ** **
>
> I’ll run a full fsck on the drives tonight, try to rule that out.****
>
> ** **
>
> Thanks for your help.****
>
> ** **
>
> ** **
>
> ** **
>
> *Mike Preston*
>
> Infrastructure Team  |  SYNETY****
>
> www.synety.com****
>
> ** **
>
> direct: 0116 424 4016****
>
> mobile: 07950 892038****
>
> main: 0116 424 4000****
>
> ** **
>
> ** **
>
> *From:* Clay Gerrard [mailto:clay.gerrard at gmail.com<clay.gerrard at gmail.com>]
>
> *Sent:* 23 September 2013 20:34
> *To:* Mike Preston
> *Cc:* openstack at lists.openstack.org
> *Subject:* Re: [Openstack] Replication error****
>
> ** **
>
> Run `swift-ring-builder /etc/swift/object.builder validate` - it should
> have no errors and exit 0.  Can you provide a paste of the output from
> `swift-ring-builder /etc/swift/object.builder` as well - it should list
> some general info about the ring (number of replicas, and list of devices).
>  Rebalance the ring and make sure it's been distributed to all nodes.****
>
> ** **
>
> The particular line you're seeing pop up in the traceback seems to be
> looking for all of the nodes for a particular partition it found in the
> objects' dir.  I'm not seeing any local sanitization [1] around those top
> level directory names, so maybe it's just some garbage that created there
> outside of swift, or some file system corruption?****
>
> ** **
>
> Can you provide the output from `ls /srv/node/objects` (or wherever you
> have devices configured)****
>
> ** **
>
> -Clay****
>
> ** **
>
> 1. https://bugs.launchpad.net/swift/+bug/1229372****
>
> ** **
>
> On Mon, Sep 23, 2013 at 2:34 AM, Mike Preston <mike.preston at synety.com>
> wrote:****
>
> Hi, ****
>
>  ****
>
> We are seeing a replication error on swift. The error only is seen on a
> single node, the other nodes appear to be working fine.****
>
> Installed version is debian wheezy with swift 1.4.8-2+deb7u1 ****
>
> Sep 23 10:33:03 storage-node-01 object-replicator Starting object
> replication pass.****
>
> Sep 23 10:33:03 storage-node-01 object-replicator Exception in top-level
> replication loop: #012Traceback (most recent call last):#012  File
> "/usr/lib/python2.7/dist-packages/swift/obj/replicator.py", line 564, in
> replicate#012    jobs = self.collect_jobs()#012  File
> "/usr/lib/python2.7/dist-packages/swift/obj/replicator.py", line 536, in
> collect_jobs#012    self.object_ring.get_part_nodes(int(partition))#012
> File "/usr/lib/python2.7/dist-packages/swift/common/ring/ring.py", line
> 103, in get_part_nodes#012    return [self.devs[r[part]] for r in
> self._replica2part2dev_id]#012IndexError: array index out of range****
>
> Sep 23 10:33:03 storage-node-01 object-replicator Nothing replicated for
> 0.728466033936 seconds.****
>
> Sep 23 10:33:03 storage-node-01 object-replicator Object replication
> complete. (0.01 minutes)****
>
> Can anyone shed any light on this or next steps in debugging it or fixing
> it?****
>
>  ****
>
>  ****
>
>  ****
>
> *Mike Preston*****
>
> Infrastructure Team  |  SYNETY****
>
> www.synety.com****
>
>  ****
>
> direct: 0116 424 4016****
>
> mobile: 07950 892038****
>
> main: 0116 424 4000****
>
>  ****
>
>  ****
>
>
> _______________________________________________
> Mailing list:
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to     : openstack at lists.openstack.org
> Unsubscribe :
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack****
>
> ** **
>
> _______________________________________________
> Mailing list:
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to     : openstack at lists.openstack.org
> Unsubscribe :
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20130926/d1540c8f/attachment.html>


More information about the Openstack mailing list