[Openstack] Replication error
Piotr Kopec
pkopec17 at gmail.com
Thu Sep 26 09:19:43 UTC 2013
Replications relies on rsync. Check if rsync is working correctly on all
swift nodes.
If you can please provide me with account-server.conf, container-server.conf,
proxy-server.conf.
I had plenty of problems with replicators too, so I'll try to help you.
Regards
Piotr
P.S.
try out http://markdown-here.com/ while attaching .conf files. Just a
suggestion. :)
2013/9/26 Mike Preston <mike.preston at synety.com>
> I know it is poor form to reply to yourself, but I would appreciate it
> if anyone has any insight on this problem.****
>
> ** **
>
> *Mike Preston*
>
> Infrastructure Team | SYNETY****
>
> www.synety.com****
>
> ** **
>
> direct: 0116 424 4016****
>
> mobile: 07950 892038****
>
> main: 0116 424 4000****
>
> ** **
>
> ** **
>
> *From:* Mike Preston [mailto:mike.preston at synety.com]
> *Sent:* 24 September 2013 09:52
> *To:* openstack at lists.openstack.org
>
> *Subject:* Re: [Openstack] Replication error****
>
> ** **
>
> root at storage-proxy-01:~/swift# swift-ring-builder object.builder validate*
> ***
>
> root at storage-proxy-01:~/swift# echo $?****
>
> 0****
>
> ** **
>
> I ran md5sum on the ring files on both the proxy (where we generate them)
> and the nodes and confirmed that they are identical.****
>
> ** **
>
> root at storage-proxy-01:~/swift# swift-ring-builder object.builder****
>
> object.builder, build version 72****
>
> 65536 partitions, 3 replicas, 4 zones, 32 devices, 999.99 balance****
>
> The minimum number of hours before a partition can be reassigned is 3****
>
> Devices: id zone ip address port name weight partitions
> balance meta****
>
> 0 1 10.20.15.51 6000 sdb1 3000.00 7123
> 1.44****
>
> 1 1 10.20.15.51 6000 sdc1 3000.00 7123
> 1.44****
>
> 2 1 10.20.15.51 6000 sdd1 3000.00 7122
> 1.43****
>
> 3 1 10.20.15.51 6000 sde1 3000.00 7123
> 1.44****
>
> 4 1 10.20.15.51 6000 sdf1 3000.00 7122
> 1.43****
>
> 5 1 10.20.15.51 6000 sdg1 3000.00 7123
> 1.44****
>
> 6 3 10.20.15.51 6000 sdh1 0.00 1273
> 999.99****
>
> 7 3 10.20.15.51 6000 sdi1 0.00 1476
> 999.99****
>
> 8 2 10.20.15.52 6000 sdb1 3000.00 7122
> 1.43****
>
> 9 2 10.20.15.52 6000 sdc1 3000.00 7122
> 1.43****
>
> 10 2 10.20.15.52 6000 sdd1 3000.00 7122
> 1.43****
>
> 11 2 10.20.15.52 6000 sde1 3000.00 7122
> 1.43****
>
> 12 2 10.20.15.52 6000 sdf1 3000.00 7122
> 1.43****
>
> 13 2 10.20.15.52 6000 sdg1 3000.00 7122
> 1.43****
>
> 14 3 10.20.15.52 6000 sdh1 0.00 1378
> 999.99****
>
> 15 3 10.20.15.52 6000 sdi1 0.00 997
> 999.99****
>
> 16 3 10.20.15.53 6000 sas0 3000.00 6130
> -12.70****
>
> 17 3 10.20.15.53 6000 sas1 3000.00 6130
> -12.70****
>
> 18 3 10.20.15.53 6000 sas2 3000.00 6129
> -12.71****
>
> 19 3 10.20.15.53 6000 sas3 3000.00 6130
> -12.70****
>
> 20 3 10.20.15.53 6000 sas4 3000.00 6130
> -12.70****
>
> 21 3 10.20.15.53 6000 sas5 3000.00 6130
> -12.70****
>
> 22 3 10.20.15.53 6000 sas6 3000.00 6129
> -12.71****
>
> 23 3 10.20.15.53 6000 sas7 3000.00 6129
> -12.71****
>
> 24 4 10.20.15.54 6000 sas0 3000.00 7122
> 1.43****
>
> 25 4 10.20.15.54 6000 sas1 3000.00 7122
> 1.43****
>
> 26 4 10.20.15.54 6000 sas2 3000.00 7123
> 1.44****
>
> 27 4 10.20.15.54 6000 sas3 3000.00 7123
> 1.44****
>
> 28 4 10.20.15.54 6000 sas4 3000.00 7122
> 1.43****
>
> 29 4 10.20.15.54 6000 sas5 3000.00 7122
> 1.43****
>
> 30 4 10.20.15.54 6000 sas6 3000.00 7123
> 1.44****
>
> 31 4 10.20.15.54 6000 sas7 3000.00 7122
> 1.43****
>
> ** **
>
> (We are currently migrating data between boxes due to cluster hardware
> replacement, which is why zone 3 is weighted as such on the first 2 nodes)
> ****
>
> ** **
>
> Filelist attached (for the objects/ directory on the devices)… ****
>
> but I see nothing out of place.****
>
> ** **
>
> I’ll run a full fsck on the drives tonight, try to rule that out.****
>
> ** **
>
> Thanks for your help.****
>
> ** **
>
> ** **
>
> ** **
>
> *Mike Preston*
>
> Infrastructure Team | SYNETY****
>
> www.synety.com****
>
> ** **
>
> direct: 0116 424 4016****
>
> mobile: 07950 892038****
>
> main: 0116 424 4000****
>
> ** **
>
> ** **
>
> *From:* Clay Gerrard [mailto:clay.gerrard at gmail.com<clay.gerrard at gmail.com>]
>
> *Sent:* 23 September 2013 20:34
> *To:* Mike Preston
> *Cc:* openstack at lists.openstack.org
> *Subject:* Re: [Openstack] Replication error****
>
> ** **
>
> Run `swift-ring-builder /etc/swift/object.builder validate` - it should
> have no errors and exit 0. Can you provide a paste of the output from
> `swift-ring-builder /etc/swift/object.builder` as well - it should list
> some general info about the ring (number of replicas, and list of devices).
> Rebalance the ring and make sure it's been distributed to all nodes.****
>
> ** **
>
> The particular line you're seeing pop up in the traceback seems to be
> looking for all of the nodes for a particular partition it found in the
> objects' dir. I'm not seeing any local sanitization [1] around those top
> level directory names, so maybe it's just some garbage that created there
> outside of swift, or some file system corruption?****
>
> ** **
>
> Can you provide the output from `ls /srv/node/objects` (or wherever you
> have devices configured)****
>
> ** **
>
> -Clay****
>
> ** **
>
> 1. https://bugs.launchpad.net/swift/+bug/1229372****
>
> ** **
>
> On Mon, Sep 23, 2013 at 2:34 AM, Mike Preston <mike.preston at synety.com>
> wrote:****
>
> Hi, ****
>
> ****
>
> We are seeing a replication error on swift. The error only is seen on a
> single node, the other nodes appear to be working fine.****
>
> Installed version is debian wheezy with swift 1.4.8-2+deb7u1 ****
>
> Sep 23 10:33:03 storage-node-01 object-replicator Starting object
> replication pass.****
>
> Sep 23 10:33:03 storage-node-01 object-replicator Exception in top-level
> replication loop: #012Traceback (most recent call last):#012 File
> "/usr/lib/python2.7/dist-packages/swift/obj/replicator.py", line 564, in
> replicate#012 jobs = self.collect_jobs()#012 File
> "/usr/lib/python2.7/dist-packages/swift/obj/replicator.py", line 536, in
> collect_jobs#012 self.object_ring.get_part_nodes(int(partition))#012
> File "/usr/lib/python2.7/dist-packages/swift/common/ring/ring.py", line
> 103, in get_part_nodes#012 return [self.devs[r[part]] for r in
> self._replica2part2dev_id]#012IndexError: array index out of range****
>
> Sep 23 10:33:03 storage-node-01 object-replicator Nothing replicated for
> 0.728466033936 seconds.****
>
> Sep 23 10:33:03 storage-node-01 object-replicator Object replication
> complete. (0.01 minutes)****
>
> Can anyone shed any light on this or next steps in debugging it or fixing
> it?****
>
> ****
>
> ****
>
> ****
>
> *Mike Preston*****
>
> Infrastructure Team | SYNETY****
>
> www.synety.com****
>
> ****
>
> direct: 0116 424 4016****
>
> mobile: 07950 892038****
>
> main: 0116 424 4000****
>
> ****
>
> ****
>
>
> _______________________________________________
> Mailing list:
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to : openstack at lists.openstack.org
> Unsubscribe :
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack****
>
> ** **
>
> _______________________________________________
> Mailing list:
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to : openstack at lists.openstack.org
> Unsubscribe :
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20130926/d1540c8f/attachment.html>
More information about the Openstack
mailing list