[Openstack] Replication error
Mike Preston
mike.preston at synety.com
Mon Sep 30 07:57:46 UTC 2013
Sure,
Rsync appears to be fine on all nodes.
Proxy-server.conf
[DEFAULT]
cert_file = /etc/swift/cert.crt
key_file = /etc/swift/cert.key
bind_port = 8080
workers = 8
user = swift
memcache_servers = 10.20.15.50:11211
[pipeline:main]
pipeline = healthcheck cache tempauth proxy-server
[app:proxy-server]
use = egg:swift#proxy
allow_account_management = true
account_autocreate = true
node_timeout = 30
conn_timeout = 1
recheck_account_existence = 240
recheck_container_existence = 240
[filter:tempauth]
use = egg:swift#tempauth
set log_name = tempauth
set log_facility = LOG_LOCAL0
set log_level = DEBUG
<!-various tempauth users -->
[filter:healthcheck]
use = egg:swift#healthcheck
[filter:cache]
use = egg:swift#memcache
memcache_servers = 10.20.15.50:11211
Object-server.conf
[DEFAULT]
bind_ip = 0.0.0.0
workers = 8
[pipeline:main]
pipeline = recon object-server
[app:object-server]
use = egg:swift#object
[object-replicator]
run_pause = 30
concurrency = 1
[object-updater]
[object-auditor]
files_per_second = 2
bytes_per_second = 1500000
zero_byte_files_per_second = 20
[filter:recon]
use = egg:swift#recon
recon_cache_path = /var/cache/swift
Account-server.conf
[DEFAULT]
bind_ip = 0.0.0.0
workers = 2
[pipeline:main]
pipeline = recon account-server
[app:account-server]
use = egg:swift#account
[account-replicator]
[account-auditor]
accounts_per_second=5
[account-reaper]
[filter:recon]
use = egg:swift#recon
recon_cache_path = /var/cache/swift
Container-server.conf
[DEFAULT]
bind_ip = 0.0.0.0
workers = 2
[pipeline:main]
pipeline = recon container-server
[app:container-server]
use = egg:swift#container
[container-replicator]
[container-updater]
[container-auditor]
containers_per_second=5
[filter:recon]
use = egg:swift#recon
recon_cache_path = /var/cache/swift
rsyncd.conf
uid = swift
gid = swift
log file = /var/log/rsyncd.log
pid file = /var/run/rsyncd.pid
address = 10.20.15.51
[account]
max connections = 50
path = /srv/node/
read only = false
lock file = /var/lock/account.lock
[container]
max connections = 50
path = /srv/node/
read only = false
lock file = /var/lock/container.lock
[object]
max connections = 50
path = /srv/node/
read only = false
lock file = /var/lock/object.lock
Mike Preston
Infrastructure Team | SYNETY
www.synety.com<http://www.synety.com>
direct: 0116 424 4016
mobile: 07950 892038
main: 0116 424 4000
From: Piotr Kopec [mailto:pkopec17 at gmail.com]
Sent: 26 September 2013 10:20
To: Mike Preston
Cc: openstack at lists.openstack.org
Subject: Re: [Openstack] Replication error
Replications relies on rsync. Check if rsync is working correctly on all swift nodes.
If you can please provide me with account-server.conf, container-server.conf, proxy-server.conf.
I had plenty of problems with replicators too, so I'll try to help you.
Regards
Piotr
P.S.
try out http://markdown-here.com/ while attaching .conf files. Just a suggestion. :)
2013/9/26 Mike Preston <mike.preston at synety.com<mailto:mike.preston at synety.com>>
I know it is poor form to reply to yourself, but I would appreciate it if anyone has any insight on this problem.
Mike Preston
Infrastructure Team | SYNETY
www.synety.com<http://www.synety.com>
direct: 0116 424 4016
mobile: 07950 892038
main: 0116 424 4000
From: Mike Preston [mailto:mike.preston at synety.com<mailto:mike.preston at synety.com>]
Sent: 24 September 2013 09:52
To: openstack at lists.openstack.org<mailto:openstack at lists.openstack.org>
Subject: Re: [Openstack] Replication error
root at storage-proxy-01:~/swift# swift-ring-builder object.builder validate
root at storage-proxy-01:~/swift# echo $?
0
I ran md5sum on the ring files on both the proxy (where we generate them) and the nodes and confirmed that they are identical.
root at storage-proxy-01:~/swift# swift-ring-builder object.builder
object.builder, build version 72
65536 partitions, 3 replicas, 4 zones, 32 devices, 999.99 balance
The minimum number of hours before a partition can be reassigned is 3
Devices: id zone ip address port name weight partitions balance meta
0 1 10.20.15.51 6000 sdb1 3000.00 7123 1.44
1 1 10.20.15.51 6000 sdc1 3000.00 7123 1.44
2 1 10.20.15.51 6000 sdd1 3000.00 7122 1.43
3 1 10.20.15.51 6000 sde1 3000.00 7123 1.44
4 1 10.20.15.51 6000 sdf1 3000.00 7122 1.43
5 1 10.20.15.51 6000 sdg1 3000.00 7123 1.44
6 3 10.20.15.51 6000 sdh1 0.00 1273 999.99
7 3 10.20.15.51 6000 sdi1 0.00 1476 999.99
8 2 10.20.15.52 6000 sdb1 3000.00 7122 1.43
9 2 10.20.15.52 6000 sdc1 3000.00 7122 1.43
10 2 10.20.15.52 6000 sdd1 3000.00 7122 1.43
11 2 10.20.15.52 6000 sde1 3000.00 7122 1.43
12 2 10.20.15.52 6000 sdf1 3000.00 7122 1.43
13 2 10.20.15.52 6000 sdg1 3000.00 7122 1.43
14 3 10.20.15.52 6000 sdh1 0.00 1378 999.99
15 3 10.20.15.52 6000 sdi1 0.00 997 999.99
16 3 10.20.15.53 6000 sas0 3000.00 6130 -12.70
17 3 10.20.15.53 6000 sas1 3000.00 6130 -12.70
18 3 10.20.15.53 6000 sas2 3000.00 6129 -12.71
19 3 10.20.15.53 6000 sas3 3000.00 6130 -12.70
20 3 10.20.15.53 6000 sas4 3000.00 6130 -12.70
21 3 10.20.15.53 6000 sas5 3000.00 6130 -12.70
22 3 10.20.15.53 6000 sas6 3000.00 6129 -12.71
23 3 10.20.15.53 6000 sas7 3000.00 6129 -12.71
24 4 10.20.15.54 6000 sas0 3000.00 7122 1.43
25 4 10.20.15.54 6000 sas1 3000.00 7122 1.43
26 4 10.20.15.54 6000 sas2 3000.00 7123 1.44
27 4 10.20.15.54 6000 sas3 3000.00 7123 1.44
28 4 10.20.15.54 6000 sas4 3000.00 7122 1.43
29 4 10.20.15.54 6000 sas5 3000.00 7122 1.43
30 4 10.20.15.54 6000 sas6 3000.00 7123 1.44
31 4 10.20.15.54 6000 sas7 3000.00 7122 1.43
(We are currently migrating data between boxes due to cluster hardware replacement, which is why zone 3 is weighted as such on the first 2 nodes)
Filelist attached (for the objects/ directory on the devices)...
but I see nothing out of place.
I'll run a full fsck on the drives tonight, try to rule that out.
Thanks for your help.
Mike Preston
Infrastructure Team | SYNETY
www.synety.com<http://www.synety.com>
direct: 0116 424 4016
mobile: 07950 892038
main: 0116 424 4000
From: Clay Gerrard [mailto:clay.gerrard at gmail.com]
Sent: 23 September 2013 20:34
To: Mike Preston
Cc: openstack at lists.openstack.org<mailto:openstack at lists.openstack.org>
Subject: Re: [Openstack] Replication error
Run `swift-ring-builder /etc/swift/object.builder validate` - it should have no errors and exit 0. Can you provide a paste of the output from `swift-ring-builder /etc/swift/object.builder` as well - it should list some general info about the ring (number of replicas, and list of devices). Rebalance the ring and make sure it's been distributed to all nodes.
The particular line you're seeing pop up in the traceback seems to be looking for all of the nodes for a particular partition it found in the objects' dir. I'm not seeing any local sanitization [1] around those top level directory names, so maybe it's just some garbage that created there outside of swift, or some file system corruption?
Can you provide the output from `ls /srv/node/objects` (or wherever you have devices configured)
-Clay
1. https://bugs.launchpad.net/swift/+bug/1229372
On Mon, Sep 23, 2013 at 2:34 AM, Mike Preston <mike.preston at synety.com<mailto:mike.preston at synety.com>> wrote:
Hi,
We are seeing a replication error on swift. The error only is seen on a single node, the other nodes appear to be working fine.
Installed version is debian wheezy with swift 1.4.8-2+deb7u1
Sep 23 10:33:03 storage-node-01 object-replicator Starting object replication pass.
Sep 23 10:33:03 storage-node-01 object-replicator Exception in top-level replication loop: #012Traceback (most recent call last):#012 File "/usr/lib/python2.7/dist-packages/swift/obj/replicator.py", line 564, in replicate#012 jobs = self.collect_jobs()#012 File "/usr/lib/python2.7/dist-packages/swift/obj/replicator.py", line 536, in collect_jobs#012 self.object_ring.get_part_nodes(int(partition))#012 File "/usr/lib/python2.7/dist-packages/swift/common/ring/ring.py", line 103, in get_part_nodes#012 return [self.devs[r[part]] for r in self._replica2part2dev_id]#012IndexError: array index out of range
Sep 23 10:33:03 storage-node-01 object-replicator Nothing replicated for 0.728466033936 seconds.
Sep 23 10:33:03 storage-node-01 object-replicator Object replication complete. (0.01 minutes)
Can anyone shed any light on this or next steps in debugging it or fixing it?
Mike Preston
Infrastructure Team | SYNETY
www.synety.com<http://www.synety.com>
direct: 0116 424 4016
mobile: 07950 892038
main: 0116 424 4000
_______________________________________________
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack at lists.openstack.org<mailto:openstack at lists.openstack.org>
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
_______________________________________________
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack at lists.openstack.org<mailto:openstack at lists.openstack.org>
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20130930/e363f0be/attachment.html>
More information about the Openstack
mailing list