[Openstack] Replication error

Mike Preston mike.preston at synety.com
Mon Sep 30 07:57:46 UTC 2013


Sure,

Rsync appears to be fine on all nodes.


Proxy-server.conf

[DEFAULT]
cert_file = /etc/swift/cert.crt
key_file = /etc/swift/cert.key
bind_port = 8080
workers = 8
user = swift
memcache_servers = 10.20.15.50:11211

[pipeline:main]
pipeline = healthcheck cache tempauth proxy-server

[app:proxy-server]
use = egg:swift#proxy
allow_account_management = true
account_autocreate = true
node_timeout = 30
conn_timeout = 1
recheck_account_existence = 240
recheck_container_existence = 240

[filter:tempauth]
use = egg:swift#tempauth

set log_name = tempauth
set log_facility = LOG_LOCAL0
set log_level = DEBUG

<!-various tempauth users -->

[filter:healthcheck]
use = egg:swift#healthcheck

[filter:cache]
use = egg:swift#memcache
memcache_servers = 10.20.15.50:11211


Object-server.conf

[DEFAULT]
bind_ip = 0.0.0.0
workers = 8

[pipeline:main]
pipeline = recon object-server

[app:object-server]
use = egg:swift#object

[object-replicator]
run_pause = 30
concurrency = 1

[object-updater]

[object-auditor]
files_per_second = 2
bytes_per_second = 1500000
zero_byte_files_per_second = 20

[filter:recon]
use = egg:swift#recon
recon_cache_path = /var/cache/swift


Account-server.conf

[DEFAULT]
bind_ip = 0.0.0.0
workers = 2

[pipeline:main]
pipeline = recon account-server

[app:account-server]
use = egg:swift#account

[account-replicator]

[account-auditor]
accounts_per_second=5

[account-reaper]

[filter:recon]
use = egg:swift#recon
recon_cache_path = /var/cache/swift


Container-server.conf

[DEFAULT]
bind_ip = 0.0.0.0
workers = 2

[pipeline:main]
pipeline = recon container-server

[app:container-server]
use = egg:swift#container

[container-replicator]

[container-updater]

[container-auditor]
containers_per_second=5

[filter:recon]
use = egg:swift#recon
recon_cache_path = /var/cache/swift


rsyncd.conf

uid = swift
gid = swift
log file = /var/log/rsyncd.log
pid file = /var/run/rsyncd.pid
address = 10.20.15.51

[account]
max connections = 50
path = /srv/node/
read only = false
lock file = /var/lock/account.lock

[container]
max connections = 50
path = /srv/node/
read only = false
lock file = /var/lock/container.lock

[object]
max connections = 50
path = /srv/node/
read only = false
lock file = /var/lock/object.lock


Mike Preston
Infrastructure Team  |  SYNETY
www.synety.com<http://www.synety.com>

direct: 0116 424 4016
mobile: 07950 892038
main: 0116 424 4000


From: Piotr Kopec [mailto:pkopec17 at gmail.com]
Sent: 26 September 2013 10:20
To: Mike Preston
Cc: openstack at lists.openstack.org
Subject: Re: [Openstack] Replication error


Replications relies on rsync. Check if rsync is working correctly on all swift nodes.
If you can please provide me with account-server.conf, container-server.conf, proxy-server.conf.
I had plenty of problems with replicators too, so I'll try to help you.

Regards
Piotr

P.S.

try out http://markdown-here.com/ while attaching .conf files. Just a suggestion. :)

2013/9/26 Mike Preston <mike.preston at synety.com<mailto:mike.preston at synety.com>>
I know it is poor form to reply to yourself, but I would appreciate it if anyone has any insight on this problem.

Mike Preston
Infrastructure Team  |  SYNETY
www.synety.com<http://www.synety.com>

direct: 0116 424 4016
mobile: 07950 892038
main: 0116 424 4000


From: Mike Preston [mailto:mike.preston at synety.com<mailto:mike.preston at synety.com>]
Sent: 24 September 2013 09:52
To: openstack at lists.openstack.org<mailto:openstack at lists.openstack.org>

Subject: Re: [Openstack] Replication error

root at storage-proxy-01:~/swift# swift-ring-builder object.builder validate
root at storage-proxy-01:~/swift# echo $?
0

I ran md5sum on the ring files on both the proxy (where we generate them) and the nodes and confirmed that they are identical.

root at storage-proxy-01:~/swift# swift-ring-builder object.builder
object.builder, build version 72
65536 partitions, 3 replicas, 4 zones, 32 devices, 999.99 balance
The minimum number of hours before a partition can be reassigned is 3
Devices:    id  zone      ip address  port      name weight partitions balance meta
             0     1     10.20.15.51  6000      sdb1 3000.00       7123    1.44
             1     1     10.20.15.51  6000      sdc1 3000.00       7123    1.44
             2     1     10.20.15.51  6000      sdd1 3000.00       7122    1.43
             3     1     10.20.15.51  6000      sde1 3000.00       7123    1.44
             4     1     10.20.15.51  6000      sdf1 3000.00       7122    1.43
             5     1     10.20.15.51  6000      sdg1 3000.00       7123    1.44
             6     3     10.20.15.51  6000      sdh1   0.00       1273  999.99
             7     3     10.20.15.51  6000      sdi1   0.00       1476  999.99
             8     2     10.20.15.52  6000      sdb1 3000.00       7122    1.43
             9     2     10.20.15.52  6000      sdc1 3000.00       7122    1.43
            10     2     10.20.15.52  6000      sdd1 3000.00       7122    1.43
            11     2     10.20.15.52  6000      sde1 3000.00       7122    1.43
            12     2     10.20.15.52  6000      sdf1 3000.00       7122    1.43
            13     2     10.20.15.52  6000      sdg1 3000.00       7122    1.43
            14     3     10.20.15.52  6000      sdh1   0.00       1378  999.99
            15     3     10.20.15.52  6000      sdi1   0.00        997  999.99
            16     3     10.20.15.53  6000      sas0 3000.00       6130  -12.70
            17     3     10.20.15.53  6000      sas1 3000.00       6130  -12.70
            18     3     10.20.15.53  6000      sas2 3000.00       6129  -12.71
            19     3     10.20.15.53  6000      sas3 3000.00       6130  -12.70
            20     3     10.20.15.53  6000      sas4 3000.00       6130  -12.70
            21     3     10.20.15.53  6000      sas5 3000.00       6130  -12.70
            22     3     10.20.15.53  6000      sas6 3000.00       6129  -12.71
            23     3     10.20.15.53  6000      sas7 3000.00       6129  -12.71
            24     4     10.20.15.54  6000      sas0 3000.00       7122    1.43
            25     4     10.20.15.54  6000      sas1 3000.00       7122    1.43
            26     4     10.20.15.54  6000      sas2 3000.00       7123    1.44
            27     4     10.20.15.54  6000      sas3 3000.00       7123    1.44
            28     4     10.20.15.54  6000      sas4 3000.00       7122    1.43
            29     4     10.20.15.54  6000      sas5 3000.00       7122    1.43
            30     4     10.20.15.54  6000      sas6 3000.00       7123    1.44
            31     4     10.20.15.54  6000      sas7 3000.00       7122    1.43

(We are currently migrating data between boxes due to cluster hardware replacement, which is why zone 3 is weighted as such on the first 2 nodes)

Filelist attached (for the objects/ directory on the devices)...
but I see nothing out of place.

I'll run a full fsck on the drives tonight, try to rule that out.

Thanks for your help.



Mike Preston
Infrastructure Team  |  SYNETY
www.synety.com<http://www.synety.com>

direct: 0116 424 4016
mobile: 07950 892038
main: 0116 424 4000


From: Clay Gerrard [mailto:clay.gerrard at gmail.com]
Sent: 23 September 2013 20:34
To: Mike Preston
Cc: openstack at lists.openstack.org<mailto:openstack at lists.openstack.org>
Subject: Re: [Openstack] Replication error

Run `swift-ring-builder /etc/swift/object.builder validate` - it should have no errors and exit 0.  Can you provide a paste of the output from `swift-ring-builder /etc/swift/object.builder` as well - it should list some general info about the ring (number of replicas, and list of devices).  Rebalance the ring and make sure it's been distributed to all nodes.

The particular line you're seeing pop up in the traceback seems to be looking for all of the nodes for a particular partition it found in the objects' dir.  I'm not seeing any local sanitization [1] around those top level directory names, so maybe it's just some garbage that created there outside of swift, or some file system corruption?

Can you provide the output from `ls /srv/node/objects` (or wherever you have devices configured)

-Clay

1. https://bugs.launchpad.net/swift/+bug/1229372

On Mon, Sep 23, 2013 at 2:34 AM, Mike Preston <mike.preston at synety.com<mailto:mike.preston at synety.com>> wrote:
Hi,

We are seeing a replication error on swift. The error only is seen on a single node, the other nodes appear to be working fine.
Installed version is debian wheezy with swift 1.4.8-2+deb7u1
Sep 23 10:33:03 storage-node-01 object-replicator Starting object replication pass.
Sep 23 10:33:03 storage-node-01 object-replicator Exception in top-level replication loop: #012Traceback (most recent call last):#012  File "/usr/lib/python2.7/dist-packages/swift/obj/replicator.py", line 564, in replicate#012    jobs = self.collect_jobs()#012  File "/usr/lib/python2.7/dist-packages/swift/obj/replicator.py", line 536, in collect_jobs#012    self.object_ring.get_part_nodes(int(partition))#012  File "/usr/lib/python2.7/dist-packages/swift/common/ring/ring.py", line 103, in get_part_nodes#012    return [self.devs[r[part]] for r in self._replica2part2dev_id]#012IndexError: array index out of range
Sep 23 10:33:03 storage-node-01 object-replicator Nothing replicated for 0.728466033936 seconds.
Sep 23 10:33:03 storage-node-01 object-replicator Object replication complete. (0.01 minutes)
Can anyone shed any light on this or next steps in debugging it or fixing it?



Mike Preston
Infrastructure Team  |  SYNETY
www.synety.com<http://www.synety.com>

direct: 0116 424 4016
mobile: 07950 892038
main: 0116 424 4000



_______________________________________________
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to     : openstack at lists.openstack.org<mailto:openstack at lists.openstack.org>
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack


_______________________________________________
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to     : openstack at lists.openstack.org<mailto:openstack at lists.openstack.org>
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20130930/e363f0be/attachment.html>


More information about the Openstack mailing list