Thanks Andi, that helps, it's true that my expectations were misplaced; I was expecting all nodes to "rebalance" until they each store the same size.<div><br></div><div>What's weird though is there are missing folders in the newly created c0d4p1 node. Here's what I get</div>

<div><br></div><div><div>root@storage3:/srv/node# ls c0d1p1/</div><div>accounts  async_pending  containers  objects  tmp</div><div><br></div><div><div>root@storage3:/srv/node# ls c0d4p1/</div><div>accounts  tmp</div></div>

<div><br></div><div>Is that normal?</div><div><br></div><div>And when I check /var/log/rsyncd.log for the moves in between storage nodes, I see too many of the following- which, again, makes me think whether there's something wrong :</div>

<div><br></div><div><div>2012/10/24 19:22:56 [6514] rsync to container/c0d4p1/tmp/e49cf526-1d53-4069-bbea-b74f6dbec5f1 from storage2 (192.168.1.4)</div><div>2012/10/24 19:22:56 [6514] receiving file list</div><div>2012/10/24 19:22:56 [6514] sent 54 bytes  received 17527 bytes  total size 17408</div>

<div>2012/10/24 21:22:56 [6516] connect from storage2 (192.168.1.4)</div><div>2012/10/24 19:22:56 [6516] rsync to container/c0d4p1/tmp/4b8b0618-077b-48e2-a7a0-fb998fcf11bc from storage2 (192.168.1.4)</div><div>2012/10/24 19:22:56 [6516] receiving file list</div>

<div>2012/10/24 19:22:56 [6516] sent 54 bytes  received 26743 bytes  total size 26624</div><div>2012/10/24 21:22:56 [6518] connect from storage2 (192.168.1.4)</div><div>2012/10/24 19:22:56 [6518] rsync to container/c0d4p1/tmp/53452ee6-c52c-4e3b-abe2-a31a2c8d65ba from storage2 (192.168.1.4)</div>

<div>2012/10/24 19:22:56 [6518] receiving file list</div><div>2012/10/24 19:22:57 [6518] sent 54 bytes  received 24695 bytes  total size 24576</div><div>2012/10/24 21:22:57 [6550] connect from storage2 (192.168.1.4)</div>

<div>2012/10/24 19:22:57 [6550] rsync to container/c0d4p1/tmp/b858126d-3152-4d71-a0e8-eea115f69fc8 from storage2 (192.168.1.4)</div><div>2012/10/24 19:22:57 [6550] receiving file list</div><div>2012/10/24 19:22:57 [6550] sent 54 bytes  received 24695 bytes  total size 24576</div>

<div>2012/10/24 21:22:57 [6552] connect from storage2 (192.168.1.4)</div><div>2012/10/24 19:22:57 [6552] rsync to container/c0d4p1/tmp/f3ce8205-84ac-4236-baea-3a3aef2da6ab from storage2 (192.168.1.4)</div><div>2012/10/24 19:22:57 [6552] receiving file list</div>

<div>2012/10/24 19:22:58 [6552] sent 54 bytes  received 25719 bytes  total size 25600</div><div>2012/10/24 21:22:58 [6554] connect from storage2 (192.168.1.4)</div><div>2012/10/24 19:22:58 [6554] rsync to container/c0d4p1/tmp/91b4f046-eacb-4a1d-aed1-727d0c982742 from storage2 (192.168.1.4)</div>

<div>2012/10/24 19:22:58 [6554] receiving file list</div><div>2012/10/24 19:22:58 [6554] sent 54 bytes  received 18551 bytes  total size 18432</div><div>2012/10/24 21:22:58 [6556] connect from storage2 (192.168.1.4)</div>

<div>2012/10/24 19:22:58 [6556] rsync to container/c0d4p1/tmp/94d223f9-b84d-4911-be6b-bb28f89b6647 from storage2 (192.168.1.4)</div><div>2012/10/24 19:22:58 [6556] receiving file list</div><div>2012/10/24 19:22:58 [6556] sent 54 bytes  received 24695 bytes  total size 24576</div>

</div><div><br></div><div><br></div><div><br></div><br><div class="gmail_quote">On Tue, Oct 23, 2012 at 11:17 AM, andi abes <span dir="ltr"><<a href="mailto:andi.abes@gmail.com" target="_blank">andi.abes@gmail.com</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">On Tue, Oct 23, 2012 at 12:16 PM, Emre Sokullu <<a href="mailto:emre@groups-inc.com">emre@groups-inc.com</a>> wrote:<br>


> Folks,<br>

><br>

> This is the 3rd day and I see no or very little (kb.s) change with the new<br>

> disks.<br>

><br>

> Could it be normal, is there a long computation process that takes time<br>

> first before actually filling newly added disks?<br>

><br>

> Or should I just start from scratch with the "create" command this time. The<br>

> last time I did it, I didn't use the "swift-ring-builder create 20 3 1 .."<br>

> command first but just started with "swift-ring-builder add ..." and used<br>

> existing ring.gz files, thinking otherwise I could be reformatting the whole<br>

> stack. I'm not sure if that's the case.<br>

><br>

<br>

</div>That is correct - you don't want to recreate the rings, since that is<br>

likely to cause redundant partition movement.<br>

<br>

> Please advise. Thanks,<br>

><br>

<br>

I think your expectations might be misplaced. the ring builder tries<br>

to not move partitions needlessly. In your cluster, you had 3<br>

zones(and i'm assuming 3 replicas). swift placed the partitions as<br>

efficiently as it could, spread across the 3 zones (servers). As<br>

things stand, there's no real reason for partitions to move across the<br>

servers. I'm guessing that the data growth you've seen is from new<br>

data, not from existing data movement (but there are some calls to<br>

random in the code which might have produced some partition movement).<br>

<br>

If you truly want to move things around forcefully, you could:<br>

* decrease the weight of the old devices. This would cause them to be<br>

over weighted, and partitions reassigned away from them.<br>

* delete and re-add devices to the ring. This will cause all the<br>

partitions from the deleted devices to be spread across the new set of<br>

devices.<br>

<br>

After you perform your ring manipulation commands, execute the<br>

rebalance command and copy the ring files.<br>

This is likely to cause *lots* of activity in your cluster... which<br>

seems to be the desired outcome. Its likely to have negative impact of<br>

service requests to the proxy. It's something you probably want to be<br>

careful about.<br>

<br>

If you leave things alone as they are, new data will be distributed on<br>

the new devices, and as old data gets deleted usage will rebalance<br>

over time.<br>

<div class="HOEnZb"><div class="h5"><br>

<br>

> --<br>

> Emre<br>

><br>

> On Mon, Oct 22, 2012 at 12:09 PM, Emre Sokullu <<a href="mailto:emre@groups-inc.com">emre@groups-inc.com</a>> wrote:<br>

>><br>

>> Hi Samuel,<br>

>><br>

>> Thanks for quick reply.<br>

>><br>

>> They're all 100. And here's the output of swift-ring-builder<br>

>><br>

>> root@proxy1:/etc/swift# swift-ring-builder account.builder<br>

>> account.builder, build version 13<br>

>> 1048576 partitions, 3 replicas, 3 zones, 12 devices, 0.00 balance<br>

>> The minimum number of hours before a partition can be reassigned is 1<br>

>> Devices:    id  zone      ip address  port      name weight partitions<br>

>> balance meta<br>

>>              0     1     192.168.1.3  6002    c0d1p1 100.00     262144<br>

>> 0.00<br>

>>              1     1     192.168.1.3  6002    c0d2p1 100.00     262144<br>

>> 0.00<br>

>>              2     1     192.168.1.3  6002    c0d3p1 100.00     262144<br>

>> 0.00<br>

>>              3     2     192.168.1.4  6002    c0d1p1 100.00     262144<br>

>> 0.00<br>

>>              4     2     192.168.1.4  6002    c0d2p1 100.00     262144<br>

>> 0.00<br>

>>              5     2     192.168.1.4  6002    c0d3p1 100.00     262144<br>

>> 0.00<br>

>>              6     3     192.168.1.5  6002    c0d1p1 100.00     262144<br>

>> 0.00<br>

>>              7     3     192.168.1.5  6002    c0d2p1 100.00     262144<br>

>> 0.00<br>

>>              8     3     192.168.1.5  6002    c0d3p1 100.00     262144<br>

>> 0.00<br>

>>              9     1     192.168.1.3  6002    c0d4p1 100.00     262144<br>

>> 0.00<br>

>>             10     2     192.168.1.4  6002    c0d4p1 100.00     262144<br>

>> 0.00<br>

>>             11     3     192.168.1.5  6002    c0d4p1 100.00     262144<br>

>> 0.00<br>

>><br>

>> On Mon, Oct 22, 2012 at 12:03 PM, Samuel Merritt <<a href="mailto:sam@swiftstack.com">sam@swiftstack.com</a>><br>

>> wrote:<br>

>> > On 10/22/12 9:38 AM, Emre Sokullu wrote:<br>

>> >><br>

>> >> Hi folks,<br>

>> >><br>

>> >> At <a href="http://GROU.PS" target="_blank">GROU.PS</a>, we've been an OpenStack SWIFT user for more than 1.5 years<br>

>> >> now. Currently, we hold about 18TB of data on 3 storage nodes. Since<br>

>> >> we hit 84% in utilization, we have recently decided to expand the<br>

>> >> storage with more disks.<br>

>> >><br>

>> >> In order to do that, after creating a new c0d4p1 partition in each of<br>

>> >> the storage nodes, we ran the following commands on our proxy server:<br>

>> >><br>

>> >> swift-ring-builder account.builder add z1-192.168.1.3:6002/c0d4p1 100<br>

>> >> swift-ring-builder container.builder add z1-192.168.1.3:6002/c0d4p1 100<br>

>> >> swift-ring-builder object.builder add z1-192.168.1.3:6002/c0d4p1 100<br>

>> >> swift-ring-builder account.builder add z2-192.168.1.4:6002/c0d4p1 100<br>

>> >> swift-ring-builder container.builder add z2-192.168.1.4:6002/c0d4p1 100<br>

>> >> swift-ring-builder object.builder add z2-192.168.1.4:6002/c0d4p1 100<br>

>> >> swift-ring-builder account.builder add z3-192.168.1.5:6002/c0d4p1 100<br>

>> >> swift-ring-builder container.builder add z3-192.168.1.5:6002/c0d4p1 100<br>

>> >> swift-ring-builder object.builder add z3-192.168.1.5:6002/c0d4p1 100<br>

>> >><br>

>> >> [snip]<br>

>> ><br>

>> >><br>

>> >> So right now, the problem is;  the disk growth in each of the storage<br>

>> >> nodes seems to have stalled,<br>

>> ><br>

>> > So you've added 3 new devices to each ring and assigned a weight of  100<br>

>> > to<br>

>> > each one. What are the weights of the other devices in the ring? If<br>

>> > they're<br>

>> > much larger than 100, then that will cause the new devices to end up<br>

>> > with a<br>

>> > small fraction of the data you want on them.<br>

>> ><br>

>> > Running "swift-ring-builder <thing>.builder" will show you information,<br>

>> > including weights, of all the devices in the ring.<br>

>> ><br>

>> ><br>

>> ><br>

>> >> * Bonus question: why do we copy ring.gz files to storage nodes and<br>

>> >> how critical they are. To me it's not clear how Swift can afford to<br>

>> >> wait (even though it's just a few seconds ) for .ring.gz files to be<br>

>> >> in storage nodes after rebalancing- if those files are so critical.<br>

>> ><br>

>> ><br>

>> > The ring.gz files contain the mapping from Swift partitions to disks. As<br>

>> > you<br>

>> > know, the proxy server uses it to determine which backends have the data<br>

>> > for<br>

>> > a given request. The replicators also use the ring to determine where<br>

>> > data<br>

>> > belongs so that they can ensure the right number of replicas, etc.<br>

>> ><br>

>> > When two storage nodes have different versions of a ring.gz file, you<br>

>> > can<br>

>> > get replicator fights. They look like this:<br>

>> ><br>

>> > - node1's (old) ring says that the partition for a replica of<br>

>> > /cof/fee/cup<br>

>> > belongs on node2's /dev/sdf.<br>

>> > - node2's (new) ring says that the same partition belongs on node1's<br>

>> > /dev/sdd.<br>

>> ><br>

>> > When the replicator on node1 runs, it will see that it has the partition<br>

>> > for<br>

>> > /cof/fee/cup on its disk. It will then consult the ring, push that<br>

>> > partition's contents to node2, and then delete its local copy (since<br>

>> > node1's<br>

>> > ring says that this data does not belong on node1).<br>

>> ><br>

>> > When the replicator on node2 runs, it will do the converse: push to<br>

>> > node1,<br>

>> > then delete its local copy.<br>

>> ><br>

>> > If you leave the rings out of sync for a long time, then you'll end up<br>

>> > consuming disk and network IO ping-ponging a set of data around. If<br>

>> > they're<br>

>> > out of sync for a few seconds, then it's not a big deal.<br>

>> ><br>

>> > _______________________________________________<br>

>> > Mailing list: <a href="https://launchpad.net/~openstack" target="_blank">https://launchpad.net/~openstack</a><br>

>> > Post to     : <a href="mailto:openstack@lists.launchpad.net">openstack@lists.launchpad.net</a><br>

>> > Unsubscribe : <a href="https://launchpad.net/~openstack" target="_blank">https://launchpad.net/~openstack</a><br>

>> > More help   : <a href="https://help.launchpad.net/ListHelp" target="_blank">https://help.launchpad.net/ListHelp</a><br>

><br>

><br>

><br>

> _______________________________________________<br>

> Mailing list: <a href="https://launchpad.net/~openstack" target="_blank">https://launchpad.net/~openstack</a><br>

> Post to     : <a href="mailto:openstack@lists.launchpad.net">openstack@lists.launchpad.net</a><br>

> Unsubscribe : <a href="https://launchpad.net/~openstack" target="_blank">https://launchpad.net/~openstack</a><br>

> More help   : <a href="https://help.launchpad.net/ListHelp" target="_blank">https://help.launchpad.net/ListHelp</a><br>

><br>

</div></div></blockquote></div><br></div>