<div dir="ltr"><div><div>Hi John,<br><br></div>Thanks for the explanation. Have a couple of more questions on this subject though.<br><br></div><div></div><div>1. "pretend_min_hours_passed" sounds like something that I could use. I'm okay if there is a chance of interruption in services to the user at this time, as long as it does not cause any data-loss or data-corruption.<br>
</div><div>2. It would have been really useful if the rebalancing operations could be logged by swift somewhere and automatically run later (after min_part_hours).<br><br></div><div>Regards,<br></div><div>Shyam<br></div></div>
<div class="gmail_extra"><br><br><div class="gmail_quote">On Thu, May 1, 2014 at 11:15 PM, John Dickinson <span dir="ltr"><<a href="mailto:me@not.mn" target="_blank">me@not.mn</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class=""><br>
On May 1, 2014, at 10:32 AM, Shyam Prasad N <<a href="mailto:nspmangalore@gmail.com">nspmangalore@gmail.com</a>> wrote:<br>
<br>
> Hi Chuck,<br>
> Thanks for the reply.<br>
><br>
> The reason for such weight distribution seems to do with the ring rebalance command. I've scripted the disk addition (and rebalance) process to the ring using a wrapper command. When I trigger the rebalance after each disk addition, only the first rebalance seems to take effect.<br>
><br>
> Is there any other way to adjust the weights other than rebalance? Or is there a way to force a rebalance, even if the frequency of the rebalance (as a part of disk addition) is under an hour (the min_part_hours value in ring creation).<br>
<br>
</div>Rebalancing only moves one replica at a time to ensure that your data remains available, even if you have a hardware failure while you are adding capacity. This is why it may take multiple rebalances to get everything evenly balanced.<br>
<br>
The min_part_hours setting (perhaps poorly named) should match how long a replication pass takes in your cluster. You can understand this because of what I said above. By ensuring that replication has completed before putting another partition "in flight", Swift can ensure that you keep your data highly available.<br>
<br>
For completeness to answer your question, there is an (intentionally) undocumented option in swift-ring-builder called "pretend_min_part_hours_passed", but it should ALMOST NEVER be used in a production cluster, unless you really, really know what you are doing. Using that option will very likely cause service interruptions to your users. The better option is to correctly set the min_part_hours value to match your replication pass time (with set_min_part_hours), and then wait for swift to move things around.<br>
<br>
Here's some more info on how and why to add capacity to a running Swift cluster: <a href="https://swiftstack.com/blog/2012/04/09/swift-capacity-management/" target="_blank">https://swiftstack.com/blog/2012/04/09/swift-capacity-management/</a><br>
<span class="HOEnZb"><font color="#888888"><br>
--John<br>
</font></span><div class="HOEnZb"><div class="h5"><br>
<br>
<br>
<br>
<br>
> On May 1, 2014 9:00 PM, "Chuck Thier" <<a href="mailto:cthier@gmail.com">cthier@gmail.com</a>> wrote:<br>
> Hi Shyam,<br>
><br>
> If I am reading your ring output correctly, it looks like only the devices in node .202 have a weight set, and thus why all of your objects are going to that one node. You can update the weight of the other devices, and rebalance, and things should get distributed correctly.<br>
><br>
> --<br>
> Chuck<br>
><br>
><br>
> On Thu, May 1, 2014 at 5:28 AM, Shyam Prasad N <<a href="mailto:nspmangalore@gmail.com">nspmangalore@gmail.com</a>> wrote:<br>
> Hi,<br>
><br>
> I created a swift cluster and configured the rings like this...<br>
><br>
> swift-ring-builder object.builder create 10 3 1<br>
><br>
> ubuntu-202:/etc/swift$ swift-ring-builder object.builder<br>
> object.builder, build version 12<br>
> 1024 partitions, 3.000000 replicas, 1 regions, 4 zones, 12 devices, 300.00 balance<br>
> The minimum number of hours before a partition can be reassigned is 1<br>
> Devices: id region zone ip address port replication ip replication port name weight partitions balance meta<br>
> 0 1 1 10.3.0.202 6010 10.3.0.202 6010 xvdb 1.00 1024 300.00<br>
> 1 1 1 10.3.0.202 6020 10.3.0.202 6020 xvdc 1.00 1024 300.00<br>
> 2 1 1 10.3.0.202 6030 10.3.0.202 6030 xvde 1.00 1024 300.00<br>
> 3 1 2 10.3.0.212 6010 10.3.0.212 6010 xvdb 1.00 0 -100.00<br>
> 4 1 2 10.3.0.212 6020 10.3.0.212 6020 xvdc 1.00 0 -100.00<br>
> 5 1 2 10.3.0.212 6030 10.3.0.212 6030 xvde 1.00 0 -100.00<br>
> 6 1 3 10.3.0.222 6010 10.3.0.222 6010 xvdb 1.00 0 -100.00<br>
> 7 1 3 10.3.0.222 6020 10.3.0.222 6020 xvdc 1.00 0 -100.00<br>
> 8 1 3 10.3.0.222 6030 10.3.0.222 6030 xvde 1.00 0 -100.00<br>
> 9 1 4 10.3.0.232 6010 10.3.0.232 6010 xvdb 1.00 0 -100.00<br>
> 10 1 4 10.3.0.232 6020 10.3.0.232 6020 xvdc 1.00 0 -100.00<br>
> 11 1 4 10.3.0.232 6030 10.3.0.232 6030 xvde 1.00 0 -100.00<br>
><br>
> Container and account rings have a similar configuration.<br>
> Once the rings were created and all the disks were added to the rings like above, I ran rebalance on each ring. (I ran rebalance after adding each of the node above.)<br>
> Then I immediately scp the rings to all other nodes in the cluster.<br>
><br>
> I now observe that the objects are all going to 10.3.0.202. I don't see the objects being replicated to the other nodes. So much so that 202 is approaching 100% disk usage, while other nodes are almost completely empty.<br>
> What am I doing wrong? Am I not supposed to run rebalance operation after addition of each disk/node?<br>
><br>
> Thanks in advance for the help.<br>
><br>
> --<br>
> -Shyam<br>
><br>
> _______________________________________________<br>
> OpenStack-dev mailing list<br>
> <a href="mailto:OpenStack-dev@lists.openstack.org">OpenStack-dev@lists.openstack.org</a><br>
> <a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>
><br>
><br>
><br>
> _______________________________________________<br>
> OpenStack-dev mailing list<br>
> <a href="mailto:OpenStack-dev@lists.openstack.org">OpenStack-dev@lists.openstack.org</a><br>
> <a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>
><br>
> _______________________________________________<br>
> OpenStack-dev mailing list<br>
> <a href="mailto:OpenStack-dev@lists.openstack.org">OpenStack-dev@lists.openstack.org</a><br>
> <a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>
<br>
</div></div><br>_______________________________________________<br>
OpenStack-dev mailing list<br>
<a href="mailto:OpenStack-dev@lists.openstack.org">OpenStack-dev@lists.openstack.org</a><br>
<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>
<br></blockquote></div><br><br clear="all"><br>-- <br>-Shyam
</div>