Not scary! Because you have a 15/4 EC policy, we say each partition has 19 "replicas". And since rebalance will only move one "replica" of any partition max at each rebalance: up to 100% of your partitions may have at least one replica assignment move. That means, after you push out this ring, 100% of your object GET requests will experience at most one "replica" is out of place. But that's ok! In a 15/4 you only need 15 EC fragments to respond successfully and you have 18 total fragments that did NOT get reassigned. It's unfortunate the language is a little ambiguous, but it is talking about % of *partitions* that had a replica moved. Since each object resides in single a partition - the % of partitions affected most directly communicates the % of client objects affected by the rebalance. We do NOT display % of *partition-replicas* moved because while the number would be smaller - it could never be 100% because of the restriction that only one "replica" may move. When doing a large topology change - particularly with EC - it may be the case that more than one replica of each part will need to move (imagine doubling your capacity into a second zone on a 8+4 ring) - so it'll take a few cranks. Eventually you'll want to have moved 6 replicas of each part (6 in z1 and 6 in z2), but if we allowed you to move six replicas of 100% of your parts you'd only have 6/8 required parts to service reads! Protip: when you push out the new ring you can turn on handoffs_only mode for the reconstructor for a little while to get things rebalanced MUCH more quickly - just don't forget to turn it off! (sending second time because I forgot to reply all to the list) On Thu, Sep 16, 2021 at 11:35 AM Reid Guyett <rguyett@datto.com> wrote:
Hello,
We were working on expanding one of our clusters (Ussuri on Ubuntu 18.04) and are wondering about the rebalance behavior of swift-ring-builder. When we run it in debug mode on a 15/4 EC ring, we see this message about "Unable to finish rebalance plan after 2 attempts" and are seeing 100% partitions reassigned.
DEBUG: Placed 10899/2 onto dev r1z3-10.40.48.72/d10 DEBUG: Placed 2183/3 onto dev r1z5-10.40.48.76/d11 DEBUG: Placed 1607/1 onto dev r1z3-10.40.48.70/d28 DEBUG: Assigned 32768 parts DEBUG: Gather start is 10278 (Last start was 25464) DEBUG: Unable to finish rebalance plan after 2 attempts Reassigned 32768 (100.00%) partitions. Balance is now 63.21. Dispersion is now 0.00
------------------------------------------------------------------------------- NOTE: Balance of 63.21 indicates you should push this ring, wait at least 1 hours, and rebalance/repush.
-------------------------------------------------------------------------------
Moving 100% seems scary, what does that mean in this situation? Is this message because 1 fragment from every partition is moved and that is the most that it can do per rebalance because they are technically the same partition? When we compare the swift-ring-builder output (partitions per device) between rebalances we can see some partitions move each time until we no longer see the push/wait/rebalance message again. So it's not really moving 100% partitions.
Reid
-- Clay Gerrard