Not scary!

Because you have a 15/4 EC policy, we say each partition has 19 "replicas". And since rebalance will only move one "replica" of any partition max at each rebalance: up to 100% of your partitions may have at least one replica assignment move.

That means, after you push out this ring, 100% of your object GET requests will experience at most one "replica" is out of place. But that's ok! In a 15/4 you only need 15 EC fragments to respond successfully and you have 18 total fragments that did NOT get reassigned.

It's unfortunate the language is a little ambiguous, but it is talking about % of *partitions* that had a replica moved. Since each object resides in single a partition - the % of partitions affected most directly communicates the % of client objects affected by the rebalance. We do NOT display % of *partition-replicas* moved because while the number would be smaller - it could never be 100% because of the restriction that only one "replica" may move.

When doing a large topology change - particularly with EC - it may be the case that more than one replica of each part will need to move (imagine doubling your capacity into a second zone on a 8+4 ring) - so it'll take a few cranks. Eventually you'll want to have moved 6 replicas of each part (6 in z1 and 6 in z2), but if we allowed you to move six replicas of 100% of your parts you'd only have 6/8 required parts to service reads!

Protip: when you push out the new ring you can turn on handoffs_only mode for the reconstructor for a little while to get things rebalanced MUCH more quickly - just don't forget to turn it off!

(sending second time because I forgot to reply all to the list)

On Thu, Sep 16, 2021 at 11:35 AM Reid Guyett <rguyett@datto.com> wrote:

Hello,

We were working on expanding one of our clusters (Ussuri on Ubuntu
18.04) and are wondering about the rebalance behavior of
swift-ring-builder. When we run it in debug mode on a 15/4 EC ring, we
see this message about "Unable to finish rebalance plan after 2
attempts" and are seeing 100% partitions reassigned.

DEBUG: Placed 10899/2 onto dev r1z3-10.40.48.72/d10
DEBUG: Placed 2183/3 onto dev r1z5-10.40.48.76/d11
DEBUG: Placed 1607/1 onto dev r1z3-10.40.48.70/d28
DEBUG: Assigned 32768 parts
DEBUG: Gather start is 10278 (Last start was 25464)
DEBUG: Unable to finish rebalance plan after 2 attempts
Reassigned 32768 (100.00%) partitions. Balance is now 63.21.
Dispersion is now 0.00
-------------------------------------------------------------------------------
NOTE: Balance of 63.21 indicates you should push this
ring, wait at least 1 hours, and rebalance/repush.
-------------------------------------------------------------------------------

Moving 100% seems scary, what does that mean in this situation? Is
this message because 1 fragment from every partition is moved and that
is the most that it can do per rebalance because they are technically
the same partition?
When we compare the swift-ring-builder output (partitions per device)
between rebalances we can see some partitions move each time until we
no longer see the push/wait/rebalance message again. So it's not
really moving 100% partitions.

Reid

Clay Gerrard