Re: [Swift] Rebalancing EC question

16 Sep 2021

      Not scary!

Because you have a 15/4 EC policy, we say each partition has 19
"replicas".  And since rebalance will only move one "replica" of any
partition max at each rebalance: up to 100% of your partitions may have at
least one replica assignment move.

That means, after you push out this ring, 100% of your object GET requests
will experience at most one "replica" is out of place.  But that's ok!  In
a 15/4 you only need 15 EC fragments to respond successfully and you have
18 total fragments that did NOT get reassigned.

It's unfortunate the language is a little ambiguous, but it is talking
about % of *partitions* that had a replica moved.  Since each object
resides in single a partition - the % of partitions affected most directly
communicates the % of client objects affected by the rebalance.  We do NOT
display % of *partition-replicas* moved because while the number would be
smaller - it could never be 100% because of the restriction that only one
"replica" may move.

When doing a large topology change - particularly with EC - it may be the
case that more than one replica of each part will need to move (imagine
doubling your capacity into a second zone on a 8+4 ring) - so it'll take a
few cranks.  Eventually you'll want to have moved 6 replicas of each part
(6 in z1 and 6 in z2), but if we allowed you to move six replicas of 100%
of your parts you'd only have 6/8 required parts to service reads!

Protip: when you push out the new ring you can turn on handoffs_only mode
for the reconstructor for a little while to get things rebalanced MUCH more
quickly - just don't forget to turn it off!

(sending second time because I forgot to reply all to the list)

On Thu, Sep 16, 2021 at 11:35 AM Reid Guyett <rguyett@datto.com> wrote:
...
Hello,
We were working on expanding one of our clusters (Ussuri on Ubuntu
18.04) and are wondering about the rebalance behavior of
swift-ring-builder. When we run it in debug mode on a 15/4 EC ring, we
see this message about "Unable to finish rebalance plan after 2
attempts" and are seeing 100% partitions reassigned.
DEBUG: Placed 10899/2 onto dev r1z3-10.40.48.72/d10
DEBUG: Placed 2183/3 onto dev r1z5-10.40.48.76/d11
DEBUG: Placed 1607/1 onto dev r1z3-10.40.48.70/d28
DEBUG: Assigned 32768 parts
DEBUG: Gather start is 10278 (Last start was 25464)
DEBUG: Unable to finish rebalance plan after 2 attempts
Reassigned 32768 (100.00%) partitions. Balance is now 63.21.
Dispersion is now 0.00
-------------------------------------------------------------------------------
NOTE: Balance of 63.21 indicates you should push this
      ring, wait at least 1 hours, and rebalance/repush.
-------------------------------------------------------------------------------
Moving 100% seems scary, what does that mean in this situation? Is
this message because 1 fragment from every partition is moved and that
is the most that it can do per rebalance because they are technically
the same partition?
When we compare the swift-ring-builder output (partitions per device)
between rebalances we can see some partitions move each time until we
no longer see the push/wait/rebalance message again. So it's not
really moving 100% partitions.
Reid
-- 
Clay Gerrard