Open Stack

Mon Sep 20 13:41:32 UTC 2021

Thanks for that explanation. It is clear now how the rebalancing is
working for the EC policies.

We have adjusted our reconstruction workers to speed up the rebalance
and it seemed to have helped. It went from weeks to days.

Reid Guyett

On Thu, Sep 16, 2021 at 2:02 PM Clay Gerrard <clay.gerrard at gmail.com> wrote:
>
> Not scary!
>
> Because you have a 15/4 EC policy, we say each partition has 19 "replicas".  And since rebalance will only move one "replica" of any partition max at each rebalance: up to 100% of your partitions may have at least one replica assignment move.
>
> That means, after you push out this ring, 100% of your object GET requests will experience at most one "replica" is out of place.  But that's ok!  In a 15/4 you only need 15 EC fragments to respond successfully and you have 18 total fragments that did NOT get reassigned.
>
> It's unfortunate the language is a little ambiguous, but it is talking about % of *partitions* that had a replica moved.  Since each object resides in single a partition - the % of partitions affected most directly communicates the % of client objects affected by the rebalance.  We do NOT display % of *partition-replicas* moved because while the number would be smaller - it could never be 100% because of the restriction that only one "replica" may move.
>
> When doing a large topology change - particularly with EC - it may be the case that more than one replica of each part will need to move (imagine doubling your capacity into a second zone on a 8+4 ring) - so it'll take a few cranks.  Eventually you'll want to have moved 6 replicas of each part (6 in z1 and 6 in z2), but if we allowed you to move six replicas of 100% of your parts you'd only have 6/8 required parts to service reads!
>
> Protip: when you push out the new ring you can turn on handoffs_only mode for the reconstructor for a little while to get things rebalanced MUCH more quickly - just don't forget to turn it off!
>
> (sending second time because I forgot to reply all to the list)
>
> On Thu, Sep 16, 2021 at 11:35 AM Reid Guyett <rguyett at datto.com> wrote:
>>
>> Hello,
>>
>> We were working on expanding one of our clusters (Ussuri on Ubuntu
>> 18.04) and are wondering about the rebalance behavior of
>> swift-ring-builder. When we run it in debug mode on a 15/4 EC ring, we
>> see this message about "Unable to finish rebalance plan after 2
>> attempts" and are seeing 100% partitions reassigned.
>>
>> DEBUG: Placed 10899/2 onto dev r1z3-10.40.48.72/d10
>> DEBUG: Placed 2183/3 onto dev r1z5-10.40.48.76/d11
>> DEBUG: Placed 1607/1 onto dev r1z3-10.40.48.70/d28
>> DEBUG: Assigned 32768 parts
>> DEBUG: Gather start is 10278 (Last start was 25464)
>> DEBUG: Unable to finish rebalance plan after 2 attempts
>> Reassigned 32768 (100.00%) partitions. Balance is now 63.21.
>> Dispersion is now 0.00
>> -------------------------------------------------------------------------------
>> NOTE: Balance of 63.21 indicates you should push this
>>       ring, wait at least 1 hours, and rebalance/repush.
>> -------------------------------------------------------------------------------
>>
>> Moving 100% seems scary, what does that mean in this situation? Is
>> this message because 1 fragment from every partition is moved and that
>> is the most that it can do per rebalance because they are technically
>> the same partition?
>> When we compare the swift-ring-builder output (partitions per device)
>> between rebalances we can see some partitions move each time until we
>> no longer see the push/wait/rebalance message again. So it's not
>> really moving 100% partitions.
>>
>> Reid
>>
>>
>
>
> --
> Clay Gerrard

Open Stack

[Swift] Rebalancing EC question

OpenStack

Community

Documentation

Branding & Legal