[Openstack] Fwd: Re: [openstack] [swift] Erasure Coding - No reconstruction to other nodes/disks on disk failure

Hannes Fuchs hannes.fuchs at gmx.org
Sun Jan 28 11:19:59 UTC 2018


Seems to be that my university mail server bounces replies from the ML.
So I have to change my mail settings.

Maybe the information's are helpful for anyone who runs into the same
question.


Cheers,
Hannes


-------- Weitergeleitete Nachricht --------
Betreff: Re: [Openstack] [openstack] [swift] Erasure Coding - No
reconstruction to other nodes/disks on disk failure
Datum: Tue, 23 Jan 2018 10:06:47 +0100
Von: Hannes Fuchs <hannes.fuchs at student.htw-berlin.de>
An: Clay Gerrard <clay.gerrard at gmail.com>

Hello Clay,

Thank you for the fast reply and the explanation. This clears things up.
The link to the bug is also very helpful. (Did not find a hint in the
documentation)

So I'll change my test workflow.

Are there public information's from RedHat about their discussion about
ring automation?


Thanks,
Hannes

On 23.01.2018 00:03, Clay Gerrard wrote:
> It's debatable, but currently operating as intended [1].  The fail in place
> workflow for EC expects the operator to do a ring change [2].  While
> replicated fail in place workflows do allow for the operator to unmount and
> post-pone a rebalance, it's not a common workflow.  In practice the Swift
> deployers/operators I've talked to tend to follow the rebalance after disk
> failure workflow for both replicated and EC policies.  While restoring data
> to full durability in a reactive manor to drive failures is important -
> there's more than one way to get Swift to do that - and it seems
> operators/automation prefers to handle that with an explict ring change.
> That said; it's just a prioritization issue - I wouldn't imagine anyone
> would be opposed to rebuilding fragments to handoffs in response to a 507.
> But there is some efficiency concerns... reassigning primaries is a lot
> simpler in many ways as long as you're able to do that in a reactive
> fashion.  Redhat was recently discussing interest in doing more opensource
> upstream work on ring automation...
> 
> -Clay
> 
> 1. https://bugs.launchpad.net/swift/+bug/1510342 - I don't think anyone is
> directly opposed to seeing this change, but as ring automation best
> practices have become more sophisticated it's less of a priority
> 2. essentially everyone has some sort of alert/trigger/automation around
> disk failure (or degraded disk performance) and the operator/system
> immediately/automatically fail the device by removing it from the ring and
> push out the changed partition assignments - allowing the system to rebuild
> the partitions to the new primaries instead of a handoff.
> 
> On Mon, Jan 22, 2018 at 2:01 PM, Hannes Fuchs <
> hannes.fuchs at student.htw-berlin.de> wrote:
> 
>> Hello all,
>>
>> for my master's thesis I'm analyzing different storage policies in
>> openstack swift. I'm manly interested in the reconstruction speed of the
>> different EC implementations.
>>
>> I've noticed in my tests, that there is no reconstruction of
>> fragments/parity to other nodes/disks if a disk fails.
>>
>> My test setup consists of 8 nodes with each 4 disks. OS is Ubuntu 16.04
>> LTS and the swift version is 2.15.1/pike and here are my 2 example
>> policies:
>>
>> ---
>> [storage-policy:2]
>> name = liberasurecode-rs-vand-4-2
>> policy_type = erasure_coding
>> ec_type = liberasurecode_rs_vand
>> ec_num_data_fragments = 4
>> ec_num_parity_fragments = 2
>> ec_object_segment_size = 1048576
>>
>> [storage-policy:3]
>> name = liberasurecode-rs-vand-3-1
>> policy_type = erasure_coding
>> ec_type = liberasurecode_rs_vand
>> ec_num_data_fragments = 3
>> ec_num_parity_fragments = 1
>> ec_object_segment_size = 1048576
>> ---
>>
>> ATM I've tested only the ec_type liberasurecode_rs_vand. With other
>> implementations the startup of swift fails, but I think this is another
>> topic.
>>
>> To simulate a disk failure I'm using fault injection [1].
>>
>> Testrun example:
>> 1. fill with objects (32.768 1M Objects, Sum: 32GB)
>> 2. make a disk "fail"
>> 3. disk failure is detected, /but no reconstruction/
>> 4. replace "failed" disk, mount "new" empty disk
>> 5. missing fragments/parity is reconstructed on new empty disk
>>
>> Expected:
>> 1. fill with objects (32.768 1M Objects, Sum: 32GB)
>> 2. make a disk "fail"
>> 3. disk failure is detected, reconstruction to remaining disks/nodes
>> 4. replace "failed" disk, mount "new" empty disk
>> 5. rearrange data in ring to pre fail state
>>
>>
>> Shouldn't be the missing fragments/parity reconstructed on the remaining
>> disks/nodes? (See point 3, in Testrun example)
>>
>>
>> [1]
>> https://www.kernel.org/doc/Documentation/fault-injection/
>> fault-injection.txt
>>
>>
>> Cheers,
>> Hannes Fuchs
>>
>> _______________________________________________
>> Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/
>> openstack
>> Post to     : openstack at lists.openstack.org
>> Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/
>> openstack
>>
> 



More information about the Openstack mailing list