Open Stack

Mon Jan 22 22:01:22 UTC 2018

Hello all,

for my master's thesis I'm analyzing different storage policies in
openstack swift. I'm manly interested in the reconstruction speed of the
different EC implementations.

I've noticed in my tests, that there is no reconstruction of
fragments/parity to other nodes/disks if a disk fails.

My test setup consists of 8 nodes with each 4 disks. OS is Ubuntu 16.04
LTS and the swift version is 2.15.1/pike and here are my 2 example policies:

---
[storage-policy:2]
name = liberasurecode-rs-vand-4-2
policy_type = erasure_coding
ec_type = liberasurecode_rs_vand
ec_num_data_fragments = 4
ec_num_parity_fragments = 2
ec_object_segment_size = 1048576

[storage-policy:3]
name = liberasurecode-rs-vand-3-1
policy_type = erasure_coding
ec_type = liberasurecode_rs_vand
ec_num_data_fragments = 3
ec_num_parity_fragments = 1
ec_object_segment_size = 1048576
---

ATM I've tested only the ec_type liberasurecode_rs_vand. With other
implementations the startup of swift fails, but I think this is another
topic.

To simulate a disk failure I'm using fault injection [1].

Testrun example:
1. fill with objects (32.768 1M Objects, Sum: 32GB)
2. make a disk "fail"
3. disk failure is detected, /but no reconstruction/
4. replace "failed" disk, mount "new" empty disk
5. missing fragments/parity is reconstructed on new empty disk

Expected:
1. fill with objects (32.768 1M Objects, Sum: 32GB)
2. make a disk "fail"
3. disk failure is detected, reconstruction to remaining disks/nodes
4. replace "failed" disk, mount "new" empty disk
5. rearrange data in ring to pre fail state

Shouldn't be the missing fragments/parity reconstructed on the remaining
disks/nodes? (See point 3, in Testrun example)

[1]
https://www.kernel.org/doc/Documentation/fault-injection/fault-injection.txt

Cheers,
Hannes Fuchs

Open Stack

[Openstack] [openstack] [swift] Erasure Coding - No reconstruction to other nodes/disks on disk failure

OpenStack

Community

Documentation

Branding & Legal