[Openstack] [Swift] Deciding on EC fragment config

Mark Kirkwood mark.kirkwood at catalyst.net.nz
Wed Apr 4 21:44:22 UTC 2018


Thanks John,

I was leaning towards '2 is not quite enough' for parity, but wanted to 
get a 2nd opinion. The level of detail and discussion in your answer is 
very helpful, much appreciated!

Mark


On 05/04/18 08:25, John Dickinson wrote:
> The answer always starts with "it depends...". Depends on your hardware, where it's physically located, the durability you need, the access patterns, etc
>
> There have been whole phd dissertations on the right way to calculate durability. Two parity segments isn't exactly equivalent to three replicas because in the EC case you've also got to figure out the chance of failure to get all of the necessary remaining segments to satisfy a read request[1].
>
> In your case, using 3 or 4 parity bits will probably get you better durability and availability than a 3x replica system and still use less overall drive space[2]. My company's product has three "canned" EC policy settings to make it simpler for customers to choose. We've got 4+3, 8+4, and 15+4 settings, and we steer people to one of them based on how many servers are in their cluster.
>
> Note that there's nothing special about the m=4 examples in Swift's docs, at least in the sense of recommending 4 parity as better than 3 or 5 (or any other number).
>
> In your case, you'll want to take into account how many drives you can lose and how many servers you can lose. Suppose you have a 10+4 scheme and two servers and 12 drives in each server. You'll be able to lose 4 drives, yes, but if either server goes down, you'll not be able to access your data because each server will have 7 fragments (on seven disks). However, if you had 6 servers with 4 drives each, for the same total of 24 drives, you could lose four drives, like the other situation, but you could also lose up to two servers and still be able to read your data[3].
>
> Another consideration is how much overhead you want to have. Increasing the data segments lowers the overhead used, but increasing the parity segments improves your durability and availability (up to the limits of your physical hardware failure domains).
>
> Finally, and probably most simply, you'll want to take into account the increased CPU and network cost for a particular EC scheme. A 3x replica write needs 3 network connections, and a read needs 1. For an EC policy, a write needs k+m connections, and a read needs k. If you're using something really large like an 18+3 scheme, you're looking at a 7x overhead in network requirements when compared to a 3x replica policy. The increased socket management and packet shuffling can add significant burden to your proxy servers[4]. Good news on the CPU though. The EC algorithms are old and well tuned, especially when using libraries like erasure or isa-l, and CPUs are really fast. Erasure code policies do not add significant overhead from the encode/decode steps.
>
> So, in summary, it's complicated, there's isn't a "right" answer, and it depends a lot on everything else about your cluster. But you've got this! You'll do great, and keep asking questions.
>
> I hope all this helps.
>
>




More information about the Openstack mailing list