Open Stack

Tue Mar 13 22:05:05 UTC 2018

Hi,

I'm looking at adding per-region Erasure code policies to our Swift 
cluster. Currently I'm experimenting with a small one - 3 hosts per 
region (each with 6 devices). Doing some experimentation seems to have 
highlighted a subtle relation between desire to minimize overhead and 
durability to survive a *host* outage. I'll do some examples below, and 
feel free to check my math :-)

For brevity use k = number of data fragments, m = number of parity 
fragments.

Suppose I use a (k=4, m=2) policy for each region. My overhead is m/k = 
50% (i.e 1G uses 1,5G on disk). Each of my 3 hosts has 2 fragments, so 
if I lose a host I still have 4 in total so can reassemble objects :-)

Suppose I use a (k=8. m=2) policy, Now my overhead is m/k = 25% (yay, 
better than 50%). However now my fragments get spread around like: 3, 3, 
4, If I lose a host I have at most 7 fragments - not enough to 
reassemble objects :-(

To me this suggests that a certain minimum number of *hosts* per region 
is needed for a given EC policy to be durable in the advent of host 
outage (or destruction). Is this correct - or have a flubbed the 
calculations?

regards

Mark

Open Stack

[Openstack] [Swift] Erasure code durability and overhead in small clusters

OpenStack

Community

Documentation

Branding & Legal