[Openstack] [Swift] Erasure code durability and overhead in small clusters
Mark Kirkwood
mark.kirkwood at catalyst.net.nz
Tue Mar 13 22:05:05 UTC 2018
Hi,
I'm looking at adding per-region Erasure code policies to our Swift
cluster. Currently I'm experimenting with a small one - 3 hosts per
region (each with 6 devices). Doing some experimentation seems to have
highlighted a subtle relation between desire to minimize overhead and
durability to survive a *host* outage. I'll do some examples below, and
feel free to check my math :-)
For brevity use k = number of data fragments, m = number of parity
fragments.
Suppose I use a (k=4, m=2) policy for each region. My overhead is m/k =
50% (i.e 1G uses 1,5G on disk). Each of my 3 hosts has 2 fragments, so
if I lose a host I still have 4 in total so can reassemble objects :-)
Suppose I use a (k=8. m=2) policy, Now my overhead is m/k = 25% (yay,
better than 50%). However now my fragments get spread around like: 3, 3,
4, If I lose a host I have at most 7 fragments - not enough to
reassemble objects :-(
To me this suggests that a certain minimum number of *hosts* per region
is needed for a given EC policy to be durable in the advent of host
outage (or destruction). Is this correct - or have a flubbed the
calculations?
regards
Mark
More information about the Openstack
mailing list