[Openstack] [Swift] Erasure code durability and overhead in small	clusters
    Mark Kirkwood 
    mark.kirkwood at catalyst.net.nz
       
    Tue Mar 13 22:05:05 UTC 2018
    
    
  
Hi,
I'm looking at adding per-region Erasure code policies to our Swift 
cluster. Currently I'm experimenting with a small one - 3 hosts per 
region (each with 6 devices). Doing some experimentation seems to have 
highlighted a subtle relation between desire to minimize overhead and 
durability to survive a *host* outage. I'll do some examples below, and 
feel free to check my math :-)
For brevity use k = number of data fragments, m = number of parity 
fragments.
Suppose I use a (k=4, m=2) policy for each region. My overhead is m/k = 
50% (i.e 1G uses 1,5G on disk). Each of my 3 hosts has 2 fragments, so 
if I lose a host I still have 4 in total so can reassemble objects :-)
Suppose I use a (k=8. m=2) policy, Now my overhead is m/k = 25% (yay, 
better than 50%). However now my fragments get spread around like: 3, 3, 
4, If I lose a host I have at most 7 fragments - not enough to 
reassemble objects :-(
To me this suggests that a certain minimum number of *hosts* per region 
is needed for a given EC policy to be durable in the advent of host 
outage (or destruction). Is this correct - or have a flubbed the 
calculations?
regards
Mark
    
    
More information about the Openstack
mailing list