[openstack-dev] [Swift] Erasure coding and geo replication

Kota TSUYUZAKI tsuyuzaki.kota at lab.ntt.co.jp
Mon Feb 15 10:29:09 UTC 2016


Hello Mark,

AFAIK, a few reasons for that we still are in working progress for erasure code + geo replication.

>> and expect to survive a region outage...
>>
>> With that I mind I did some experiments (Liberty swift) and it looks to me like if you have:
>>
>> - num_data_frags < num_nodes in (smallest) region
>>
>> and:
>>
>> - num_parity_frags = num_data_frags
>>
>>
>> then having a region fail does not result in service outage.

Good point but note that the PyECLib v1.0.7 (pinned to Kilo/Liberty stable) still have a problem which cannot decode the original data when all feed fragments are parity frags[1]. (i.e. if set
num_parity_frags = num_data frags and then, num_parity_frags comes into proxy for GET request, it will fail at the decoding) The problem was already resolved in the PyECLib/liberasurecode at master
branch and current swift master has the PyECLib>=1.0.7 dependencies so if you thought to use the newest Swift, it might be not
a matter.

In the Swift perspective, I think that we need more tests/discussion for geo replication around write/read affinity[2] which is geo replication stuff in Swift itself and performances.

For the write/read affinity, actually we didn't consider the affinity control to simplify the implementation until EC landed into Swift master[3] so I think it's time to make sure how we can use the
affinity control with EC but it's not done yet.

For the performance perspective, in my experiments, more parities causes quite performance degradation[4]. To prevent the degradation, I am working for the spec which makes duplicated copy from
data/parity fragments and spread them out into geo regions.

To sumurize, we've not done the work yet but we welcome to discuss and contribute for EC + geo replication anytime, IMO.

Thanks,
Kota

1: https://bitbucket.org/tsg-/liberasurecode/commits/a01b1818c874a65d1d1fb8f11ea441e9d3e18771
2: http://docs.openstack.org/developer/swift/admin_guide.html#geographically-distributed-clusters
3: http://docs.openstack.org/developer/swift/overview_erasure_code.html#region-support
4: https://specs.openstack.org/openstack/swift-specs/specs/in_progress/global_ec_cluster.html



(2016/02/15 18:00), Mark Kirkwood wrote:
> After looking at:
> 
> https://www.youtube.com/watch?v=9YHvYkcse-k
> 
> I have a question (that follows on from Bruno's) about using erasure coding with geo replication.
> 
> Now the example given to show why you could/should not use erasure coding with geo replication is somewhat flawed as it is immediately clear that you cannot set:
> 
> - num_data_frags > num_devices (or nodes) in a region
> 
> and expect to survive a region outage...
> 
> With that I mind I did some experiments (Liberty swift) and it looks to me like if you have:
> 
> - num_data_frags < num_nodes in (smallest) region
> 
> and:
> 
> - num_parity_frags = num_data_frags
> 
> 
> then having a region fail does not result in service outage.
> 
> So my real question is - it looks like it *is* possible to use erasure coding in geo replicated situations - however I may well be missing something significant, so I'd love some clarification here [1]!
> 
> Cheers
> 
> Mark
> 
> [1] Reduction is disk usage and net traffic looks attractive
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> 


-- 
----------------------------------------------------------
Kota Tsuyuzaki(露﨑 浩太)  <tsuyuzaki.kota at lab.ntt.co.jp>
NTT Software Innovation Center
Cloud Solution Project
Phone  0422-59-2837
Fax    0422-59-2965
-----------------------------------------------------------





More information about the OpenStack-dev mailing list