[openstack-dev] [Swift] Design note of geo-distributed Swift cluster

Oleg Gelbukh ogelbukh at mirantis.com
Mon Mar 4 18:35:45 UTC 2013


Yuzawa-san,

Your concept aligns very well with concept of replication based on
container sync modified to be service-wide, which we've seen with one of
our customers. I'm really excited to see that development of multi-regions
feature goes in several directions, as it's beneficial for Swift and the
whole ecosystem.

It appears that 'proxy ring' is an additional DB to global multi-region
ring. You could consider including information about region proxy servers
with the main ring database, for example, as a device dict entry.

PS. Any chance to meet you at Summit in Portland?

--
Best regards,
Oleg Gelbukh
Mirantis Inc.


On Fri, Mar 1, 2013 at 4:34 PM, YUZAWA Takahiko <
yuzawataka at intellilink.co.jp> wrote:

> Oleg-san
>
> Thank you for your reply. I think your idea about inter-region replication
> is very interesting.
>
> But we think it isn't good to separate namespace in geo-distributed
> cluster (about #3) for the first step.
>
> So now we have another one idea for geo-distributed cluster, in order to
> meet the requirements that no proxy-server directly connects to storage
> servers of foreign regions.
>
> Basic concept:
>  We would like to introduce 'proxy ring' that contains at least an
> information of corresponding proxy-servers to each regions. Swift-server
> can know the location of proxy-servers of each regions by 'proxy ring'.
>  And the proxy-server of local connects to the proxy-server of foreign
> directly (not to storage-servers of foreign)  when need to manipulate
> objects in foreign region.
> It may seems swift-proxy-server works like 'web forward proxy' for the
> other region proxy-servers.
>
> If 'proxy ring' was introduced, we think that we can implement
> geo-distributed clusters based on 'region-tier' and 'proxy affinity'
> without separated namespace.
>
> The following is the basic behavior of proxy-server.
>
> GET:
>  Proxy-server will get objects from local region storage servers as far as
> possible. If there are no local nodes in the primary nodes, proxy-server
> will relay a request to  proxy-server of the other region (included in the
> primary nodes).
>
> PUT:
>  Proxy-server will force to store an object into storage-servers of local
> region, and object-replicator makes it replicate toward the other region.
> (same as before)
>
> DELETE:
>  Inspecting the primary nodes, If nodes in local were found, proxy-server
> sends a request to local storage server as usual. If nodes of the other
> region were found, proxy-server will relay a request to proxy-server of the
> other region.
>
>
> For PoC, I made a brief patch againt swift-1.7.6 so that we can check the
> idea works (proxy-ring is not yet implemented though. also doesn't
> implement a process for accounts and containers).
>
> https://github.com/yuzawataka/**swift/commit/**
> 1381b0713d8676ac1f6a2e48c55264**037935e96a<https://github.com/yuzawataka/swift/commit/1381b0713d8676ac1f6a2e48c55264037935e96a>
>
> Any suggestions would be appreciated.
>
> Thank you.
>
>
> --
> Best regards,
> YUZAWA Takahiko
> NTTDATA INTELLILINK
>
> (2013/02/27 22:45), Oleg Gelbukh wrote:
>
>> Hello, Adrian, Yuzawa-san
>>
>> We are still pursuing the global ring implementation with minimal
>> changes to replication algorithm. However, it has a number of
>> drawbacks. Some of them were obvious from the very beginning (for
>> example, a need to tweak rebalance to minimize data transfers between
>> regions, or operational overhead required to dirstribute ring files in
>> multi-region environment), others have been made visible by this very
>> discussion.
>>
>> We identified 3 basic ways to implement the inter-region replication:
>>
>> 1. introduce replicator affinity, which is in general resembles proxy
>> affinity in a sense that the original replicator handles replication
>> to devices from local and foreign regions differently. For example,
>> limit the number of REPLICATE calls to foreign regions to one in ten
>> replicator runs and only connect single foreign region server in a
>> single run. This is an approach we are going to take in the first
>> iteration.
>>
>> 2. implement separate replicator process for cross-region replication.
>> The original replicator handles replication to devices in local region
>> and ignores devices in foreign regions, while region-replicator acts
>> symmetrically ignoring local devices. This approach is basically an
>> extension of the first, but allows to isolate changes from the core
>> code.
>>
>> 3. create replicator-server to sit on the edge of region's replication
>> network (or storage network if replication network is not used) and
>> control replication to foreign regions. This server won't store any
>> data, only database of hashes in a sort of 'ring-of-rings', used to
>> determine if replication to foreign region required.
>> In this case, global namespace will move to that 'ring-of-rings', and
>> for inside-region replication, standard ring is used.
>> Replication-server represented as special device with very large
>> 'weight' parameter, to get information about replicas in local cluster
>> from standard replicators. This server will also have to 'proxy'
>> replication traffic when it detects that partition is modified in
>> local cluster.
>> Unlike #1 and #2, this option supports only replication between
>> regions, no proxy-servers can talk to storage serevr in foreign
>> region. However, it can allow more sophisticated algorithms for
>> inter-regions replication.
>>
>> --
>> Best regards,
>> Oleg Gelbukh
>> Mirantis, Inc.
>>
>>
>
>
> ______________________________**_________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.**org <OpenStack-dev at lists.openstack.org>
> http://lists.openstack.org/**cgi-bin/mailman/listinfo/**openstack-dev<http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20130304/bef3bdc1/attachment.html>


More information about the OpenStack-dev mailing list