[openstack-dev] [Swift] Design note of geo-distributed Swift cluster

YUZAWA Takahiko yuzawataka at intellilink.co.jp
Tue Feb 19 05:36:54 UTC 2013


Oleg-san,

The implementation of geo-distributed cluster of swift entirely changed 
from proxy-affinity and region to inter-region replication?

I have questions like following.

* Have namespace been split normal rings and ring-of-ring of 
inter-region replicator? If it's so, how clients reach objects in other 
region?
* Inter-region replicators must store one replica of objects of each 
region? Will it scale?

Could you tell us more details of this idea?

Thank you.

(2013/02/18 17:14), Oleg Gelbukh wrote:
> Hello,
>
> I would like to continue this insightful discussion by dropping a couple
> of suggestions inline.
>
> On Tue, Feb 5, 2013 at 3:47 PM, Caitlin Bestler
> <caitlin.bestler at nexenta.com <mailto:caitlin.bestler at nexenta.com>> wrote:
>
>     While we don't want to solve every possible topology, I think we
>     really need to pay attention to what multi-site really requires.
>
>     I haven't done any studies of the entire market, but in my
>     experience inter-site replication used by storage services is almost
>     always
>     via dedicated or VPN tunnels, and when VPN tunnels are used they are
>     traffic shaped.
>
>     This is not just a matter of connecting a bunch of IP addresses on
>     the internet and then form a vague impression as to which ones
>     are "far" away. It is more like the type of discovery routers do
>     where each tunnel is a "link".
>
>     A proper remote replication solution will be aware of these links,
>     and take that into account in its replication strategy. One example
>     topology that I believe is very likely is a distributed corporate
>     intranet. The branch offices are very unlikely to connect with each
>     other, but rather mostly connect with the central office (and maybe
>     one alternate location).
>
>     If the communications capacity favors communicating with certain
>     sites, then we should favor replicating to those sites. Communications
>     capacity between corporate sites is typically provisioned (whether
>     with dedicated lines or just VPN) and not something you will be able
>     to just increase on demand instantly. Inter-site bandwidth is still
>     expensive.
>
>     That said, there are still two important things to reach a consensus on:
>
>     * Are we talking about enabling the Swift Proxy to access content
>     that is at multiple sites, but each object is linked to a specific site.
>         Or are we creating a global namespace with eventual consistency,
>     and smart assignment of objects to the sites where they are
>         actually referenced? The first goal is certainly easier.
>
> Our initial idea was to create a global namespace, i.e. have a single
> ring shared across all regions and containing all devices, and have
> proxy-servers accessing data based on the ring location with preference
> of local servers. Now, after some a work done on replication network
> feature, we understand that the most likely deployment topology is
> regions with replication networks connected by VPN of some sort and
> storage networks totally isolated. In such deployment, no proxy server
> will ever access remote region's storage server, thus no need in global
> namespace for accessing data. What we're actually need global namespace
> for is the inter-region replication, which brings us to the second question:
>
>     * What forms of site-to-site replication are we going to support? Is
>     this something each system administrator specifies (such as
>          by adding policies along the lines of "all new objects created
>     at a branch office will be replicated to the two central sites on
>          a daily basis. Only objects actually referenced at a branch
>     office will be cached there.") or something more akin to how Swift
>          operates locally where the user does not specify where specific
>     things are stored?
>
>
> It looks like we need a kind of 'ring-of-rings' and a server(s)
> controlling inter-region replication in every region. This server might
> be represented as a device with very high weight, or some special
> device, which basically has at least one replica of most partitions (or
> each partition) in the cluster. This ensures the local replicators
> report number of replicas in local cluster to inter-region replicator.
> Inter-region replicators, in turn, compare value of replicas to recorded
> in 'ring-of-rings' and initiate cross-region replication if local region
> lost all configured replicas of partition.
>
>
>
>
>
>
>     _________________________________________________
>     OpenStack-dev mailing list
>     OpenStack-dev at lists.openstack.__org
>     <mailto:OpenStack-dev at lists.openstack.org>
>     http://lists.openstack.org/__cgi-bin/mailman/listinfo/__openstack-dev <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>
>
>
> --
> Best regards,
> Oleg Gelbukh
> Mirantis, Inc.
>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>




More information about the OpenStack-dev mailing list