[openstack-dev] [Swift] Design note of geo-distributed Swift cluster
YUZAWA Takahiko
yuzawataka at intellilink.co.jp
Tue Feb 19 05:36:54 UTC 2013
Oleg-san,
The implementation of geo-distributed cluster of swift entirely changed
from proxy-affinity and region to inter-region replication?
I have questions like following.
* Have namespace been split normal rings and ring-of-ring of
inter-region replicator? If it's so, how clients reach objects in other
region?
* Inter-region replicators must store one replica of objects of each
region? Will it scale?
Could you tell us more details of this idea?
Thank you.
(2013/02/18 17:14), Oleg Gelbukh wrote:
> Hello,
>
> I would like to continue this insightful discussion by dropping a couple
> of suggestions inline.
>
> On Tue, Feb 5, 2013 at 3:47 PM, Caitlin Bestler
> <caitlin.bestler at nexenta.com <mailto:caitlin.bestler at nexenta.com>> wrote:
>
> While we don't want to solve every possible topology, I think we
> really need to pay attention to what multi-site really requires.
>
> I haven't done any studies of the entire market, but in my
> experience inter-site replication used by storage services is almost
> always
> via dedicated or VPN tunnels, and when VPN tunnels are used they are
> traffic shaped.
>
> This is not just a matter of connecting a bunch of IP addresses on
> the internet and then form a vague impression as to which ones
> are "far" away. It is more like the type of discovery routers do
> where each tunnel is a "link".
>
> A proper remote replication solution will be aware of these links,
> and take that into account in its replication strategy. One example
> topology that I believe is very likely is a distributed corporate
> intranet. The branch offices are very unlikely to connect with each
> other, but rather mostly connect with the central office (and maybe
> one alternate location).
>
> If the communications capacity favors communicating with certain
> sites, then we should favor replicating to those sites. Communications
> capacity between corporate sites is typically provisioned (whether
> with dedicated lines or just VPN) and not something you will be able
> to just increase on demand instantly. Inter-site bandwidth is still
> expensive.
>
> That said, there are still two important things to reach a consensus on:
>
> * Are we talking about enabling the Swift Proxy to access content
> that is at multiple sites, but each object is linked to a specific site.
> Or are we creating a global namespace with eventual consistency,
> and smart assignment of objects to the sites where they are
> actually referenced? The first goal is certainly easier.
>
> Our initial idea was to create a global namespace, i.e. have a single
> ring shared across all regions and containing all devices, and have
> proxy-servers accessing data based on the ring location with preference
> of local servers. Now, after some a work done on replication network
> feature, we understand that the most likely deployment topology is
> regions with replication networks connected by VPN of some sort and
> storage networks totally isolated. In such deployment, no proxy server
> will ever access remote region's storage server, thus no need in global
> namespace for accessing data. What we're actually need global namespace
> for is the inter-region replication, which brings us to the second question:
>
> * What forms of site-to-site replication are we going to support? Is
> this something each system administrator specifies (such as
> by adding policies along the lines of "all new objects created
> at a branch office will be replicated to the two central sites on
> a daily basis. Only objects actually referenced at a branch
> office will be cached there.") or something more akin to how Swift
> operates locally where the user does not specify where specific
> things are stored?
>
>
> It looks like we need a kind of 'ring-of-rings' and a server(s)
> controlling inter-region replication in every region. This server might
> be represented as a device with very high weight, or some special
> device, which basically has at least one replica of most partitions (or
> each partition) in the cluster. This ensures the local replicators
> report number of replicas in local cluster to inter-region replicator.
> Inter-region replicators, in turn, compare value of replicas to recorded
> in 'ring-of-rings' and initiate cross-region replication if local region
> lost all configured replicas of partition.
>
>
>
>
>
>
> _________________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.__org
> <mailto:OpenStack-dev at lists.openstack.org>
> http://lists.openstack.org/__cgi-bin/mailman/listinfo/__openstack-dev <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>
>
>
> --
> Best regards,
> Oleg Gelbukh
> Mirantis, Inc.
>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
More information about the OpenStack-dev
mailing list