[openstack-dev] [Swift] Design note of geo-distributed Swift cluster

Oleg Gelbukh ogelbukh at mirantis.com
Mon Feb 18 08:14:33 UTC 2013


Hello,

I would like to continue this insightful discussion by dropping a couple of
suggestions inline.

On Tue, Feb 5, 2013 at 3:47 PM, Caitlin Bestler <caitlin.bestler at nexenta.com
> wrote:

> While we don't want to solve every possible topology, I think we really
> need to pay attention to what multi-site really requires.
>
> I haven't done any studies of the entire market, but in my experience
> inter-site replication used by storage services is almost always
> via dedicated or VPN tunnels, and when VPN tunnels are used they are
> traffic shaped.
>
> This is not just a matter of connecting a bunch of IP addresses on the
> internet and then form a vague impression as to which ones
> are "far" away. It is more like the type of discovery routers do where
> each tunnel is a "link".
>
> A proper remote replication solution will be aware of these links, and
> take that into account in its replication strategy. One example
> topology that I believe is very likely is a distributed corporate
> intranet. The branch offices are very unlikely to connect with each
> other, but rather mostly connect with the central office (and maybe one
> alternate location).
>
> If the communications capacity favors communicating with certain sites,
> then we should favor replicating to those sites. Communications
> capacity between corporate sites is typically provisioned (whether with
> dedicated lines or just VPN) and not something you will be able
> to just increase on demand instantly. Inter-site bandwidth is still
> expensive.
>
> That said, there are still two important things to reach a consensus on:
>
> * Are we talking about enabling the Swift Proxy to access content that is
> at multiple sites, but each object is linked to a specific site.
>    Or are we creating a global namespace with eventual consistency, and
> smart assignment of objects to the sites where they are
>    actually referenced? The first goal is certainly easier.
>

Our initial idea was to create a global namespace, i.e. have a single ring
shared across all regions and containing all devices, and have
proxy-servers accessing data based on the ring location with preference of
local servers. Now, after some a work done on replication network feature,
we understand that the most likely deployment topology is regions with
replication networks connected by VPN of some sort and storage networks
totally isolated. In such deployment, no proxy server will ever access
remote region's storage server, thus no need in global namespace for
accessing data. What we're actually need global namespace for is the
inter-region replication, which brings us to the second question:

* What forms of site-to-site replication are we going to support? Is this
> something each system administrator specifies (such as
>     by adding policies along the lines of "all new objects created at a
> branch office will be replicated to the two central sites on
>     a daily basis. Only objects actually referenced at a branch office
> will be cached there.") or something more akin to how Swift
>     operates locally where the user does not specify where specific things
> are stored?


It looks like we need a kind of 'ring-of-rings' and a server(s) controlling
inter-region replication in every region. This server might be represented
as a device with very high weight, or some special device, which basically
has at least one replica of most partitions (or each partition) in the
cluster. This ensures the local replicators report number of replicas in
local cluster to inter-region replicator. Inter-region replicators, in
turn, compare value of replicas to recorded in 'ring-of-rings' and initiate
cross-region replication if local region lost all configured replicas of
partition.


>
>
>
>
>
> ______________________________**_________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.**org <OpenStack-dev at lists.openstack.org>
> http://lists.openstack.org/**cgi-bin/mailman/listinfo/**openstack-dev<http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>
>

--
Best regards,
Oleg Gelbukh
Mirantis, Inc.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20130218/c8a646de/attachment.html>


More information about the OpenStack-dev mailing list