[openstack-dev] [Swift] Design note of geo-distributed Swift cluster

John Dickinson me at not.mn
Mon Feb 4 23:11:35 UTC 2013


When solving the global cluster problem, two things need to be kept in mind, I think. First, I don't need to try (and indeed I can't) solve for every single deployment scenario. Second, I really want to keep "pluggable" parts of Swift to a minimum.

That being said, proxy affinity (ie determining nearness) probably doesn't have to be any more difficult than "my region" vs "far". That would be the first step. If at some future stage we decide to figure out smart ways to determine nearness, great. But that doesn't mean it needs to be a requirement for the first iteration. Perhaps there's even some cool stuff that systems like Quantum could offer eventually. Deployers know what their clusters look like. Let's take advantage of that to get simpler code paths.

Similarly, I don't want to see proposals to replace the rsync transport mechanism with a plugin system that let's you choose anything you want (replication over token ring! replication over bit torrent!). If we need to improve the transport over what rsync offers, let's actually make something better for Swift rather than some plugin system that fragments the Swift deployments. We don't need yet-another-config-option.

--John




On Jan 28, 2013, at 11:53 PM, YUZAWA Takahiko <yuzawataka at intellilink.co.jp> wrote:

> Hi, Caitlin-san
> 
> > * Is each region supposed to be self-sufficient, in that get requests can be fulfilled by a copy within that
> >      region even if the links to other regions are temporarily down?
> 
> Object-replicator would keep retrying toward the other region. And the newest object which is PUT in other region can't be reached until connection recover.
> 
> 
> > * What is the tolerance for "eventual consistency" when dealing with continental distinces and TBs of new
> >      content potentially being created each day?
> 
> We think it depends on the performance of object-replicator. So we also assume object-replicator as the most important role in geo-distributed Swift cluster.
> We are considering the improvement of object-replicator. For example, we would like to modify object-replicator that a transfer program to be pluggable and dynamically switchable. It is aimed to be replaced with another program "rsync" (because rsync may be slow in WAN connections as Long Fat Pipe).
> 
> 
> > * What happens if the same object is updated concurrently in two different regions?
> 
> Each object in disk has timestamp in file's name like '1342555642.83577.data'.  Object-replicator synchronizes 'partition' directories of these object across nodes, and the object is replicated with keeping file name.  Same objects' file which concurrently updated might exist together in a directory of a storage node, but each file names would differ(because it's hard to be just same time by time.time() of python).
> And the newest object (as the biggest timestamp name in the directory) always has priority when requested by GET.
> So that The newest object which is PUT has the priority across the whole Swift cluster if object-replicators work and are keeping consistency.
> 
> --
> Best regards,
> YUZAWA Takahiko
> NTTDATA INTELLILINK
> 
> 
> (2013/01/26 3:12), Caitlin Bestler wrote:
>> These blueprints and documents are focused almost entirely on how the Swift Proxy creates objects.
>> 
>> I think the more critical issue for Swift objects is how Objects are replicated in a multi-region environment
>> when a copy becomes unavailable.
>> 
>> The cold hard fact here is that inter-region replication is considerably more expensive than intra-region
>> replication. If you're doing a multi-region cloud obviously you have to do both, but I am skeptical that
>> a single algorithm can support both with nothing more than a "distance" metric.
>> 
>> Some serious questions to apply to any design proposal:
>> 
>> * Is each region supposed to be self-sufficient, in that get requests can be fulfilled by a copy within that
>>     region even if the links to other regions are temporarily down?
>> * What is the tolerance for "eventual consistency" when dealing with continental distinces and TBs of new
>>     content potentially being created each day?
>> * What happens if the same object is updated concurrently in two different regions?
>> 
>> 
>> _______________________________________________
>> OpenStack-dev mailing list
>> OpenStack-dev at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>> 
> 
> 
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4082 bytes
Desc: not available
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20130204/c2c0ec5a/attachment.bin>


More information about the OpenStack-dev mailing list