[Openstack] Swift questions

John Dickinson me at not.mn
Wed Aug 27 18:07:10 UTC 2014

Yup, this is a totally possible failure scenario, and Swift will merge the data (using last-write-wins for overwrites) automatically when the partition is restored. But you'll still have full durability on writes, even with a partitioned global cluster.


On Aug 27, 2014, at 10:49 AM, Marcus White <roastedseaweed.k at gmail.com> wrote:

> Yup, thanks for the great explanation:)
> Another question, though related: If there are three regions, and two
> get "split", there is now a partition. Both the split ones can talk to
> the third, but not to each other.
> A PUT comes into one region, and it gets written to the local site.
> Container information presumably gets updated here, including byte
> count.
> Same thing happens on another site, where a PUT comes in and the
> container information is updated with the byte count.
> When the sites get back together, do the container servers make sure
> its all correct in the end?
> If this is not a possible scenario, is there any case where the
> container metadata can be different between two zones or regions
> because of a partition, and independent PUTs can happen, and the data
> has to be merged? Is that all done by the respective servers(container
> or account?)
> I will be looking at the sources soon.
> Thanks again
> MW.
> On Wed, Aug 27, 2014 at 8:13 PM, Luse, Paul E <paul.e.luse at intel.com> wrote:
>> Marcus-
>> Not sure how much nitty gritty detail you care to know as some of these answers will get into code specifics which you're better off exploring on your own so my explanation isn't potentially dated.  At a high level though, the proxy looks up the nodes that are responsible for the storing of an object and its container via the rings.  It passes that info to the storage nodes when it does the PUT request so when the storage node goes to update the container it's been told "and here are the nodes to send the container update to".  It will send the updates to all of them.  Similarly, once the container server has updated its database it goes and updates the appropriate account databases.
>> Make sense?
>> Thx
>> Paul
>> -----Original Message-----
>> From: Marcus White [mailto:roastedseaweed.k at gmail.com]
>> Sent: Wednesday, August 27, 2014 7:04 AM
>> To: Luse, Paul E
>> Cc: openstack
>> Subject: Re: [Openstack] Swift questions
>> Thanks Paul:)
>> For the container part, you mentioned that node(meaning object
>> server?) contacts the container server. Since you can have multiple container servers, how does the object server know which container server to contact? How and where the container gets updated is a bit confusing. With container rings and account rings being separate and in the proxy part, I am not sure I understand how that path works.
>> MW
>> On Wed, Aug 27, 2014 at 6:15 PM, Luse, Paul E <paul.e.luse at intel.com> wrote:
>>> Hi Marcus,
>>> See answers below.  Feel free to ask follow-ups, others may have more to add as well.
>>> Thx
>>> Paul
>>> -----Original Message-----
>>> From: Marcus White [mailto:roastedseaweed.k at gmail.com]
>>> Sent: Wednesday, August 27, 2014 5:04 AM
>>> To: openstack
>>> Subject: [Openstack] Swift questions
>>> Hello,
>>> Some questions on new and old features of Swift. Any help would be
>>> great:) Some are very basic, sorry!
>>> 1. Does Swift write two copies and then return back to the client in the 3 replica case, with third in the background?
>>> PL>  Depends on the number of replicas, the formula for what we call a quorum is n/2 + 1 which is the number of success responses we get from the back end storage nodes before telling the client that all is good.  So, yes, with 3 replicas you need 2 good responses before returning OK.
>>> 2. This again is a stupid question, but eventually consistent for an object is a bit confusing, unless it is updated. If it is created, it is either there or not and you cannot update the data within the object. Maybe a POST can change the metadata? Or the container listing shows its there but the actual object never got there? Those are the only cases I can think of.
>>> PL> No, it's a good question because its asked a lot.  The most common scenario that we talk about for eventually consistent is the consistency between the existence of an object and its presence in the container listing so your thinking is pretty close.  When an object PUT is complete on a storage node (fully committed to disk), that node will then send a message to the appropriate container server to update the listing.  It will attempt to do this synchronously but if it can't, the update may be delayed w/o any indication to the client.  This is by design and means that it's possible to get a successful PUT, be able to GET the object w/o any problem however it may not yet show up in the container listing.  There are other scenarios that demonstrate the eventually consistent nature of Swift, this is just a common and easy to explain one.
>>> 3. Once an object has been written, when and how is the container
>>> listing, number of bytes, account listing (if new container created)
>>> etc updated? Is there something done in the path of the PUT to
>>> indicate this object belongs to a particular container and the number
>>> of bytes etc is done in the background? A little clarification would
>>> help:)
>>> PL>  Covered as part of last question.
>>> 4. For the global clusters, is the object ring across regions or is it the same with containers and accounts also?
>>> PL>  Check out the SwiftStack blog if you haven't already at https://swiftstack.com/blog/2013/07/02/swift-1-9-0-release/ and there's also some other stuff (including a demo from the last summit) that you can find googling around a bit too.  The 'Region Tier' element described in the blog addresses the makeup of a ring so can be applied to both container and account rings also - I personally didn't work on this feature so will leave it to one of the other guys to comment more in this area.
>>> 5. For containers in global clusters, if a client queries the
>>> container metadata from another site, is there a chance of it getting
>>> the old metadata? With respect to the object itself, the eventually
>>> consistent part is a bit confusing for me:)
>>> PL> There's always a chance of getting old "something" whether its metadata or data, that's part of eventually consistent.  In the face of an outage (the P in the CAP theorem) Swift will always favor availability which may mean older data or older metadata (object or container listing) depending on the specific scenario.  If deployed correctly I don't believe use of global clusters increases the odds of this happening though (again will count on someone else to say more) and its worth emphasizing the getting "old stuff" is in the face of some sort of failure (or big network congestion) so you shouldn't think of eventually consistent as being a system where you "get whatever you get".  You'll get the latest greatest available information.
>>> MW
>>> _______________________________________________
>>> Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>>> Post to     : openstack at lists.openstack.org
>>> Unsubscribe :
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> _______________________________________________
> Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to     : openstack at lists.openstack.org
> Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20140827/86a06786/attachment.sig>

More information about the Openstack mailing list