[Openstack] Swift questions
me at not.mn
Wed Aug 27 16:43:51 UTC 2014
Just to clarify the first answer. Everything Paul said is correct: with 3 replicas, Swift requires that 2 (a quorum) are successful before a success can be returned to the client. But, assuming there isn't any current issue in the cluster (ie it's healthy), then all replicas will be written before success is returned to the client. Swift doesn't spool the object for later storage. The data is written to disk (for every replica) as it is streamed to the cluster. The quorum check simply allows for better availability if a drive fails during the write. In most cases, a 2xx response to a PUT means that the data is fully replicated and durably flushed to disk for every replica.
Background replication is only done to handle failures (eg a drive failing). When a client gets a success response (2xx code) from Swift from a write request, the client can assume that Swift has durably stored the data and is responsible for keeping it. If the client gets a 5xx response, the client should retry the request because Swift is not offering any guarantees about the durability of the data stored.
On Aug 27, 2014, at 7:43 AM, Luse, Paul E <paul.e.luse at intel.com> wrote:
> Not sure how much nitty gritty detail you care to know as some of these answers will get into code specifics which you're better off exploring on your own so my explanation isn't potentially dated. At a high level though, the proxy looks up the nodes that are responsible for the storing of an object and its container via the rings. It passes that info to the storage nodes when it does the PUT request so when the storage node goes to update the container it's been told "and here are the nodes to send the container update to". It will send the updates to all of them. Similarly, once the container server has updated its database it goes and updates the appropriate account databases.
> Make sense?
> -----Original Message-----
> From: Marcus White [mailto:roastedseaweed.k at gmail.com]
> Sent: Wednesday, August 27, 2014 7:04 AM
> To: Luse, Paul E
> Cc: openstack
> Subject: Re: [Openstack] Swift questions
> Thanks Paul:)
> For the container part, you mentioned that node(meaning object
> server?) contacts the container server. Since you can have multiple container servers, how does the object server know which container server to contact? How and where the container gets updated is a bit confusing. With container rings and account rings being separate and in the proxy part, I am not sure I understand how that path works.
> On Wed, Aug 27, 2014 at 6:15 PM, Luse, Paul E <paul.e.luse at intel.com> wrote:
>> Hi Marcus,
>> See answers below. Feel free to ask follow-ups, others may have more to add as well.
>> -----Original Message-----
>> From: Marcus White [mailto:roastedseaweed.k at gmail.com]
>> Sent: Wednesday, August 27, 2014 5:04 AM
>> To: openstack
>> Subject: [Openstack] Swift questions
>> Some questions on new and old features of Swift. Any help would be
>> great:) Some are very basic, sorry!
>> 1. Does Swift write two copies and then return back to the client in the 3 replica case, with third in the background?
>> PL> Depends on the number of replicas, the formula for what we call a quorum is n/2 + 1 which is the number of success responses we get from the back end storage nodes before telling the client that all is good. So, yes, with 3 replicas you need 2 good responses before returning OK.
>> 2. This again is a stupid question, but eventually consistent for an object is a bit confusing, unless it is updated. If it is created, it is either there or not and you cannot update the data within the object. Maybe a POST can change the metadata? Or the container listing shows its there but the actual object never got there? Those are the only cases I can think of.
>> PL> No, it's a good question because its asked a lot. The most common scenario that we talk about for eventually consistent is the consistency between the existence of an object and its presence in the container listing so your thinking is pretty close. When an object PUT is complete on a storage node (fully committed to disk), that node will then send a message to the appropriate container server to update the listing. It will attempt to do this synchronously but if it can't, the update may be delayed w/o any indication to the client. This is by design and means that it's possible to get a successful PUT, be able to GET the object w/o any problem however it may not yet show up in the container listing. There are other scenarios that demonstrate the eventually consistent nature of Swift, this is just a common and easy to explain one.
>> 3. Once an object has been written, when and how is the container
>> listing, number of bytes, account listing (if new container created)
>> etc updated? Is there something done in the path of the PUT to
>> indicate this object belongs to a particular container and the number
>> of bytes etc is done in the background? A little clarification would
>> PL> Covered as part of last question.
>> 4. For the global clusters, is the object ring across regions or is it the same with containers and accounts also?
>> PL> Check out the SwiftStack blog if you haven't already at https://swiftstack.com/blog/2013/07/02/swift-1-9-0-release/ and there's also some other stuff (including a demo from the last summit) that you can find googling around a bit too. The 'Region Tier' element described in the blog addresses the makeup of a ring so can be applied to both container and account rings also - I personally didn't work on this feature so will leave it to one of the other guys to comment more in this area.
>> 5. For containers in global clusters, if a client queries the
>> container metadata from another site, is there a chance of it getting
>> the old metadata? With respect to the object itself, the eventually
>> consistent part is a bit confusing for me:)
>> PL> There's always a chance of getting old "something" whether its metadata or data, that's part of eventually consistent. In the face of an outage (the P in the CAP theorem) Swift will always favor availability which may mean older data or older metadata (object or container listing) depending on the specific scenario. If deployed correctly I don't believe use of global clusters increases the odds of this happening though (again will count on someone else to say more) and its worth emphasizing the getting "old stuff" is in the face of some sort of failure (or big network congestion) so you shouldn't think of eventually consistent as being a system where you "get whatever you get". You'll get the latest greatest available information.
>> Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>> Post to : openstack at lists.openstack.org
>> Unsubscribe :
> Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to : openstack at lists.openstack.org
> Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 801 bytes
Desc: Message signed with OpenPGP using GPGMail
More information about the Openstack