[Openstack] Swift questions

Marcus White roastedseaweed.k at gmail.com
Mon Sep 1 08:19:23 UTC 2014


Hello John
Thanks a lot for all your answers:)

Some more follow up questios

You mentioned you see both deployments..

1. For the case where there are different IP addresses for different
regions, I am assuming a different keystone deployment for each site
with clients knowing the auth endpoints of each site, and getting the
swift endpoints from the auth endpoints service catalog. They know or
figure out which one is local themselves..

2. For the multi geo case with multiple IPs, wanted to query here
again on the answer you mentioned, that the keystone database is
synced across sites..For the multi site case you had mentioned you see
in deployments with multiple IP addresses, how do you guys set up
keystone so that the user information is the same across all sites?
Probably related to Question 1.

MW

On Sun, Aug 31, 2014 at 11:41 PM, John Dickinson <me at not.mn> wrote:
>
> On Aug 31, 2014, at 5:25 AM, Marcus White <roastedseaweed.k at gmail.com> wrote:
>
>> One last follow up question, hopefully!
>>
>> I agree, each proxy server can handle any requests..I wanted to see
>> what the typical deployment is for the multi geo case.
>>
>> For multi region global clusters, for example, each region would have
>> a separate IP address? Behind each regions IP address you can have
>> multiple proxy servers in each site, and a customer can connect to any
>> site?
>
> I've seen both. Really it depends on the goals of the global cluster and the existing infrastructure available. If the goal is a DR story, then often DNS is pointed at one region, and when the "R" of DR is needed, the DNS entry is updated to point to the other region. Other times, if the goal is to allow for lower-latency reads and writes, then then all regions are used for access all the time. The client either relies on some sort of geo-dns/anycast system to get to the closest endpoint, or the client chooses the preferred region from a list of available ones.
>
> Yes, I'd normally expect each region to have a separate public IP address with multiple proxies load balanced behind it.
>
>>
>> In a single site case, keystone would give you one end point, which is
>> the external load balancer Ip. For the multi site case, what happens
>> with keystone? Does it provide multiple end points?
>>
>> Another question is, does the keystone server itself have a different
>> external IP for each site?
>
> My understanding is that Keystone doesn't really work across wide geographic areas (I'd be happy to be wrong here--maybe a Keystone dev can jump in). I think the limitation is around the backing store Keystone uses to manage tokens and the service catalog.
>
> There are a few things to consider when using a central auth system with a globally distributed Swift cluster. First, note that if the auth token (identity info) isn't cached in the local region, Swift will need to ask the Keystone service for the information. Swift will cache it if possible, but know that the auth lookup will add latency (noticeable latency if the auth system is across an ocean). Second, make sure memcache and Swift's cache middleware are configured properly. For example, you don't want a global memcache pool to result in 50% of the internal "fetch from cache" requests to go across the WAN. You'd instead want a separate memcache pool for each region.
>
> As a side note, a centralized auth system actually makes the caching strategies easier, since there's less complexity to deal with when trying to get and cache and validate auth credentials and tokens.
>
>
>>
>> Just confused a bit on the deployment basics.
>
> Good luck! and feel free to ask more as needed.
>
>>
>> MW.
>>
>>
>> On Fri, Aug 29, 2014 at 7:51 PM, John Dickinson <me at not.mn> wrote:
>>>
>>> On Aug 29, 2014, at 2:43 AM, Marcus White <roastedseaweed.k at gmail.com> wrote:
>>>
>>>> Thanks John:)
>>>>
>>>> Some more additional basic questions..
>>>>
>>>> 1. For the multi site case, does Swift present a single end point, or
>>>> is it two separate regions with two separate IP addresses, and the
>>>> client can talk to either to get the data? I think it is the latter?
>>>
>>> Every proxy server in the cluster will be able to handle requests. There isn't a single "control node" that brokers everything. Often times, all of the proxy servers are put behind a load balancer or geo-dns or something to present a single domain for the cluster.
>>>
>>>
>>>>
>>>> 2. Are keystone databases synced across sites? I didnt know of an
>>>> option in keystone to do that, hence asking..If we need to
>>>> authenticate against keystone on different sites to access the same
>>>> object, it has to be that way?
>>>
>>> That is also my understanding, but I'm not an expert on Keystone options or deployment patterns.
>>>
>>>>
>>>> TIA,
>>>>
>>>> MW
>>>>
>>>>
>>>> On Wed, Aug 27, 2014 at 11:37 PM, John Dickinson <me at not.mn> wrote:
>>>>> Yup, this is a totally possible failure scenario, and Swift will merge the data (using last-write-wins for overwrites) automatically when the partition is restored. But you'll still have full durability on writes, even with a partitioned global cluster.
>>>>>
>>>>> --John
>>>>>
>>>>>
>>>>>
>>>>> On Aug 27, 2014, at 10:49 AM, Marcus White <roastedseaweed.k at gmail.com> wrote:
>>>>>
>>>>>> Yup, thanks for the great explanation:)
>>>>>>
>>>>>> Another question, though related: If there are three regions, and two
>>>>>> get "split", there is now a partition. Both the split ones can talk to
>>>>>> the third, but not to each other.
>>>>>>
>>>>>> A PUT comes into one region, and it gets written to the local site.
>>>>>> Container information presumably gets updated here, including byte
>>>>>> count.
>>>>>>
>>>>>> Same thing happens on another site, where a PUT comes in and the
>>>>>> container information is updated with the byte count.
>>>>>> When the sites get back together, do the container servers make sure
>>>>>> its all correct in the end?
>>>>>>
>>>>>> If this is not a possible scenario, is there any case where the
>>>>>> container metadata can be different between two zones or regions
>>>>>> because of a partition, and independent PUTs can happen, and the data
>>>>>> has to be merged? Is that all done by the respective servers(container
>>>>>> or account?)
>>>>>>
>>>>>> I will be looking at the sources soon.
>>>>>>
>>>>>> Thanks again
>>>>>> MW.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Aug 27, 2014 at 8:13 PM, Luse, Paul E <paul.e.luse at intel.com> wrote:
>>>>>>> Marcus-
>>>>>>>
>>>>>>> Not sure how much nitty gritty detail you care to know as some of these answers will get into code specifics which you're better off exploring on your own so my explanation isn't potentially dated.  At a high level though, the proxy looks up the nodes that are responsible for the storing of an object and its container via the rings.  It passes that info to the storage nodes when it does the PUT request so when the storage node goes to update the container it's been told "and here are the nodes to send the container update to".  It will send the updates to all of them.  Similarly, once the container server has updated its database it goes and updates the appropriate account databases.
>>>>>>>
>>>>>>> Make sense?
>>>>>>>
>>>>>>> Thx
>>>>>>> Paul
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Marcus White [mailto:roastedseaweed.k at gmail.com]
>>>>>>> Sent: Wednesday, August 27, 2014 7:04 AM
>>>>>>> To: Luse, Paul E
>>>>>>> Cc: openstack
>>>>>>> Subject: Re: [Openstack] Swift questions
>>>>>>>
>>>>>>> Thanks Paul:)
>>>>>>>
>>>>>>> For the container part, you mentioned that node(meaning object
>>>>>>> server?) contacts the container server. Since you can have multiple container servers, how does the object server know which container server to contact? How and where the container gets updated is a bit confusing. With container rings and account rings being separate and in the proxy part, I am not sure I understand how that path works.
>>>>>>>
>>>>>>> MW
>>>>>>>
>>>>>>> On Wed, Aug 27, 2014 at 6:15 PM, Luse, Paul E <paul.e.luse at intel.com> wrote:
>>>>>>>> Hi Marcus,
>>>>>>>>
>>>>>>>> See answers below.  Feel free to ask follow-ups, others may have more to add as well.
>>>>>>>>
>>>>>>>> Thx
>>>>>>>> Paul
>>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: Marcus White [mailto:roastedseaweed.k at gmail.com]
>>>>>>>> Sent: Wednesday, August 27, 2014 5:04 AM
>>>>>>>> To: openstack
>>>>>>>> Subject: [Openstack] Swift questions
>>>>>>>>
>>>>>>>> Hello,
>>>>>>>> Some questions on new and old features of Swift. Any help would be
>>>>>>>> great:) Some are very basic, sorry!
>>>>>>>>
>>>>>>>> 1. Does Swift write two copies and then return back to the client in the 3 replica case, with third in the background?
>>>>>>>>
>>>>>>>> PL>  Depends on the number of replicas, the formula for what we call a quorum is n/2 + 1 which is the number of success responses we get from the back end storage nodes before telling the client that all is good.  So, yes, with 3 replicas you need 2 good responses before returning OK.
>>>>>>>>
>>>>>>>> 2. This again is a stupid question, but eventually consistent for an object is a bit confusing, unless it is updated. If it is created, it is either there or not and you cannot update the data within the object. Maybe a POST can change the metadata? Or the container listing shows its there but the actual object never got there? Those are the only cases I can think of.
>>>>>>>>
>>>>>>>> PL> No, it's a good question because its asked a lot.  The most common scenario that we talk about for eventually consistent is the consistency between the existence of an object and its presence in the container listing so your thinking is pretty close.  When an object PUT is complete on a storage node (fully committed to disk), that node will then send a message to the appropriate container server to update the listing.  It will attempt to do this synchronously but if it can't, the update may be delayed w/o any indication to the client.  This is by design and means that it's possible to get a successful PUT, be able to GET the object w/o any problem however it may not yet show up in the container listing.  There are other scenarios that demonstrate the eventually consistent nature of Swift, this is just a common and easy to explain one.
>>>>>>>>
>>>>>>>> 3. Once an object has been written, when and how is the container
>>>>>>>> listing, number of bytes, account listing (if new container created)
>>>>>>>> etc updated? Is there something done in the path of the PUT to
>>>>>>>> indicate this object belongs to a particular container and the number
>>>>>>>> of bytes etc is done in the background? A little clarification would
>>>>>>>> help:)
>>>>>>>>
>>>>>>>> PL>  Covered as part of last question.
>>>>>>>>
>>>>>>>> 4. For the global clusters, is the object ring across regions or is it the same with containers and accounts also?
>>>>>>>>
>>>>>>>> PL>  Check out the SwiftStack blog if you haven't already at https://swiftstack.com/blog/2013/07/02/swift-1-9-0-release/ and there's also some other stuff (including a demo from the last summit) that you can find googling around a bit too.  The 'Region Tier' element described in the blog addresses the makeup of a ring so can be applied to both container and account rings also - I personally didn't work on this feature so will leave it to one of the other guys to comment more in this area.
>>>>>>>>
>>>>>>>> 5. For containers in global clusters, if a client queries the
>>>>>>>> container metadata from another site, is there a chance of it getting
>>>>>>>> the old metadata? With respect to the object itself, the eventually
>>>>>>>> consistent part is a bit confusing for me:)
>>>>>>>>
>>>>>>>> PL> There's always a chance of getting old "something" whether its metadata or data, that's part of eventually consistent.  In the face of an outage (the P in the CAP theorem) Swift will always favor availability which may mean older data or older metadata (object or container listing) depending on the specific scenario.  If deployed correctly I don't believe use of global clusters increases the odds of this happening though (again will count on someone else to say more) and its worth emphasizing the getting "old stuff" is in the face of some sort of failure (or big network congestion) so you shouldn't think of eventually consistent as being a system where you "get whatever you get".  You'll get the latest greatest available information.
>>>>>>>>
>>>>>>>> MW
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>>>>>>>> Post to     : openstack at lists.openstack.org
>>>>>>>> Unsubscribe :
>>>>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>>>>>>
>>>>>> _______________________________________________
>>>>>> Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>>>>>> Post to     : openstack at lists.openstack.org
>>>>>> Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>>>>>
>>>
>




More information about the Openstack mailing list