[openstack-dev] [Openstack-operators] [nova][glance] Who needs multiple api_servers?

Mike Dorman mdorman at godaddy.com
Fri Apr 28 15:46:51 UTC 2017


Maybe we are talking about two different things here?  I’m a bit confused.

Our Glance config in nova.conf on HV’s looks like this:

[glance]
api_servers=http://glance1:9292,http://glance2:9292,http://glance3:9292,http://glance4:9292
glance_api_insecure=True
glance_num_retries=4
glance_protocol=http

So we do provide the full URLs, and there is SSL support.  Right?  I am fairly certain we tested this to ensure that if one URL fails, nova goes on to retry the next one.  That failure does not get bubbled up to the user (which is ultimately the goal.)

I don’t disagree with you that the client side choose-a-server-at-random is not a great load balancer.  (But isn’t this roughly the same thing that oslo-messaging does when we give it a list of RMQ servers?)  For us it’s more about the failure handling if one is down than it is about actually equally distributing the load.

In my mind options One and Two are the same, since today we are already providing full URLs and not only server names.  At the end of the day, I don’t feel like there is a compelling argument here to remove this functionality (that people are actively making use of.)

To be clear, I, and I think others, are fine with nova by default getting the Glance endpoint from Keystone.  And that in Keystone there should exist only one Glance endpoint.  What I’d like to see remain is the ability to override that for nova-compute and to target more than one Glance URL for purposes of fail over.

Thanks,
Mike




On 4/28/17, 8:20 AM, "Monty Taylor" <mordred at inaugust.com> wrote:

    Thank you both for your feedback - that's really helpful.
    
    Let me say a few more words about what we're trying to accomplish here 
    overall so that maybe we can figure out what the right way forward is. 
    (it may be keeping the glance api servers setting, but let me at least 
    make the case real quick)
    
     From a 10,000 foot view, the thing we're trying to do is to get nova's 
    consumption of all of the OpenStack services it uses to be less special.
    
    The clouds have catalogs which list information about the services - 
    public, admin and internal endpoints and whatnot - and then we're asking 
    admins to not only register that information with the catalog, but to 
    also put it into the nova.conf. That means that any updating of that 
    info needs to be an API call to keystone and also a change to nova.conf. 
    If we, on the other hand, use the catalog, then nova can pick up changes 
    in real time as they're rolled out to the cloud - and there is hopefully 
    a sane set of defaults we could choose (based on operator feedback like 
    what you've given) so that in most cases you don't have to tell nova 
    where to find glance _at_all_ becuase the cloud already knows where it 
    is. (nova would know to look in the catalog for the interal interface of 
    the image service - for instance - there's no need to ask an operator to 
    add to the config "what is the service_type of the image service we 
    should talk to" :) )
    
    Now - glance, and the thing you like that we don't - is especially hairy 
    because of the api_servers list. The list, as you know, is just a list 
    of servers, not even of URLs. This  means it's not possible to configure 
    nova to talk to glance over SSL (which I know you said works for you, 
    but we'd like for people to be able to choose to SSL all their things) 
    We could add that, but it would be an additional pile of special config. 
    Because of all of that, we also have to attempt to make working URLs 
    from what is usually a list of IP addresses. This is also clunky and 
    prone to failure.
    
    The implementation on the underside of the api_servers code is the 
    world's dumbest load balancer. It picks a server from the  list at 
    random and uses it. There is no facility for dealing with a server in 
    the list that stops working or for allowing rolling upgrades like there 
    would with a real load-balancer across the set. If one of the API 
    servers goes away, we have no context to know that, so just some of your 
    internal calls to glance fail.
    
    Those are the issues - basically:
    - current config is special and fragile
    - impossible to SSL
    - unflexible/unpowerful de-facto software loadbalancer
    
    Now - as is often the case - it turns out the combo of those things is 
    working very well for you -so we need to adjust our thinking on the 
    topic a bit. Let me toss out some alternatives and see what you think:
    
    Alternative One - Do Both things
    
    We add the new "consume from catalog" and make it default. (and make it 
    default to consuming the internal interface by default) We have to do 
    that in parallel with the current glance api_servers setting anyway, 
    because of deprecation periods, so the code to support both approaches 
    will exist. Instead of then deprecating the api_servers list, we keep 
    it- but add a big doc warning listing the gotchas and limitations - but 
    for those folks for whom they are not an issue, you've got an out.
    
    Alternative Two - Hybrid Approach - optional list of URLs
    
    We go ahead and move to service config being the standard way one lists 
    how to consume a service from the catalog. One of the standard options 
    for consuming services is "endpoint_override" - which is a way an API 
    user can say "hi, please to ignore the catalog and use this endpoint 
    I've given you instead". The endpoint in question is a full URL, so 
    https/http and ports and whatnot are all handled properly.
    
    We add, in addition, an additional option "endpoint_override_list" which 
    allows you to provide a list of URLs (not API servers) and if you 
    provide that option, we'll keep the logic of choosing one at random at 
    API call time. It's still a poor load balancer, and we'll still put 
    warnings in the docs about it not being a featureful load balancing 
    solution, but again would be available if needed.
    
    Alternative Three - We ignore you and give you docs
    
    I'm only including this because in the name of completeness. But we 
    could write a bunch of docs about a recommended way of putting your 
    internal endpoints in a load balancer and registering that with the 
    internal endpoint in keystone. (I would prefer to make the operators 
    happy, so let's say whatever vote I have is not for this option)
    
    Alternative Four - We update client libs to understand multiple values 
    from keystone for endpoints
    
    I _really_ don't like this one - as I think us doing dumb software 
    loadbalancing client side is prone to a ton of failures. BUT - right now 
    the assumption when consuming endpoints from the catalog is that one and 
    only one endpoint will be returned for a given 
    service_type/service_name/interface. Rather than special-casing the
    url roundrobin in nova, we could move that round-robin to be in the base 
    client library, update api consumption docs with round-robin 
    recommendations and then have you register the list of endpoints with 
    keystone.
    
    I know the keystone team has long been _very_ against using keystone as 
    a list of all the endpoints, and I agree with them. Putting it here for 
    sake of argument.
    
    Alternative Five - We update keystone to round-robin lists of endpoints
    
    Potentially even worse than four and even more unlikely given the 
    keystone team's feelings, but we could have keystone continue to only 
    return one endpoint, but have it do the round-robin selection at catalog 
    generation time.
    
    
    Sorry - you caught me in early morning brainstorm mode.
    
    I am neither nova core nor keystone core. BUT:
    
    I think honestly if adding a load balancer in front of your internal 
    endpoints is an undue burden and/or the usefulness of the lists 
    outweighs the limitations they have, we should go with One or Two. (I 
    think three through five are all terrible)
    
    My personal preference would be for Two - the round-robin code winds up 
    being the same logic in both cases, but at least in Two folks who want 
    to SSL all the way _can_, and it shouldn't be an undue extra burden on 
    those of you using the api_servers now. We also don't have to do the 
    funky things we currently have to do to turn the api_severs list into 
    workable URLs.
    
    
    On 04/27/2017 11:50 PM, Blair Bethwaite wrote:
    > We at Nectar are in the same boat as Mike. Our use-case is a little
    > bit more about geo-distributed operations though - our Cells are in
    > different States around the country, so the local glance-apis are
    > particularly important for caching popular images close to the
    > nova-computes. We consider these glance-apis as part of the underlying
    > cloud infra rather than user-facing, so I think we'd prefer not to see
    > them in the service-catalog returned to users either... is there going
    > to be a (standard) way to hide them?
    >
    > On 28 April 2017 at 09:15, Mike Dorman <mdorman at godaddy.com> wrote:
    >> We make extensive use of the [glance]/api_servers list.  We configure that on hypervisors to direct them to Glance servers which are more “local” network-wise (in order to reduce network traffic across security zones/firewalls/etc.)  This way nova-compute can fail over in case one of the Glance servers in the list is down, without putting them behind a load balancer.  We also don’t run https for these “internal” Glance calls, to save the overhead when transferring images.
    >>
    >> End-user calls to Glance DO go through a real load balancer and then are distributed out to the Glance servers on the backend.  From the end-user’s perspective, I totally agree there should be one, and only one URL.
    >>
    >> However, we would be disappointed to see the change you’re suggesting implemented.  We would lose the redundancy we get now by providing a list.  Or we would have to shunt all the calls through the user-facing endpoint, which would generate a lot of extra traffic (in places where we don’t want it) for image transfers.
    >>
    >> Thanks,
    >> Mike
    >>
    >>
    >>
    >> On 4/27/17, 4:02 PM, "Matt Riedemann" <mriedemos at gmail.com> wrote:
    >>
    >>     On 4/27/2017 4:52 PM, Eric Fried wrote:
    >>     > Y'all-
    >>     >
    >>     >   TL;DR: Does glance ever really need/use multiple endpoint URLs?
    >>     >
    >>     >   I'm working on bp use-service-catalog-for-endpoints[1], which intends
    >>     > to deprecate disparate conf options in various groups, and centralize
    >>     > acquisition of service endpoint URLs.  The idea is to introduce
    >>     > nova.utils.get_service_url(group) -- note singular 'url'.
    >>     >
    >>     >   One affected conf option is [glance]api_servers[2], which currently
    >>     > accepts a *list* of endpoint URLs.  The new API will only ever return *one*.
    >>     >
    >>     >   Thus, as planned, this blueprint will have the side effect of
    >>     > deprecating support for multiple glance endpoint URLs in Pike, and
    >>     > removing said support in Queens.
    >>     >
    >>     >   Some have asserted that there should only ever be one endpoint URL for
    >>     > a given service_type/interface combo[3].  I'm fine with that - it
    >>     > simplifies things quite a bit for the bp impl - but wanted to make sure
    >>     > there were no loudly-dissenting opinions before we get too far down this
    >>     > path.
    >>     >
    >>     > [1]
    >>     > https://blueprints.launchpad.net/nova/+spec/use-service-catalog-for-endpoints
    >>     > [2]
    >>     > https://github.com/openstack/nova/blob/7e7bdb198ed6412273e22dea72e37a6371fce8bd/nova/conf/glance.py#L27-L37
    >>     > [3]
    >>     > http://eavesdrop.openstack.org/irclogs/%23openstack-keystone/%23openstack-keystone.2017-04-27.log.html#t2017-04-27T20:38:29
    >>     >
    >>     > Thanks,
    >>     > Eric Fried (efried)
    >>     > .
    >>     >
    >>     > __________________________________________________________________________
    >>     > OpenStack Development Mailing List (not for usage questions)
    >>     > Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
    >>     > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
    >>     >
    >>
    >>     +openstack-operators
    >>
    >>     --
    >>
    >>     Thanks,
    >>
    >>     Matt
    >>
    >>     __________________________________________________________________________
    >>     OpenStack Development Mailing List (not for usage questions)
    >>     Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
    >>     http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
    >>
    >>
    >> _______________________________________________
    >> OpenStack-operators mailing list
    >> OpenStack-operators at lists.openstack.org
    >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
    >
    >
    >
    
    
    __________________________________________________________________________
    OpenStack Development Mailing List (not for usage questions)
    Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
    http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
    



More information about the OpenStack-dev mailing list