[openstack-dev] [all] [clients] [keystone] lack of retrying tokens leads to overall OpenStack fragility

Jamie Lennox jamielennox at redhat.com
Fri Sep 12 00:43:22 UTC 2014



----- Original Message -----
> From: "Steven Hardy" <shardy at redhat.com>
> To: "OpenStack Development Mailing List (not for usage questions)" <openstack-dev at lists.openstack.org>
> Sent: Friday, 12 September, 2014 12:21:52 AM
> Subject: Re: [openstack-dev] [all] [clients] [keystone] lack of retrying tokens leads to overall OpenStack fragility
> 
> On Wed, Sep 10, 2014 at 08:46:45PM -0400, Jamie Lennox wrote:
> > 
> > ----- Original Message -----
> > > From: "Steven Hardy" <shardy at redhat.com>
> > > To: "OpenStack Development Mailing List (not for usage questions)"
> > > <openstack-dev at lists.openstack.org>
> > > Sent: Thursday, September 11, 2014 1:55:49 AM
> > > Subject: Re: [openstack-dev] [all] [clients] [keystone] lack of retrying
> > > tokens leads to overall OpenStack fragility
> > > 
> > > On Wed, Sep 10, 2014 at 10:14:32AM -0400, Sean Dague wrote:
> > > > Going through the untriaged Nova bugs, and there are a few on a similar
> > > > pattern:
> > > > 
> > > > Nova operation in progress.... takes a while
> > > > Crosses keystone token expiration time
> > > > Timeout thrown
> > > > Operation fails
> > > > Terrible 500 error sent back to user
> > > 
> > > We actually have this exact problem in Heat, which I'm currently trying
> > > to
> > > solve:
> > > 
> > > https://bugs.launchpad.net/heat/+bug/1306294
> > > 
> > > Can you clarify, is the issue either:
> > > 
> > > 1. Create novaclient object with username/password
> > > 2. Do series of operations via the client object which eventually fail
> > > after $n operations due to token expiry
> > > 
> > > or:
> > > 
> > > 1. Create novaclient object with username/password
> > > 2. Some really long operation which means token expires in the course of
> > > the service handling the request, blowing up and 500-ing
> > > 
> > > If the former, then it does sound like a client, or usage-of-client bug,
> > > although note if you pass a *token* vs username/password (as is currently
> > > done for glance and heat in tempest, because we lack the code to get the
> > > token outside of the shell.py code..), there's nothing the client can do,
> > > because you can't request a new token with longer expiry with a token...
> > > 
> > > However if the latter, then it seems like not really a client problem to
> > > solve, as it's hard to know what action to take if a request failed
> > > part-way through and thus things are in an unknown state.
> > > 
> > > This issue is a hard problem, which can possibly be solved by
> > > switching to a trust scoped token (service impersonates the user), but
> > > then
> > > you're effectively bypassing token expiry via delegation which sits
> > > uncomfortably with me (despite the fact that we may have to do this in
> > > heat
> > > to solve the afforementioned bug)
> > > 
> > > > It seems like we should have a standard pattern that on token
> > > > expiration
> > > > the underlying code at least gives one retry to try to establish a new
> > > > token to complete the flow, however as far as I can tell *no* clients
> > > > do
> > > > this.
> > > 
> > > As has been mentioned, using sessions may be one solution to this, and
> > > AFAIK session support (where it doesn't already exist) is getting into
> > > various clients via the work being carried out to add support for v3
> > > keystone by David Hu:
> > > 
> > > https://review.openstack.org/#/q/owner:david.hu%2540hp.com,n,z
> > > 
> > > I see patches for Heat (currently gating), Nova and Ironic.
> > > 
> > > > I know we had to add that into Tempest because tempest runs can exceed
> > > > 1
> > > > hr, and we want to avoid random fails just because we cross a token
> > > > expiration boundary.
> > > 
> > > I can't claim great experience with sessions yet, but AIUI you could do
> > > something like:
> > > 
> > > from keystoneclient.auth.identity import v3
> > > from keystoneclient import session
> > > from keystoneclient.v3 import client
> > > 
> > > auth = v3.Password(auth_url=OS_AUTH_URL,
> > >                    username=USERNAME,
> > >                    password=PASSWORD,
> > >                    project_id=PROJECT,
> > >                    user_domain_name='default')
> > > sess = session.Session(auth=auth)
> > > ks = client.Client(session=sess)
> > > 
> > > And if you can pass the same session into the various clients tempest
> > > creates then the Password auth-plugin code takes care of reauthenticating
> > > if the token cached in the auth plugin object is expired, or nearly
> > > expired:
> > > 
> > > https://github.com/openstack/python-keystoneclient/blob/master/keystoneclient/auth/identity/base.py#L120
> > > 
> > > So in the tempest case, it seems like it may be a case of migrating the
> > > code creating the clients to use sessions instead of passing a token or
> > > username/password into the client object?
> > > 
> > > That's my understanding of it atm anyway, hopefully jamielennox will be
> > > along
> > > soon with more details :)
> > > 
> > > Steve
> > 
> > 
> > By clients here are you referring to the CLIs or the python libraries?
> > Implementation is at different points with each.
> 
> I think for both heat and tempest we're talking about the python libraries
> (Client objects).
> 
> > Sessions will handle automatically reauthenticating and retrying a request,
> > however it relies on the service throwing a 401 Unauthenticated error. If
> > a service is returning a 500 (or a timeout?) then there isn't much that a
> > client can/should do for that because we can't assume that trying again
> > with a new token will solve anything.
> 
> Hmm, I was hoping it would reauthenticate based on the auth_ref
> will_expire_soon, as it would fit better with out current usage of the
> auth_ref in heat.

We do that as well, though currently this window is set to 1 second and not configurable at __init__ time: 
https://github.com/openstack/python-keystoneclient/blob/master/keystoneclient/auth/identity/base.py#L37

A patch to change that to whatever number of seconds you think is appropriate would be welcomed.

> > 
> > At the moment we have keystoneclient, novaclient, cinderclient
> > neutronclient and then a number of the smaller projects with support for
> > sessions. That obviously doesn't mean that existing users of that code
> > have transitioned to the newer way though. David Hu has been working on
> > using this code within the existing CLIs. I have prototypes for at least
> > nova to talk to neutron and cinder which i'm waiting for Kilo to push.
> > From there it should be easier to do this for other services.
> 
> Interesting, I guess we need to prioritize migrating Heat to the session
> model too, once all the clients support it.

I'd love to talk to you guys about this. In my mind heat should be one of the big winners from this, I just figured we need more or less complete client support first. 
 
> > 
> > For service to service communication there are two types.
> > 1) using the user's token like nova->cinder. If this token expires there is
> > really nothing that nova can do except raise 401 and make the client do it
> > again.
> > 2) using a service user like nova->neutron. This should allow automatic
> > reauthentication and will be fixed/standardied by sessions.
> 
> (1) is the problem I'm trying to solve in bug #1306294, and (for Heat at
> least) there seems to be two solutions, neither of which I particularly
> like:
> 
> - Require username/password to be passed into the service (something we've
>   been trying to banish via migrating to trusts for deferred
>   authentication)
> - Create a trust, and impersonate the user for the duration of the request,
>   or after the token expires until it is completed, using the service user
>   credentials and the trust_id.
> 
> It's the second one which I'm deliberating over - technically it will work,
> and we create the trust anyway (e.g for later use to do autoscaling etc),
> but can anyone from the keystone team comment on the legitimacy of the
> approach?
> 
> Intuitively it seems wrong, but I can't see any other way if we want to
> support token-only auth and cope with folks doing stuff which takes 2 hours
> with a 1 hour token expiry?
> 
> The current workaround, as mentioned by sdague, has been just to increase
> the token expiry to several hours.
> 
> Thoughts appreciated!

Right, so passing username/password is not acceptable. 

So i just put a reply to Sean further up in this thread (different branch? stupid zimbra) about the Service-Token that we are trying to add to middleware. The intention of this was to restrict what can be done with service tokens but it *might* give us some benefits here, I'm not sure of the security implications here yet - or how we would write policy to allow only certain interactions. 

I know what you mean about it feeling wrong a service creating a trust per call, i already see issues with the security model of services being allowed to create trusts at will for users. Is it reasonable to define a way to pass a trust_id as a header, such that for certain long running calls nova would know to use that trust to do long running work? I realize that it's difficult for clients to create trusts for the correct users now, but could that work? 

> Steve
> 
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 



More information about the OpenStack-dev mailing list