[openstack-dev] [all] [clients] [keystone] lack of retrying tokens leads to overall OpenStack fragility

Jamie Lennox jamielennox at redhat.com
Fri Sep 12 00:06:04 UTC 2014



----- Original Message -----
> From: "Sean Dague" <sean at dague.net>
> To: openstack-dev at lists.openstack.org
> Sent: Thursday, 11 September, 2014 9:44:43 PM
> Subject: Re: [openstack-dev] [all] [clients] [keystone] lack of retrying tokens leads to overall OpenStack fragility
> 
> On 09/10/2014 08:46 PM, Jamie Lennox wrote:
> > 
> > ----- Original Message -----
> >> From: "Steven Hardy" <shardy at redhat.com>
> >> To: "OpenStack Development Mailing List (not for usage questions)"
> >> <openstack-dev at lists.openstack.org>
> >> Sent: Thursday, September 11, 2014 1:55:49 AM
> >> Subject: Re: [openstack-dev] [all] [clients] [keystone] lack of retrying
> >> tokens leads to overall OpenStack fragility
> >>
> >> On Wed, Sep 10, 2014 at 10:14:32AM -0400, Sean Dague wrote:
> >>> Going through the untriaged Nova bugs, and there are a few on a similar
> >>> pattern:
> >>>
> >>> Nova operation in progress.... takes a while
> >>> Crosses keystone token expiration time
> >>> Timeout thrown
> >>> Operation fails
> >>> Terrible 500 error sent back to user
> >>
> >> We actually have this exact problem in Heat, which I'm currently trying to
> >> solve:
> >>
> >> https://bugs.launchpad.net/heat/+bug/1306294
> >>
> >> Can you clarify, is the issue either:
> >>
> >> 1. Create novaclient object with username/password
> >> 2. Do series of operations via the client object which eventually fail
> >> after $n operations due to token expiry
> >>
> >> or:
> >>
> >> 1. Create novaclient object with username/password
> >> 2. Some really long operation which means token expires in the course of
> >> the service handling the request, blowing up and 500-ing
> >>
> >> If the former, then it does sound like a client, or usage-of-client bug,
> >> although note if you pass a *token* vs username/password (as is currently
> >> done for glance and heat in tempest, because we lack the code to get the
> >> token outside of the shell.py code..), there's nothing the client can do,
> >> because you can't request a new token with longer expiry with a token...
> >>
> >> However if the latter, then it seems like not really a client problem to
> >> solve, as it's hard to know what action to take if a request failed
> >> part-way through and thus things are in an unknown state.
> >>
> >> This issue is a hard problem, which can possibly be solved by
> >> switching to a trust scoped token (service impersonates the user), but
> >> then
> >> you're effectively bypassing token expiry via delegation which sits
> >> uncomfortably with me (despite the fact that we may have to do this in
> >> heat
> >> to solve the afforementioned bug)
> >>
> >>> It seems like we should have a standard pattern that on token expiration
> >>> the underlying code at least gives one retry to try to establish a new
> >>> token to complete the flow, however as far as I can tell *no* clients do
> >>> this.
> >>
> >> As has been mentioned, using sessions may be one solution to this, and
> >> AFAIK session support (where it doesn't already exist) is getting into
> >> various clients via the work being carried out to add support for v3
> >> keystone by David Hu:
> >>
> >> https://review.openstack.org/#/q/owner:david.hu%2540hp.com,n,z
> >>
> >> I see patches for Heat (currently gating), Nova and Ironic.
> >>
> >>> I know we had to add that into Tempest because tempest runs can exceed 1
> >>> hr, and we want to avoid random fails just because we cross a token
> >>> expiration boundary.
> >>
> >> I can't claim great experience with sessions yet, but AIUI you could do
> >> something like:
> >>
> >> from keystoneclient.auth.identity import v3
> >> from keystoneclient import session
> >> from keystoneclient.v3 import client
> >>
> >> auth = v3.Password(auth_url=OS_AUTH_URL,
> >>                    username=USERNAME,
> >>                    password=PASSWORD,
> >>                    project_id=PROJECT,
> >>                    user_domain_name='default')
> >> sess = session.Session(auth=auth)
> >> ks = client.Client(session=sess)
> >>
> >> And if you can pass the same session into the various clients tempest
> >> creates then the Password auth-plugin code takes care of reauthenticating
> >> if the token cached in the auth plugin object is expired, or nearly
> >> expired:
> >>
> >> https://github.com/openstack/python-keystoneclient/blob/master/keystoneclient/auth/identity/base.py#L120
> >>
> >> So in the tempest case, it seems like it may be a case of migrating the
> >> code creating the clients to use sessions instead of passing a token or
> >> username/password into the client object?
> >>
> >> That's my understanding of it atm anyway, hopefully jamielennox will be
> >> along
> >> soon with more details :)
> >>
> >> Steve
> > 
> > 
> > By clients here are you referring to the CLIs or the python libraries?
> > Implementation is at different points with each.
> > 
> > Sessions will handle automatically reauthenticating and retrying a request,
> > however it relies on the service throwing a 401 Unauthenticated error. If
> > a service is returning a 500 (or a timeout?) then there isn't much that a
> > client can/should do for that because we can't assume that trying again
> > with a new token will solve anything.
> > 
> > At the moment we have keystoneclient, novaclient, cinderclient
> > neutronclient and then a number of the smaller projects with support for
> > sessions. That obviously doesn't mean that existing users of that code
> > have transitioned to the newer way though. David Hu has been working on
> > using this code within the existing CLIs. I have prototypes for at least
> > nova to talk to neutron and cinder which i'm waiting for Kilo to push.
> > From there it should be easier to do this for other services.
> > 
> > For service to service communication there are two types.
> > 1) using the user's token like nova->cinder. If this token expires there is
> > really nothing that nova can do except raise 401 and make the client do it
> > again.
> 
> In this case it would be really good to do at least 1 retry, because
> it's completely silly for us to fail an action based on a token timeout.
> The solution ops are doing is changing their token expiration back to
> some really large number.

I think though that the correct response from nova here is still 401. Because we are using the user's token internally to nova there is not the option to re-authenticate (we don't have the user's credentials). Then the retry logic will happen on the client side. An improvement here might be the service token work we are looking at in keystonemiddleware. https://review.openstack.org/#/c/108384/

Basically when nova talks to neutron/cinder or any of the other services it should submit to that service the user's token as now and an X-Service-Token which is that service's token. The intention here is to start limiting by policy what actions are allowed to be done on behalf of another service.

I don't know how this would work but it may be acceptable that if both tokens are given we rely on the validity of the Service token and some window for the Auth token. It would indicate the token was valid when submitted to nova and nova is validating this by passing it's own token. Possibly that just shuffles the problem to the service token expiration but at least nova could do something about that.

Again we are only just at the point of allowing this in auth_token middleware, i'm not sure of the greater security implications at this point. 

> > 2) using a service user like nova->neutron. This should allow automatic
> > reauthentication and will be fixed/standardied by sessions.
> 
> Ok, glanceclient should be a high target here, because that's often
> involved in long running things (snapshot manip is slow).

All the clients are special, glance is extra special. I'm not sure how to match what the session is capable of with the ssl compression mangling that glanceclient does, however it is really the next goal client.
 
> 	-Sean
> 
> --
> Sean Dague
> http://dague.net
> 
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 



More information about the OpenStack-dev mailing list