[openstack-dev] [all] [clients] [keystone] lack of retrying tokens leads to overall OpenStack fragility

Steven Hardy shardy at redhat.com
Thu Sep 11 14:21:52 UTC 2014


On Wed, Sep 10, 2014 at 08:46:45PM -0400, Jamie Lennox wrote:
> 
> ----- Original Message -----
> > From: "Steven Hardy" <shardy at redhat.com>
> > To: "OpenStack Development Mailing List (not for usage questions)" <openstack-dev at lists.openstack.org>
> > Sent: Thursday, September 11, 2014 1:55:49 AM
> > Subject: Re: [openstack-dev] [all] [clients] [keystone] lack of retrying tokens leads to overall OpenStack fragility
> > 
> > On Wed, Sep 10, 2014 at 10:14:32AM -0400, Sean Dague wrote:
> > > Going through the untriaged Nova bugs, and there are a few on a similar
> > > pattern:
> > > 
> > > Nova operation in progress.... takes a while
> > > Crosses keystone token expiration time
> > > Timeout thrown
> > > Operation fails
> > > Terrible 500 error sent back to user
> > 
> > We actually have this exact problem in Heat, which I'm currently trying to
> > solve:
> > 
> > https://bugs.launchpad.net/heat/+bug/1306294
> > 
> > Can you clarify, is the issue either:
> > 
> > 1. Create novaclient object with username/password
> > 2. Do series of operations via the client object which eventually fail
> > after $n operations due to token expiry
> > 
> > or:
> > 
> > 1. Create novaclient object with username/password
> > 2. Some really long operation which means token expires in the course of
> > the service handling the request, blowing up and 500-ing
> > 
> > If the former, then it does sound like a client, or usage-of-client bug,
> > although note if you pass a *token* vs username/password (as is currently
> > done for glance and heat in tempest, because we lack the code to get the
> > token outside of the shell.py code..), there's nothing the client can do,
> > because you can't request a new token with longer expiry with a token...
> > 
> > However if the latter, then it seems like not really a client problem to
> > solve, as it's hard to know what action to take if a request failed
> > part-way through and thus things are in an unknown state.
> > 
> > This issue is a hard problem, which can possibly be solved by
> > switching to a trust scoped token (service impersonates the user), but then
> > you're effectively bypassing token expiry via delegation which sits
> > uncomfortably with me (despite the fact that we may have to do this in heat
> > to solve the afforementioned bug)
> > 
> > > It seems like we should have a standard pattern that on token expiration
> > > the underlying code at least gives one retry to try to establish a new
> > > token to complete the flow, however as far as I can tell *no* clients do
> > > this.
> > 
> > As has been mentioned, using sessions may be one solution to this, and
> > AFAIK session support (where it doesn't already exist) is getting into
> > various clients via the work being carried out to add support for v3
> > keystone by David Hu:
> > 
> > https://review.openstack.org/#/q/owner:david.hu%2540hp.com,n,z
> > 
> > I see patches for Heat (currently gating), Nova and Ironic.
> > 
> > > I know we had to add that into Tempest because tempest runs can exceed 1
> > > hr, and we want to avoid random fails just because we cross a token
> > > expiration boundary.
> > 
> > I can't claim great experience with sessions yet, but AIUI you could do
> > something like:
> > 
> > from keystoneclient.auth.identity import v3
> > from keystoneclient import session
> > from keystoneclient.v3 import client
> > 
> > auth = v3.Password(auth_url=OS_AUTH_URL,
> >                    username=USERNAME,
> >                    password=PASSWORD,
> >                    project_id=PROJECT,
> >                    user_domain_name='default')
> > sess = session.Session(auth=auth)
> > ks = client.Client(session=sess)
> > 
> > And if you can pass the same session into the various clients tempest
> > creates then the Password auth-plugin code takes care of reauthenticating
> > if the token cached in the auth plugin object is expired, or nearly
> > expired:
> > 
> > https://github.com/openstack/python-keystoneclient/blob/master/keystoneclient/auth/identity/base.py#L120
> > 
> > So in the tempest case, it seems like it may be a case of migrating the
> > code creating the clients to use sessions instead of passing a token or
> > username/password into the client object?
> > 
> > That's my understanding of it atm anyway, hopefully jamielennox will be along
> > soon with more details :)
> > 
> > Steve
> 
> 
> By clients here are you referring to the CLIs or the python libraries? Implementation is at different points with each. 

I think for both heat and tempest we're talking about the python libraries
(Client objects).

> Sessions will handle automatically reauthenticating and retrying a request, however it relies on the service throwing a 401 Unauthenticated error. If a service is returning a 500 (or a timeout?) then there isn't much that a client can/should do for that because we can't assume that trying again with a new token will solve anything. 

Hmm, I was hoping it would reauthenticate based on the auth_ref
will_expire_soon, as it would fit better with out current usage of the
auth_ref in heat.

> 
> At the moment we have keystoneclient, novaclient, cinderclient neutronclient and then a number of the smaller projects with support for sessions. That obviously doesn't mean that existing users of that code have transitioned to the newer way though. David Hu has been working on using this code within the existing CLIs. I have prototypes for at least nova to talk to neutron and cinder which i'm waiting for Kilo to push. From there it should be easier to do this for other services. 

Interesting, I guess we need to prioritize migrating Heat to the session
model too, once all the clients support it.

> 
> For service to service communication there are two types.
> 1) using the user's token like nova->cinder. If this token expires there is really nothing that nova can do except raise 401 and make the client do it again. 
> 2) using a service user like nova->neutron. This should allow automatic reauthentication and will be fixed/standardied by sessions. 

(1) is the problem I'm trying to solve in bug #1306294, and (for Heat at
least) there seems to be two solutions, neither of which I particularly
like:

- Require username/password to be passed into the service (something we've
  been trying to banish via migrating to trusts for deferred
  authentication)
- Create a trust, and impersonate the user for the duration of the request,
  or after the token expires until it is completed, using the service user
  credentials and the trust_id.

It's the second one which I'm deliberating over - technically it will work,
and we create the trust anyway (e.g for later use to do autoscaling etc),
but can anyone from the keystone team comment on the legitimacy of the
approach?

Intuitively it seems wrong, but I can't see any other way if we want to
support token-only auth and cope with folks doing stuff which takes 2 hours
with a 1 hour token expiry?

The current workaround, as mentioned by sdague, has been just to increase
the token expiry to several hours.

Thoughts appreciated!

Steve



More information about the OpenStack-dev mailing list