[openstack-dev] [all] [clients] [keystone] lack of retrying tokens leads to overall OpenStack fragility

Steven Hardy shardy at redhat.com
Wed Sep 10 15:55:49 UTC 2014


On Wed, Sep 10, 2014 at 10:14:32AM -0400, Sean Dague wrote:
> Going through the untriaged Nova bugs, and there are a few on a similar
> pattern:
> 
> Nova operation in progress.... takes a while
> Crosses keystone token expiration time
> Timeout thrown
> Operation fails
> Terrible 500 error sent back to user

We actually have this exact problem in Heat, which I'm currently trying to
solve:

https://bugs.launchpad.net/heat/+bug/1306294

Can you clarify, is the issue either:

1. Create novaclient object with username/password
2. Do series of operations via the client object which eventually fail
after $n operations due to token expiry

or:

1. Create novaclient object with username/password
2. Some really long operation which means token expires in the course of
the service handling the request, blowing up and 500-ing

If the former, then it does sound like a client, or usage-of-client bug,
although note if you pass a *token* vs username/password (as is currently
done for glance and heat in tempest, because we lack the code to get the
token outside of the shell.py code..), there's nothing the client can do,
because you can't request a new token with longer expiry with a token...

However if the latter, then it seems like not really a client problem to
solve, as it's hard to know what action to take if a request failed
part-way through and thus things are in an unknown state.

This issue is a hard problem, which can possibly be solved by
switching to a trust scoped token (service impersonates the user), but then
you're effectively bypassing token expiry via delegation which sits
uncomfortably with me (despite the fact that we may have to do this in heat
to solve the afforementioned bug)

> It seems like we should have a standard pattern that on token expiration
> the underlying code at least gives one retry to try to establish a new
> token to complete the flow, however as far as I can tell *no* clients do
> this.

As has been mentioned, using sessions may be one solution to this, and
AFAIK session support (where it doesn't already exist) is getting into
various clients via the work being carried out to add support for v3
keystone by David Hu:

https://review.openstack.org/#/q/owner:david.hu%2540hp.com,n,z

I see patches for Heat (currently gating), Nova and Ironic.

> I know we had to add that into Tempest because tempest runs can exceed 1
> hr, and we want to avoid random fails just because we cross a token
> expiration boundary.

I can't claim great experience with sessions yet, but AIUI you could do
something like:

from keystoneclient.auth.identity import v3
from keystoneclient import session
from keystoneclient.v3 import client

auth = v3.Password(auth_url=OS_AUTH_URL,
                   username=USERNAME,
                   password=PASSWORD,
                   project_id=PROJECT,
                   user_domain_name='default')
sess = session.Session(auth=auth)
ks = client.Client(session=sess)

And if you can pass the same session into the various clients tempest
creates then the Password auth-plugin code takes care of reauthenticating
if the token cached in the auth plugin object is expired, or nearly
expired:

https://github.com/openstack/python-keystoneclient/blob/master/keystoneclient/auth/identity/base.py#L120

So in the tempest case, it seems like it may be a case of migrating the
code creating the clients to use sessions instead of passing a token or
username/password into the client object?

That's my understanding of it atm anyway, hopefully jamielennox will be along
soon with more details :)

Steve



More information about the OpenStack-dev mailing list