[openstack-dev] [all] [clients] [keystone] lack of retrying tokens leads to overall OpenStack fragility
Steven Hardy
shardy at redhat.com
Fri Sep 12 16:00:46 UTC 2014
On Thu, Sep 11, 2014 at 08:43:22PM -0400, Jamie Lennox wrote:
>
>
> ----- Original Message -----
> > From: "Steven Hardy" <shardy at redhat.com>
> > To: "OpenStack Development Mailing List (not for usage questions)" <openstack-dev at lists.openstack.org>
> > Sent: Friday, 12 September, 2014 12:21:52 AM
> > Subject: Re: [openstack-dev] [all] [clients] [keystone] lack of retrying tokens leads to overall OpenStack fragility
> >
> > On Wed, Sep 10, 2014 at 08:46:45PM -0400, Jamie Lennox wrote:
> > >
> > > ----- Original Message -----
> > > > From: "Steven Hardy" <shardy at redhat.com>
> > > > To: "OpenStack Development Mailing List (not for usage questions)"
> > > > <openstack-dev at lists.openstack.org>
> > > > Sent: Thursday, September 11, 2014 1:55:49 AM
> > > > Subject: Re: [openstack-dev] [all] [clients] [keystone] lack of retrying
> > > > tokens leads to overall OpenStack fragility
> > > >
> > > > On Wed, Sep 10, 2014 at 10:14:32AM -0400, Sean Dague wrote:
> > > > > Going through the untriaged Nova bugs, and there are a few on a similar
> > > > > pattern:
> > > > >
> > > > > Nova operation in progress.... takes a while
> > > > > Crosses keystone token expiration time
> > > > > Timeout thrown
> > > > > Operation fails
> > > > > Terrible 500 error sent back to user
> > > >
> > > > We actually have this exact problem in Heat, which I'm currently trying
> > > > to
> > > > solve:
> > > >
> > > > https://bugs.launchpad.net/heat/+bug/1306294
> > > >
> > > > Can you clarify, is the issue either:
> > > >
> > > > 1. Create novaclient object with username/password
> > > > 2. Do series of operations via the client object which eventually fail
> > > > after $n operations due to token expiry
> > > >
> > > > or:
> > > >
> > > > 1. Create novaclient object with username/password
> > > > 2. Some really long operation which means token expires in the course of
> > > > the service handling the request, blowing up and 500-ing
> > > >
> > > > If the former, then it does sound like a client, or usage-of-client bug,
> > > > although note if you pass a *token* vs username/password (as is currently
> > > > done for glance and heat in tempest, because we lack the code to get the
> > > > token outside of the shell.py code..), there's nothing the client can do,
> > > > because you can't request a new token with longer expiry with a token...
> > > >
> > > > However if the latter, then it seems like not really a client problem to
> > > > solve, as it's hard to know what action to take if a request failed
> > > > part-way through and thus things are in an unknown state.
> > > >
> > > > This issue is a hard problem, which can possibly be solved by
> > > > switching to a trust scoped token (service impersonates the user), but
> > > > then
> > > > you're effectively bypassing token expiry via delegation which sits
> > > > uncomfortably with me (despite the fact that we may have to do this in
> > > > heat
> > > > to solve the afforementioned bug)
> > > >
> > > > > It seems like we should have a standard pattern that on token
> > > > > expiration
> > > > > the underlying code at least gives one retry to try to establish a new
> > > > > token to complete the flow, however as far as I can tell *no* clients
> > > > > do
> > > > > this.
> > > >
> > > > As has been mentioned, using sessions may be one solution to this, and
> > > > AFAIK session support (where it doesn't already exist) is getting into
> > > > various clients via the work being carried out to add support for v3
> > > > keystone by David Hu:
> > > >
> > > > https://review.openstack.org/#/q/owner:david.hu%2540hp.com,n,z
> > > >
> > > > I see patches for Heat (currently gating), Nova and Ironic.
> > > >
> > > > > I know we had to add that into Tempest because tempest runs can exceed
> > > > > 1
> > > > > hr, and we want to avoid random fails just because we cross a token
> > > > > expiration boundary.
> > > >
> > > > I can't claim great experience with sessions yet, but AIUI you could do
> > > > something like:
> > > >
> > > > from keystoneclient.auth.identity import v3
> > > > from keystoneclient import session
> > > > from keystoneclient.v3 import client
> > > >
> > > > auth = v3.Password(auth_url=OS_AUTH_URL,
> > > > username=USERNAME,
> > > > password=PASSWORD,
> > > > project_id=PROJECT,
> > > > user_domain_name='default')
> > > > sess = session.Session(auth=auth)
> > > > ks = client.Client(session=sess)
> > > >
> > > > And if you can pass the same session into the various clients tempest
> > > > creates then the Password auth-plugin code takes care of reauthenticating
> > > > if the token cached in the auth plugin object is expired, or nearly
> > > > expired:
> > > >
> > > > https://github.com/openstack/python-keystoneclient/blob/master/keystoneclient/auth/identity/base.py#L120
> > > >
> > > > So in the tempest case, it seems like it may be a case of migrating the
> > > > code creating the clients to use sessions instead of passing a token or
> > > > username/password into the client object?
> > > >
> > > > That's my understanding of it atm anyway, hopefully jamielennox will be
> > > > along
> > > > soon with more details :)
> > > >
> > > > Steve
> > >
> > >
> > > By clients here are you referring to the CLIs or the python libraries?
> > > Implementation is at different points with each.
> >
> > I think for both heat and tempest we're talking about the python libraries
> > (Client objects).
> >
> > > Sessions will handle automatically reauthenticating and retrying a request,
> > > however it relies on the service throwing a 401 Unauthenticated error. If
> > > a service is returning a 500 (or a timeout?) then there isn't much that a
> > > client can/should do for that because we can't assume that trying again
> > > with a new token will solve anything.
> >
> > Hmm, I was hoping it would reauthenticate based on the auth_ref
> > will_expire_soon, as it would fit better with out current usage of the
> > auth_ref in heat.
>
> We do that as well, though currently this window is set to 1 second and not configurable at __init__ time:
> https://github.com/openstack/python-keystoneclient/blob/master/keystoneclient/auth/identity/base.py#L37
>
> A patch to change that to whatever number of seconds you think is appropriate would be welcomed.
Thanks for the clarification, I discovered similar while testing yesterday,
I'll look into sending a patch.
>
> > >
> > > At the moment we have keystoneclient, novaclient, cinderclient
> > > neutronclient and then a number of the smaller projects with support for
> > > sessions. That obviously doesn't mean that existing users of that code
> > > have transitioned to the newer way though. David Hu has been working on
> > > using this code within the existing CLIs. I have prototypes for at least
> > > nova to talk to neutron and cinder which i'm waiting for Kilo to push.
> > > From there it should be easier to do this for other services.
> >
> > Interesting, I guess we need to prioritize migrating Heat to the session
> > model too, once all the clients support it.
>
> I'd love to talk to you guys about this. In my mind heat should be one of the big winners from this, I just figured we need more or less complete client support first.
Agreed, I've already started looking into it.
> > > For service to service communication there are two types.
> > > 1) using the user's token like nova->cinder. If this token expires there is
> > > really nothing that nova can do except raise 401 and make the client do it
> > > again.
> > > 2) using a service user like nova->neutron. This should allow automatic
> > > reauthentication and will be fixed/standardied by sessions.
> >
> > (1) is the problem I'm trying to solve in bug #1306294, and (for Heat at
> > least) there seems to be two solutions, neither of which I particularly
> > like:
> >
> > - Require username/password to be passed into the service (something we've
> > been trying to banish via migrating to trusts for deferred
> > authentication)
> > - Create a trust, and impersonate the user for the duration of the request,
> > or after the token expires until it is completed, using the service user
> > credentials and the trust_id.
> >
> > It's the second one which I'm deliberating over - technically it will work,
> > and we create the trust anyway (e.g for later use to do autoscaling etc),
> > but can anyone from the keystone team comment on the legitimacy of the
> > approach?
> >
> > Intuitively it seems wrong, but I can't see any other way if we want to
> > support token-only auth and cope with folks doing stuff which takes 2 hours
> > with a 1 hour token expiry?
> >
> > The current workaround, as mentioned by sdague, has been just to increase
> > the token expiry to several hours.
> >
> > Thoughts appreciated!
>
> Right, so passing username/password is not acceptable.
>
> So i just put a reply to Sean further up in this thread (different branch? stupid zimbra) about the Service-Token that we are trying to add to middleware. The intention of this was to restrict what can be done with service tokens but it *might* give us some benefits here, I'm not sure of the security implications here yet - or how we would write policy to allow only certain interactions.
Hmm, atm I don't see this as a solution for heat - because we do everything
using the user token (or with a token impersonating the user).
So unless the service token approach includes impersonation I don't think
it will work for us (because e.g we don't want to create a bunch of
resources owned by the service user).
> I know what you mean about it feeling wrong a service creating a trust per call, i already see issues with the security model of services being allowed to create trusts at will for users. Is it reasonable to define a way to pass a trust_id as a header, such that for certain long running calls nova would know to use that trust to do long running work? I realize that it's difficult for clients to create trusts for the correct users now, but could that work?
Yeah, well we're already some way down that patch as we create a trust per
stack, but I agree there are possible problems with the model which would
be solved by allowing the user to determine ahead of time which services
they trust to do deferred (or long running) operations.
We have already discussed the idea of an X-Auth-Trust header:
https://blueprints.launchpad.net/heat/+spec/x-auth-trust
There are a few problems unfortunately:
- The user has no way of knowing the trustee user ID, and last time I
checked a non-admin user can't create a trust with a trustee username.
- You'd require either chained delegation (yes, I was supposed to do that
for Juno..) or some way to specify multiple trustees, or you just move
the problem out by one layer (pass the trust into heat, but then the
trust-scoped token you use to do a backup in glance can expire)
- Every service (or auth_token I guess) needs validation that the trust
trustor matches the user_id in the request, or the trust_id becomes as
valuable-a secret as the token.
That said, I do think it's something worth figuring out, as it would enable
some interesting use-cases (e.g Solum needs it so it can do deferred
adjustments to heat stacks via a trust).
Sorry, this has veered a little OT, but I appreciate the discussion :)
Steve
More information about the OpenStack-dev
mailing list