[openstack-dev] [nova] [glance] [cinder] [neutron] [keystone] - RFC cross project request id tracking
Sean Dague
sean at dague.net
Sun May 14 11:04:03 UTC 2017
One of the things that came up in a logging Forum session is how much
effort operators are having to put into reconstructing flows for things
like server boot when they go wrong, as every time we jump a service
barrier the request-id is reset to something new. The back and forth
between Nova / Neutron and Nova / Glance would be definitely well served
by this. Especially if this is something that's easy to query in elastic
search.
The last time this came up, some people were concerned that trusting
request-id on the wire was concerning to them because it's coming from
random users. We're going to assume that's still a concern by some.
However, since the last time that came up, we've introduced the concept
of "service users", which are a set of higher priv services that we are
using to wrap user requests between services so that long running
request chains (like image snapshot). We trust these service users
enough to keep on trucking even after the user token has expired for
this long run operations. We could use this same trust path for
request-id chaining.
So, the basic idea is, services will optionally take an inbound
X-OpenStack-Request-ID which will be strongly validated to the format
(req-$uuid). They will continue to always generate one as well. When the
context is built (which is typically about 3 more steps down the paste
pipeline), we'll check that the service user was involved, and if not,
reset the request_id to the local generated one. We'll log both the
global and local request ids. All of these changes happen in
oslo.middleware, oslo.context, oslo.log, and most projects won't need
anything to get this infrastructure.
The python clients, and callers, will then need to be augmented to pass
the request-id in on requests. Servers will effectively decide when they
want to opt into calling other services this way.
This only ends up logging the top line global request id as well as the
last leaf for each call. This does mean that full tree construction will
take more work if you are bouncing through 3 or more servers, but it's a
step which I think can be completed this cycle.
I've got some more detailed notes, but before going through the process
of putting this into an oslo spec I wanted more general feedback on it
so that any objections we didn't think about yet can be raised before
going through the detailed design.
-Sean
--
Sean Dague
http://dague.net
More information about the OpenStack-dev
mailing list