[openstack-dev] [nova] [glance] [cinder] [neutron] [keystone] - RFC cross project request id tracking

Sean Dague sean at dague.net
Sun May 14 11:04:03 UTC 2017


One of the things that came up in a logging Forum session is how much 
effort operators are having to put into reconstructing flows for things 
like server boot when they go wrong, as every time we jump a service 
barrier the request-id is reset to something new. The back and forth 
between Nova / Neutron and Nova / Glance would be definitely well served 
by this. Especially if this is something that's easy to query in elastic 
search.

The last time this came up, some people were concerned that trusting 
request-id on the wire was concerning to them because it's coming from 
random users. We're going to assume that's still a concern by some. 
However, since the last time that came up, we've introduced the concept 
of "service users", which are a set of higher priv services that we are 
using to wrap user requests between services so that long running 
request chains (like image snapshot). We trust these service users 
enough to keep on trucking even after the user token has expired for 
this long run operations. We could use this same trust path for 
request-id chaining.

So, the basic idea is, services will optionally take an inbound 
X-OpenStack-Request-ID which will be strongly validated to the format 
(req-$uuid). They will continue to always generate one as well. When the 
context is built (which is typically about 3 more steps down the paste 
pipeline), we'll check that the service user was involved, and if not, 
reset the request_id to the local generated one. We'll log both the 
global and local request ids. All of these changes happen in 
oslo.middleware, oslo.context, oslo.log, and most projects won't need 
anything to get this infrastructure.

The python clients, and callers, will then need to be augmented to pass 
the request-id in on requests. Servers will effectively decide when they 
want to opt into calling other services this way.

This only ends up logging the top line global request id as well as the 
last leaf for each call. This does mean that full tree construction will 
take more work if you are bouncing through 3 or more servers, but it's a 
step which I think can be completed this cycle.

I've got some more detailed notes, but before going through the process 
of putting this into an oslo spec I wanted more general feedback on it 
so that any objections we didn't think about yet can be raised before 
going through the detailed design.

	-Sean

-- 
Sean Dague
http://dague.net



More information about the OpenStack-dev mailing list