[openstack-dev] [nova] [glance] [cinder] [neutron] [keystone] - RFC cross project request id tracking
Tim.Bell at cern.ch
Sun May 14 17:08:13 UTC 2017
> On 14 May 2017, at 13:04, Sean Dague <sean at dague.net> wrote:
> One of the things that came up in a logging Forum session is how much effort operators are having to put into reconstructing flows for things like server boot when they go wrong, as every time we jump a service barrier the request-id is reset to something new. The back and forth between Nova / Neutron and Nova / Glance would be definitely well served by this. Especially if this is something that's easy to query in elastic search.
> The last time this came up, some people were concerned that trusting request-id on the wire was concerning to them because it's coming from random users. We're going to assume that's still a concern by some. However, since the last time that came up, we've introduced the concept of "service users", which are a set of higher priv services that we are using to wrap user requests between services so that long running request chains (like image snapshot). We trust these service users enough to keep on trucking even after the user token has expired for this long run operations. We could use this same trust path for request-id chaining.
> So, the basic idea is, services will optionally take an inbound X-OpenStack-Request-ID which will be strongly validated to the format (req-$uuid). They will continue to always generate one as well. When the context is built (which is typically about 3 more steps down the paste pipeline), we'll check that the service user was involved, and if not, reset the request_id to the local generated one. We'll log both the global and local request ids. All of these changes happen in oslo.middleware, oslo.context, oslo.log, and most projects won't need anything to get this infrastructure.
> The python clients, and callers, will then need to be augmented to pass the request-id in on requests. Servers will effectively decide when they want to opt into calling other services this way.
> This only ends up logging the top line global request id as well as the last leaf for each call. This does mean that full tree construction will take more work if you are bouncing through 3 or more servers, but it's a step which I think can be completed this cycle.
> I've got some more detailed notes, but before going through the process of putting this into an oslo spec I wanted more general feedback on it so that any objections we didn't think about yet can be raised before going through the detailed design.
This is very consistent with what I had understood during the forum session. Having a single request id across multiple services as the end user operation is performed would be a great help in operations, where we are often using a solution like ElasticSearch/Kibana to show logs and interactively query the timing and results of a given request id. It would also improve traceability during investigations where we are aiming to determine who the initial requesting user.
> Sean Dague
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
More information about the OpenStack-dev