[openstack-dev] [all] cross project communication: Return request-id to caller

Kekane, Abhishek Abhishek.Kekane at nttdata.com
Wed May 27 07:06:56 UTC 2015

Hi Devs,

Each OpenStack service sends a request ID header with HTTP responses. This request ID can be useful for tracking down problems in the logs. However, when operation crosses service boundaries, this tracking can become difficult, as each service has its own request ID. Request ID is not returned to the caller, so it is not easy to track the request. This becomes especially problematic when requests are coming in parallel. For example, glance will call cinder for creating image, but that cinder instance may be handling several other requests at the same time. By using same request ID in the log, user can easily find the cinder request ID that is same as glance request ID in the g-api log. It will help operators/developers to analyse logs effectively.

To address this issue we have come up with following solutions:

Solution 1: Return tuple containing headers and body from respective clients (also favoured by Joe Gordon)
Reference: https://review.openstack.org/#/c/156508/6/specs/log-request-id-mappings.rst

1. Maintains backward compatibility
2. Effective debugging/analysing of the problem as both calling service request-id and called service request-id are logged in same log message
3. Build a full call graph
4. End user will able to know the request-id of the request and can approach service provider to know the cause of failure of particular request.

1. The changes need to be done first in cross-projects before making changes in clients
2. Applications which are using python-*clients needs to do required changes (check return type of  response)

Solution 2:  Use thread local storage to store 'x-openstack-request-id' returned from headers (suggested by Doug Hellmann)
Reference: https://review.openstack.org/#/c/156508/9/specs/log-request-id-mappings.rst

Add new method 'get_openstack_request_id' to return this request-id to the caller.

1. Doesn't break compatibility
2. Minimal changes are required in client
3. Build a full call graph

1. Malicious user can send long request-id to fill up the disk-space, resulting in potential DoS
2. Changes need to be done in all python-*clients
3. Last request id should be flushed out in a subsequent call otherwise it will return wrong request id to the caller

Solution 3: Unique request-id across OpenStack Services (suggested by Jamie Lennox)
Reference: https://review.openstack.org/#/c/156508/10/specs/log-request-id-mappings.rst

Get 'x-openstack-request-id' from auth plugin and add it to the request headers. If 'x-openstack-request-id' key is present in the request header, then it will use the same one further or else it will generate a new one.

https://review.openstack.org/#/c/164582/ - Include request-id in auth plugin and add it to request headers
https://review.openstack.org/#/c/166063/ - Add session-object for glance client
Add 'UserAuthPlugin' and '_ContextAuthPlugin' same as nova in cinder and neutron

1. Using same request id for the request crossing multiple service boundaries will help operators/developers identify the problem quickly
2. Required changes only in keystonemiddleware and oslo_middleware libraries. No changes are required in the python client bindings or OpenStack core services

1. As 'x-openstack-request-id' in the request header will be visible to the user, it is possible to send same request id for multiple requests which in turn could create more problems in case of troubleshooting cause of the failure as request_id middleware will not check for its uniqueness in the scope of the running OpenStack service.
2. Having the same request ID for all services for a single user API call means you cannot generate a full call graph. For example if a single user's nova API call produces 2 calls to glance you want to be able to differentiate the two different calls.

During the Liberty design summit, I had a chance of discussing these designs with some of the core members like Doug, Joe Gordon, Jamie Lennox etc. But not able to came to any conclusion on the final design and know the communities direction by which way they want to use this request-id effectively.

However IMO, solution 1 sounds more useful as the debugger can able to build the full call graph which can be helpful for analysing gate failures effectively as well as end user will be able to know his request-id and can track his request.

I request all community members to go through these solutions and let us know which is the appropriate way to improve the logs by logging request-id.

Thanks & Regards,

Abhishek Kekane

