[openstack-dev] [nova] [glance] [cinder] [neutron] [keystone] - RFC cross project request id tracking

John Dickinson me at not.mn
Tue May 16 17:28:54 UTC 2017



On 14 May 2017, at 4:04, Sean Dague wrote:

> One of the things that came up in a logging Forum session is how much effort operators are having to put into reconstructing flows for things like server boot when they go wrong, as every time we jump a service barrier the request-id is reset to something new. The back and forth between Nova / Neutron and Nova / Glance would be definitely well served by this. Especially if this is something that's easy to query in elastic search.
>
> The last time this came up, some people were concerned that trusting request-id on the wire was concerning to them because it's coming from random users. We're going to assume that's still a concern by some. However, since the last time that came up, we've introduced the concept of "service users", which are a set of higher priv services that we are using to wrap user requests between services so that long running request chains (like image snapshot). We trust these service users enough to keep on trucking even after the user token has expired for this long run operations. We could use this same trust path for request-id chaining.
>
> So, the basic idea is, services will optionally take an inbound X-OpenStack-Request-ID which will be strongly validated to the format (req-$uuid). They will continue to always generate one as well. When the context is built (which is typically about 3 more steps down the paste pipeline), we'll check that the service user was involved, and if not, reset the request_id to the local generated one. We'll log both the global and local request ids. All of these changes happen in oslo.middleware, oslo.context, oslo.log, and most projects won't need anything to get this infrastructure.
>
> The python clients, and callers, will then need to be augmented to pass the request-id in on requests. Servers will effectively decide when they want to opt into calling other services this way.
>
> This only ends up logging the top line global request id as well as the last leaf for each call. This does mean that full tree construction will take more work if you are bouncing through 3 or more servers, but it's a step which I think can be completed this cycle.
>
> I've got some more detailed notes, but before going through the process of putting this into an oslo spec I wanted more general feedback on it so that any objections we didn't think about yet can be raised before going through the detailed design.
>
> 	-Sean
>
> -- 
> Sean Dague
> http://dague.net
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


I'm not sure the best place to respond (mailing list or gerrit), so
I'll write this up and post it to both places.

I think the idea behind this proposal is great. It has the potential
to bring a lot of benefit to users who are tracing a request across
many different services, in part by making it easy to search in an
indexing system like ELK.

The current proposal has some elements that won't work with the way
Swift currently solves this problem. This is mostly due to the
proposed uuid-ish check for validation. However, the Swift solution
has a few aspects that I believe would be very helpful for the entire
community.

NB: Swift returns both an `X-OpenStack-Request-ID` and an `X-Trans-ID`
header in every response. The `X-Trans-ID` was implemented before the
OpenStack request ID was proposed, and so we've kept the `X-Trans-ID` so
as not to break existing clients. The value of `X-OpenStack-Request-ID`
in any response from Swift is simply a mirror of the `X-Trans-ID` value.

The request id in Swift is made up of a few parts:

    X-Openstack-Request-Id: txbea0071df2b0465082501-00591b3077saio-extraextra


In the code, this in generated from:

    'tx%s-%010x%s' % (uuid.uuid4().hex[:21], time.time(), quote(trans_id_suffix))

...meaning that there are three parts to the request id. Let's take
each in turn.

The first part always starts with 'tx' (originally from the
"transaction id") and then is the first 21 hex characters of a uuid4.
The truncation is to limit the overall length of the value.

The second part is the hex value of the current time, padded to 10
characters.

Finally, the third part is the quoted suffix, and it defaults to the
empty string. The suffix itself can be made of two parts. The first is
configured in the Swift proxy server itself (ie the service that does
the logging) via the `trans_id_suffix` config. This allows an operator
to set a different suffix for each API endpoint or each region or each
cluster in order to help distinguish them in logs. For example, if a
deployment with multiple clusters uses centralized log aggregation, a
different trans_id_suffix value for each cluster makes it very easy to
distinguish between the clusters' logs.

The last part of the suffix is settable via the end-user (ie the one
calling the API). When the "X-Trans-ID-Extra" header in a request,
it's value is quoted and appended to the final transaction id value.

Here's a curl example that shows this all put together. You can see
that I have my Swift-All-In-One dev environment configured to use the
"saio" value for the `trans_id_suffix` value:

    $ curl -i -H "X-Auth-Token: AUTH_tk1bab51ce5e1d4e2bb6c54bccf59433ee" http://saio:8080/v1/AUTH_test/c/o --data-binary 1234 -XPUT -H "x-trans-id-extra: extraextra"
    HTTP/1.1 201 Created
    Last-Modified: Tue, 16 May 2017 17:04:17 GMT
    Content-Length: 0
    Etag: 81dc9bdb52d04dc20036dbd8313ed055
    Content-Type: text/html; charset=UTF-8
    X-Trans-Id: txf766cc02859c450eb4aef-00591b3110saio-extraextra
    X-Openstack-Request-Id: txf766cc02859c450eb4aef-00591b3110saio-extraextra
    Date: Tue, 16 May 2017 17:04:16 GMT

    $
    $ curl -i -H "X-Auth-Token: AUTH_tk1bab51ce5e1d4e2bb6c54bccf59433ee" http://saio:8080/v1/AUTH_test/c/o -H "x-trans-id-extra: moredifferentextra"
    HTTP/1.1 200 OK
    Content-Length: 4
    Accept-Ranges: bytes
    Last-Modified: Tue, 16 May 2017 17:04:17 GMT
    Etag: 81dc9bdb52d04dc20036dbd8313ed055
    X-Timestamp: 1494954256.25977
    Content-Type: application/x-www-form-urlencoded
    X-Trans-Id: tx4013173098b348b6b7952-00591b34d2saio-moredifferentextra
    X-Openstack-Request-Id: tx4013173098b348b6b7952-00591b34d2saio-moredifferentextra
    Date: Tue, 16 May 2017 17:20:18 GMT

    1234
    $ curl -i -H "X-Auth-Token: AUTH_tk1bab51ce5e1d4e2bb6c54bccf59433ee" http://saio:8080/v1/AUTH_test/c/o
    HTTP/1.1 200 OK
    Content-Length: 4
    Accept-Ranges: bytes
    Last-Modified: Tue, 16 May 2017 17:04:17 GMT
    Etag: 81dc9bdb52d04dc20036dbd8313ed055
    X-Timestamp: 1494954256.25977
    Content-Type: application/x-www-form-urlencoded
    X-Trans-Id: txf66856a06d7547c4ad79d-00591b3527saio
    X-Openstack-Request-Id: txf66856a06d7547c4ad79d-00591b3527saio
    Date: Tue, 16 May 2017 17:21:43 GMT

    1234

The `X-Trans-ID-Extra` header, specifically, sounds very similar to
what is being proposed to solve the cross-project reuqest IDs. To
quote from the commit in Swift that added this:

    The value of the X-Trans-Id-Extra header on the request (if any) will
    now be appended to the transaction ID. This lets users put their own
    information into transaction IDs.

    For example, Glance folks upload images as large objects, so they'd
    like to be able to tie together all the segment PUTs and the manifest
    PUT with some operation ID in the logs. This would let them pass in
    that operation ID as X-Trans-Id-Extra, and then when things went
    wrong, it'd be much easier to find all the requests in Swift's logs.

    Also, this aids debuggability when requests fail to receive
    responses. If a user is sending in their own X-Trans-Id-Extra strings,
    then that gives operators something to search for in the logs. The
    normal txid won't work since that's in the response, but the client
    didn't receive one.

    Swift will only use the first 32 characters of X-Trans-Id-Extra so
    that its log lines stay a manageable length. Also, it's URL-quoted so
    that users cannot inject double quotes into X-Trans-Id-Extra and screw
    up log parsers.

I hope this information helps move the conversation around cross-
project reuqest IDs forward. I believe that Swift's current solution
and pattern would work very well for other OpenStack projects.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20170516/5210873b/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: OpenPGP digital signature
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20170516/5210873b/attachment.sig>


More information about the OpenStack-dev mailing list