[osprofiler] Distributed tracing in OpenStack
Monty Taylor
mordred at inaugust.com
Fri Apr 12 14:48:55 UTC 2019
On 4/12/19 2:34 PM, Monty Taylor wrote:
>
>
> On 4/11/19 9:42 PM, Ilya Shakhat wrote:
>> Hi,
>>
>> Distributed tracing is one of must-have features when one wants to
>> track the full path of request going through different services and
>> APIs. This makes it similar to shared request-id, but with nice
>> visualization at the end [1]. In OpenStack the tracing can be achieved
>> via osprofiler library. The library was introduced 5 years ago, and
>> back then there was no standard approach on how to do tracing and
>> that's why it stays aside from what has become a mainstream. Yet there
>> is no single standard, but the major players are OpenTracing and
>> OpenCensus communities. OpenTracing is represented by Uber's Jaeger
>> which is the default tracer from k8s world.
>>
>> Issues and limitations to be fixed:
>> 1. Compatibility. While osprofiler library supports many different
>> storage drivers, it has only one way of transferring trace context
>> over the wire. Ideally the library should be compatible with other
>> third-party tracers and allow traces to start in front of OpenStack
>> APIs (e.g. in user apps) and continue after (e.g. in storage systems,
>> or network management tools). [2]
>> 2. Operation mode. With osprofiler tracing is initiated by user
>> request, while in industrial solutions the tracing can be managed
>> centrally via dynamic sampling policies.
>> 3. In-process trace propagation. Depending on execution model
>> (threaded, async) the ways of storing current trace context differ.
>> OSProfiler supports thread-local model, which recently got broken with
>> new async implementation in openstacksdk [3].
>
> FWIW - we should have re-fixed that issue in SDK for all instances other
> than parallel uploading of Large Objects segments to swift. The
> parallism support now relies on the calling context's parallism. The
> large-object segment uploader is a thing we should make sure we do
> things with to make sure we're not losing those interactions.
>
> That said - if we move forward with this plan - let's be sure to make
> sure it works in openstacksdk - and that we're testing it so that we
> don't break it.
Do we need to wrap logical operations that may make more than one remote
call in a single span?
I ask because in the cloud layer of openstacksdk, there are methods,
like "create_image" or "get_server" which can wind up making multiple
calls to multiple services, but it's a single logical operation to the
user. I don't know enough about the opentracing best practices - do we
care about such aggregations? Or is simply wrapping the http call at the
ksa layer enough?
>> With OpenTracing it is possible to select the appropriate model
>> alongside with tracer configuration.
>>
>> What's the plan:
>> Switching to OpenTracing could be a good option to gain compatibility
>> with 3rd-party solutions. The actual change should go to osprofiler
>> library, but indirectly affects all OpenStack projects (should it be a
>> global team goal then?). I'm going to make a PoC of proposed change,
>> so reviews would be highly appreciated.
>>
>> Comments, suggestions?
>
> Generally supportive. I have specific impl feedbacks - but I'll leave
> those on the patches.
>
>> Thanks,
>> Ilya
>>
>> [1] e.g.
>> http://logs.openstack.org/15/650915/4/check/tempest-smoke-py3-osprofiler-redis/7c6c14e/osprofiler-traces/trace-3e5cc660-8815-4079-86b9-778af8469d79.html.gz
>>
>> [2] https://bugs.launchpad.net/osprofiler/+bug/1798565
>> [3] https://bugs.launchpad.net/osprofiler/+bug/1818493
>
>
More information about the openstack-discuss
mailing list