[openstack-dev] OS tracing??

Joshua Harlow harlowja at yahoo-inc.com
Wed Sep 5 00:02:42 UTC 2012

Another interesting possiblitiy,

If we had defined the state transitions (FSM like) for each API it would
be very neat to be able to do something like the following.

$ Describe machine 'new_instance'
Auth -> Quota check -> Run -> Download image -> Start

$ Enable trace 'new_instance' 'Auth->Run'

Then that would start tracing the different states of that 'state
machine', outputting to statsd or what not (ideally not requiring lots of
code annotations but making an attempt to use pythons built in goodies for
this). I could see this as a natural fit into an orchestration layer
(since the state machine is handled by the orchestration layer) so maybe
those 2 could go hand in hand (has there been any orchestration work for

On 9/4/12 4:37 PM, "Joshua Harlow" <harlowja at yahoo-inc.com> wrote:

>All good info, RPC should definitely cover most, as for as the other ones,
>a paste would be awesome. I'm hoping that this info can start to find
>spots where issues will pop up, and we can fix them early. As for
>eventlet, did a little digging, they have http://tinyurl.com/cc2uwlc which
>seems to take into account the eventlet switching. Might be useful.
>I'll also look into the trace stuff, I'd be cool if we could hook into
>that to automatically pickup certain modules, and start actively tracing
>them, then be able to turn this on/off remotely (possibly via the eventlet
>backdoor server?). Then you could have some pretty knarly debug
>capabilities (when needed) as well as being able to track exactly what
>your server is doing (without having to keep the 'tracing' always on,
>which it seems like tach requires?) Of course at some point this might
>have to be more intrusive, as u start wanting to know context and the
>On 9/4/12 4:27 PM, "Sandy Walsh" <sandy.walsh at rackspace.com> wrote:
>>Actually if you look at the default configs, you'll see we hook into the
>>RPC dispatcher. All incoming/outgoing calls are tracked on all services,
>>which is the majority of what's important. I have some specific ones for
>>compute.run_instance, but it's optional. I'll dig it out and send a
>>Never thought about hooking into python trace, but you'd likely spend
>>more time telling it what *not* to report. Have to think about that a
>>little more.
>>Eventlet and RPC in-queue time are definitely concerns. That's what
>>Inflight is meant to monitor.
>>From: Joshua Harlow [harlowja at yahoo-inc.com]
>>Sent: Tuesday, September 04, 2012 8:14 PM
>>To: Sandy Walsh; openstack-dev at yahoo-inc.com; OpenStack Development
>>Mailing List
>>Subject: Re: OS tracing??
>>Does this mean there is a massive set of functions which u guys have
>>wrapped this around?
>>Is there anyway that full config can be distributed? I wonder if it is
>>possible to hook into the profiling/trace functions that python provides
>>to automatically get this information (filter for certain
>>namespaces/modules to avoid all the other 'garbage'?) Then no config
>>be needed at all, and this could become even more agnostic to what
>>functions/classes/methods to wrap.
>>I wonder if the interactions with eventlet though cause some issues....
>>On 9/4/12 4:03 PM, "Sandy Walsh" <sandy.walsh at rackspace.com> wrote:
>>>Yes, Tach can be used against any python program, but the sample configs
>>>are for nova services.
>>>You would call your program like this:
>>>tach tach.conf nova_foo nova_foo.conf
>>>This will load tach, load your program and monkeypatch the
>>>functions/methods defined. The measurements go to statsd (timings,
>>>counts, etc)
>>>We drive all this via puppet, so when we update our tach puppet
>>>the services all update automatically.
>>>From: Joshua Harlow [harlowja at yahoo-inc.com]
>>>Sent: Tuesday, September 04, 2012 4:29 PM
>>>To: openstack-dev at yahoo-inc.com; Sandy Walsh; OpenStack Development
>>>Mailing List
>>>Subject: Re: OS tracing??
>>>Thanks much,
>>>Almost forgot about tach, it seems like it can be hooked into arbitrary
>>>functions, which is great. It'd be cool if that type of functionality
>>>included with say nova, and it could be remotely enabled/disabled as
>>>needed (say a weird production issue u want to find more info about, so
>>>send a special command that says start monitoring this function, or even
>>>better, integrate it into eventlet so that it can start reporting
>>>automatically on 'hot' functions).
>>>Is tach monkey patching the functions that it is asked to instrument?
>>>On 9/4/12 10:46 AM, "Sandy Walsh" <sandy.walsh at rackspace.com> wrote:
>>>>We've been using Tach to orchestrate Openstack services and report to
>>>>statsd/graphite. https://github.com/ohthree/tach ... works great
>>>>I've been trying to land this Inflight Service branch to measure RPC
>>>>greenlet overhead
>>>>BP: https://blueprints.launchpad.net/nova/+spec/monitoring-service
>>>>Hope it helps,
>>>>From: Joshua Harlow [harlowja at yahoo-inc.com]
>>>>Sent: Tuesday, September 04, 2012 2:35 PM
>>>>To: OpenStack Development Mailing List
>>>>Cc: openstack-dev
>>>>Subject: [openstack-dev] OS tracing??
>>>>Has anyone had any luck with trying out some tracing/coverage with the
>>>>openstack projects to see where the bottlenecks are (outside of test
>>>>I was thinking about possible ways to do this (there seems to be a lot
>>>>different libraries that might help) but was wondering if anyone else
>>>>figured out the best one to use yet.
>>>>Ideally it should have the following properties (in my mind):
>>>>Non-intrusive (shouldn't require sprinkling of timing/trace logic all
>>>>over)Works with eventlet/greenlet (eventlet is going to switch things
>>>>and out, so that has to be taken account of)Probably does this via
>>>>sampling (?)Writes out some standard format (valgrind like?) for
>>>>analysisŠCan be turned on and off remotely (nice to have, it'd be cool
>>>>have an API/entrypoint/Š that says enable tracing which can be used on
>>>>live system, that system will become slower but it'd be neat)
>>>>This could be some special 'admin' entry point (restricted to certain
>>>>users of course) that could also do stuff like 'reload-configs' or
>>>>'enable-tracing' or 'adjust-log-level' or similar administrative
>>>>that would be useful during those crazy debug
>>>> sessions (think a simple admin telnet entrypoint to view stats,
>>>>to what memcache/redis provide via there 'stats' commandsŠ)
>>>>Anyone have any ideas on this :-)

More information about the OpenStack-dev mailing list