[openstack-dev] OS tracing??
Sandy Walsh
sandy.walsh at RACKSPACE.COM
Wed Sep 5 17:43:38 UTC 2012
Here's the Tach config paste for the additional methods within compute.run_instance()
http://paste.openstack.org/show/20676/
It captures the time for the four main parts of run_instance()
- [nova.compute.manager._check_image_size]
- [nova.compute.manager._allocate_network]
- [nova.compute.manager._setup_block_device_mapping]
- [nova.compute.manager._spawn]
which is handy for spotting Quantum, Glance and hypervisor delays.
-S
________________________________________
From: Joshua Harlow [harlowja at yahoo-inc.com]
Sent: Tuesday, September 04, 2012 11:08 PM
To: Sandy Walsh; openstack-dev at yahoo-inc.com; OpenStack Development Mailing List
Subject: Re: OS tracing??
Ya for that, I'd be willing to help out here, its about time for this to
happen.
And its a logical step for as many of these monitoring pieces as we can
have (with granularity at the state transition level - to start...). I
know its probably a lot of refactoring to get this to happen but someone
has to do it, or baby steps need to get started and then it can just
continually get better (I'd be up for that also, either one giant leap for
openstack or baby leaps).
On 9/4/12 5:32 PM, "Sandy Walsh" <sandy.walsh at rackspace.com> wrote:
>That was the feature I was suggesting in a previous email about using the
>StackTach worker.py to monitor these state transitions. It could be done
>in-memory with rolling state machines pretty easily.
>
>I need to hunt down the latest version of StackTach since it has some
>good performance/stability improvements.
>
>It could certainly be a precursor to a dedicated orchestration layer.
>
>-S
>
>
>________________________________________
>From: Joshua Harlow [harlowja at yahoo-inc.com]
>Sent: Tuesday, September 04, 2012 9:02 PM
>To: openstack-dev at yahoo-inc.com; Joshua Harlow; Sandy Walsh; OpenStack
>Development Mailing List
>Subject: Re: OS tracing??
>
>Another interesting possiblitiy,
>
>If we had defined the state transitions (FSM like) for each API it would
>be very neat to be able to do something like the following.
>
>$ Describe machine 'new_instance'
>
>Auth -> Quota check -> Run -> Download image -> Start
>
>$ Enable trace 'new_instance' 'Auth->Run'
>
>Then that would start tracing the different states of that 'state
>machine', outputting to statsd or what not (ideally not requiring lots of
>code annotations but making an attempt to use pythons built in goodies for
>this). I could see this as a natural fit into an orchestration layer
>(since the state machine is handled by the orchestration layer) so maybe
>those 2 could go hand in hand (has there been any orchestration work for
>folsom?).
>
>On 9/4/12 4:37 PM, "Joshua Harlow" <harlowja at yahoo-inc.com> wrote:
>
>>Thx,
>>
>>All good info, RPC should definitely cover most, as for as the other
>>ones,
>>a paste would be awesome. I'm hoping that this info can start to find
>>spots where issues will pop up, and we can fix them early. As for
>>eventlet, did a little digging, they have http://tinyurl.com/cc2uwlc
>>which
>>seems to take into account the eventlet switching. Might be useful.
>>
>>I'll also look into the trace stuff, I'd be cool if we could hook into
>>that to automatically pickup certain modules, and start actively tracing
>>them, then be able to turn this on/off remotely (possibly via the
>>eventlet
>>backdoor server?). Then you could have some pretty knarly debug
>>capabilities (when needed) as well as being able to track exactly what
>>your server is doing (without having to keep the 'tracing' always on,
>>which it seems like tach requires?) Of course at some point this might
>>have to be more intrusive, as u start wanting to know context and the
>>like.
>>
>>-Josh
>>
>>On 9/4/12 4:27 PM, "Sandy Walsh" <sandy.walsh at rackspace.com> wrote:
>>
>>>Actually if you look at the default configs, you'll see we hook into the
>>>RPC dispatcher. All incoming/outgoing calls are tracked on all services,
>>>which is the majority of what's important. I have some specific ones for
>>>compute.run_instance, but it's optional. I'll dig it out and send a
>>>paste.
>>>
>>>Never thought about hooking into python trace, but you'd likely spend
>>>more time telling it what *not* to report. Have to think about that a
>>>little more.
>>>
>>>Eventlet and RPC in-queue time are definitely concerns. That's what
>>>Inflight is meant to monitor.
>>>
>>>-S
>>>
>>>
>>>________________________________________
>>>From: Joshua Harlow [harlowja at yahoo-inc.com]
>>>Sent: Tuesday, September 04, 2012 8:14 PM
>>>To: Sandy Walsh; openstack-dev at yahoo-inc.com; OpenStack Development
>>>Mailing List
>>>Subject: Re: OS tracing??
>>>
>>>Does this mean there is a massive set of functions which u guys have
>>>wrapped this around?
>>>
>>>Is there anyway that full config can be distributed? I wonder if it is
>>>possible to hook into the profiling/trace functions that python provides
>>>to automatically get this information (filter for certain
>>>namespaces/modules to avoid all the other 'garbage'?) Then no config
>>>would
>>>be needed at all, and this could become even more agnostic to what
>>>functions/classes/methods to wrap.
>>>
>>>Thoughts?
>>>I wonder if the interactions with eventlet though cause some issues....
>>>
>>>On 9/4/12 4:03 PM, "Sandy Walsh" <sandy.walsh at rackspace.com> wrote:
>>>
>>>>Yes, Tach can be used against any python program, but the sample
>>>>configs
>>>>are for nova services.
>>>>
>>>>You would call your program like this:
>>>>
>>>>tach tach.conf nova_foo nova_foo.conf
>>>>
>>>>This will load tach, load your program and monkeypatch the
>>>>functions/methods defined. The measurements go to statsd (timings,
>>>>counts, etc)
>>>>
>>>>We drive all this via puppet, so when we update our tach puppet
>>>>variables
>>>>the services all update automatically.
>>>>
>>>>-S
>>>>
>>>>________________________________________
>>>>From: Joshua Harlow [harlowja at yahoo-inc.com]
>>>>Sent: Tuesday, September 04, 2012 4:29 PM
>>>>To: openstack-dev at yahoo-inc.com; Sandy Walsh; OpenStack Development
>>>>Mailing List
>>>>Subject: Re: OS tracing??
>>>>
>>>>Thanks much,
>>>>
>>>>Almost forgot about tach, it seems like it can be hooked into arbitrary
>>>>functions, which is great. It'd be cool if that type of functionality
>>>>was
>>>>included with say nova, and it could be remotely enabled/disabled as
>>>>needed (say a weird production issue u want to find more info about, so
>>>>u
>>>>send a special command that says start monitoring this function, or
>>>>even
>>>>better, integrate it into eventlet so that it can start reporting
>>>>automatically on 'hot' functions).
>>>>
>>>>Is tach monkey patching the functions that it is asked to instrument?
>>>>
>>>>-Josh
>>>>
>>>>On 9/4/12 10:46 AM, "Sandy Walsh" <sandy.walsh at rackspace.com> wrote:
>>>>
>>>>>We've been using Tach to orchestrate Openstack services and report to
>>>>>statsd/graphite. https://github.com/ohthree/tach ... works great
>>>>>
>>>>>I've been trying to land this Inflight Service branch to measure RPC
>>>>>and
>>>>>greenlet overhead
>>>>>https://review.openstack.org/#/c/11179/
>>>>>BP: https://blueprints.launchpad.net/nova/+spec/monitoring-service
>>>>>
>>>>>Hope it helps,
>>>>>-S
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>From: Joshua Harlow [harlowja at yahoo-inc.com]
>>>>>Sent: Tuesday, September 04, 2012 2:35 PM
>>>>>To: OpenStack Development Mailing List
>>>>>Cc: openstack-dev
>>>>>Subject: [openstack-dev] OS tracing??
>>>>>
>>>>>Has anyone had any luck with trying out some tracing/coverage with the
>>>>>openstack projects to see where the bottlenecks are (outside of test
>>>>>coverage)?
>>>>>
>>>>>I was thinking about possible ways to do this (there seems to be a lot
>>>>>of
>>>>>different libraries that might help) but was wondering if anyone else
>>>>>has
>>>>>figured out the best one to use yet.
>>>>>
>>>>>Ideally it should have the following properties (in my mind):
>>>>>
>>>>>Non-intrusive (shouldn't require sprinkling of timing/trace logic all
>>>>>over)Works with eventlet/greenlet (eventlet is going to switch things
>>>>>in
>>>>>and out, so that has to be taken account of)Probably does this via
>>>>>sampling (?)Writes out some standard format (valgrind like?) for
>>>>>analysisŠCan be turned on and off remotely (nice to have, it'd be cool
>>>>>to
>>>>>have an API/entrypoint/Š that says enable tracing which can be used on
>>>>>a
>>>>>live system, that system will become slower but it'd be neat)
>>>>>
>>>>>This could be some special 'admin' entry point (restricted to certain
>>>>>users of course) that could also do stuff like 'reload-configs' or
>>>>>'enable-tracing' or 'adjust-log-level' or similar administrative
>>>>>actions
>>>>>that would be useful during those crazy debug
>>>>> sessions (think a simple admin telnet entrypoint to view stats,
>>>>>similar
>>>>>to what memcache/redis provide via there 'stats' commandsŠ)
>>>>>
>>>>>
>>>>>Anyone have any ideas on this :-)
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>-Josh
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>>
>
More information about the OpenStack-dev
mailing list