Open Stack

Sat Mar 22 10:27:38 UTC 2014

On 03/21/2014 05:11 PM, Joe Gordon wrote:
> 
> 
> 
> On Fri, Mar 21, 2014 at 4:04 AM, Sean Dague <sean at dague.net
> <mailto:sean at dague.net>> wrote:
> 
>     On 03/20/2014 06:18 PM, Joe Gordon wrote:
>     >
>     >
>     >
>     > On Thu, Mar 20, 2014 at 3:03 PM, Alexei Kornienko
>     > <alexei.kornienko at gmail.com <mailto:alexei.kornienko at gmail.com>
>     <mailto:alexei.kornienko at gmail.com
>     <mailto:alexei.kornienko at gmail.com>>> wrote:
>     >
>     >     Hello,
>     >
>     >     We've done some profiling and results are quite interesting:
>     >     during 1,5 hour ceilometer inserted 59755 events (59755 calls to
>     >     record_metering_data)
>     >     this calls resulted in total 2591573 SQL queries.
>     >
>     >     And the most interesting part is that 291569 queries were ROLLBACK
>     >     queries.
>     >     We do around 5 rollbacks to record a single event!
>     >
>     >     I guess it means that MySQL backend is currently totally
>     unusable in
>     >     production environment.
>     >
>     >
>     > It should be noticed that SQLAlchemy is horrible for performance, in
>     > nova we usually see sqlalchemy overheads of well over 10x (time
>     > nova.db.api call vs the time MySQL measures when slow log is recording
>     > everything).
> 
>     That's not really a fair assessment. Python object inflation takes time.
>     I do get that there is SQLA overhead here, but even if you trimmed it
>     out you would not get the the mysql query time.
> 
> 
> To give an example from nova:
> 
> doing a nova list with no servers:
> 
> stack at devstack:~/devstack$ nova --timing list 
> 
> | GET
> http://10.0.0.16:8774/v2/a82ededa9a934b93a7184d06f302d745/servers/detail
> | 0.0817470550537 |
> 
> So nova command takes 0.0817470550537 seconds.
> 
> Inside the nova logs (when putting a timer around all nova.db.api calls
> [1] ), nova.db.api.instance_get_all_by_filters takes 0.06 seconds:
> 
>     2014-03-21 20:58:46.760 DEBUG nova.db.api
> [req-91879f86-7665-4943-8953-41c92c42c030 demo demo]
> 'instance_get_all_by_filters' 0.06 seconds timed
> /mnt/stack/nova/nova/db/api.py:1940
> 
> But the sql slow long reports the same query takes only 0.001006 seconds
> with a lock_time of 0.000269 for a total of  0.00127 seconds.
> 
>     # Query_time: 0.001006  Lock_time: 0.000269 Rows_sent: 0
>  Rows_examined: 0
> 
> 
> So in this case only 2% of the time
> that  nova.db.api.instance_get_all_by_filters takes is spent inside of
> mysql. Or to put it differently  nova.db.api.instance_get_all_by_filters
> is 47 times slower then the raw DB call underneath.
> 
> Yes I agree that that turning raw sql data into python objects should
> take time, but I just don't think it should take 98% of the time.
> 
> [1] https://github.com/jogo/nova/commit/7743ee366bbf8746f1c0f634f29ebf73bff16ea1
> 
>     That being said, having Ceilometer's write path be highly tuned and not
>     use SQLA (and written for every back end natively) is probably
>     appropriate.
> 
> 
> While I like this idea, they loose free postgresql support by dropping
> SQLA. But that is a solvable problem.

Joe, you're just trolling now, right? :)

I mean you picked the most pathological case possible. An empty table
with no data ever returned. So no actual work was done anywhere, and
this is just measure side effects which in no way are commensurate with
actual read / write profiles of a real system.

I 100% agree that SQLA provides overhead. However, removing SQLA is the
last in a series of optimizations that you do on a system. Because
taking it out doesn't solve having bad data usage (getting more data
than you need), bad schema, or bad queries. I would expect substantial
gains could be made tackling those first.

If after that, fast path drivers sounded like a good idea, go for it.

But realize that a fast path driver is more work to write and maintain.
And has the energy hasn't gone into optimizing things yet, I think a
proposal to put even more work on the team to write a new set of harder
to maintain drivers, is just a non starter.

All I'm asking is that we need profiling. Ceilometer is suppose to be
high performance / low overhead metrics collection. We have some
indication that it's not meeting that desire based on our gate runs.
Which means we can reproduce it. Which is great, because reproducing
means things are fixable, and we can easily know if we did fix it.

Optimizing is hard, but I think it's the right time to do it. Not just
with elasticity, but with old fashion analysis.

	-Sean

-- 
Sean Dague
Samsung Research America
sean at dague.net / sean.dague at samsung.com
http://dague.net

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 482 bytes
Desc: OpenPGP digital signature
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140322/98f4727f/attachment.pgp>

Open Stack

[openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer tempest testing in gate

OpenStack

Community

Documentation

Branding & Legal