[openstack-dev] [Ceilometer] [QA] Slow Ceilometer resource_list CLI command

Neal, Phil phil.neal at hp.com
Tue Mar 18 15:17:14 UTC 2014


> -----Original Message-----
> From: Tim Bell [mailto:Tim.Bell at cern.ch]
> Sent: Monday, March 17, 2014 2:04 PM
> To: OpenStack Development Mailing List (not for usage questions)
> Subject: Re: [openstack-dev] [Ceilometer] [QA] Slow Ceilometer
> resource_list CLI command
> 
> 
> At CERN, we've had similar issues when enabling telemetry. Our resource-list
> times out after 10 minutes when the proxies for HA assume there is no
> answer coming back. Keystone instances per cell have helped the situation a
> little so we can collect the data but there was a significant increase in load on
> the API endpoints.
> 
> I feel that some reference for production scale validation would be beneficial
> as part of TC approval to leave incubation in case there are issues such as this
> to be addressed.
> 
> Tim
> 
> > -----Original Message-----
> > From: Jay Pipes [mailto:jaypipes at gmail.com]
> > Sent: 17 March 2014 20:25
> > To: openstack-dev at lists.openstack.org
> > Subject: Re: [openstack-dev] [Ceilometer] [QA] Slow Ceilometer
> resource_list CLI command
> >
> ...
> >
> > Yep. At AT&T, we had to disable calls to GET /resources without any filters
> on it. The call would return hundreds of thousands of
> > records, all being JSON-ified at the Ceilometer API endpoint, and the result
> would take minutes to return. There was no default limit
> > on the query, which meant every single records in the database was
> returned, and on even a semi-busy system, that meant
> > horrendous performance.
> >
> > Besides the problem that the SQLAlchemy driver doesn't yet support
> pagination [1], the main problem with the get_resources() call is
> > the underlying databases schema for the Sample model is wacky, and
> forces the use of a dependent subquery in the WHERE clause
> > [2] which completely kills performance of the query to get resources.
> >
> > [1]
> >
> https://github.com/openstack/ceilometer/blob/master/ceilometer/storage/
> impl_sqlalchemy.py#L436
> > [2]
> >
> https://github.com/openstack/ceilometer/blob/master/ceilometer/storage/
> impl_sqlalchemy.py#L503
> >
> > > The cli tests are supposed to be quick read-only sanity checks of the
> > > cli functionality and really shouldn't ever be on the list of slowest
> > > tests for a gate run.
> >
> > Oh, the test is readonly all-right. ;) It's just that it's reading hundreds of
> thousands of records.
> >
> > >  I think there was possibly a performance regression recently in
> > > ceilometer because from I can tell this test used to normally take ~60 sec.
> > > (which honestly is probably too slow for a cli test too) but it is
> > > currently much slower than that.
> > >
> > > From logstash it seems there are still some cases when the resource
> > > list takes as long to execute as it used to, but the majority of runs take a
> long time:
> > > http://goo.gl/smJPB9
> > >
> > > In the short term I've pushed out a patch that will remove this test
> > > from gate
> > > runs: https://review.openstack.org/#/c/81036 But, I thought it would
> > > be good to bring this up on the ML to try and figure out what changed
> > > or why this is so slow.
> >
> > I agree with removing the test from the gate in the short term. Medium to
> long term, the root causes of the problem (that GET
> > /resources has no support for pagination on the query, there is no default
> for limiting results based on a since timestamp, and that
> > the underlying database schema is non-optimal) should be addressed.

Gordon has introduced a blueprint 
https://blueprints.launchpad.net/ceilometer/+spec/big-data-sql with some fixes for individual queries but +1 to the point of looking at re-architecting the schema as an approach to fixing performance. We've also seen some gains here at HP using batch writes as well but have temporarily tabled that work in favor of getting a better-performing schema in place.
- Phil

> >
> > Best,
> > -jay
> >
> >
> > _______________________________________________
> > OpenStack-dev mailing list
> > OpenStack-dev at lists.openstack.org
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



More information about the OpenStack-dev mailing list