[openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer tempest testing in gate

Jay Pipes jaypipes at gmail.com
Thu Mar 20 22:15:37 UTC 2014


On Fri, 2014-03-21 at 00:03 +0200, Alexei Kornienko wrote:
> Hello,
> 
> We've done some profiling and results are quite interesting:
> during 1,5 hour ceilometer inserted 59755 events (59755 calls to
> record_metering_data)
> this calls resulted in total 2591573 SQL queries.

Yes, this matches my own experience with Ceilo+MySQL. But do not assume
that there are 2591573/59755 or around 43 queries per record meter
event. That is misleading. In fact, the number of queries per record
meter event increases over time, as the number of retries climbs due to
contention between readers and writers.

> And the most interesting part is that 291569 queries were ROLLBACK
> queries.

Yep, I noted that as well. But, this is not unique to Ceilometer by any
means. Just take a look at any database serving Nova, Cinder, Glance, or
anything that uses the common SQLAlchemy code. You will see a huge
percentage of entire number of queries taken up by ROLLBACK statements.
The problem in Ceilometer is just that the write:read ratio is much
higher than any of the other projects.

I had a suspicion that the rollbacks have to do with the way that the
oslo.db retry logic works, but I never had a chance to investigate it
further. Would be really interested to see similar stats against
PostgreSQL and see if the rollback issue is isolated to MySQL (I suspect
it is).

Best,
-jay

> We do around 5 rollbacks to record a single event!
> 
> I guess it means that MySQL backend is currently totally unusable in
> production environment.
> 
> Please find a full profiling graph attached.
> 
> Regards,
> 
> On 03/20/2014 10:31 PM, Sean Dague wrote:
> 
> > On 03/20/2014 01:01 PM, David Kranz wrote:
> > > On 03/20/2014 12:31 PM, Sean Dague wrote:
> > > > On 03/20/2014 11:35 AM, David Kranz wrote:
> > > > > On 03/20/2014 06:15 AM, Sean Dague wrote:
> > > > > > On 03/20/2014 05:49 AM, Nadya Privalova wrote:
> > > > > > > Hi all,
> > > > > > > First of all, thanks for your suggestions!
> > > > > > > 
> > > > > > > To summarize the discussions here:
> > > > > > > 1. We are not going to install Mongo (because "is's wrong" ?)
> > > > > > We are not going to install Mongo "not from base distribution", because
> > > > > > we don't do that for things that aren't python. Our assumption is
> > > > > > dependent services come from the base OS.
> > > > > > 
> > > > > > That being said, being an integrated project means you have to be able
> > > > > > to function, sanely, on an sqla backend, as that will always be part of
> > > > > > your gate.
> > > > > This is a claim I think needs a bit more scrutiny if by "sanely" you
> > > > > mean "performant". It seems we have an integrated project that no one
> > > > > would deploy using the sql db driver we have in the gate. Is any one
> > > > > doing that?  Is having a scalable sql back end a goal of ceilometer?
> > > > > 
> > > > > More generally, if there is functionality that is of great importance to
> > > > > any cloud deployment (and we would not integrate it if we didn't think
> > > > > it was) that cannot be deployed at scale using sqla, are we really going
> > > > > to say it should not be a part of OpenStack because we refuse, for
> > > > > whatever reason, to run it in our gate using a driver that would
> > > > > actually be used? And if we do demand an sqla backend, how much time
> > > > > should we spend trying to optimize it if no one will really use it?
> > > > > Though the slow heat job is a little different because the slowness
> > > > > comes directly from running real use cases, perhaps we should just set
> > > > > up a "slow ceilometer" job if the sql version is too slow for its budget
> > > > > in the main job.
> > > > > 
> > > > > It seems like there is a similar thread, at least in part, about this
> > > > > around marconi.
> > > > We required a non mongo backend to graduate ceilometer. So I don't think
> > > > it's too much to ask that it actually works.
> > > > 
> > > > If the answer is that it will never work and it was a checkbox with no
> > > > intent to make it work, then it should be deprecated and removed from
> > > > the tree in Juno, with a big WARNING that you shouldn't ever use that
> > > > backend. Like Nova now does with all the virt drivers that aren't tested
> > > > upstream.
> > > > 
> > > > Shipping in tree code that you don't want people to use is bad for
> > > > users. Either commit to making it work, or deprecate it and remove it.
> > > > 
> > > > I don't see this as the same issue as the slow heat job. Heat,
> > > > architecturally, is going to be slow. It spins up real OSes and does
> > > > real thinks to them. There is no way that's ever going to be fast, and
> > > > the dedicated job was a recognition that to support this level of
> > > > services in OpenStack we need to give them more breathing room.
> > > Peace. I specifically noted that difference in my original comment. And
> > > for that reason the heat slow job may not be temporary.
> > > > Architecturally Ceilometer should not be this expensive. We've got some
> > > > data showing it to be aberrant from where we believe it should be. We
> > > > should fix that.
> > > There are plenty of cases where we have had code that passes gate tests
> > > with acceptable performance but falls over in real deployment. I'm just
> > > saying that having a driver that works ok in the gate but does not work
> > > for real deployments is of no more value that not having it at all.
> > > Maybe less value.
> > > How do you propose to solve the problem of getting more ceilometer tests
> > > into the gate in the short-run? As a practical measure l don't see why
> > > it is so bad to have a separate job until the complex issue of whether
> > > it is possible to have a real-world performant sqla backend is resolved.
> > > Or did I miss something and it has already been determined that sqla
> > > could be used for large-scale deployments if we just fixed our code?
> > I think right now the ball is back in the ceilometer court to do some
> > performance profiling, and lets see what comes of that. I don't think
> > we're getting more test before the release in any real way.
> > 
> > > > Once we get a base OS in the gate that lets us direct install mongo from
> > > > base packages, we can also do that. Or someone can 3rd party it today.
> > > > Then we'll even have comparative results to understand the differences.
> > > Yes. Do you know which base OS's are candidates for that?
> > Ubuntu 14.04 will have a sufficient level of Mongo, so some time in the
> > Juno cycle we should have it in the gate.
> > 
> > 	-Sean
> > 
> > 
> > 
> > _______________________________________________
> > OpenStack-dev mailing list
> > OpenStack-dev at lists.openstack.org
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev





More information about the OpenStack-dev mailing list