[openstack-dev] [Ceilometer][QA][Tempest][Infra] Ceilometer tempest testing in gate

Alexei Kornienko alexei.kornienko at gmail.com
Thu Mar 20 23:02:37 UTC 2014


On 03/21/2014 12:53 AM, Jay Pipes wrote:
> On Fri, 2014-03-21 at 00:32 +0200, Alexei Kornienko wrote:
>> On 03/21/2014 12:15 AM, Jay Pipes wrote:
>>> On Fri, 2014-03-21 at 00:03 +0200, Alexei Kornienko wrote:
>>>> Hello,
>>>>
>>>> We've done some profiling and results are quite interesting:
>>>> during 1,5 hour ceilometer inserted 59755 events (59755 calls to
>>>> record_metering_data)
>>>> this calls resulted in total 2591573 SQL queries.
>>> Yes, this matches my own experience with Ceilo+MySQL. But do not assume
>>> that there are 2591573/59755 or around 43 queries per record meter
>>> event. That is misleading. In fact, the number of queries per record
>>> meter event increases over time, as the number of retries climbs due to
>>> contention between readers and writers.
>>>
>>>> And the most interesting part is that 291569 queries were ROLLBACK
>>>> queries.
>>> Yep, I noted that as well. But, this is not unique to Ceilometer by any
>>> means. Just take a look at any database serving Nova, Cinder, Glance, or
>>> anything that uses the common SQLAlchemy code. You will see a huge
>>> percentage of entire number of queries taken up by ROLLBACK statements.
>>> The problem in Ceilometer is just that the write:read ratio is much
>>> higher than any of the other projects.
>>>
>>> I had a suspicion that the rollbacks have to do with the way that the
>>> oslo.db retry logic works, but I never had a chance to investigate it
>>> further. Would be really interested to see similar stats against
>>> PostgreSQL and see if the rollback issue is isolated to MySQL (I suspect
>>> it is).
>> Rollbacks are caused not by retry logic but by create_or_update logic:
>> We first try to do INSERT in sub-transaction when it fails we rollback
>> this transaction and do update instead.
> No, that isn't correct, AFAIK. We first do a SELECT into the table and
> then if no result, try an insert:
>
> https://github.com/openstack/ceilometer/blob/master/ceilometer/storage/impl_sqlalchemy.py#L286-L292
>
> The problem, IMO, is twofold. There does not need to be nested
> transactional containers around these create_or_update lookups -- i.e.
> the lookups can be done outside of the main transaction begin here:
>
> https://github.com/openstack/ceilometer/blob/master/ceilometer/storage/impl_sqlalchemy.py#L335
I'm afraid you are wrong here:

nested  =  session.connection().dialect.name  !=  'sqlite' # always True for MySQL
if  not  nested  and  session.query(model_class).get(str(_id)): # always False

Short circuit is used and no select is ever performed in MySQL.
>
> Secondly, given the volume of inserts (that also generate selects), a
> simple memcache lookup cache would be highly beneficial in cutting down
> on writer/reader contention in MySQL.
You are right but I'm afraid that adding memcache will make deployment 
more complicated.
>
> These are things that can be done without changing the schema (which has
> other issues that can be looked at of course).
>
> Best,
> -jay
>
>> This is caused by poorly designed schema that requires such hacks.
>> Cause of this I suspect that we'll have similar results for PostgreSQL.
>>
>> Tomorrow we'll do the same tests with PostgreSQL and MongoDB to see if
>> there is any difference.
>>
>>> Best,
>>> -jay
>>>
>>>> We do around 5 rollbacks to record a single event!
>>>>
>>>> I guess it means that MySQL backend is currently totally unusable in
>>>> production environment.
>>>>
>>>> Please find a full profiling graph attached.
>>>>
>>>> Regards,
>>>>
>>>> On 03/20/2014 10:31 PM, Sean Dague wrote:
>>>>
>>>>> On 03/20/2014 01:01 PM, David Kranz wrote:
>>>>>> On 03/20/2014 12:31 PM, Sean Dague wrote:
>>>>>>> On 03/20/2014 11:35 AM, David Kranz wrote:
>>>>>>>> On 03/20/2014 06:15 AM, Sean Dague wrote:
>>>>>>>>> On 03/20/2014 05:49 AM, Nadya Privalova wrote:
>>>>>>>>>> Hi all,
>>>>>>>>>> First of all, thanks for your suggestions!
>>>>>>>>>>
>>>>>>>>>> To summarize the discussions here:
>>>>>>>>>> 1. We are not going to install Mongo (because "is's wrong" ?)
>>>>>>>>> We are not going to install Mongo "not from base distribution", because
>>>>>>>>> we don't do that for things that aren't python. Our assumption is
>>>>>>>>> dependent services come from the base OS.
>>>>>>>>>
>>>>>>>>> That being said, being an integrated project means you have to be able
>>>>>>>>> to function, sanely, on an sqla backend, as that will always be part of
>>>>>>>>> your gate.
>>>>>>>> This is a claim I think needs a bit more scrutiny if by "sanely" you
>>>>>>>> mean "performant". It seems we have an integrated project that no one
>>>>>>>> would deploy using the sql db driver we have in the gate. Is any one
>>>>>>>> doing that?  Is having a scalable sql back end a goal of ceilometer?
>>>>>>>>
>>>>>>>> More generally, if there is functionality that is of great importance to
>>>>>>>> any cloud deployment (and we would not integrate it if we didn't think
>>>>>>>> it was) that cannot be deployed at scale using sqla, are we really going
>>>>>>>> to say it should not be a part of OpenStack because we refuse, for
>>>>>>>> whatever reason, to run it in our gate using a driver that would
>>>>>>>> actually be used? And if we do demand an sqla backend, how much time
>>>>>>>> should we spend trying to optimize it if no one will really use it?
>>>>>>>> Though the slow heat job is a little different because the slowness
>>>>>>>> comes directly from running real use cases, perhaps we should just set
>>>>>>>> up a "slow ceilometer" job if the sql version is too slow for its budget
>>>>>>>> in the main job.
>>>>>>>>
>>>>>>>> It seems like there is a similar thread, at least in part, about this
>>>>>>>> around marconi.
>>>>>>> We required a non mongo backend to graduate ceilometer. So I don't think
>>>>>>> it's too much to ask that it actually works.
>>>>>>>
>>>>>>> If the answer is that it will never work and it was a checkbox with no
>>>>>>> intent to make it work, then it should be deprecated and removed from
>>>>>>> the tree in Juno, with a big WARNING that you shouldn't ever use that
>>>>>>> backend. Like Nova now does with all the virt drivers that aren't tested
>>>>>>> upstream.
>>>>>>>
>>>>>>> Shipping in tree code that you don't want people to use is bad for
>>>>>>> users. Either commit to making it work, or deprecate it and remove it.
>>>>>>>
>>>>>>> I don't see this as the same issue as the slow heat job. Heat,
>>>>>>> architecturally, is going to be slow. It spins up real OSes and does
>>>>>>> real thinks to them. There is no way that's ever going to be fast, and
>>>>>>> the dedicated job was a recognition that to support this level of
>>>>>>> services in OpenStack we need to give them more breathing room.
>>>>>> Peace. I specifically noted that difference in my original comment. And
>>>>>> for that reason the heat slow job may not be temporary.
>>>>>>> Architecturally Ceilometer should not be this expensive. We've got some
>>>>>>> data showing it to be aberrant from where we believe it should be. We
>>>>>>> should fix that.
>>>>>> There are plenty of cases where we have had code that passes gate tests
>>>>>> with acceptable performance but falls over in real deployment. I'm just
>>>>>> saying that having a driver that works ok in the gate but does not work
>>>>>> for real deployments is of no more value that not having it at all.
>>>>>> Maybe less value.
>>>>>> How do you propose to solve the problem of getting more ceilometer tests
>>>>>> into the gate in the short-run? As a practical measure l don't see why
>>>>>> it is so bad to have a separate job until the complex issue of whether
>>>>>> it is possible to have a real-world performant sqla backend is resolved.
>>>>>> Or did I miss something and it has already been determined that sqla
>>>>>> could be used for large-scale deployments if we just fixed our code?
>>>>> I think right now the ball is back in the ceilometer court to do some
>>>>> performance profiling, and lets see what comes of that. I don't think
>>>>> we're getting more test before the release in any real way.
>>>>>
>>>>>>> Once we get a base OS in the gate that lets us direct install mongo from
>>>>>>> base packages, we can also do that. Or someone can 3rd party it today.
>>>>>>> Then we'll even have comparative results to understand the differences.
>>>>>> Yes. Do you know which base OS's are candidates for that?
>>>>> Ubuntu 14.04 will have a sufficient level of Mongo, so some time in the
>>>>> Juno cycle we should have it in the gate.
>>>>>
>>>>> 	-Sean
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> OpenStack-dev mailing list
>>>>> OpenStack-dev at lists.openstack.org
>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>> _______________________________________________
>>>> OpenStack-dev mailing list
>>>> OpenStack-dev at lists.openstack.org
>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>
>>> _______________________________________________
>>> OpenStack-dev mailing list
>>> OpenStack-dev at lists.openstack.org
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>> _______________________________________________
>> OpenStack-dev mailing list
>> OpenStack-dev at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




More information about the OpenStack-dev mailing list