[openstack-dev] [Fuel][Nailgun] Random failures in unit tests

Igor Kalnitsky ikalnitsky at mirantis.com
Wed Mar 16 14:28:11 UTC 2016


Hey Vitaly,

Thanks for your feedback, it's an important notice. However, I think
you didn't get the problem quite well so let me explain it again.

You see, Nailgun unit tests are failing due to races or deadlocks
happened by two transactions: test transaction and fake thread
transaction, and we must face it and fix it. This problem has nothing
to do with the problem you're encountering in UI tests. Besides,
removing them from test doesn't mean removing them from Nailgun code
base.

So your problem must be addressed, but it's kinda another story.

Thanks,
Igor

On Wed, Mar 16, 2016 at 4:21 PM, Vitaly Kramskikh
<vkramskikh at mirantis.com> wrote:
> Igor,
>
> We have UI and CLI integration tests which use fake mode of Nailgun, and we
> can't avoid using fake threads for them. So I think we need to think how to
> fix fake threads instead. There is a critical bug which is the main reason
> of randomly failing UI tests. To fix it, we need to fix fake threads
> behaviour.
>
> 2016-03-16 17:06 GMT+03:00 Igor Kalnitsky <ikalnitsky at mirantis.com>:
>>
>> Hey Fuelers,
>>
>> As you might know recently we encounter a lot of random test failures
>> on CI, and they are still there (likely with a bit less probability).
>> A nature of that random failures is actually not a random, they are
>> happened because of so called fake threads.
>>
>> Fake threads, actually, ain't fake at all. They are native OS threads
>> that are designed to emulate Astute behaviour (i.e. catch RPC call and
>> respond with appropriate message). Since they are native threads and
>> we use SQLAlchemy's scoped_session, fake threads are using a separate
>> database session, hence - transaction. That leads to the following
>> issues:
>>
>> * Races. We don't know when threads are switched, therefore, we don't
>> know what's committed and what's not. Some Nailgun tests sends
>> something via RPC (catched by fake threads) and immediately checks
>> something. The issue is, we can't guarantee fake threads is already
>> committed produced result. That could be avoided by waiting for
>> 'ready' status of created nailgun task, however, it's better to simply
>> do not use fake threads in that case and simply call appropriate
>> Nailgun receiver's method directly in the test.
>>
>> * Deadlocks. It's incredibly hard to ensure the same order of database
>> locks in test + business code on one hand and fake thread code on
>> other hand. That's why we can (and we do) encounter deadlocks on CI,
>> when test case waits for lock acquired by fake thread, and fake thread
>> waits for lock acquired by test case.
>>
>> Fake threads are became a bottleneck of landing patches to master in
>> time, and we can't ignore it anymore. We have ~190 tests that use fake
>> threads, and fixing them all at once is a boring routine. So I kindly
>> ask Nailgun contrubitors to fix them as soon as we face them. Let's
>> file a bug on each file in CI, and quicly prepare a separate patch
>> that removes fake thread from failed test.
>>
>> Thanks in advance,
>> Igor
>>
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
>
>
> --
> Vitaly Kramskikh,
> Fuel UI Tech Lead,
> Mirantis, Inc.
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>



More information about the OpenStack-dev mailing list