[openstack-dev] [Fuel][Nailgun] Random failures in unit tests

Mike Scherbakov mscherbakov at mirantis.com
Thu Mar 24 05:34:24 UTC 2016


I finally got it passing all the tests, including performance:
https://review.openstack.org/#/c/294976/. I'd appreciate if you guys can
review/land it sooner than later: patch touches many tests, and it would be
beneficial for everyone to be based on updated code.

Thanks,

On Mon, Mar 21, 2016 at 12:22 AM Mike Scherbakov <mscherbakov at mirantis.com>
wrote:

> FakeUI, which is based on fake threads, is obviously needed for
> development purposes.
> Ideally we need to refactor our integration tests, so that we don't run
> whole pipeline in every test. To start, I suggest that we switch from
> threads to synchronous runs of test cases (while keeping threads for
> fakeUI).
> Please take a look & comment in this draft:
> https://review.openstack.org/#/c/294976/
>
> Thanks,
>
> On Wed, Mar 16, 2016 at 7:30 AM Igor Kalnitsky <ikalnitsky at mirantis.com>
> wrote:
>
>> Hey Vitaly,
>>
>> Thanks for your feedback, it's an important notice. However, I think
>> you didn't get the problem quite well so let me explain it again.
>>
>> You see, Nailgun unit tests are failing due to races or deadlocks
>> happened by two transactions: test transaction and fake thread
>> transaction, and we must face it and fix it. This problem has nothing
>> to do with the problem you're encountering in UI tests. Besides,
>> removing them from test doesn't mean removing them from Nailgun code
>> base.
>>
>> So your problem must be addressed, but it's kinda another story.
>>
>> Thanks,
>> Igor
>>
>> On Wed, Mar 16, 2016 at 4:21 PM, Vitaly Kramskikh
>> <vkramskikh at mirantis.com> wrote:
>> > Igor,
>> >
>> > We have UI and CLI integration tests which use fake mode of Nailgun,
>> and we
>> > can't avoid using fake threads for them. So I think we need to think
>> how to
>> > fix fake threads instead. There is a critical bug which is the main
>> reason
>> > of randomly failing UI tests. To fix it, we need to fix fake threads
>> > behaviour.
>> >
>> > 2016-03-16 17:06 GMT+03:00 Igor Kalnitsky <ikalnitsky at mirantis.com>:
>> >>
>> >> Hey Fuelers,
>> >>
>> >> As you might know recently we encounter a lot of random test failures
>> >> on CI, and they are still there (likely with a bit less probability).
>> >> A nature of that random failures is actually not a random, they are
>> >> happened because of so called fake threads.
>> >>
>> >> Fake threads, actually, ain't fake at all. They are native OS threads
>> >> that are designed to emulate Astute behaviour (i.e. catch RPC call and
>> >> respond with appropriate message). Since they are native threads and
>> >> we use SQLAlchemy's scoped_session, fake threads are using a separate
>> >> database session, hence - transaction. That leads to the following
>> >> issues:
>> >>
>> >> * Races. We don't know when threads are switched, therefore, we don't
>> >> know what's committed and what's not. Some Nailgun tests sends
>> >> something via RPC (catched by fake threads) and immediately checks
>> >> something. The issue is, we can't guarantee fake threads is already
>> >> committed produced result. That could be avoided by waiting for
>> >> 'ready' status of created nailgun task, however, it's better to simply
>> >> do not use fake threads in that case and simply call appropriate
>> >> Nailgun receiver's method directly in the test.
>> >>
>> >> * Deadlocks. It's incredibly hard to ensure the same order of database
>> >> locks in test + business code on one hand and fake thread code on
>> >> other hand. That's why we can (and we do) encounter deadlocks on CI,
>> >> when test case waits for lock acquired by fake thread, and fake thread
>> >> waits for lock acquired by test case.
>> >>
>> >> Fake threads are became a bottleneck of landing patches to master in
>> >> time, and we can't ignore it anymore. We have ~190 tests that use fake
>> >> threads, and fixing them all at once is a boring routine. So I kindly
>> >> ask Nailgun contrubitors to fix them as soon as we face them. Let's
>> >> file a bug on each file in CI, and quicly prepare a separate patch
>> >> that removes fake thread from failed test.
>> >>
>> >> Thanks in advance,
>> >> Igor
>> >>
>> >>
>> __________________________________________________________________________
>> >> OpenStack Development Mailing List (not for usage questions)
>> >> Unsubscribe:
>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>> >
>> >
>> >
>> >
>> > --
>> > Vitaly Kramskikh,
>> > Fuel UI Tech Lead,
>> > Mirantis, Inc.
>> >
>> >
>> __________________________________________________________________________
>> > OpenStack Development Mailing List (not for usage questions)
>> > Unsubscribe:
>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>> >
>>
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
> --
> Mike Scherbakov
> #mihgen
>
-- 
Mike Scherbakov
#mihgen
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20160324/609c357f/attachment.html>


More information about the OpenStack-dev mailing list