<div dir="ltr"><div>Igor,<br><br></div>We have UI and CLI integration tests which use fake mode of Nailgun, and we can't avoid using fake threads for them. So I think we need to think how to fix fake threads instead. There is <a href="https://bugs.launchpad.net/fuel/+bug/1549750">a critical bug</a> which is the main reason of randomly failing UI tests. To fix it, we need to fix fake threads behaviour.<br></div><div class="gmail_extra"><br><div class="gmail_quote">2016-03-16 17:06 GMT+03:00 Igor Kalnitsky <span dir="ltr"><<a href="mailto:ikalnitsky@mirantis.com" target="_blank">ikalnitsky@mirantis.com</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hey Fuelers,<br>

<br>

As you might know recently we encounter a lot of random test failures<br>

on CI, and they are still there (likely with a bit less probability).<br>

A nature of that random failures is actually not a random, they are<br>

happened because of so called fake threads.<br>

<br>

Fake threads, actually, ain't fake at all. They are native OS threads<br>

that are designed to emulate Astute behaviour (i.e. catch RPC call and<br>

respond with appropriate message). Since they are native threads and<br>

we use SQLAlchemy's scoped_session, fake threads are using a separate<br>

database session, hence - transaction. That leads to the following<br>

issues:<br>

<br>

* Races. We don't know when threads are switched, therefore, we don't<br>

know what's committed and what's not. Some Nailgun tests sends<br>

something via RPC (catched by fake threads) and immediately checks<br>

something. The issue is, we can't guarantee fake threads is already<br>

committed produced result. That could be avoided by waiting for<br>

'ready' status of created nailgun task, however, it's better to simply<br>

do not use fake threads in that case and simply call appropriate<br>

Nailgun receiver's method directly in the test.<br>

<br>

* Deadlocks. It's incredibly hard to ensure the same order of database<br>

locks in test + business code on one hand and fake thread code on<br>

other hand. That's why we can (and we do) encounter deadlocks on CI,<br>

when test case waits for lock acquired by fake thread, and fake thread<br>

waits for lock acquired by test case.<br>

<br>

Fake threads are became a bottleneck of landing patches to master in<br>

time, and we can't ignore it anymore. We have ~190 tests that use fake<br>

threads, and fixing them all at once is a boring routine. So I kindly<br>

ask Nailgun contrubitors to fix them as soon as we face them. Let's<br>

file a bug on each file in CI, and quicly prepare a separate patch<br>

that removes fake thread from failed test.<br>

<br>

Thanks in advance,<br>

Igor<br>

<br>

__________________________________________________________________________<br>

OpenStack Development Mailing List (not for usage questions)<br>

Unsubscribe: <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" rel="noreferrer" target="_blank">OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a><br>

<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" rel="noreferrer" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>

</blockquote></div><br><br clear="all"><br>-- <br><div class="gmail_signature"><div dir="ltr"><div><div dir="ltr">Vitaly Kramskikh,<br>Fuel UI Tech Lead,<br>Mirantis, Inc.</div></div></div></div>

</div>