[openstack-dev] [neutron] [nova] non-deterministic gate failures due to unclosed eventlet Timeouts

Matt Riedemann mriedem at linux.vnet.ibm.com
Sun Sep 7 14:43:14 UTC 2014



On 9/7/2014 8:39 AM, John Schwarz wrote:
> Hi,
>
> Long story short: for future reference, if you initialize an eventlet
> Timeout, make sure you close it (either with a context manager or simply
> timeout.close()), and be extra-careful when writing tests using
> eventlet Timeouts, because these timeouts don't implicitly expire and
> will cause unexpected behaviours (see [1]) like gate failures. In our
> case this caused non-deterministic failures on the dsvm-functional test
> suite.
>
>
> Late last week, a bug was found ([2]) in which an eventlet Timeout
> object was initialized but not closed. This instance was left inside
> eventlet's inner-workings and triggered non-deterministic "Timeout: 10
> seconds" errors and failures in dsvm-functional tests.
>
> As mentioned earlier, initializing a new eventlet.timeout.Timeout
> instance also registers it to inner mechanisms that exist within the
> library, and the reference remains there until it is explicitly removed
> (and not until the scope leaves the function block, as some would have
> thought). Thus, the old code (simply creating an instance without
> assigning it to a variable) left no way to close the timeout object.
> This reference remains throughout the "life" of a worker, so this can
> (and did) effect other tests and procedures using eventlet under the
> same process. Obviously this could easily effect production-grade
> systems with very high load.
>
> For future reference:
>   1) If you run into a "Timeout: %d seconds" exception whose traceback
> includes "hub.switch()" and "self.greenlet.switch()" calls, there might
> be a latent Timeout somewhere in the code, and a search for all
> eventlet.timeout.Timeout instances will probably produce the culprit.
>
>   2) The setup used to reproduce this error for debugging purposes is a
> baremetal machine running a VM with devstack. In the baremetal machine I
> used some 6 "dd if=/dev/zero of=/dev/null" to simulate high CPU load
> (full command can be found at [3]), and in the VM I ran the
> dsvm-functional suite. Using only a VM with similar high CPU simulation
> fails to produce the result.
>
> [1]
> http://eventlet.net/doc/modules/timeout.html#eventlet.timeout.eventlet.timeout.Timeout.Timeout.cancel
> [2] https://review.openstack.org/#/c/119001/
> [3]
> http://stackoverflow.com/questions/2925606/how-to-create-a-cpu-spike-with-a-bash-command
>
>
> --
> John Schwarz,
> Software Engineer, Red Hat.
>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>

Thanks, that might be what's causing this timeout/gate failure in the 
nova unit tests. [1]

[1] https://bugs.launchpad.net/nova/+bug/1357578

-- 

Thanks,

Matt Riedemann




More information about the OpenStack-dev mailing list