[openstack-dev] [neutron] non-deterministic gate failures due to unclosed eventlet Timeouts

Doug Hellmann doug at doughellmann.com
Mon Sep 8 15:31:21 UTC 2014


On Sep 7, 2014, at 9:39 AM, John Schwarz <jschwarz at redhat.com> wrote:

> Hi,
> 
> Long story short: for future reference, if you initialize an eventlet
> Timeout, make sure you close it (either with a context manager or simply
> timeout.close()), and be extra-careful when writing tests using
> eventlet Timeouts, because these timeouts don't implicitly expire and
> will cause unexpected behaviours (see [1]) like gate failures. In our
> case this caused non-deterministic failures on the dsvm-functional test
> suite.

It would be good to have a fixture class in oslotest to set up eventlet timeouts properly.

Doug

> 
> 
> Late last week, a bug was found ([2]) in which an eventlet Timeout
> object was initialized but not closed. This instance was left inside
> eventlet's inner-workings and triggered non-deterministic "Timeout: 10
> seconds" errors and failures in dsvm-functional tests.
> 
> As mentioned earlier, initializing a new eventlet.timeout.Timeout
> instance also registers it to inner mechanisms that exist within the
> library, and the reference remains there until it is explicitly removed
> (and not until the scope leaves the function block, as some would have
> thought). Thus, the old code (simply creating an instance without
> assigning it to a variable) left no way to close the timeout object.
> This reference remains throughout the "life" of a worker, so this can
> (and did) effect other tests and procedures using eventlet under the
> same process. Obviously this could easily effect production-grade
> systems with very high load.
> 
> For future reference:
> 1) If you run into a "Timeout: %d seconds" exception whose traceback
> includes "hub.switch()" and "self.greenlet.switch()" calls, there might
> be a latent Timeout somewhere in the code, and a search for all
> eventlet.timeout.Timeout instances will probably produce the culprit.
> 
> 2) The setup used to reproduce this error for debugging purposes is a
> baremetal machine running a VM with devstack. In the baremetal machine I
> used some 6 "dd if=/dev/zero of=/dev/null" to simulate high CPU load
> (full command can be found at [3]), and in the VM I ran the
> dsvm-functional suite. Using only a VM with similar high CPU simulation
> fails to produce the result.
> 
> [1]
> http://eventlet.net/doc/modules/timeout.html#eventlet.timeout.eventlet.timeout.Timeout.Timeout.cancel
> [2] https://review.openstack.org/#/c/119001/
> [3]
> http://stackoverflow.com/questions/2925606/how-to-create-a-cpu-spike-with-a-bash-command
> 
> 
> --
> John Schwarz,
> Software Engineer, Red Hat.
> 
> 
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




More information about the OpenStack-dev mailing list