[openstack-dev] Tripleo, Ironic, the SSH power driver, paramiko and eventlet fun.
jang at ioctl.org
jang at ioctl.org
Tue Jun 24 11:15:30 UTC 2014
There's a bug on this:
https://bugs.launchpad.net/ironic/+bug/1321787?comments=all
It seems like it's been well-known for a long time that paramiko
parallelism doesn't work well with eventlet. Ironic's aggressive use of
the ssh power driver seems to hit this hard.
The sign that you're hitting a problem with this is the "multiple
simultaneous readers" warning, which is spurious (but a sign of trouble).
I've started to follow up this problem with the eventletdev mailing list,
since having had a trawl back through tickets it looks like we've seen
other issues arising from this in various places, going back at least 18
months. They're not all paramiko-related; not are they necessarily
"caused" by the thing mentioned in the TRACE lines - that's just the point
eventlet can detect the problem. I've seen the glanceclient, at the least,
also trigger this - as well as Ironic's use of utils.execute to launch
(again, parallel) qemu-img calls.
I'm just trying a tripleo run with a much reduced workers_pool_size to see
if I can at least forcibly get a run to complete successfully.
As to where the problem lies: it seems eventlet has a registered listener
backed by one FD. That FD gets recycled by another thread, which attempts
to read or write on it. That's why eventlet is carping. Paramiko seems to
trigger this quite reliably because it uses a worker thread to manage its
ssh communication.
Fixes to eventlet might be quite tricky - the bug above has a link to some
quick sketches in github - although it's just struck me that there may be
a simpler approach to investigate, which I'll pursue after sending this.
It'd be good to get some eyeballs on this eventlet problem - it's been
hitting us for quite a time - only, previously to ironic, not at a
sufficiently high rate to cause huge amounts of pain.
Cheers,
jan
PS. A longer, rambling braindump went to the eventletdev mailing list,
which can be found here:
https://lists.secondlife.com/pipermail/eventletdev/2014-June/thread.html
...I think I've a better handle on the problem now, but I still don't have
a satisfactory from-first-principles explanation of exactly the state of
events that causes paramiko to trigger this.
--
jang at ioctl.org http://ioctl.org/jan/
stty intr ^m
More information about the OpenStack-dev
mailing list