[openstack-dev] [openstack-ansible] Random ssh errors in gate check jobs

Jesse Pretorius jesse.pretorius at gmail.com
Mon Nov 23 12:32:57 UTC 2015


On 23 November 2015 at 01:36, Major Hayden <major at mhtx.net> wrote:

> Hey folks,
>
> Some of my recent reviews have been frequent fliers in the land of CI gate
> jobs and I've spent a fair amount of time diagnosing random ssh failures to
> containers in AIO builds.  The error I get most often is this:
>
>     SSH Error: data could not be sent to the remote host. Make sure this
> host can be reached over ssh
>
> After digging in Ansible code for a bit, I found the error within the ssh
> connection plugin[1].  It looks like an issue where the ssh connection is
> actually open but data cannot be sent to the subprocess.
>
> I messed around heavily with multiplexing, keys, GSSAPI, and more, but the
> errors randomly appear.  I've proposed a review[2] for a switch to paramiko
> transport mode for gate jobs only and it has run four times without ssh
> errors (although two builds had timeouts due to the repo build taking too
> long).
>
> The fifth build is running now and it seems to be moving along fairly
> quickly.
>
> [1]
> https://github.com/ansible/ansible/blob/devel/lib/ansible/plugins/connection/ssh.py#L245-L260
> [2] https://review.openstack.org/#/c/248361/


Thanks for digging into this Major. It is a royal pain and will likely be
resolved with the release of Ansible 2, but for now we're stuck with having
to work around the issue with what we have.

I wonder, is there a difference in results or performance between using
paramiko or turning ssh pipelining off?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20151123/7dfcd39f/attachment.html>


More information about the OpenStack-dev mailing list