Retrying the ssh connection with on all ssh exception may help. It is possible the ssh server causes this type of exception, when the key or the ssh service being configured by cloud-init. It also can hide a temporary network black hole issue. These are not scientifically proven things, but https://review.openstack.org/#/c/73186/. NOTE: We are using the same ssh code to make connection, in nova network jobs since long.. The other mentioned changes probably does not have impact to the stability, they mainly improves the logging of the failures. The 9f756a081533b55f212221ea5de8ed968acea273 and the following patches might decrease the load on the l3 agent, but it would be more difficult to backport. I do not remember anything else in tempest what may help to make the stable/havana neutron jobs more stable. Best Regards, Attila ----- Original Message -----
From: "Alan Pevec" <apevec@gmail.com> To: "Gary Kotton" <gkotton@vmware.com>, "Attila Fazekas" <afazekas@redhat.com>, "Joe Gordon" <joe.gordon0@gmail.com>, "David Kranz" <dkranz@redhat.com>, mtreinish@kortar.org, "Sean Dague" <sean@dague.net> Cc: "openstack-stable-maint" <openstack-stable-maint@lists.openstack.org> Sent: Wednesday, February 12, 2014 11:44:58 PM Subject: Re: [Openstack-stable-maint] 2013.2.2 exception requests
Copying authors of tempest patches referenced below + few Tempest core members who might be interested.
https://review.openstack.org/#/c/72754/ That's a good candidate for exception, and I see Neutron stable-maint members already approved but it's failing *-isolated gate jobs. I'll try throwing dice few more times, but could someone familiar have a look? What are those jobs doing?
Ihar commented in the review: " I suspect tempest lacks some of those ssh.py fixes from master: c3128c085c2635d82c4909d1be5d016df4978632 ad7ef7d1bdd98045639ee4045144c8fe52853e76 31a91a605a25f578b51a7bed2df8fde5c5f49ffc I'm not sure this would be enough to stabilize gate though."
Gary, Attila, Joe - would you like to backport your patches to stable/havana Tempest? Do you agree they should improve gate stability and is there anything else to be backported to stabilize *-isolated gate jobs?
Thanks, Alan