[Openstack-stable-maint] 2013.2.2 exception requests

Ihar Hrachyshka ihrachys at redhat.com
Thu Feb 13 12:09:16 UTC 2014


Hi, see below.

----- Original Message -----
> Retrying the ssh connection with on all ssh exception may help.
> 
> It is possible the ssh server causes this type of exception,
> when the key or the ssh service being configured by cloud-init.
> 

First, tests don't use cloud-init based images to start new nova instances. Cirros images use some similar, but another service to set instance up. See: http://bazaar.launchpad.net/~smoser/cirros/trunk/view/head:/src/sbin/cirros-ds

The fix in question is for neutron-metadata-agent, and it was not hit by any requests from the new instance created by tempest, meaning the instance either failed to run, or network connection was not properly established. Nova-api log shows that new nova instance state is polled for some time (~6 mins), but its port is always in DOWN state.

> It also can hide a temporary network black hole issue.
> 

The instance is created at ~00:59:??, the test fails at ~01:06:??, so it's hardly temporary.

> These are not scientifically proven things, but
> https://review.openstack.org/#/c/73186/.
> 
> NOTE: We are using the same ssh code to make connection,
> in nova network jobs since long..
> 

This review catches another exception type (SSHException). Does it mean that if that would be our issue, we would see SSHException tracebacks in tempest log? There's no such thing there.

> The other mentioned changes probably does not have impact to the stability,
> they mainly improves the logging of the failures.
> 
> The 9f756a081533b55f212221ea5de8ed968acea273 and the following patches
>  might decrease the load on the l3 agent,
> but it would be more difficult to backport.
> 
> I do not remember anything else in tempest what may help to
>  make the stable/havana neutron jobs more stable.
> 

There was also some bug in file injection to a new instance in gate that made ssh sessions fail. Something related to guestfs, but I don't know all the details. Adding Russel to Cc since he may have more info on this.

> Best Regards,
> Attila
> 
> ----- Original Message -----
> > From: "Alan Pevec" <apevec at gmail.com>
> > To: "Gary Kotton" <gkotton at vmware.com>, "Attila Fazekas"
> > <afazekas at redhat.com>, "Joe Gordon" <joe.gordon0 at gmail.com>,
> > "David Kranz" <dkranz at redhat.com>, mtreinish at kortar.org, "Sean Dague"
> > <sean at dague.net>
> > Cc: "openstack-stable-maint" <openstack-stable-maint at lists.openstack.org>
> > Sent: Wednesday, February 12, 2014 11:44:58 PM
> > Subject: Re: [Openstack-stable-maint] 2013.2.2 exception requests
> > 
> > Copying authors of tempest patches referenced below + few Tempest core
> > members who might be interested.
> > 
> > >> https://review.openstack.org/#/c/72754/
> > > That's a good candidate for exception, and I see Neutron stable-maint
> > > members already approved but it's failing *-isolated gate jobs.
> > > I'll try throwing dice few more times, but could someone familiar have a
> > > look?
> > > What are those jobs doing?
> > 
> > Ihar commented in the review: " I suspect tempest lacks some of those
> > ssh.py fixes from master:
> > c3128c085c2635d82c4909d1be5d016df4978632
> > ad7ef7d1bdd98045639ee4045144c8fe52853e76
> > 31a91a605a25f578b51a7bed2df8fde5c5f49ffc
> > I'm not sure this would be enough to stabilize gate though."
> > 
> > Gary, Attila, Joe - would you like to backport your patches to
> > stable/havana Tempest?
> > Do you agree they should improve gate stability and is there anything
> > else to be backported to stabilize *-isolated gate jobs?
> > 
> > 
> > Thanks,
> > Alan
> > 
> 
> _______________________________________________
> Openstack-stable-maint mailing list
> Openstack-stable-maint at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-stable-maint
> 



More information about the Openstack-stable-maint mailing list