[Openstack-stable-maint] 2013.2.2 exception requests

Attila Fazekas afazekas at redhat.com
Thu Feb 13 16:24:32 UTC 2014


I agree, in this specific case it will not help.

May be in 10..100 ppm of the failure cases it may help.

----- Original Message -----
> From: "Ihar Hrachyshka" <ihrachys at redhat.com>
> To: "Attila Fazekas" <afazekas at redhat.com>
> Cc: "Alan Pevec" <apevec at gmail.com>, "Joe Gordon" <joe.gordon0 at gmail.com>, "openstack-stable-maint"
> <openstack-stable-maint at lists.openstack.org>, "Sean Dague" <sean at dague.net>, "Russell Bryant" <rbryant at redhat.com>
> Sent: Thursday, February 13, 2014 1:09:16 PM
> Subject: Re: [Openstack-stable-maint] 2013.2.2 exception requests
> 
> Hi, see below.
> 
> ----- Original Message -----
> > Retrying the ssh connection with on all ssh exception may help.
> > 
> > It is possible the ssh server causes this type of exception,
> > when the key or the ssh service being configured by cloud-init.
> > 
> 
> First, tests don't use cloud-init based images to start new nova instances.
> Cirros images use some similar, but another service to set instance up. See:
> http://bazaar.launchpad.net/~smoser/cirros/trunk/view/head:/src/sbin/cirros-ds
> 
> The fix in question is for neutron-metadata-agent, and it was not hit by any
> requests from the new instance created by tempest, meaning the instance
> either failed to run, or network connection was not properly established.
> Nova-api log shows that new nova instance state is polled for some time (~6
> mins), but its port is always in DOWN state.
> 
> > It also can hide a temporary network black hole issue.
> > 
> 
> The instance is created at ~00:59:??, the test fails at ~01:06:??, so it's
> hardly temporary.
> 
> > These are not scientifically proven things, but
> > https://review.openstack.org/#/c/73186/.
> > 
> > NOTE: We are using the same ssh code to make connection,
> > in nova network jobs since long..
> > 
> 
> This review catches another exception type (SSHException). Does it mean that
> if that would be our issue, we would see SSHException tracebacks in tempest
> log? There's no such thing there.
> 
> > The other mentioned changes probably does not have impact to the stability,
> > they mainly improves the logging of the failures.
> > 
> > The 9f756a081533b55f212221ea5de8ed968acea273 and the following patches
> >  might decrease the load on the l3 agent,
> > but it would be more difficult to backport.
> > 
> > I do not remember anything else in tempest what may help to
> >  make the stable/havana neutron jobs more stable.
> > 
> 
> There was also some bug in file injection to a new instance in gate that made
> ssh sessions fail. Something related to guestfs, but I don't know all the
> details. Adding Russel to Cc since he may have more info on this.
> 
> > Best Regards,
> > Attila
> > 
> > ----- Original Message -----
> > > From: "Alan Pevec" <apevec at gmail.com>
> > > To: "Gary Kotton" <gkotton at vmware.com>, "Attila Fazekas"
> > > <afazekas at redhat.com>, "Joe Gordon" <joe.gordon0 at gmail.com>,
> > > "David Kranz" <dkranz at redhat.com>, mtreinish at kortar.org, "Sean Dague"
> > > <sean at dague.net>
> > > Cc: "openstack-stable-maint" <openstack-stable-maint at lists.openstack.org>
> > > Sent: Wednesday, February 12, 2014 11:44:58 PM
> > > Subject: Re: [Openstack-stable-maint] 2013.2.2 exception requests
> > > 
> > > Copying authors of tempest patches referenced below + few Tempest core
> > > members who might be interested.
> > > 
> > > >> https://review.openstack.org/#/c/72754/
> > > > That's a good candidate for exception, and I see Neutron stable-maint
> > > > members already approved but it's failing *-isolated gate jobs.
> > > > I'll try throwing dice few more times, but could someone familiar have
> > > > a
> > > > look?
> > > > What are those jobs doing?
> > > 
> > > Ihar commented in the review: " I suspect tempest lacks some of those
> > > ssh.py fixes from master:
> > > c3128c085c2635d82c4909d1be5d016df4978632
> > > ad7ef7d1bdd98045639ee4045144c8fe52853e76
> > > 31a91a605a25f578b51a7bed2df8fde5c5f49ffc
> > > I'm not sure this would be enough to stabilize gate though."
> > > 
> > > Gary, Attila, Joe - would you like to backport your patches to
> > > stable/havana Tempest?
> > > Do you agree they should improve gate stability and is there anything
> > > else to be backported to stabilize *-isolated gate jobs?
> > > 
> > > 
> > > Thanks,
> > > Alan
> > > 
> > 
> > _______________________________________________
> > Openstack-stable-maint mailing list
> > Openstack-stable-maint at lists.openstack.org
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-stable-maint
> > 
> 



More information about the Openstack-stable-maint mailing list