[openstack-dev] [neutron] [gate] 35% failure rate for neutron tempest jobs

Armando M. armamig at gmail.com
Tue Nov 10 19:43:10 UTC 2015


On 10 November 2015 at 11:11, Sean Dague <sean at dague.net> wrote:

> On 11/10/2015 01:37 PM, Armando M. wrote:
> >
> >
> > On 10 November 2015 at 09:49, Sean Dague <sean at dague.net
> > <mailto:sean at dague.net>> wrote:
> >
> >     The neutron tempest jobs are now at a 35% failure rate:
> >     http://tinyurl.com/ne3ex4v (note, 35% is basically the worst
> possible
> >     fail rate, because it's just passing enough to land patches that
> cause
> >     that kind of fail on two test runs check/gate with a coin flip).
> >
> >
> > Sean, thanks for the heads-up.
> >
> >
> >
> >     The failure is currently seen here -
> >
> http://logstash.openstack.org/#/dashboard/file/logstash.json?query=message:%5C%22No%20IPv4%20addresses%20found%20in:%20%5B%5D%5C%22
> >
> >     That is a new assert that was added in Tempest. However it was added
> in
> >     a path that expects there should be an IPv4 address. The fact that
> port
> >     is sometimes not returning one is problematic.
> >     https://review.openstack.org/#/c/241800/
> >
> >     The server via nova is returning an address here -
> >
> http://logs.openstack.org/76/243676/1/check/gate-tempest-dsvm-neutron-full/291e1d7/logs/tempest.txt.gz#_2015-11-10_17_14_35_465
> >
> >     But then when the port is polled here:
> >
> http://logs.openstack.org/76/243676/1/check/gate-tempest-dsvm-neutron-full/291e1d7/logs/tempest.txt.gz#_2015-11-10_17_14_35_527
> >     it comes back with {"ports": []}
> >
> >
> >     This can be contrasted with a working path where we do the similar
> >     action on the Server is active here -
> >
> http://logs.openstack.org/76/243676/1/check/gate-tempest-dsvm-neutron-full/291e1d7/logs/tempest.txt.gz#_2015-11-10_17_13_48_193
> >
> >     Then we verify the port -
> >
> http://logs.openstack.org/76/243676/1/check/gate-tempest-dsvm-neutron-full/291e1d7/logs/tempest.txt.gz#_2015-11-10_17_13_48_230
> >
> >     Which returns:
> >
> >       Body: {"ports": [{"status": "ACTIVE", "binding:host_id":
> >     "devstack-trusty-rax-dfw-5784820", "allowed_address_pairs": [],
> >     "extra_dhcp_opts": [], "dns_assignment": [{"hostname":
> >     "host-10-100-0-3", "ip_address": "10.100.0.3", "fqdn":
> >     "host-10-100-0-3.openstacklocal."}], "device_owner": "compute:None",
> >     "port_security_enabled": true, "binding:profile": {}, "fixed_ips":
> >     [{"subnet_id": "147b1e65-3463-4965-8461-11b76a00dd99", "ip_address":
> >     "10.100.0.3"}], "id": "65c11c76-42fc-4010-bbb8-58996911803e",
> >     "security_groups": ["f2d48dcf-ea8d-4a7c-bf09-da37d3c2ee37"],
> >     "device_id": "b03bec85-fe69-4c0d-94e8-51753a8bebd5", "name": "",
> >     "admin_state_up": true, "network_id":
> >     "eb72d3af-f1a0-410b-8085-76cbe19ace90", "dns_name": "",
> >     "binding:vif_details": {"port_filter": true, "ovs_hybrid_plug":
> true},
> >     "binding:vnic_type": "normal", "binding:vif_type": "ovs",
> "tenant_id":
> >     "eab50a3d331c4db3a68f71d1ebdc41bf", "mac_address":
> >     "fa:16:3e:02:e4:ee"}]}
> >
> >
> >     HenryG suggested this might be related to the ERROR of "No more IP
> >     addresses available on network". However that ERROR is thrown a lot
> in
> >     neutron, and 60% of the times the tempest run is successful.
> >
> >
> >     This issue is currently stuck and needs neutron folks to engage to
> get
> >     us somewhere. Reverting the tempest patch which does the early
> >     verification might make this class of fail go away, but I think what
> >     it's done is surface a more fundamental bit where ports aren't active
> >     when the server is active, which may explain deeper races we've had
> over
> >     the years. So actually getting folks to dive in here would be really
> >     great.
> >
> >
> > We'll dig into this more deeply. AFAIK, Nova servers won't go ACTIVE if
> > the port isn't, so we might have a regression. That said, it's been on
> > our radar to better synchronize actions that need to happen on port
> > setup. Right now for instance, DHCP and L2 setup is uncoordinated and
> > Kevin Benton has been looking into it.
> >
> > That said, I wonder if reverting the tempest patch is the best course of
> > action: we can then use Depends-on to test a Neutron fix and the revert
> > of the revert together without causing the gate too much grief.
> >
> > Thoughts?
>
> So, I just stared at the Tempest patch again, and honestly, reverting it
> isn't going to help anything.
>
> https://github.com/openstack/tempest/blob/a1edb75d7901a9e338ab397d208a40c99c5fd9a1/tempest/scenario/manager.py#L760-L765
>
> Because a revert just removes the assert len != 0
>
> The next line is an assert len == 1 (which has been there for a long time)
>
> So a len 0 will fail there as well. Which probably points to this being
> a neutron regression entirely, we'd still be failing with the empty
> ports list, it would just be an incredibly cryptic error of "Found
> multiple IPv4 addresses: []" (which actually means found 0 port addresses).
>
> The reason the change was pushed in Tempest was to make the fail
> condition more clear.
>

Ok, I'll look into it. I assume we're tracking this issue with [1]? We'll
elaborate on the bug report going forward?

[1] https://bugs.launchpad.net/neutron/+bug/1514935


>
>         -Sean
>
> --
> Sean Dague
> http://dague.net
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20151110/d37efe41/attachment.html>


More information about the OpenStack-dev mailing list