[openstack-dev] [neutron] [gate] 35% failure rate for neutron tempest jobs

Sean Dague sean at dague.net
Tue Nov 10 17:49:47 UTC 2015


The neutron tempest jobs are now at a 35% failure rate:
http://tinyurl.com/ne3ex4v (note, 35% is basically the worst possible
fail rate, because it's just passing enough to land patches that cause
that kind of fail on two test runs check/gate with a coin flip).

The failure is currently seen here -
http://logstash.openstack.org/#/dashboard/file/logstash.json?query=message:%5C%22No%20IPv4%20addresses%20found%20in:%20%5B%5D%5C%22

That is a new assert that was added in Tempest. However it was added in
a path that expects there should be an IPv4 address. The fact that port
is sometimes not returning one is problematic.
https://review.openstack.org/#/c/241800/

The server via nova is returning an address here -
http://logs.openstack.org/76/243676/1/check/gate-tempest-dsvm-neutron-full/291e1d7/logs/tempest.txt.gz#_2015-11-10_17_14_35_465

But then when the port is polled here:
http://logs.openstack.org/76/243676/1/check/gate-tempest-dsvm-neutron-full/291e1d7/logs/tempest.txt.gz#_2015-11-10_17_14_35_527
it comes back with {"ports": []}


This can be contrasted with a working path where we do the similar
action on the Server is active here -
http://logs.openstack.org/76/243676/1/check/gate-tempest-dsvm-neutron-full/291e1d7/logs/tempest.txt.gz#_2015-11-10_17_13_48_193

Then we verify the port -
http://logs.openstack.org/76/243676/1/check/gate-tempest-dsvm-neutron-full/291e1d7/logs/tempest.txt.gz#_2015-11-10_17_13_48_230

Which returns:

  Body: {"ports": [{"status": "ACTIVE", "binding:host_id":
"devstack-trusty-rax-dfw-5784820", "allowed_address_pairs": [],
"extra_dhcp_opts": [], "dns_assignment": [{"hostname":
"host-10-100-0-3", "ip_address": "10.100.0.3", "fqdn":
"host-10-100-0-3.openstacklocal."}], "device_owner": "compute:None",
"port_security_enabled": true, "binding:profile": {}, "fixed_ips":
[{"subnet_id": "147b1e65-3463-4965-8461-11b76a00dd99", "ip_address":
"10.100.0.3"}], "id": "65c11c76-42fc-4010-bbb8-58996911803e",
"security_groups": ["f2d48dcf-ea8d-4a7c-bf09-da37d3c2ee37"],
"device_id": "b03bec85-fe69-4c0d-94e8-51753a8bebd5", "name": "",
"admin_state_up": true, "network_id":
"eb72d3af-f1a0-410b-8085-76cbe19ace90", "dns_name": "",
"binding:vif_details": {"port_filter": true, "ovs_hybrid_plug": true},
"binding:vnic_type": "normal", "binding:vif_type": "ovs", "tenant_id":
"eab50a3d331c4db3a68f71d1ebdc41bf", "mac_address": "fa:16:3e:02:e4:ee"}]}


HenryG suggested this might be related to the ERROR of "No more IP
addresses available on network". However that ERROR is thrown a lot in
neutron, and 60% of the times the tempest run is successful.


This issue is currently stuck and needs neutron folks to engage to get
us somewhere. Reverting the tempest patch which does the early
verification might make this class of fail go away, but I think what
it's done is surface a more fundamental bit where ports aren't active
when the server is active, which may explain deeper races we've had over
the years. So actually getting folks to dive in here would be really great.


	-Sean

-- 
Sean Dague
http://dague.net



More information about the OpenStack-dev mailing list