<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On 10 November 2015 at 11:11, Sean Dague <span dir="ltr"><<a href="mailto:sean@dague.net" target="_blank">sean@dague.net</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><span class="">On 11/10/2015 01:37 PM, Armando M. wrote:<br>
><br>
><br>
> On 10 November 2015 at 09:49, Sean Dague <<a href="mailto:sean@dague.net">sean@dague.net</a><br>
</span><div><div class="h5">> <mailto:<a href="mailto:sean@dague.net">sean@dague.net</a>>> wrote:<br>
><br>
>     The neutron tempest jobs are now at a 35% failure rate:<br>
>     <a href="http://tinyurl.com/ne3ex4v" rel="noreferrer" target="_blank">http://tinyurl.com/ne3ex4v</a> (note, 35% is basically the worst possible<br>
>     fail rate, because it's just passing enough to land patches that cause<br>
>     that kind of fail on two test runs check/gate with a coin flip).<br>
><br>
><br>
> Sean, thanks for the heads-up.<br>
><br>
><br>
><br>
>     The failure is currently seen here -<br>
>     <a href="http://logstash.openstack.org/#/dashboard/file/logstash.json?query=message:%5C%22No%20IPv4%20addresses%20found%20in:%20%5B%5D%5C%22" rel="noreferrer" target="_blank">http://logstash.openstack.org/#/dashboard/file/logstash.json?query=message:%5C%22No%20IPv4%20addresses%20found%20in:%20%5B%5D%5C%22</a><br>
><br>
>     That is a new assert that was added in Tempest. However it was added in<br>
>     a path that expects there should be an IPv4 address. The fact that port<br>
>     is sometimes not returning one is problematic.<br>
>     <a href="https://review.openstack.org/#/c/241800/" rel="noreferrer" target="_blank">https://review.openstack.org/#/c/241800/</a><br>
><br>
>     The server via nova is returning an address here -<br>
>     <a href="http://logs.openstack.org/76/243676/1/check/gate-tempest-dsvm-neutron-full/291e1d7/logs/tempest.txt.gz#_2015-11-10_17_14_35_465" rel="noreferrer" target="_blank">http://logs.openstack.org/76/243676/1/check/gate-tempest-dsvm-neutron-full/291e1d7/logs/tempest.txt.gz#_2015-11-10_17_14_35_465</a><br>
><br>
>     But then when the port is polled here:<br>
>     <a href="http://logs.openstack.org/76/243676/1/check/gate-tempest-dsvm-neutron-full/291e1d7/logs/tempest.txt.gz#_2015-11-10_17_14_35_527" rel="noreferrer" target="_blank">http://logs.openstack.org/76/243676/1/check/gate-tempest-dsvm-neutron-full/291e1d7/logs/tempest.txt.gz#_2015-11-10_17_14_35_527</a><br>
>     it comes back with {"ports": []}<br>
><br>
><br>
>     This can be contrasted with a working path where we do the similar<br>
>     action on the Server is active here -<br>
>     <a href="http://logs.openstack.org/76/243676/1/check/gate-tempest-dsvm-neutron-full/291e1d7/logs/tempest.txt.gz#_2015-11-10_17_13_48_193" rel="noreferrer" target="_blank">http://logs.openstack.org/76/243676/1/check/gate-tempest-dsvm-neutron-full/291e1d7/logs/tempest.txt.gz#_2015-11-10_17_13_48_193</a><br>
><br>
>     Then we verify the port -<br>
>     <a href="http://logs.openstack.org/76/243676/1/check/gate-tempest-dsvm-neutron-full/291e1d7/logs/tempest.txt.gz#_2015-11-10_17_13_48_230" rel="noreferrer" target="_blank">http://logs.openstack.org/76/243676/1/check/gate-tempest-dsvm-neutron-full/291e1d7/logs/tempest.txt.gz#_2015-11-10_17_13_48_230</a><br>
><br>
>     Which returns:<br>
><br>
>       Body: {"ports": [{"status": "ACTIVE", "binding:host_id":<br>
>     "devstack-trusty-rax-dfw-5784820", "allowed_address_pairs": [],<br>
>     "extra_dhcp_opts": [], "dns_assignment": [{"hostname":<br>
>     "host-10-100-0-3", "ip_address": "10.100.0.3", "fqdn":<br>
>     "host-10-100-0-3.openstacklocal."}], "device_owner": "compute:None",<br>
>     "port_security_enabled": true, "binding:profile": {}, "fixed_ips":<br>
>     [{"subnet_id": "147b1e65-3463-4965-8461-11b76a00dd99", "ip_address":<br>
>     "10.100.0.3"}], "id": "65c11c76-42fc-4010-bbb8-58996911803e",<br>
>     "security_groups": ["f2d48dcf-ea8d-4a7c-bf09-da37d3c2ee37"],<br>
>     "device_id": "b03bec85-fe69-4c0d-94e8-51753a8bebd5", "name": "",<br>
>     "admin_state_up": true, "network_id":<br>
>     "eb72d3af-f1a0-410b-8085-76cbe19ace90", "dns_name": "",<br>
>     "binding:vif_details": {"port_filter": true, "ovs_hybrid_plug": true},<br>
>     "binding:vnic_type": "normal", "binding:vif_type": "ovs", "tenant_id":<br>
>     "eab50a3d331c4db3a68f71d1ebdc41bf", "mac_address":<br>
>     "fa:16:3e:02:e4:ee"}]}<br>
><br>
><br>
>     HenryG suggested this might be related to the ERROR of "No more IP<br>
>     addresses available on network". However that ERROR is thrown a lot in<br>
>     neutron, and 60% of the times the tempest run is successful.<br>
><br>
><br>
>     This issue is currently stuck and needs neutron folks to engage to get<br>
>     us somewhere. Reverting the tempest patch which does the early<br>
>     verification might make this class of fail go away, but I think what<br>
>     it's done is surface a more fundamental bit where ports aren't active<br>
>     when the server is active, which may explain deeper races we've had over<br>
>     the years. So actually getting folks to dive in here would be really<br>
>     great.<br>
><br>
><br>
> We'll dig into this more deeply. AFAIK, Nova servers won't go ACTIVE if<br>
> the port isn't, so we might have a regression. That said, it's been on<br>
> our radar to better synchronize actions that need to happen on port<br>
> setup. Right now for instance, DHCP and L2 setup is uncoordinated and<br>
> Kevin Benton has been looking into it.<br>
><br>
> That said, I wonder if reverting the tempest patch is the best course of<br>
> action: we can then use Depends-on to test a Neutron fix and the revert<br>
> of the revert together without causing the gate too much grief.<br>
><br>
> Thoughts?<br>
<br>
</div></div>So, I just stared at the Tempest patch again, and honestly, reverting it<br>
isn't going to help anything.<br>
<a href="https://github.com/openstack/tempest/blob/a1edb75d7901a9e338ab397d208a40c99c5fd9a1/tempest/scenario/manager.py#L760-L765" rel="noreferrer" target="_blank">https://github.com/openstack/tempest/blob/a1edb75d7901a9e338ab397d208a40c99c5fd9a1/tempest/scenario/manager.py#L760-L765</a><br>
<br>
Because a revert just removes the assert len != 0<br>
<br>
The next line is an assert len == 1 (which has been there for a long time)<br>
<br>
So a len 0 will fail there as well. Which probably points to this being<br>
a neutron regression entirely, we'd still be failing with the empty<br>
ports list, it would just be an incredibly cryptic error of "Found<br>
multiple IPv4 addresses: []" (which actually means found 0 port addresses).<br>
<br>
The reason the change was pushed in Tempest was to make the fail<br>
condition more clear.<br></blockquote><div><br></div><div>Ok, I'll look into it. I assume we're tracking this issue with [1]? We'll elaborate on the bug report going forward?</div><div><br></div><div>[1] <a href="https://bugs.launchpad.net/neutron/+bug/1514935">https://bugs.launchpad.net/neutron/+bug/1514935</a> </div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div class=""><div class="h5"><br>
        -Sean<br>
<br>
--<br>
Sean Dague<br>
<a href="http://dague.net" rel="noreferrer" target="_blank">http://dague.net</a><br>
<br>
__________________________________________________________________________<br>
OpenStack Development Mailing List (not for usage questions)<br>
Unsubscribe: <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" rel="noreferrer" target="_blank">OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a><br>
<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" rel="noreferrer" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>
</div></div></blockquote></div><br></div></div>