[Openstack-operators] Mitaka to Newton networking issues

Grant Morley grant at absolutedevops.io
Thu Dec 8 09:50:39 UTC 2016


Hi Kevin,

Sorry for the late reply, We have tried doing that and we were still 
seeing the same issues.  I don't think the bug was quite the same as 
what we were seeing.

Unfortunately we have had to roll back to Mitaka as we had a tight 
deadline and not being able to create networks / have HA was fairly 
critical. Interestingly, now we are back on Mitaka, everything is 
working fine.

I will try and get a testing environment set up to see if I get the same 
results as we were seeing when we upgraded to Newton from Mitaka. I am 
not sure if it is something to do with our specific set up, but we have 
followed the OSA guidelines and as everything was working on Liberty and 
Mitaka I assume we have it all set up correctly.

I will keep you posted to our findings, as we may be onto another bug.

Regards,


On 06/12/16 14:07, Kevin Benton wrote:
> There was a bug that the fixes just recently merged for where removing 
> a router on the L3 agent was done in the wrong order and it resulted 
> in issues cleaning up the interfaces with Linux Bridge + L3HA. 
> https://bugs.launchpad.net/neutron/+bug/1629159
>
> It could be the case that there is an orphaned veth pair in a deleted 
> namespace from the same router when it was removed from the L3 agent.
>
> For each L3 agent, can you shutdown the L3 agent, run the netns 
> cleanup script, ensure all keepalived processes are dead, and then 
> start the agent again?
>
> On Tue, Dec 6, 2016 at 4:59 AM, Grant Morley <grant at absolutedevops.io 
> <mailto:grant at absolutedevops.io>> wrote:
>
>     They both appear to be "ACTIVE" which is what I would expect:
>
>     root at management-1-utility-container-f1222d05:~# neutron port-show
>     8cd027f1-9f8c-4077-9c8a-92abc62fadd4
>     +-----------------------+--------------------------------------------------------------------------------------+
>     | Field                 | Value |
>     +-----------------------+--------------------------------------------------------------------------------------+
>     | admin_state_up        | True |
>     | allowed_address_pairs | |
>     | binding:host_id       |
>     network-1-neutron-agents-container-11d47568 |
>     | binding:profile       | {} |
>     | binding:vif_details   | {"port_filter": true} |
>     | binding:vif_type      | bridge |
>     | binding:vnic_type     | normal |
>     | created_at            | 2016-12-05T10:58:01Z |
>     | description | |
>     | device_id             | a8a10308-d62f-420f-99cf-f3727ef2b784 |
>     | device_owner          | network:router_ha_interface |
>     | extra_dhcp_opts | |
>     | fixed_ips             | {"subnet_id":
>     "6495d542-4b78-40df-84af-31500aaa0bf8", "ip_address":
>     "169.254.192.5"} |
>     | id                    | 8cd027f1-9f8c-4077-9c8a-92abc62fadd4 |
>     | mac_address           | fa:16:3e:58:a1:a4 |
>     | name                  | HA port tenant
>     e0ffdeb1e910469d9e625b95f2fa6c54 |
>     | network_id            | 2b04fc3a-5c0d-4f55-996f-8888d8bd1e1d |
>     | port_security_enabled | False |
>     | project_id | |
>     | revision_number       | 23 |
>     | security_groups | |
>     | status                | ACTIVE |
>     | tenant_id | |
>     | updated_at            | 2016-12-06T10:18:00Z |
>     +-----------------------+--------------------------------------------------------------------------------------+
>     root at management-1-utility-container-f1222d05:~# neutron port-show
>     bda1f324-3178-46e5-8638-0f454ba09cab
>     +-----------------------+--------------------------------------------------------------------------------------+
>     | Field                 | Value |
>     +-----------------------+--------------------------------------------------------------------------------------+
>     | admin_state_up        | True |
>     | allowed_address_pairs | |
>     | binding:host_id       |
>     network-2-neutron-agents-container-40906bfc |
>     | binding:profile       | {} |
>     | binding:vif_details   | {"port_filter": true} |
>     | binding:vif_type      | bridge |
>     | binding:vnic_type     | normal |
>     | created_at            | 2016-12-05T10:58:01Z |
>     | description | |
>     | device_id             | a8a10308-d62f-420f-99cf-f3727ef2b784 |
>     | device_owner          | network:router_ha_interface |
>     | extra_dhcp_opts | |
>     | fixed_ips             | {"subnet_id":
>     "6495d542-4b78-40df-84af-31500aaa0bf8", "ip_address":
>     "169.254.192.1"} |
>     | id                    | bda1f324-3178-46e5-8638-0f454ba09cab |
>     | mac_address           | fa:16:3e:c3:8a:14 |
>     | name                  | HA port tenant
>     e0ffdeb1e910469d9e625b95f2fa6c54 |
>     | network_id            | 2b04fc3a-5c0d-4f55-996f-8888d8bd1e1d |
>     | port_security_enabled | False |
>     | project_id | |
>     | revision_number       | 15 |
>     | security_groups | |
>     | status                | ACTIVE |
>     | tenant_id | |
>     | updated_at            | 2016-12-05T14:35:16Z |
>     +-----------------------+--------------------------------------------------------------------------------------+
>
>
>
>     On 06/12/16 12:53, Kevin Benton wrote:
>>     Can you do a 'neutron port-show' for both of those HA ports to
>>     check their status field?
>>
>>     On Tue, Dec 6, 2016 at 2:29 AM, Grant Morley
>>     <grant at absolutedevops.io <mailto:grant at absolutedevops.io>> wrote:
>>
>>         Hi Kevin & Neil,
>>
>>         Many thanks for the reply. I have attached a screen shot
>>         showing that we cannot ping between the L3 HA nodes on the
>>         router name spaces. This was previously working fine with
>>         Mitaka, and has only stopped working since the upgrade to Newton.
>>
>>         From the packet captures and TCP dumps, the traffic doesn't
>>         seem to be even leaving the namespace.
>>
>>         On the attachment, the left hand side shows the state of
>>         keepalived showing both HA agents as master and the ring hand
>>         side is the ping attempt.
>>
>>         Regards,
>>
>>         On 06/12/16 10:14, Kevin Benton wrote:
>>>         Yes, that is a misleading warning. What is happening is that
>>>         it's trying to load the interface driver as an alias first,
>>>         which results in a stevedore warning that you see and then
>>>         it falls back to loading it by the class path, which is what
>>>         you have configured. We will need to see if there is a way
>>>         we can suppress that warning somehow when we make the call
>>>         to load by an alias and it fails.
>>>
>>>         If you switch your interface to just 'linuxbridge', that
>>>         should get rid of the warning.
>>>
>>>
>>>         For both L3 HA nodes becoming master, we need a little more
>>>         info to figure out the root cause. Can you try switching
>>>         into the router namespace on one of the L3 HA nodes and see
>>>         if you can ping the other router instance across the L3 HA
>>>         network for that router?
>>>
>>>         On Mon, Dec 5, 2016 at 7:54 AM, Neil Jerram <neil at tigera.io
>>>         <mailto:neil at tigera.io>> wrote:
>>>
>>>             I have also recently been seeing 'Could not load
>>>             <whatever>InterfaceDriver' warnings from the DHCP agent,
>>>             and haven't yet understood that - although I'm pretty
>>>             sure that my interface driver is being loaded really -
>>>             or else none of my networking function would work at all.
>>>
>>>             So it's possible that that part of your report is
>>>             benign, and just a misleading warning.  That said, I am
>>>             still worried about it too, and would like to understand
>>>             it properly.
>>>
>>>             I'm not aware of seeing the other symptoms you mentioned.
>>>
>>>                  Neil
>>>
>>>
>>>             On Mon, Dec 5, 2016 at 3:14 PM Grant Morley
>>>             <grant at absolutedevops.io
>>>             <mailto:grant at absolutedevops.io>> wrote:
>>>
>>>                 Hi All,
>>>
>>>                 We have just upgraded from Mitaka to Newton. We are
>>>                 running OSA and we seem to have come across some
>>>                 weird networking issues since the upgrade. Basically
>>>                 network access to instances is very intermittent and
>>>                 seems to randomly stop working.
>>>
>>>                 We are running neutron in HA and it appears that
>>>                 both of the neutron nodes are now trying to be
>>>                 master and are both trying to bring up the gateway
>>>                 IP addresses which would be causing conflicts.
>>>
>>>                 We are also seeing a lot of the following in the
>>>                 "neutron-dhcp-agent" log files:
>>>
>>>                 2016-12-05 14:42:24.837 2020 WARNING stevedore.named
>>>                 [req-1955d0a1-1453-4c65-a93a-54e8ea39b230
>>>                 1ac995c0729142289f7237222f335806
>>>                 3cc95dbe91c84e3e8ebbb9893ee54d20 - - -] Could not
>>>                 load neutron.agent.linux.interface.BridgeInterfaceDriver
>>>                 2016-12-05 14:42:42.803 2020 INFO
>>>                 neutron.agent.dhcp.agent
>>>                 [req-fad7d2bb-9d3c-4192-868a-0164b382aecf
>>>                 1ac995c0729142289f7237222f335806
>>>                 3cc95dbe91c84e3e8ebbb9893ee54d20 - - -] Trigger
>>>                 reload_allocations for port admin_state_up=True,
>>>                 allowed_address_pairs=[], binding:host_id=,
>>>                 binding:profile=, binding:vif_details=,
>>>                 binding:vif_type=unbound, binding:vnic_type=normal,
>>>                 created_at=2016-12-05T14:42:42Z, description=,
>>>                 device_id=8752effa-2ff2-4ce1-be70-e9f2243612cb,
>>>                 device_owner=network:floatingip, extra_dhcp_opts=[],
>>>                 fixed_ips=[{u'subnet_id':
>>>                 u'4ca7db2d-544a-4a97-b5a4-3cbf2467a4b7',
>>>                 u'ip_address': u'XXX.XXX.XXX.XXX'}],
>>>                 id=b3cf223d-8e76-484a-a649-d8a7dd435124,
>>>                 mac_address=fa:16:3e:ff:0d:50, name=,
>>>                 network_id=af5db886-0178-4f8d-9189-f55f773b37fa,
>>>                 port_security_enabled=False, project_id=,
>>>                 revision_number=4, security_groups=[], status=N/A,
>>>                 tenant_id=, updated_at=2016-12-05T14:42:42Z
>>>
>>>                 I am a bit concerned about neutron not being able to
>>>                 load the Bridge interface driver.
>>>
>>>                 Has anyone else come across this at all or have any
>>>                 pointers? This was working fine in Mitaka it just
>>>                 seems since the upgrade to Newton, we have these issues.
>>>
>>>                 I am able to provide more logs if they are needed.
>>>
>>>                 Regards,
>>>
>>>                 -- 
>>>                 Grant Morley
>>>                 Cloud Lead
>>>                 Absolute DevOps Ltd
>>>                 Units H, J & K, Gateway 1000, Whittle Way,
>>>                 Stevenage, Herts, SG1 2FP
>>>                 www.absolutedevops.io <http://www.absolutedevops.io>
>>>                 grant at absolutedevops.io
>>>                 <mailto:grant at absolutedevops.io> 0845 874 0580
>>>                 _______________________________________________
>>>                 OpenStack-operators mailing list
>>>                 OpenStack-operators at lists.openstack.org
>>>                 <mailto:OpenStack-operators at lists.openstack.org>
>>>                 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>>                 <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators>
>>>
>>>
>>>             _______________________________________________
>>>             OpenStack-operators mailing list
>>>             OpenStack-operators at lists.openstack.org
>>>             <mailto:OpenStack-operators at lists.openstack.org>
>>>             http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>>             <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators>
>>>
>>>
>>
>>         -- 
>>         Grant Morley
>>         Cloud Lead
>>         Absolute DevOps Ltd
>>         Units H, J & K, Gateway 1000, Whittle Way, Stevenage, Herts,
>>         SG1 2FP
>>         www.absolutedevops.io <http://www.absolutedevops.io>
>>         grant at absolutedevops.io <mailto:grant at absolutedevops.io> 0845
>>         874 0580
>>
>>
>
>     -- 
>     Grant Morley
>     Cloud Lead
>     Absolute DevOps Ltd
>     Units H, J & K, Gateway 1000, Whittle Way, Stevenage, Herts, SG1 2FP
>     www.absolutedevops.io <http://www.absolutedevops.io>
>     grant at absolutedevops.io <mailto:grant at absolutedevops.io> 0845 874
>     0580
>
>

-- 
Grant Morley
Cloud Lead
Absolute DevOps Ltd
Units H, J & K, Gateway 1000, Whittle Way, Stevenage, Herts, SG1 2FP
www.absolutedevops.io <http://www.absolutedevops.io/> 
grant at absolutedevops.io <mailto:grant at absolutedevops.i> 0845 874 0580
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20161208/e47d95fc/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 4369 bytes
Desc: not available
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20161208/e47d95fc/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 4369 bytes
Desc: not available
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20161208/e47d95fc/attachment-0001.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ado_new.png
Type: image/png
Size: 4369 bytes
Desc: not available
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20161208/e47d95fc/attachment-0002.png>


More information about the OpenStack-operators mailing list