[neutron] Dadfailed of ipv6 metadata IP in qdhcp namespace and disappearing dhcp namespaces
Brian Haley
haleyb.dev at gmail.com
Mon Jan 3 01:35:26 UTC 2022
Hi,
On 1/2/22 10:51 AM, Kamil Madáč wrote:
> Hello,
>
> In our small cloud environment, we started to see weird behavior during
> last 2 months. Dhcp namespaces started to disappear randomly, which
> caused that VMs losed connectivity once dhcp lease expired.
> After the investigation I found out following issue/bug:
>
> 1. ipv6 metadata address of tap interface in some qdhcp-xxxx namespaces
> are stucked in "dadfailed tentative" state (i do not know why yet)
This issue was reported about a month ago:
https://bugs.launchpad.net/neutron/+bug/1953165
And Bence marked it a duplicate of:
https://bugs.launchpad.net/neutron/+bug/1930414
Seems to be a bug in a flow based on the title - "Traffic leaked from
dhcp port before vlan tag is applied".
I would follow-up in that second bug.
Thanks,
-Brian
> 3. root at cloud01:~# ip netns exec
> qdhcp-3094b264-829b-4381-9ca2-59b3a3fc1ea1 ip a
> 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
> group default qlen 1000
> link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
> inet 127.0.0.1/8 scope host lo
> valid_lft forever preferred_lft forever
> inet6 ::1/128 scope host
> valid_lft forever preferred_lft forever
> 2585: tap1797d9b1-e1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500
> qdisc noqueue state UNKNOWN group default qlen 1000
> link/ether fa:16:3e:77:64:0d brd ff:ff:ff:ff:ff:ff
> inet 169.254.169.254/32 brd 169.254.169.254 scope global
> tap1797d9b1-e1
> valid_lft forever preferred_lft forever
> inet 192.168.0.2/24 brd 192.168.0.255 scope global tap1797d9b1-e1
> valid_lft forever preferred_lft forever
> inet6 fe80::a9fe:a9fe/64 scope link dadfailed tentative
> valid_lft forever preferred_lft forever
> inet6 fe80::f816:3eff:fe77:640d/64 scope link
> valid_lft forever preferred_lft forever
> 4.
>
> 5. This blocked dhcp agent to finish sync_state function, and
> NetworkCache was not updated with subnets of such neutron network
> 6. During creation of VM assigned to such network, agent does not
> detect any subnets (see point 2), so he thinks
> (reload_allocations()) there is no dhcp needed and deletes
> qdhcp-xxxx namespace, so no DHCP and no Metadata are working on such
> network since that moment, and after 24h we see connectivity issues.
> 7. Restart of DHCP agent recreates missing qdhcp-xxxx namespaces, but
> NetworkCache in dhcp agent is again empty, so creation of VM
> deletes the qdhcp-xxxx namespace again 🙁
>
> Workaround is to remove dhcp agent from that network and add it again.
> Interestingly, sometimes I need to do it multiple times, because in few
> cases tap interface in new qdhcp finishes again in dadfailed tentative
> state. After year in production we have 20 networks out of 60 in such state.
>
> We are using kolla-ansible deployment on Ubuntu 20.04, kernel
> 5.4.0-65-generic. Openstack version Victoria and neutron is in version
> 17.2.2.dev70.
>
>
> Is that bug in neutron, or is it misconfiguration of OS on our side?
>
> I'm locally testing patch which disables ipv6 dad in qdhcp-xxxx
> namespace (net.ipv6.conf.default.accept_dad = 1), but I'm not sure it is
> good solution when it comes to other neutron features?
>
>
> Kamil Madáč
> /Slovensko IT a.s./
>
More information about the openstack-discuss
mailing list