[neutron] Dadfailed of ipv6 metadata IP in qdhcp namespace and disappearing dhcp namespaces

Brian Haley haleyb.dev at gmail.com
Mon Jan 3 01:35:26 UTC 2022


Hi,

On 1/2/22 10:51 AM, Kamil Madáč wrote:
> Hello,
> 
> In our small cloud environment, we started to see weird behavior during 
> last 2 months. Dhcp namespaces started to disappear randomly, which 
> caused that VMs losed connectivity once dhcp lease expired.
> After the investigation I found out following issue/bug:
> 
>  1. ipv6 metadata address of tap interface in some qdhcp-xxxx namespaces
>     are stucked in "dadfailed tentative" state (i do not know why yet)

This issue was reported about a month ago:

https://bugs.launchpad.net/neutron/+bug/1953165

And Bence marked it a duplicate of:

https://bugs.launchpad.net/neutron/+bug/1930414

Seems to be a bug in a flow based on the title - "Traffic leaked from 
dhcp port before vlan tag is applied".

I would follow-up in that second bug.

Thanks,

-Brian

>  3. root at cloud01:~# ip netns exec
>     qdhcp-3094b264-829b-4381-9ca2-59b3a3fc1ea1 ip a
>     1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
>     group default qlen 1000
>          link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
>          inet 127.0.0.1/8 scope host lo
>             valid_lft forever preferred_lft forever
>          inet6 ::1/128 scope host
>             valid_lft forever preferred_lft forever
>     2585: tap1797d9b1-e1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500
>     qdisc noqueue state UNKNOWN group default qlen 1000
>          link/ether fa:16:3e:77:64:0d brd ff:ff:ff:ff:ff:ff
>          inet 169.254.169.254/32 brd 169.254.169.254 scope global
>     tap1797d9b1-e1
>             valid_lft forever preferred_lft forever
>          inet 192.168.0.2/24 brd 192.168.0.255 scope global tap1797d9b1-e1
>             valid_lft forever preferred_lft forever
>          inet6 fe80::a9fe:a9fe/64 scope link dadfailed tentative
>             valid_lft forever preferred_lft forever
>          inet6 fe80::f816:3eff:fe77:640d/64 scope link
>             valid_lft forever preferred_lft forever
>  4.
> 
>  5. This blocked dhcp agent to finish sync_state function, and
>     NetworkCache was not updated with subnets of such neutron network
>  6. During creation of VM assigned to such network, agent does not
>     detect any subnets (see point 2), so he thinks
>     (reload_allocations()) there is no dhcp needed and deletes
>     qdhcp-xxxx namespace, so no DHCP and no Metadata are working on such
>     network since that moment, and after 24h we see connectivity issues.
>  7. Restart of DHCP agent recreates missing qdhcp-xxxx namespaces, but
>     NetworkCache  in dhcp agent is again empty, so creation of VM
>     deletes the qdhcp-xxxx namespace again 🙁
> 
> Workaround is to remove dhcp agent from that network and add it again. 
> Interestingly, sometimes I need to do it multiple times, because in few 
> cases tap interface in new qdhcp finishes again in dadfailed tentative 
> state. After year in production we have 20 networks out of 60 in such state.
> 
> We are using kolla-ansible deployment on Ubuntu 20.04, kernel 
> 5.4.0-65-generic. Openstack version Victoria and neutron is in version 
> 17.2.2.dev70.
> 
> 
> Is that bug in neutron, or is it misconfiguration of OS on our side?
> 
> I'm locally testing patch which disables ipv6 dad in qdhcp-xxxx 
> namespace (net.ipv6.conf.default.accept_dad = 1), but I'm not sure it is 
> good solution when it comes to other neutron features?
> 
> 
> Kamil Madáč
> /Slovensko IT a.s./
> 



More information about the openstack-discuss mailing list