[Openstack-operators] Floating IPs failing in dvr_snat mode with Mitaka

Jonathan Mills jonmills at gmail.com
Wed Aug 10 19:18:32 UTC 2016


Hi all,

I’m running Mitaka on CentOS 7.2 with Neutron in dvr_snat mode.

# uname -msr

Linux 3.10.0-327.22.2.el7.x86_64 x86_64

I’m using vlans, not vxlans, but I don’t think that matters either way.  So
basically, I have one NIC “eth2” which is in vlan trunk mode, and on my
switch side, I have every neutron-defined vlan trunked there.  Whether it’s
a tenant network vlan, or an external vlan for floating IPs, it all comes
back to that same NIC.

So here’s a compute node “node1”.  It has a successfully booted VM, which
has fixed IP 10.97.8.103 and floating IP 10.96.8.107.  As seen from the
compute node:


# ip netns

fip-cbe55dc5-c4e4-4ec0-aa52-b4713f1279ee

qrouter-efc60192-97ad-49ef-bab7-cda42ca6bc29

snat-efc60192-97ad-49ef-bab7-cda42ca6bc29



# ip netns exec fip-cbe55dc5-c4e4-4ec0-aa52-b4713f1279ee ip addr

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN

   link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

   inet 127.0.0.1/8 scope host lo

      valid_lft forever preferred_lft forever

   inet6 ::1/128 scope host

      valid_lft forever preferred_lft forever

2: fpr-efc60192-9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc
pfifo_fast state UP qlen 1000

   link/ether 32:06:67:df:53:c6 brd ff:ff:ff:ff:ff:ff link-netnsid 0

   inet 169.254.109.47/31 scope global fpr-efc60192-9

      valid_lft forever preferred_lft forever

   inet6 fe80::3006:67ff:fedf:53c6/64 scope link

      valid_lft forever preferred_lft forever

19: fg-152dc56a-c1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc
noqueue state UNKNOWN

   link/ether fa:16:3e:40:9f:5b brd ff:ff:ff:ff:ff:ff

   inet 10.96.8.101/23 brd 10.96.9.255 scope global fg-152dc56a-c1

      valid_lft forever preferred_lft forever

   inet6 fe80::f816:3eff:fe40:9f5b/64 scope link

      valid_lft forever preferred_lft forever



# ip netns exec qrouter-efc60192-97ad-49ef-bab7-cda42ca6bc29 ip addr

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN

   link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

   inet 127.0.0.1/8 scope host lo

      valid_lft forever preferred_lft forever

   inet6 ::1/128 scope host

      valid_lft forever preferred_lft forever

2: rfp-efc60192-9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc
pfifo_fast state UP qlen 1000

   link/ether 72:49:e7:78:48:5d brd ff:ff:ff:ff:ff:ff link-netnsid 0

   inet 169.254.109.46/31 scope global rfp-efc60192-9

      valid_lft forever preferred_lft forever

   inet 10.96.8.107/32 brd 10.96.8.107 scope global rfp-efc60192-9

      valid_lft forever preferred_lft forever

   inet6 fe80::7049:e7ff:fe78:485d/64 scope link

      valid_lft forever preferred_lft forever

17: qr-ffc302ba-82: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc
noqueue state UNKNOWN

   link/ether fa:16:3e:8d:7c:62 brd ff:ff:ff:ff:ff:ff

   inet 10.97.8.1/23 brd 10.97.9.255 scope global qr-ffc302ba-82

      valid_lft forever preferred_lft forever

   inet6 fe80::f816:3eff:fe8d:7c62/64 scope link

      valid_lft forever preferred_lft forever




So you can see that I have both the ‘fpr’  and ‘rfp’ namespaces, which is a
good indicator I didn’t totally flub the dvr_snat neutron config.  From
within either namespace, I can ping the floating IP 10.96.8.107, which
makes sense.  However, for the floating IP to be useful, it would need to
be generally reachable by any other system in its designated vlan, and that
is not the case.  In my real-world use case, I would be running the vlan of
this floating IP network back over to my bastion host, to allow users to
ssh into their VMs via the floating IP.  I can’t reach the floating IPs
though from anywhere outside the namespace on the compute node.


One more clue, in the l3-agent log on the compute node in question:


2016-08-03 11:14:09.665 6041 ERROR neutron.agent.linux.ip_lib [-] Failed
sending gratuitous ARP to 10.96.8.107 on fg-152dc56a-c1 in namespace
fip-cbe55dc5-c4e4-4ec0-aa52-b4713f1279ee

2016-08-03 11:14:09.665 6041 ERROR neutron.agent.linux.ip_lib Traceback
(most recent call last):

2016-08-03 11:14:09.665 6041 ERROR neutron.agent.linux.ip_lib   File
"/usr/lib/python2.7/site-packages/neutron/agent/linux/ip_lib.py", line
1040, in _arping

2016-08-03 11:14:09.665 6041 ERROR neutron.agent.linux.ip_lib
    ip_wrapper.netns.execute(arping_cmd, check_exit_code=True)

2016-08-03 11:14:09.665 6041 ERROR neutron.agent.linux.ip_lib   File
"/usr/lib/python2.7/site-packages/neutron/agent/linux/ip_lib.py", line 927,
in execute

2016-08-03 11:14:09.665 6041 ERROR neutron.agent.linux.ip_lib
    log_fail_as_error=log_fail_as_error, **kwargs)

2016-08-03 11:14:09.665 6041 ERROR neutron.agent.linux.ip_lib   File
"/usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py", line 140,
in execute

2016-08-03 11:14:09.665 6041 ERROR neutron.agent.linux.ip_lib     raise
RuntimeError(msg)

2016-08-03 11:14:09.665 6041 ERROR neutron.agent.linux.ip_lib RuntimeError:
Exit code: 2; Stdin: ; Stdout: ; Stderr: bind: Cannot assign requested
address


After a little Googling, I think I may be seeing the same behavior as this
user:

https://bugs.centos.org/view.php?id=11238

I’m reaching out to see if anyone else has witnessed this, or has any sage
advice for me.


Jonathan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20160810/ada5918e/attachment.html>


More information about the OpenStack-operators mailing list