Open Stack

Wed Jun 26 00:05:00 UTC 2013

Hopefully I've gotten the attention of networking folk with that
subject. We have an odd routing problem in our devtest environment,
which I haven't figured out the cause of - but I'm wondering if anyone
has some insight before I start climbing through the linuxbridge
kernel code :)

In the interests of brevity I'm going to skip over *why* the setup is
the way it is, but I can enlarge if needed ;).

The devtest environment uses a test network - 192.0.2.0/24 broken down
into bits - on an isolated bridge, to emulate a datacentre with
baremetal machines. We then use 3+ VMs to simulate a deployment story.
The 'seed' VM hosts a one-node baremetal nova with one registered
baremetal node.
The second VM - the 'undercloud' is deployed by the 'seed' cloud and
is a one-node baremetal nova with all the remaining VM's registered as
baremetal nodes.
The third VM - the 'overcloud control plane' is deployed by the
undercloud, and is a combined control node + neutron network node; if
we have only 3 VM's then it also runs nova-compute kvm.
Any addition VM's are scaled out nova-compute kvm nodes.

The 'seed' VM has two devices - eth0 connected to virbr0, and eth1 to
br99. The other nodes are all connected to br99. This approximates a
datacentre network where there is no L2 connectivity between the
environment OpenStack tools are being run in, and the undercloud.

In the host we have two bridges:
virbr0: ..
    inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0
br99: ..
    inet6 fe80::ac1b:6ff:fee1:6440/64 scope link

Both are linuxbridge devices.

We have routing into the test environment:
$ ip route
192.0.2.0/24 via 192.168.122.128 dev virbr0
192.168.122.0/24 dev virbr0  proto kernel  scope link  src 192.168.122.1

And naturally the seed VM has ip forwarding enabled.

We have masquerading on eth0 of the seed node to permit the other vm's
access to the internet via libvirt's NAT rules:
iptables -A POSTROUTING -s 192.0.2.0/24 ! -d 192.168.122.1/32 -o eth0
-j MASQUERADE

The nodes booted by the seed node have the seed node as their default route:
$ ip route
default via 192.0.2.33 dev eth0
192.0.2.32/29 dev eth0  proto kernel  scope link  src 192.0.2.34

Now, with linux bridge, this doesn't work. With an ovs bridge for br99
it works fine.

The *way* in which it doesn't work is the mysterious thing.

ping from a seed node booted instance - e.g. from the undercloud VM -
to 192.168.122.1 - the virbr0 address of the host - works fine (and
tcpdump shows bidirectional traffic on br99).
ping from 192.168.122.1 to the undercloud VM - 192.0.2.34 - doesn't work.

My immediate reaction was 'this is a NAT problem'.

However, if it was an inbound NAT issue, the traffic wouldn't reach
br99: and it does. tcpdumping virbr0 shows the ICMP from
192.168.122.1->192.0.2.34 as expected. tcpdumping eth0 within the seed
node - ditto. tcpdumping eth1 within the seed node - ditto. tcpdumping
br99 from the host - ditto. And if it was a NAT problem I'd still
expect to see the incoming traffic on eth0 of the undercloud VM.

tcpdumping eth0 of the undercloud VM doesn't see the frames at all.

I thought it might be a checksum issue, so tried adding a
checksum-fill rule to POSTROUTING on the seed node, but it had no
effect.

So there is the puzzle: how can traffic reach eth0 of the undercloud
VM, via the seed node when it is in reply to a session initiated by
the undercloud VM, but not when it's initiated from the host, while
it's visible on br99 in both cases.

I'm sure I've missed something simple, in my
getting-over-a-virus-fog-of-thought.

Puzzledly-yrs,
Rob

-- 
Robert Collins <rbtcollins at hp.com>
Distinguished Technologist
HP Cloud Services

Open Stack

[openstack-dev] odd behaviour with linuxbridge vs openvswitch in the tripleo dev/test environment

OpenStack

Community

Documentation

Branding & Legal