[Openstack] Quantum L3 Performance Problems
justin.brown at fandingo.org
Wed Aug 21 15:13:27 UTC 2013
I'm having some severe network performance problems with OpenStack Quantum.
I have a pretty normal Open vSwitch Quantum configuration using GRE
tunnels. One thing to note is that I have limited hardware at this
point. Rather than having dedicated controller and Quantum hosts, they
are running on one host each as a separate libvirt VM.
Let me be clear memory and CPU resources are not scarce at the host or
VM level for Quantum. The host has 8 CPUs and load is ~2.3 and never
spikes above 2.7. Quantum has 4 vCPUs and load is ~1.3 and doesn't
spike above 2.
This controller host has a single 1Gbps NIC with trunked VLANs and
same for the compute hosts.
I have six systems for testing: controller host (CH1), Quantum server
VM (Q), compute node 1 (N1), compute node 2 (N2), instance 1 (IN1),
and instance 2 (IN2).
The instances are running on separate compute nodes.
Here are some iperf results.
CH1 <--> Q: 6.3 Gbps
This communication happens over a Linux Bridge.
CH1 <--> N1: 937Mbps
This happens over the 1Gbps physical ethernet network.
Q (GRE) <--> IN1: 451Mbps
I ran iperf on Q using the qrouter Linux network namespace to test
peformance impact of GRE tunnel.
IN1 <--> IN2: 682Mbps
Again testing GRE tunneling. The discrepancy from the previous test is
interesting since it's the same basic test.
The results above are not too bad. This is where things get interesting.
Quantum is configured with one external (192.168.27.0/24) and one
private network (10.10.1.0/24).
IN1 has address 10.10.1.2 and floating IP 192.168.27.11 (the first few
IPs are outside the allocation pool).
I connected my laptop (1 Gbps) directly to the switch and assigned IP
192.168.27.2, so there wouldn't be any routing from the physical
Laptop <--> N1: 935Mbps
Laptop <--> IN1: 26.7Mbps
That is not a typo. Traffic going through the L3 agent slows by almost
17x (from the Q GRE to IN1 result). I regularly see results below
I'm having a real tough time troubleshooting the last test. I ran
tcpdump from the host, CH1, and I don't see any errors causing TCP
retransmission or duplicate packets.
Both CH1 and Quantum server have plenty of CPU available. It's like
the L3 iptables rules are massively decreasing performance, but I've
used iptables for years in other capacities and haven't seen this sort
The various Quantum logs don't indicate any problems.
Has anyone else seen large performance decreases when using the
Quantum L3 agent?
Any ideas on how to troubleshoot this?
More information about the Openstack