[ops] Bandwidth problem on computes
Jahson Babel
jahson.babel at cc.in2p3.fr
Tue Mar 16 13:48:47 UTC 2021
Hello everyone,
I have a bandwidth problem between the computes nodes of an openstack
cluster.
This cluster runs on Rocky version with OpenVSwitch.
To simplify I'll just pick 3 servers, one controller and two computes
nodes all connected to the same switch.
Every server is configured with two 10G links. Those links are
configured in LACP /teaming.
From what I understand of teaming and this configuration I should be
able to get 10Gbps between all three nodes.
But if I iperf we are way below this :
compute1 # sudo iperf3 -c compute2 -p 5201
Connecting to host compute2, port 5201
[ 4] local X.X.X.X port 44946 connected to X.X.X.X port 5201
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 342 MBytes 2.87 Gbits/sec 137 683 KBytes
[ 4] 1.00-2.00 sec 335 MBytes 2.81 Gbits/sec 8 501 KBytes
Plus the problem seems to be only present with incoming traffic. Which
mean I can almost get the full 10gbps if I iperf from a compute to the
controller.
compute1 # sudo iperf3 -c controller -p 5201
Connecting to host controller, port 5201
[ 4] local X.X.X.X port 39008 connected to X.X.X.X port 5201
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 1.10 GBytes 9.41 Gbits/sec 0 691 KBytes
[ 4] 1.00-2.00 sec 1.09 GBytes 9.38 Gbits/sec 0 803 KBytes
If I do the opposite I get the same results I was getting between the 2
computes.
From the tests we've done it seems related to the openstack's services,
specifically neutron or OpenVSwitch. From the time those services are
running we can't get the full bandwidth.
Stopping the services won't fix the issue, in our case removing the
packages and rebooting is the only way to obtain the full bandwidth
between computes.
I voluntarily didn't mention VMs to simplify the question but of course
this behavior can also be observed in VMs
Knowing that we can achieve 10Gbps it doesn't seems related to the
hardware nor the OS. That why we suspect OpenStack's services.
But I couldn't find any evidence or misconfiguration that could confirm
that.
So if anyone got some hints about that kind of setup and/or how mitigate
bandwidth decrease I would appreciate.
Let know if you need more info.
Thanks in advance,
Jahson
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2964 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20210316/d47adce3/attachment.bin>
More information about the openstack-discuss
mailing list