[ops] Bandwidth problem on computes

Jahson Babel jahson.babel at cc.in2p3.fr
Tue Mar 16 13:48:47 UTC 2021


Hello everyone,
I have a bandwidth problem between the computes nodes of an openstack 
cluster.
This cluster runs on Rocky version with OpenVSwitch.
To simplify I'll just pick 3 servers, one controller and two computes 
nodes all connected to the same switch.
Every server is configured with two 10G links. Those links are 
configured in LACP /teaming.

 From what I understand of teaming and this configuration I should be 
able to get 10Gbps between all three nodes.
But if I iperf we are way below this :

compute1 # sudo iperf3 -c compute2 -p 5201
Connecting to host compute2, port 5201
[  4] local X.X.X.X port 44946 connected to X.X.X.X port 5201
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec   342 MBytes  2.87 Gbits/sec  137    683 KBytes
[  4]   1.00-2.00   sec   335 MBytes  2.81 Gbits/sec    8    501 KBytes

  Plus the problem seems to be only present with incoming traffic. Which 
mean I can almost get the full 10gbps if I iperf from a compute to the 
controller.

compute1 # sudo iperf3 -c controller -p 5201
Connecting to host controller, port 5201
[  4] local X.X.X.X port 39008 connected to X.X.X.X port 5201
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec  1.10 GBytes  9.41 Gbits/sec    0    691 KBytes
[  4]   1.00-2.00   sec  1.09 GBytes  9.38 Gbits/sec    0    803 KBytes

If I do the opposite I get the same results I was getting between the 2 
computes.
 From the tests we've done it seems related to the openstack's services, 
specifically neutron or OpenVSwitch. From the time those services are 
running we can't get the full bandwidth.
Stopping the services won't fix the issue, in our case removing the 
packages and rebooting is the only way to obtain the full bandwidth 
between computes.

I voluntarily didn't mention VMs to simplify the question but of course 
this behavior can also be observed in VMs

Knowing that we can achieve 10Gbps it doesn't seems related to the 
hardware nor the OS. That why we suspect OpenStack's services.
But I couldn't find any evidence or misconfiguration that could confirm 
that.
So if anyone got some hints about that kind of setup and/or how mitigate 
bandwidth decrease I would appreciate.
Let know if you need more info.
Thanks in advance,

Jahson


-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2964 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20210316/d47adce3/attachment.bin>


More information about the openstack-discuss mailing list