[ops] Bandwidth problem on computes

Fabian Zimmermann dev.faz at gmail.com
Fri Mar 19 10:45:18 UTC 2021


Hi,

can you repeat your tests with

* iperf from compute1 -> compute2
* iperf from compute2 -> compute1
* ip -r output of both nodes
* watching top while doing the iperf and reporting the process using most cpu?
* provding ethtool -k <nic> for all nics in compute1+2

 Fabian

Am Di., 16. März 2021 um 14:49 Uhr schrieb Jahson Babel
<jahson.babel at cc.in2p3.fr>:
>
> Hello everyone,
> I have a bandwidth problem between the computes nodes of an openstack
> cluster.
> This cluster runs on Rocky version with OpenVSwitch.
> To simplify I'll just pick 3 servers, one controller and two computes
> nodes all connected to the same switch.
> Every server is configured with two 10G links. Those links are
> configured in LACP /teaming.
>
>  From what I understand of teaming and this configuration I should be
> able to get 10Gbps between all three nodes.
> But if I iperf we are way below this :
>
> compute1 # sudo iperf3 -c compute2 -p 5201
> Connecting to host compute2, port 5201
> [  4] local X.X.X.X port 44946 connected to X.X.X.X port 5201
> [ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
> [  4]   0.00-1.00   sec   342 MBytes  2.87 Gbits/sec  137    683 KBytes
> [  4]   1.00-2.00   sec   335 MBytes  2.81 Gbits/sec    8    501 KBytes
>
>   Plus the problem seems to be only present with incoming traffic. Which
> mean I can almost get the full 10gbps if I iperf from a compute to the
> controller.
>
> compute1 # sudo iperf3 -c controller -p 5201
> Connecting to host controller, port 5201
> [  4] local X.X.X.X port 39008 connected to X.X.X.X port 5201
> [ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
> [  4]   0.00-1.00   sec  1.10 GBytes  9.41 Gbits/sec    0    691 KBytes
> [  4]   1.00-2.00   sec  1.09 GBytes  9.38 Gbits/sec    0    803 KBytes
>
> If I do the opposite I get the same results I was getting between the 2
> computes.
>  From the tests we've done it seems related to the openstack's services,
> specifically neutron or OpenVSwitch. From the time those services are
> running we can't get the full bandwidth.
> Stopping the services won't fix the issue, in our case removing the
> packages and rebooting is the only way to obtain the full bandwidth
> between computes.
>
> I voluntarily didn't mention VMs to simplify the question but of course
> this behavior can also be observed in VMs
>
> Knowing that we can achieve 10Gbps it doesn't seems related to the
> hardware nor the OS. That why we suspect OpenStack's services.
> But I couldn't find any evidence or misconfiguration that could confirm
> that.
> So if anyone got some hints about that kind of setup and/or how mitigate
> bandwidth decrease I would appreciate.
> Let know if you need more info.
> Thanks in advance,
>
> Jahson
>
>



More information about the openstack-discuss mailing list