[ops] Bandwidth problem on computes
Fabian Zimmermann
dev.faz at gmail.com
Fri Mar 19 10:45:18 UTC 2021
Hi,
can you repeat your tests with
* iperf from compute1 -> compute2
* iperf from compute2 -> compute1
* ip -r output of both nodes
* watching top while doing the iperf and reporting the process using most cpu?
* provding ethtool -k <nic> for all nics in compute1+2
Fabian
Am Di., 16. März 2021 um 14:49 Uhr schrieb Jahson Babel
<jahson.babel at cc.in2p3.fr>:
>
> Hello everyone,
> I have a bandwidth problem between the computes nodes of an openstack
> cluster.
> This cluster runs on Rocky version with OpenVSwitch.
> To simplify I'll just pick 3 servers, one controller and two computes
> nodes all connected to the same switch.
> Every server is configured with two 10G links. Those links are
> configured in LACP /teaming.
>
> From what I understand of teaming and this configuration I should be
> able to get 10Gbps between all three nodes.
> But if I iperf we are way below this :
>
> compute1 # sudo iperf3 -c compute2 -p 5201
> Connecting to host compute2, port 5201
> [ 4] local X.X.X.X port 44946 connected to X.X.X.X port 5201
> [ ID] Interval Transfer Bandwidth Retr Cwnd
> [ 4] 0.00-1.00 sec 342 MBytes 2.87 Gbits/sec 137 683 KBytes
> [ 4] 1.00-2.00 sec 335 MBytes 2.81 Gbits/sec 8 501 KBytes
>
> Plus the problem seems to be only present with incoming traffic. Which
> mean I can almost get the full 10gbps if I iperf from a compute to the
> controller.
>
> compute1 # sudo iperf3 -c controller -p 5201
> Connecting to host controller, port 5201
> [ 4] local X.X.X.X port 39008 connected to X.X.X.X port 5201
> [ ID] Interval Transfer Bandwidth Retr Cwnd
> [ 4] 0.00-1.00 sec 1.10 GBytes 9.41 Gbits/sec 0 691 KBytes
> [ 4] 1.00-2.00 sec 1.09 GBytes 9.38 Gbits/sec 0 803 KBytes
>
> If I do the opposite I get the same results I was getting between the 2
> computes.
> From the tests we've done it seems related to the openstack's services,
> specifically neutron or OpenVSwitch. From the time those services are
> running we can't get the full bandwidth.
> Stopping the services won't fix the issue, in our case removing the
> packages and rebooting is the only way to obtain the full bandwidth
> between computes.
>
> I voluntarily didn't mention VMs to simplify the question but of course
> this behavior can also be observed in VMs
>
> Knowing that we can achieve 10Gbps it doesn't seems related to the
> hardware nor the OS. That why we suspect OpenStack's services.
> But I couldn't find any evidence or misconfiguration that could confirm
> that.
> So if anyone got some hints about that kind of setup and/or how mitigate
> bandwidth decrease I would appreciate.
> Let know if you need more info.
> Thanks in advance,
>
> Jahson
>
>
More information about the openstack-discuss
mailing list