Open Stack

Fri Mar 19 13:21:46 UTC 2021

Hi Fabian,

Thank you for taking the time to respond.

Here are all the things you asked :

- compute1 => compute2 iperf, route, top : https://pastebin.com/fZ54xx19
- compute2 => compute1 iperf, route, top : https://pastebin.com/EZ2FZPCq
- compute1 nic ethtool : https://pastebin.com/MVVFVuDj
- compute2 nic ethtool : https://pastebin.com/Tb9zQVaf

For the ethtool part consider picking only compute1 because I've already 
tried to play a little bit with GRO/GSO on compute2 without seeing any 
improvements so far. That's why it is different.
The configuration on the compute1 is the default on our hypervisors.
Plus I didn't make the ethtool on the VMs's interfaces. I picked the 
teaming interface, the management and the tunnel interface. In my 
opinion the tunnel interfaces should not matter but I included anyway.
Can the high load from ksoftirqd lead to a such impact on bandwidth ?

Let me know if you need something else.

Jahson

On 19/03/2021 11:45, Fabian Zimmermann wrote:
> Hi,
>
> can you repeat your tests with
>
> * iperf from compute1 -> compute2
> * iperf from compute2 -> compute1
> * ip -r output of both nodes
> * watching top while doing the iperf and reporting the process using most cpu?
> * provding ethtool -k <nic> for all nics in compute1+2
>
>   Fabian
>
> Am Di., 16. März 2021 um 14:49 Uhr schrieb Jahson Babel
> <jahson.babel at cc.in2p3.fr>:
>> Hello everyone,
>> I have a bandwidth problem between the computes nodes of an openstack
>> cluster.
>> This cluster runs on Rocky version with OpenVSwitch.
>> To simplify I'll just pick 3 servers, one controller and two computes
>> nodes all connected to the same switch.
>> Every server is configured with two 10G links. Those links are
>> configured in LACP /teaming.
>>
>>   From what I understand of teaming and this configuration I should be
>> able to get 10Gbps between all three nodes.
>> But if I iperf we are way below this :
>>
>> compute1 # sudo iperf3 -c compute2 -p 5201
>> Connecting to host compute2, port 5201
>> [  4] local X.X.X.X port 44946 connected to X.X.X.X port 5201
>> [ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
>> [  4]   0.00-1.00   sec   342 MBytes  2.87 Gbits/sec  137    683 KBytes
>> [  4]   1.00-2.00   sec   335 MBytes  2.81 Gbits/sec    8    501 KBytes
>>
>>    Plus the problem seems to be only present with incoming traffic. Which
>> mean I can almost get the full 10gbps if I iperf from a compute to the
>> controller.
>>
>> compute1 # sudo iperf3 -c controller -p 5201
>> Connecting to host controller, port 5201
>> [  4] local X.X.X.X port 39008 connected to X.X.X.X port 5201
>> [ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
>> [  4]   0.00-1.00   sec  1.10 GBytes  9.41 Gbits/sec    0    691 KBytes
>> [  4]   1.00-2.00   sec  1.09 GBytes  9.38 Gbits/sec    0    803 KBytes
>>
>> If I do the opposite I get the same results I was getting between the 2
>> computes.
>>   From the tests we've done it seems related to the openstack's services,
>> specifically neutron or OpenVSwitch. From the time those services are
>> running we can't get the full bandwidth.
>> Stopping the services won't fix the issue, in our case removing the
>> packages and rebooting is the only way to obtain the full bandwidth
>> between computes.
>>
>> I voluntarily didn't mention VMs to simplify the question but of course
>> this behavior can also be observed in VMs
>>
>> Knowing that we can achieve 10Gbps it doesn't seems related to the
>> hardware nor the OS. That why we suspect OpenStack's services.
>> But I couldn't find any evidence or misconfiguration that could confirm
>> that.
>> So if anyone got some hints about that kind of setup and/or how mitigate
>> bandwidth decrease I would appreciate.
>> Let know if you need more info.
>> Thanks in advance,
>>
>> Jahson
>>
>>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2964 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20210319/db430ab9/attachment.bin>

Open Stack

[ops] Bandwidth problem on computes

OpenStack

Community

Documentation

Branding & Legal