OpenStack network tap interface random rxdrops for some VMs
    hai wu 
    haiwu.us at gmail.com
       
    Mon Jul 19 23:13:05 UTC 2021
    
    
  
Thanks. But this redhat KB article is also suggesting to set tap
txqueuelen to 10000:
https://access.redhat.com/solutions/2785881
Also it seems I might be hitting some known openstack bugs. nova would
only update relevant VM libvirt XML with rx_queue_size=1024, and it
would consistently ignore tx_queue_size=1024, even though that is
being configured in nova.conf, and systemctl restart nova-compute
already. Maybe hitting this known bug here?
https://github.com/openstack/nova/commit/7ee4fefa2f6cc98dbd7b3d6636949498a6a23dd5
On Mon, Jul 19, 2021 at 5:39 PM Sean Mooney <smooney at redhat.com> wrote:
>
> On Mon, 2021-07-19 at 13:54 -0500, hai wu wrote:
> > hmm for txqueuelen, I actually followed recommendation from here:
> > https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html/ovs-dpdk_end_to_end_troubleshooting_guide/high_packet_loss_in_the_tx_queue_of_the_instance_s_tap_interface,
> > where it suggests to do this:
> >
> > cat <<'EOF'>/etc/udev/rules.d/71-net-txqueuelen.rules
> > SUBSYSTEM=="net", ACTION=="add", KERNEL=="tap*", ATTR{tx_queue_len}="10000"
> > EOF
> >
> > or
> >
> > /sbin/ip link set tap<uuid> txqueuelen 10000
>
> thanks for bring that document to my attention si will escalate it internally to ensure its removed as the content is incorrect.
> that document is part of the ovs-dpdk end to end trubleshooting guide but we do not support the use of tap devices with ovs-dpdk.
>
> i have filed a downstream bug to correct this https://bugzilla.redhat.com/show_bug.cgi?id=1983828
>
> The use of a tap device with ovs-dpdk is highly ineffect as the tap deviece is not dpdk accleeratred and is instead handel on teh main tread of the ovs-vsctid process.
> this is severly limited in perfromance and under heavy traffic load can cause issue wiht programing openflow rules.
>
> >
> > I will try to bump up both libvirt/rx_queue_size and
> > libvirt/tx_queue_size to 1024, just not sure about the difference
> > between the above and the corresponding libvirt one.
> >
> Ignoring that setting parmaters on tap deveices via udev would be unsupproted in vendor distributiosn of openstack the main
> one is that doign ti correctly with the config option will work on vhost-user port and any virtio backend that supports them where
> as the udev apptoch will only work with tap devices.
>
> the udev rule is altering the paramaters of the tap device
>
> cat <<'EOF'>/etc/udev/rules.d/71-net-txqueuelen.rules
> SUBSYSTEM=="net", ACTION=="add", KERNEL=="tap*", ATTR{tx_queue_len}="10000"
> EOF
>
> but im not sure if those chages will be present in the guest as it not clear to me that they will alter the virtio-net-pci device frontend created by qemu
> which is presented to the guest.  setting the vaules in the nova.conf will update the contencce of the libvirt xml and it will ensure that both are set correctly.
>
> a queue lenght of 10000 is not one of the lengts supported by qemu so im not sure that it will actully help. at most i suspect that
> it will add addtional buffering but as far as i am aware the max queue lenght supporte by qemu is 1024
>
> >  Also it seems by
> > default rx_queue_size and tx_queue_size would be None, which means
> > bumping them up to 1024 should help with packet drops?
> >
> > On Mon, Jul 19, 2021 at 1:18 PM Sean Mooney <smooney at redhat.com> wrote:
> > >
> > > On Mon, 2021-07-19 at 19:16 +0100, Sean Mooney wrote:
> > > > On Mon, 2021-07-19 at 12:54 -0500, hai wu wrote:
> > > > > I already ensured txqueuelen for that VM's tap interface, and enabled
> > > > > multi-queue for the VM, but its tap rxdrop still randomly kept
> > > > > increasing, dropping one packet every few 10 or 20  or 60 seconds (It
> > > > > makes sense to me, since that would only help with txdrops, not
> > > > > rxdrops per my understandingj).
> > > > >
> > > multi queue should help with both tx and rx drops by the way.
> > >
> > > when you enabel multi queue we allocate 1 rx and tx queue per vm cpu.
> > > which should allow the network backend to process more packets in parallel form the vm.
> > > if the network backend is overloaded and cannot process anymore packets tehn addimg more rx queues wont help
> > > but provided you network backend is not not the bottelneck it will.
> > > > >  This is an idle VM. After migrating
> > > > > this idle VM to another idle OpenStack hypervisor, so it is only one
> > > > > VM running on one dedicated physical OpenStack hypervisor, and now it
> > > > > is dropping 0.03 rxdrops/s every 10 minutes.
> > > > >
> > > > > I am not aware of any way to configure rxqueuelen for tap interface.
> > > > >
> > > > you confirure the rx queue lenght the same way you confire tx queue lenght
> > > > https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.rx_queue_size
> > > > and tx queue lenght is configred by https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.tx_queue_size which you presumabnle already have set.
> > > >
> > > > >  I
> > > > > assume tap RX -> Hypervisor RX -> VM TX, correct? How to tune
> > > > rx on the taps is tx from the vm yes.
> > > > rx drops normally means the packets were droped by the network backend for some reason.
> > > > e.g. ovs or linux bridge is discarding packets.
> > > > > rxqueuelen for tap interface? If logging into this idle Linux test VM,
> > > > > I am not seeing any drops, either in its RX or TX.
> > > > >
> > > > > On Mon, Jul 19, 2021 at 12:03 PM Sean Mooney <smooney at redhat.com> wrote:
> > > > > >
> > > > > > On Mon, 2021-07-19 at 11:39 -0500, hai wu wrote:
> > > > > > > There are random very slow rxdrops for certain OpenStack VMs for their
> > > > > > > network tap interfaces. Please note that this is rxdrop, NOT txdrop.
> > > > > > >
> > > > > > > I know we could tune txqueuelen and multi-queue for tap network
> > > > > > > interface txdrop issue, but is there any known way to tune for this
> > > > > > > tap network interface rxdrop issue?
> > > > > > i think a rx drop typically means the vswtich/kernel is droping packets so i think
> > > > > > any tuneing you applied would have to be on the kernel side.
> > > > > >
> > > > > > with that said you can configure the rxqueue lenght and enabel multi queue will also result in
> > > > > > addtional rx queue so it may help but no i dont know of any one fix for this you will have
> > > > > > to see what turning work in yoru env for your given traffic profile.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Hai
> > > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > >
> > >
> >
>
>
    
    
More information about the openstack-discuss
mailing list