OpenStack network tap interface random rxdrops for some VMs

hai wu haiwu.us at gmail.com
Tue Jul 20 16:15:34 UTC 2021


We use 'train' here, and are using Openstack Debian distribution. Do
you know why tx_queue_size is being ignored here? rx_queue_size is
working all ok.

On Tue, Jul 20, 2021 at 7:11 AM Sean Mooney <smooney at redhat.com> wrote:
>
> On Mon, 2021-07-19 at 18:13 -0500, hai wu wrote:
> > Thanks. But this redhat KB article is also suggesting to set tap
> > txqueuelen to 10000:
> > https://access.redhat.com/solutions/2785881
> >
> > Also it seems I might be hitting some known openstack bugs. nova would
> > only update relevant VM libvirt XML with rx_queue_size=1024, and it
> > would consistently ignore tx_queue_size=1024, even though that is
> > being configured in nova.conf, and systemctl restart nova-compute
> > already. Maybe hitting this known bug here?
> > https://github.com/openstack/nova/commit/7ee4fefa2f6cc98dbd7b3d6636949498a6a23dd5
> that has been fixed a long time ago and if you set both tx and rx queue size then it would have still worked before.
> the bug would have only manifestetd if you set both so if you have set both that shoudl not be a factor.
>
> can you confrim what verion of openstack you are deploying is it train? is it OSP or RDO? or something else?
> >
> > On Mon, Jul 19, 2021 at 5:39 PM Sean Mooney <smooney at redhat.com> wrote:
> > >
> > > On Mon, 2021-07-19 at 13:54 -0500, hai wu wrote:
> > > > hmm for txqueuelen, I actually followed recommendation from here:
> > > > https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html/ovs-dpdk_end_to_end_troubleshooting_guide/high_packet_loss_in_the_tx_queue_of_the_instance_s_tap_interface,
> > > > where it suggests to do this:
> > > >
> > > > cat <<'EOF'>/etc/udev/rules.d/71-net-txqueuelen.rules
> > > > SUBSYSTEM=="net", ACTION=="add", KERNEL=="tap*", ATTR{tx_queue_len}="10000"
> > > > EOF
> > > >
> > > > or
> > > >
> > > > /sbin/ip link set tap<uuid> txqueuelen 10000
> > >
> > > thanks for bring that document to my attention si will escalate it internally to ensure its removed as the content is incorrect.
> > > that document is part of the ovs-dpdk end to end trubleshooting guide but we do not support the use of tap devices with ovs-dpdk.
> > >
> > > i have filed a downstream bug to correct this https://bugzilla.redhat.com/show_bug.cgi?id=1983828
> > >
> > > The use of a tap device with ovs-dpdk is highly ineffect as the tap deviece is not dpdk accleeratred and is instead handel on teh main tread of the ovs-vsctid process.
> > > this is severly limited in perfromance and under heavy traffic load can cause issue wiht programing openflow rules.
> > >
> > > >
> > > > I will try to bump up both libvirt/rx_queue_size and
> > > > libvirt/tx_queue_size to 1024, just not sure about the difference
> > > > between the above and the corresponding libvirt one.
> > > >
> > > Ignoring that setting parmaters on tap deveices via udev would be unsupproted in vendor distributiosn of openstack the main
> > > one is that doign ti correctly with the config option will work on vhost-user port and any virtio backend that supports them where
> > > as the udev apptoch will only work with tap devices.
> > >
> > > the udev rule is altering the paramaters of the tap device
> > >
> > > cat <<'EOF'>/etc/udev/rules.d/71-net-txqueuelen.rules
> > > SUBSYSTEM=="net", ACTION=="add", KERNEL=="tap*", ATTR{tx_queue_len}="10000"
> > > EOF
> > >
> > > but im not sure if those chages will be present in the guest as it not clear to me that they will alter the virtio-net-pci device frontend created by qemu
> > > which is presented to the guest.  setting the vaules in the nova.conf will update the contencce of the libvirt xml and it will ensure that both are set correctly.
> > >
> > > a queue lenght of 10000 is not one of the lengts supported by qemu so im not sure that it will actully help. at most i suspect that
> > > it will add addtional buffering but as far as i am aware the max queue lenght supporte by qemu is 1024
> > >
> > > >  Also it seems by
> > > > default rx_queue_size and tx_queue_size would be None, which means
> > > > bumping them up to 1024 should help with packet drops?
> > > >
> > > > On Mon, Jul 19, 2021 at 1:18 PM Sean Mooney <smooney at redhat.com> wrote:
> > > > >
> > > > > On Mon, 2021-07-19 at 19:16 +0100, Sean Mooney wrote:
> > > > > > On Mon, 2021-07-19 at 12:54 -0500, hai wu wrote:
> > > > > > > I already ensured txqueuelen for that VM's tap interface, and enabled
> > > > > > > multi-queue for the VM, but its tap rxdrop still randomly kept
> > > > > > > increasing, dropping one packet every few 10 or 20  or 60 seconds (It
> > > > > > > makes sense to me, since that would only help with txdrops, not
> > > > > > > rxdrops per my understandingj).
> > > > > > >
> > > > > multi queue should help with both tx and rx drops by the way.
> > > > >
> > > > > when you enabel multi queue we allocate 1 rx and tx queue per vm cpu.
> > > > > which should allow the network backend to process more packets in parallel form the vm.
> > > > > if the network backend is overloaded and cannot process anymore packets tehn addimg more rx queues wont help
> > > > > but provided you network backend is not not the bottelneck it will.
> > > > > > >  This is an idle VM. After migrating
> > > > > > > this idle VM to another idle OpenStack hypervisor, so it is only one
> > > > > > > VM running on one dedicated physical OpenStack hypervisor, and now it
> > > > > > > is dropping 0.03 rxdrops/s every 10 minutes.
> > > > > > >
> > > > > > > I am not aware of any way to configure rxqueuelen for tap interface.
> > > > > > >
> > > > > > you confirure the rx queue lenght the same way you confire tx queue lenght
> > > > > > https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.rx_queue_size
> > > > > > and tx queue lenght is configred by https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.tx_queue_size which you presumabnle already have set.
> > > > > >
> > > > > > >  I
> > > > > > > assume tap RX -> Hypervisor RX -> VM TX, correct? How to tune
> > > > > > rx on the taps is tx from the vm yes.
> > > > > > rx drops normally means the packets were droped by the network backend for some reason.
> > > > > > e.g. ovs or linux bridge is discarding packets.
> > > > > > > rxqueuelen for tap interface? If logging into this idle Linux test VM,
> > > > > > > I am not seeing any drops, either in its RX or TX.
> > > > > > >
> > > > > > > On Mon, Jul 19, 2021 at 12:03 PM Sean Mooney <smooney at redhat.com> wrote:
> > > > > > > >
> > > > > > > > On Mon, 2021-07-19 at 11:39 -0500, hai wu wrote:
> > > > > > > > > There are random very slow rxdrops for certain OpenStack VMs for their
> > > > > > > > > network tap interfaces. Please note that this is rxdrop, NOT txdrop.
> > > > > > > > >
> > > > > > > > > I know we could tune txqueuelen and multi-queue for tap network
> > > > > > > > > interface txdrop issue, but is there any known way to tune for this
> > > > > > > > > tap network interface rxdrop issue?
> > > > > > > > i think a rx drop typically means the vswtich/kernel is droping packets so i think
> > > > > > > > any tuneing you applied would have to be on the kernel side.
> > > > > > > >
> > > > > > > > with that said you can configure the rxqueue lenght and enabel multi queue will also result in
> > > > > > > > addtional rx queue so it may help but no i dont know of any one fix for this you will have
> > > > > > > > to see what turning work in yoru env for your given traffic profile.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Hai
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > >
> > >
> > >
> >
>
>



More information about the openstack-discuss mailing list