[Openstack-operators] [Openstack] UDP Buffer Filling

Liping Mao (limao) limao at cisco.com
Fri Jul 28 02:48:19 UTC 2017


I get my message has been automatically rejected by the openstack-operators-owner at lists.openstack.org<mailto:openstack-operators-owner at lists.openstack.org>.
Resend.



Hi John,

Did you know where the packet dropped? On physical interface / tap device / ovs port or in the vm.

We hit udp packet loss when the large pps. The following things you may double check:

1.      Double check if your physical interface is dropping packet.  Usually if you rx queue ring size or rx queue number is default value ,it will drop udp packet . It will start to drop packet if it reach to about 200kpps in one cpu core(rss will distribute traffic to different core, for one single core, it will drop packet about 200kpps in my exp).

Usually you can get the statics from ethtool -S interface to check if there is packet loss because of rx queue full. And use ethtool to increase your ring size. I tested in my environment that if ring size increase from 512 to 4096, it can double the throughput from 200kpps to 400kpps in one cpu core. This may help in some case.



2.      Double check if your TAP device dropped packet, the default tx_queue length is 500 or 1000, increase it to 10000 may help in some case.


3.      Double check your nf_conntrack_max in compute node and network node, the default value is 65535, in our case it usually reach to 500k-1m . we change it as following:
net.netfilter.nf_conntrack_max=10240000
net.nf_conntrack_max=10240000
if you see , something like “nf_conntrack: table full, dropping packet” in your /var/log/message log, that means you hit this one.


4.      You could check if drop happened inside your vm, increase the following param maybe help in some case:

net.core.rmem_max / net.core.rmem_default / net.core.wmem_max / net.core.rmem_default


5.      If you are using default network driver(virtio-net), you can double check if your vhost of your vm is full with CPU soft irq. You can find it by the process name is vhost-$PID_OF_YOUR_VM . In this case, if you can try the following feature in “L”:

https://specs.openstack.org/openstack/nova-specs/specs/liberty/implemented/libvirt-virtiomq.html
      multi-queue may help you some case, but it will use more vhost and more cpu in your host.


6.      Sometimes cpu numa pin can also help, but you need to reserve them and static plan you cpu.

I think we should figure out the packet lost in where and which is the bottleneck. Hope this help,  John.
Thanks.

Regards,
Liping Mao

发件人: John Petrini <jpetrini at coredial.com>
日期: 2017年7月28日 星期五 03:35
至: Pedro Sousa <pgsousa at gmail.com>, OpenStack Mailing List <openstack at lists.openstack.org>, "openstack-operators at lists.openstack.org" <openstack-operators at lists.openstack.org>
主题: Re: [Openstack] [Openstack-operators] UDP Buffer Filling

Hi Pedro,

Thank you for the suggestion. I will look into this.


John Petrini

Platforms Engineer   //   CoreDial, LLC   //   coredial.com<http://coredial.com/>   //   [itter] <https://twitter.com/coredial>    [nkedIn] <http://www.linkedin.com/company/99631>    [ogle Plus] <https://plus.google.com/104062177220750809525/posts>    [og] <http://success.coredial.com/blog>
751 Arbor Way, Hillcrest I, Suite 150, Blue Bell, PA 19422
P: 215.297.4400 x232   //   F: 215.297.4401   //   E: jpetrini at coredial.com<mailto:jpetrini at coredial.com>

On Thu, Jul 27, 2017 at 12:25 PM, Pedro Sousa <pgsousa at gmail.com<mailto:pgsousa at gmail.com>> wrote:
Hi,

have you considered to implement some network acceleration technique like to OVS-DPDK or SR-IOV?

In these kind of workloads (voice, video) that have low latency requirements you might need to use something like DPDK to avoid these issues.

Regards

On Thu, Jul 27, 2017 at 4:49 PM, John Petrini <jpetrini at coredial.com<mailto:jpetrini at coredial.com>> wrote:
Hi List,

We are running Mitaka with VLAN provider networking. We've recently encountered a problem where the UDP receive queue on instances is filling up and we begin dropping packets. Moving instances out of OpenStack onto bare metal resolves the issue completely.

These instances are running asterisk which should be pulling these packets off the queue but it appears to be falling behind no matter the resources we give it.

We can't seem to pin down a reason why we would see this behavior in KVM but not on metal. I'm hoping someone on the list might have some insight or ideas.

Thank You,

John

_______________________________________________
OpenStack-operators mailing list
OpenStack-operators at lists.openstack.org<mailto:OpenStack-operators at lists.openstack.org>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20170728/e9219785/attachment.html>


More information about the OpenStack-operators mailing list