openvswitch+dpdk 100% cpu usage of ovs-vswitchd

Sean Mooney smooney at redhat.com
Mon Nov 9 14:30:34 UTC 2020


On Mon, 2020-11-09 at 09:13 -0500, Satish Patel wrote:
> Thank Sean,
> 
> I have Intel NIC
> 
> [root at infra-lxb-1 ~]# lspci | grep -i eth
> 06:00.0 Ethernet controller: Intel Corporation 82599 10 Gigabit Dual
> Port Backplane Connection (rev 01)
> 06:00.1 Ethernet controller: Intel Corporation 82599 10 Gigabit Dual
> Port Backplane Connection (rev 01)
> 
> I was thinking if i can create a couple VF out of SR-IOV interface and
> on a computer machine i create two bonding interfaces. bond-1 for mgmt
> and bond-2 for OVS+DPDK then it will solve my all problem related TOR
> switches redundancy.
> 
> I don't think we can add VF as an interface in OVS for DPDK.
you can and if you create the bond on the host first it basically defeets teh reason for using dpdk
the kernel bond driver will be a bottelneck for dpdk. if you want to bond dpdk interfaces then you should
create that bond in ovs by adding the two vfs and then creatign an ovs bond.

> On Mon, Nov 9, 2020 at 9:03 AM Sean Mooney <smooney at redhat.com> wrote:
> > 
> > On Mon, 2020-11-09 at 04:41 +0000, Tony Liu wrote:
> > > Bonding is a SW feature supported by either kernel or DPDK layer.
> > > In case of SRIOV, it's not complicated to enable bonding inside VM.
> > > And it has to be two NICs connecting to two ToRs.
> > > 
> > > Depending on DPDK implementation, you might be able to use VF.
> > > Anyways, it's always recommended to have dedicated NIC for SRIOV.
> > for what its worth melonox do support bondign fo VF on the same card
> > i have never used it but bonding on the host is possibel for sriov.
> > im not sure if it works with openstack however but i belvie it does.
> > 
> > you will have to reach out to mellonox to determin if it is.
> > most other nic vendors do not support bonding and it may limit other
> > feature like bandwith based schduling as really you can only list one of the interfaces bandwith
> > because you cant contol which interface is activly being used.
> > 
> > > 
> > > 
> > > Thanks!
> > > Tony
> > > > -----Original Message-----
> > > > From: Satish Patel <satish.txt at gmail.com>
> > > > Sent: Sunday, November 8, 2020 6:51 PM
> > > > To: Tony Liu <tonyliu0592 at hotmail.com>
> > > > Cc: Laurent Dumont <laurentfdumont at gmail.com>; OpenStack Discuss
> > > > <openstack-discuss at lists.openstack.org>
> > > > Subject: Re: openvswitch+dpdk 100% cpu usage of ovs-vswitchd
> > > > 
> > > > Thank you tony,
> > > > 
> > > > We are running openstack cloud with SR-IOV and we are happy with
> > > > performance but one big issue, it doesn't support bonding on compute
> > > > nodes, we can do bonding inside VM but that is over complicated to do
> > > > that level of deployment, without bonding it's always risky if tor
> > > > switch dies. that is why i started looking into DPDK but look like i hit
> > > > the wall again because my compute node has only 2 NIC we i can't do
> > > > bonding while i am connected over same nic. Anyway i will stick with SR-
> > > > IOV in that case to get more performance and less complexity.
> > > > 
> > > > On Sun, Nov 8, 2020 at 3:22 PM Tony Liu <tonyliu0592 at hotmail.com> wrote:
> > > > > 
> > > > > SRIOV gives you the maximum performance, without any SW features
> > > > > (security group, L3 routing, etc.), because it bypasses SW.
> > > > > DPDK gives you less performance, with all SW features.
> > > > > 
> > > > > Depend on the use case, max perf and SW features, you will need to
> > > > > make a decision.
> > > > > 
> > > > > 
> > > > > Tony
> > > > > > -----Original Message-----
> > > > > > From: Laurent Dumont <laurentfdumont at gmail.com>
> > > > > > Sent: Sunday, November 8, 2020 9:04 AM
> > > > > > To: Satish Patel <satish.txt at gmail.com>
> > > > > > Cc: OpenStack Discuss <openstack-discuss at lists.openstack.org>
> > > > > > Subject: Re: openvswitch+dpdk 100% cpu usage of ovs-vswitchd
> > > > > > 
> > > > > > I have limited hands-on experience with both but they don't serve
> > > > > > the same purpose/have the same implementation. You use SRIOV to
> > > > > > allow Tenants to access the NIC cards directly and bypass any
> > > > > > inherent linux- vr/OVS performance limitations. This is key for NFV
> > > > > > workloads which are expecting large amount of PPS + low latency
> > > > > > (because they are often just virtualized bare-metal products with
> > > > > > one real cloud- readiness/architecture ;) ) - This means that a
> > > > > > Tenant with an SRIOV port can use DPDK + access the NIC through the
> > > > > > VF which means (in theory) a better performance than OVS+DPDK.
> > > > > > 
> > > > > > You use ovs-dpdk to increase the performance of OVS based flows (so
> > > > > > provider networks + vxlan based internal-tenant networks).
> > > > > > 
> > > > > > On Sun, Nov 8, 2020 at 11:13 AM Satish Patel <satish.txt at gmail.com
> > > > > > <mailto:satish.txt at gmail.com> > wrote:
> > > > > > 
> > > > > > 
> > > > > >        Thanks. Just curious then why people directly go for SR-IOV
> > > > > >       implementation where they get better performance + they can
> > > > use the
> > > > > >       same CPU more also. What are the measure advantages or
> > > > features
> > > > > >       attracting the community to go with DPDK over SR-IOV?
> > > > > > 
> > > > > >       On Sun, Nov 8, 2020 at 10:50 AM Laurent Dumont
> > > > > > <laurentfdumont at gmail.com <mailto:laurentfdumont at gmail.com> > wrote:
> > > > > >       >
> > > > > >       > As far as I know, DPDK enabled cores will show 100% usage at
> > > > > > all times.
> > > > > >       >
> > > > > >       > On Sun, Nov 8, 2020 at 9:39 AM Satish Patel
> > > > > > <satish.txt at gmail.com <mailto:satish.txt at gmail.com> > wrote:
> > > > > >       >>
> > > > > >       >> Folks,
> > > > > >       >>
> > > > > >       >> Recently i have added come compute nodes in cloud
> > > > supporting
> > > > > >       >> openvswitch-dpdk for performance. I am seeing all my PMD
> > > > > > cpu cores are
> > > > > >       >> 100% cpu usage on linux top command. It is normal behavior
> > > > > > from first
> > > > > >       >> looks. It's very scary to see 400% cpu usage on top. Can
> > > > someone
> > > > > >       >> confirm before I assume it's normal and what we can do to
> > > > > > reduce it if
> > > > > >       >> it's too high?
> > > > > >       >>
> > > > > > 
> > > > > 
> > 
> > 
> 





More information about the openstack-discuss mailing list