ovn-controller/OVS stranger behaviour

Lucas Alvares Gomes lucasagomes at gmail.com
Fri Jun 24 09:17:29 UTC 2022


Hi,

I was going to forward this ML to the ovs-discuss ML (the ML that the
core OVN folks watches) but I see that you already posted it there and
already got some suggestions.

I think we should continue the discussion over there, from an
OpenStack perspective I don't think there's much we can do here
because clearly ML2/OVN is updating the OVN NB DB accordingly but
ovn-controller is not picking up the changes and installing the
appropriate flows.

On Fri, Jun 24, 2022 at 10:13 AM Lucas Alvares Gomes
<lucasagomes at gmail.com> wrote:
>
> Hi,
>
> On Thu, Jun 23, 2022 at 9:30 PM Tiago Pires <tiagohp at gmail.com> wrote:
> >
> > Hi all,
> >
> > I'm trying to understand a stranger's behaviour regarding to ovn-controller/OVS.
> > In my setup I have OVN 21.09/ OVS 2.16 and Ubuntu Xena and sometimes when a new VM is created, this VM can reach other VMs in east-west traffic (even in differents Chassis) but it can't reach an external network (e.g. Internet) through Chassi Gateway.
> > I ran the following trace:
> > # ovs-appctl ofproto/trace br-int in_port="93",icmp,dl_src=fa:16:3e:26:34:ef,dl_dst=fa:16:3e:65:68:6e,nw_src=192.168.40.140,nw_dst=8.8.8.8,nw_ttl=64
> >
> > And I got this output:
> >
> > Final flow: recirc_id=0xc157b1,eth,icmp,reg0=0x300,reg11=0xd,reg12=0x10,reg13=0xf,reg14=0x3,reg15=0x2,metadata=0x29,in_port=93,vlan_tci=0x0000,dl_src=fa:16:3e:26:34:ef,dl_dst=fa:16:3e:65:68:6e,nw_src=192.168.40.140,nw_dst=8.8.8.8,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=0,icmp_code=0
> > Megaflow: recirc_id=0xc157b1,ct_state=+new-est-rel-rpl-inv+trk,ct_label=0/0x1,eth,icmp,in_port=93,dl_src=fa:16:3e:26:34:ef,dl_dst=fa:16:3e:65:68:6e,nw_src=192.168.40.128/26,nw_dst=8.0.0.0/7,nw_ttl=64,nw_frag=no
> > Datapath actions: ct(commit,zone=15,label=0/0x1,nat(src)),set(eth(src=fa:16:3e:ec:7f:dd,dst=00:00:00:00:00:00)),set(ipv4(ttl=63)),userspace(pid=3451843211,controller(reason=1,dont_send=1,continuation=0,recirc_id=12670898,rule_cookie=0x3e26215e,controller_id=0,max_len=65535))
> > It seems the Datapath is querying the controller and I did not understand the reason.
> >
> > So, I did an ovn-controller recompute (ovn-appctl -t ovn-controller recompute) on the Chassi where the VM is placed to check if it could change the behaviour and I could trace the packet with success and the VM started to communicate with the Internet normally:
> >
> > Final flow: recirc_id=0x2,eth,icmp,reg0=0x300,reg11=0xd,reg12=0x10,reg13=0xf,reg14=0x3,reg15=0x2,metadata=0x29,in_port=93,vlan_tci=0x0000,dl_src=fa:16:3e:26:34:ef,dl_dst=fa:16:3e:65:68:6e,nw_src=192.168.40.140,nw_dst=8.8.8.8,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=0,icmp_code=0
> > Megaflow: recirc_id=0x2,ct_state=+new-est-rel-rpl-inv+trk,ct_label=0/0x1,eth,icmp,tun_id=0/0xffffff,tun_metadata0=NP,in_port=93,dl_src=fa:16:3e:26:34:ef,dl_dst=fa:16:3e:65:68:6e,nw_src=192.168.40.128/26,nw_dst=8.0.0.0/7,nw_ecn=0,nw_ttl=64,nw_frag=no
> > Datapath actions: ct(commit,zone=15,label=0/0x1,nat(src)),set(tunnel(tun_id=0x2a,dst=10.X6.X3.133,ttl=64,tp_dst=6081,geneve({class=0x102,type=0x80,len=4,0x30002}),flags(df|csum|key))),set(eth(src=fa:16:3e:ec:7f:dd,dst=00:00:5e:00:04:00)),set(ipv4(ttl=63)),2
> > The Datapath action is using the tunnel with the Chassi Gateway.
> >
>
> This sounds like a bug in the ovn-controller to me. The fact that it
> worked after a recompute which forces ovn-controller to recalculate
> all flows tells me that there may be a bug in the "incremental
> processing" mechanism (a mechanism that calculates the changes based
> on deltas).
>
> > It happens always with new VMs but sometimes. After running the recompute on the Chassi, I created additional VMs and this issue did not happen.
> >
> > In my Chassi I have enable these parameters also:
> > ovn-monitor-all="true"
> > ovn-openflow-probe-interval="0"
> > ovn-remote-probe-interval="180000"
> >
> > I did some troubleshooting and I'm seeing this error (ovs-vswitchd) always when a VM is created in a Chassi:
> > 2022-06-23T11:47:08.385Z|07907|bridge|WARN|could not open network device tap8a43df0c-fd (No such device)
> > 2022-06-23T11:47:09.282Z|07908|bridge|INFO|bridge br-int: added interface tap8a43df0c-fd on port 51
> > 2022-06-23T11:47:09.645Z|07909|bridge|INFO|bridge br-int: added interface tap3200bf1c-20 on port 52
> > 2022-06-23T11:47:19.329Z|07911|connmgr|INFO|br-int<->unix#1468: 430 flow_mods in the 7 s starting 10 s ago (410 adds, 20 deletes)
> >
>
> Hmm... At a first glance it does not look related to the issue you are
> experiencing but, core OVN or OVS experts may know better.
>
> > On this commit http://patchwork.ozlabs.org/project/ovn/patch/1608197000-637-1-git-send-email-dceara@redhat.com/ it solved something similar to my issue. It seems the ovs-vswitchd is missing some flows and when I run the recompute it fixes it.
>
> Right yeah, we've seen a few bugs related to the incremental
> processing mechanism in the past. Things are much more stable nowadays
> but you may be hitting a new one.
>
>
> > So, in order to avoid this issue I'm testing at this moment to run the recompute through libvirt hook when a VM gets "started" status.
> >
> > Do you know this behaviour could be bug related?
> >
> > Regards,
> >
> > Tiago Pires
> >
> > Do you know this behaviour could be bug related?



More information about the openstack-discuss mailing list