[Openstack] VM can receive traffic, but not send it

Kaustubh Kelkar kaustubh.kelkar at casa-systems.com
Fri Mar 24 04:10:48 UTC 2017


From: Sterdnot Shaken<mailto:sterdnotshaken at gmail.com>
Sent: Thursday, March 23, 2017 2:04 PM
To: Adam Lawson<mailto:alawson at aqorn.com>
Cc: Kaustubh Kelkar<mailto:kaustubh.kelkar at casa-systems.com>; openstack at lists.openstack.org<mailto:openstack at lists.openstack.org>
Subject: Re: [Openstack] VM can receive traffic, but not send it

Just to clarify: Version: Mitaka with OVS only. Firewall driver: Openvswitch, VM OS: Windows 10

Kaustubh: Thanks for your help on the mirroring part. In my reading yesterday, I came across a thread that stated you can't mirror a patch interface with ovs? So, that would explain why I wasn't seeing the expected traffic on the mirror output ports when mirroring said patch interfaces. Outside of re-writing flows in OVS that OS installs and adding an additional output port to the flow and then tcpdumping that added output port, how would one effectively troubleshoot network traffic issues when patch interfaces were in use?

Adam: Thanks for chiming in on my issue! I appreciate it. So the VM's are placed directly on a provider network (external, flat) and, as such, have a public ip assigned to their nic's. So for these VM's, their default gateway is a physical router outside of Openstack's control.
As a way to further isolate the issue, I moved ALL but one vm off of one compute node. Multiple issues happen to show there is an issue, but (on the windows vm) running something as simple as a speed test (speedtest.net<http://speedtest.net>) works great on the download, but totally fails on the upload. Looking at all the drop flows on br-int, I did notice that this flow was incrimenting when the upload part of the test was active:

cookie=0xa9964f66f62764ad, duration=1494.495s, table=82, n_packets=5813, n_bytes=348780, idle_age=4, priority=50,ct_state=+inv+trk actions=drop
[Kaustubh] The flow is dropping an invalid packet for a tracked connection. From [1], maybe the nf_conntrack_* modules are not loaded on the compute? Without knowing the complete flow information, I may not be able to provide much help.

I still live in an era where security groups are implemented within iptables on Linux bridges!
[1] http://www.ovn.org/support/dist-docs-2.5/ovs-ofctl.8.pdf
-Kaustubh
So I added this flow to mirror what would have been dropped to a dummy interface (of port 2) that I could tcpdump to see what it was actually dropping:

ovs-ofctl add-flow br-int table=82,priority=51,ct_state=+inv+trk,actions=output:2

From the tcpdump, I call see the traffic that the VM is missing that is likely causing this whole issue...
Anyone have any thoughts on this?
Thanks!

On Thu, Mar 23, 2017 at 11:49 AM, Adam Lawson <alawson at aqorn.com<mailto:alawson at aqorn.com>> wrote:
For downloads, you're using probably DNAT or SNAT. For uploads, you're using floating IP's I'm guessing. Does uploads work for other VM's with a similar configuration? It's rare that this would occur so I would presume it's firewall related (either security group via OpenStack) or firewall on the VM itself.

Another question, are incoming connections timing out, is the security group allowing connections from everyone or a subset? i ask because I haven't seen the easy questions asked up front.

//adam


Adam Lawson

Principal Architect
Office: +1-916-794-5706<tel:(916)%20794-5706>

On Wed, Mar 22, 2017 at 11:31 AM, Kaustubh Kelkar <kaustubh.kelkar at casa-systems.com<mailto:kaustubh.kelkar at casa-systems.com>> wrote:
The select_all = 1 is supposed to mirror all the packets.

Referring to the documentation (http://openvswitch.org/support/dist-docs/ovs-vswitchd.conf.db.5.html),

¡°select_all: boolean

              If true, every packet arriving  or  departing  on  any  port  is

              selected for mirroring.
¡±

And for OVS 2.5,


¡°In Open

       vSwitch 2.5 and later, mirroring  occurs  just  after  a  packet  first

       becomes  eligible, using the packet as it exists at that point; ¡­



in  Open  vSwitch  2.4, the modifications are never visible to
       mirrors, whereas in Open  vSwitch  2.5  and  later  modifications  made
       before  the first output that makes it eligible for mirroring to a par©\
       ticular destination are visible.
¡±
I believe, if the very first flow is dropping unicast packets, you might not be able to mirror them.

Maybe you can monitor the flow-tables on each OVS bridge while sending traffic and see which flows¡¯ count increases. Something like,
watch ¨Cn 2 ¡°ovs-ofctl dump-flows <bridge name>¡±

-Kaustubh

From: Sterdnot Shaken [mailto:sterdnotshaken at gmail.com<mailto:sterdnotshaken at gmail.com>]
Sent: Wednesday, March 22, 2017 12:24 PM
To: Kaustubh Kelkar <kaustubh.kelkar at casa-systems.com<mailto:kaustubh.kelkar at casa-systems.com>>
Subject: Re: [Openstack] VM can receive traffic, but not send it

Here's was my first mirror setup:
ip link add name dummy3 type dummy
ip link set dev dummy3 up

ovs-vsctl add-port br-ex3 dummy3

ovs-vsctl -- set bridge br-ex3 mirrors=@m \
-- --id=@src get port pat-ex3-bss \
-- --id=@mir get port dummy3 \
-- --id=@m create mirror name=ovs_mirror3 select-dst-port=@src select-src-port=@src output-port=@mir select-all=true

And here's the one I did by copying your example:
ip link add name dummy3 type dummy
ip link set dev dummy3 up

ovs-vsctl add-port br-ex3 dummy3

ovs-vsctl -- set Bridge br-ex3 mirrors=@m  \
-- --id=@dummy3 get Port dummy3 \
-- --id=@pat-ex3-bss get Port pat-ex3-bss \
-- --id=@m create Mirror name=mirror0 \
select-dst-port=@pat-ex3-bss select-src-port=@pat-ex3-bss \
output-port=@dummy3 select_all=1

Both yield the same results. When I tcpdump the respective dummy interface attached to br-ex3, I only see broadcast traffic for the VM in question, I never see unicast traffic (case and point, if I ping the broadcast address on the VM, then traffic show's up in the tcpdump). I can do a tcpdump on the external interface and see the unicast traffic though, but I need to see where it's breaking in the OVS bridges.
Is there some trick to mirror unicast dataplane traffic?
Thanks in advance!



On Wed, Mar 22, 2017 at 10:07 AM, Kaustubh Kelkar <kaustubh.kelkar at casa-systems.com<mailto:kaustubh.kelkar at casa-systems.com>> wrote:

From: Sterdnot Shaken [mailto:sterdnotshaken at gmail.com<mailto:sterdnotshaken at gmail.com>]
Sent: Tuesday, March 21, 2017 8:54 PM
To: Kaustubh Kelkar <kaustubh.kelkar at casa-systems.com<mailto:kaustubh.kelkar at casa-systems.com>>
Cc: Richard Jones <rjones at suse.com<mailto:rjones at suse.com>>; openstack at lists.openstack.org<mailto:openstack at lists.openstack.org>
Subject: Re: [Openstack] VM can receive traffic, but not send it

Thanks for everyone's kind help!
Steve: I will try and turn off the offload features and see if that helps. Thanks!
Neil: I will also check and make sure neither RPF nor TTL are posing any issues.

Kaustubh: Is there a reason the mirror approach only seems to work on some of the OVS bridges, but not others? if I follow your instructions, I can see traffic when I set up a mirror on some bridges, but not others... Do I need to put these OVS bridges into promiscuous mode before the mirror will work?
[Kaustubh] I don¡¯t recall putting the bridge in promiscuous mode, but it has been a while since I had looked at this. How are you setting up the mirrors? You would need to mirror a specific port of the bridge, not the bridge itself.
Thanks!!

On Tue, Mar 21, 2017 at 9:42 AM, Kaustubh Kelkar <kaustubh.kelkar at casa-systems.com<mailto:kaustubh.kelkar at casa-systems.com>> wrote:
You can narrow down the point where the packets are being dropped by mirroring and tracing packets on OVS bridge ports. I use a script that does the following (as root):

ip link add name sniff0 type dummy
ip link set dev sniff0 up
ovs-vsctl add-port br1 sniff0
ovs-vsctl -- set Bridge br1 mirrors=@m  \
-- --id=@sniff0 get Port sniff0 \
-- --id=@eth0 get Port eth0 \
-- --id=@m create Mirror name=mirror0 \
select-dst-port=@eth0 select-src-port=@eth0 \
output-port=@sniff0 select_all=1

and to delete,
ovs-vsctl clear Bridge br1 mirrors
ovs-vsctl del-port br1 sniff0
ip link del dev sniff0

where eth0 is the point of packet capture and br1 is the bridge eth0 resides in. Then, you can run tcpdump on sniff0.
Create such mirror ports on
1) phy-br-ex on external OVS bridge
2) int-br-ex on integration bridge
3) qvo-xxx on integration bridge
Also capture packets on qvb-xxx on the linux bridge having the tap interface of the VM. Hopefully, this will provide us more clues.

-Kaustubh

From: Sterdnot Shaken [mailto:sterdnotshaken at gmail.com<mailto:sterdnotshaken at gmail.com>]
Sent: Monday, March 20, 2017 9:17 PM
To: Richard Jones <rjones at suse.com<mailto:rjones at suse.com>>
Cc: openstack at lists.openstack.org<mailto:openstack at lists.openstack.org>
Subject: Re: [Openstack] VM can receive traffic, but not send it

Wow! Thanks for answering both of my questions!
So, I did some things you suggested, including setting the MSS in iperf to something small (1000 bytes) and tested with no improvement. I then changed the VM running on Openstack to have an MTU of 1000 and retested with no improvement. I noticed that the node I was testing against was reporting back to the VM on Openstack that it had an MSS of 8960, so just for the heck of it, I changed the remote node's (server outside of Openstack) MTU also to 1000 bytes and retested with no improvement. (The effects of all of these tests were also validated by checking mss settings in the tcp header via tcpdump).
To simplify the equation, I ditched the iperf for the time being and just did a simple "telnet 'remote server' 8080" test from the remote server to the VM in Openstack, while capturing packets all along the way (4 different points along the network path). Every point saw the same packets, including the VM's tap interface as expected. I then reversed the test by initiating the tcp session on the VM in Openstack to the remote server while running the packet captures at those same points having set the remote server to respond with a TCP Reset. From VM to Remote server traffic looked correct with expected TCP SYN. The TCP Reset that the remote server responded with passed all 4 points of the network, including the external interface on the Compute node where the VM resides, but the TAP interface that connects to the VM NEVER sees the Reset. I can recreate this condition over and over.
So, thanks to your ideas Richard, I'm no longer convinced this is an MTU issue. What would prevent a TCP related response from being forwarded from the external interface to the intended VM? The security group we have applied to this VM is wide open, so I can't imagine that is the cause...
Here are 2 packet captures where I initiated a telnet to the remote server from the VM in Openstack. As said above, I set the remote server to respond with a reset. The top one is from the physical interface on the Compute node where the VM resides and the other, the tap interface to that VM:

[(openstack-mitaka) root at prv-0-18-compute user]# tcpdump -nni eth0 host x.y.120.23 and host x.y.224.45
tcpdump: WARNING: eth0: no IPv4 address assigned
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
19:10:13.143931 IP x.y.120.23.53877 > x.y.224.45.8080: Flags [S], seq 3131027441, win 8192, options [mss 960,nop,wscale 8,nop,nop,sackOK], length 0
19:10:13.147951 IP x.y.224.45.8080 > x.y.120.23.53877: Flags [R.], seq 0, ack 3131027442, win 0, length 0
19:10:16.156520 IP x.y.120.23.53877 > x.y.224.45.8080: Flags [S], seq 3131027441, win 8192, options [mss 960,nop,wscale 8,nop,nop,sackOK], length 0
19:10:16.157693 IP x.y.224.45.8080 > x.y.120.23.53877: Flags [R.], seq 0, ack 1, win 0, length 0
19:10:22.157407 IP x.y.120.23.53877 > x.y.224.45.8080: Flags [S], seq 3131027441, win 8192, options [mss 960,nop,nop,sackOK], length 0
19:10:22.158682 IP x.y.224.45.8080 > x.y.120.23.53877: Flags [R.], seq 0, ack 1, win 0, length 0


[(openstack-mitaka) root at prv-0-18-compute user]# tcpdump -nni tap3bbe0f9d-6b host x.y.120.23 and host x.y.224.45
tcpdump: WARNING: tap3bbe0f9d-6b: no IPv4 address assigned
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on tap3bbe0f9d-6b, link-type EN10MB (Ethernet), capture size 65535 bytes
19:10:13.143739 IP x.y.120.23.53877 > x.y.224.45.8080: Flags [S], seq 3131027441, win 8192, options [mss 960,nop,wscale 8,nop,nop,sackOK], length 0
19:10:16.156499 IP x.y.120.23.53877 > x.y.224.45.8080: Flags [S], seq 3131027441, win 8192, options [mss 960,nop,wscale 8,nop,nop,sackOK], length 0
19:10:22.157384 IP x.y.120.23.53877 > x.y.224.45.8080: Flags [S], seq 3131027441, win 8192, options [mss 960,nop,nop,sackOK], length 0
Any ideas? Thanks in advance for your help!!
Steve

On Mon, Mar 20, 2017 at 4:17 PM, Richard Jones <rjones at suse.com<mailto:rjones at suse.com>> wrote:
You might consider taking a packet trace of the start of an upload to see what the TCP MSS (Maximum Segment Size) options look like and perhaps compare between the different configs.  Also, you could consider either using netperf and having it tweak the MSS to a smaller value (test-specific -G option if I recall correctly), or just try dropping the MTU of your VM before you try the upload.

Another way to use netperf to "probe" without tweaking MSS or MTU settings would be to use the TCP_RR test with increasing request/response sizes.  If there is indeed an MTU issue somewhere along the way, as you walk the request/response size up to the local MTU, you should see the test performance drop off a cliff if not go fully to zero.

Does the port for the VM have a security group rule permitting ICMP traffic in?  Offhand I wouldn't expect that to be different between the two network setups you've described because I'd not have expected the virtual router to pay attention to an arriving ICMP Destination Unreachable, Datagram Too Big message to have the routed version work, but it seemed a reasonable straw at which to grasp.

rick jones

PS perhaps iperf has a similar option to set the TCP MSS, I've not looked.

>>> Sterdnot Shaken <sterdnotshaken at gmail.com<mailto:sterdnotshaken at gmail.com>> 03/20/17 3:07 PM >>>
Our info:

Openstack version: Mitaka (using OVS 2.5)
Firewall driver: Openvswitch

Anyone know why VM's that are directly on a Flat Provider Network (so the
VM would have a public IP directly assigned to it) can download data just
fine, but when we try and upload anything (iperf where the VM is the client
or something even like speedtest.net<http://speedtest.net> (upload portion)) the VM simply can't
get data out to the intended destination? Again, download works great,
upload doesn't.

If I take that VM and change it's interface to be a tenant network one that
has a Openstack HA virtual router, everything (upload and download) works
perfectly. The problem only seems to be apparent when the VM is directly on
the external network.

It seems like an MTU issue, but I don't see how... Here are the MTU's of
the part's at play:

VM: 1500
br-int (specific interface connecting to VM) - 9216
br-ex - (can't tell what that MTU is set to)

Any help would be GREATLY appreciated.

Steve




_______________________________________________
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to     : openstack at lists.openstack.org<mailto:openstack at lists.openstack.org>
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20170324/b1e1e542/attachment.html>


More information about the Openstack mailing list