[Openstack] Weird nova-network bridging problem with precise/essex

Nathanael Burton nathanael.i.burton at gmail.com
Tue Jul 17 02:20:32 UTC 2012


Narayan,

Are you doing bonding in conjunction with your bridging + vlans? Or is it
just a single interface backing the vlan_interface?

Nate
On Jul 16, 2012 9:55 PM, "Narayan Desai" <narayan.desai at gmail.com> wrote:

> We're running into what looks like a linux bridging bug, which causes
> both substantial (20-40%) packet loss, and DNS to fail about that same
> fraction of the time. We're running essex on precise, with dedicated
> nova-network servers and VLANManager. On either of our nova-network
> servers, we see the same behavior. When tracking this down, I found
> the following, when tcpdump'ing along the path between vm instance and
> n-net gateway.
>
> The packets appear to make it to the nova-network server, and are
> properly pulled out of dot1q tagging:
> root at m5-p:~# tcpdump -K -p -i vlan200 -v -vv udp port 53
> tcpdump: WARNING: vlan200: no IPv4 address assigned
> tcpdump: listening on vlan200, link-type EN10MB (Ethernet), capture
> size 65535 bytes
> 20:34:02.377711 IP (tos 0x0, ttl 64, id 59761, offset 0, flags [none],
> proto UDP (17), length 60)
>     10.0.0.3.54937 > 10.0.0.1.domain: 52874+ A? www.google.com. (32)
> 20:34:07.377942 IP (tos 0x0, ttl 64, id 59762, offset 0, flags [none],
> proto UDP (17), length 60)    10.0.0.3.54937 > 10.0.0.1.domain: 52874+
> A? www.google.com. (32)
> 20:34:12.378248 IP (tos 0x0, ttl 64, id 59763, offset 0, flags [none],
> proto UDP (17), length 60)    10.0.0.3.54937 > 10.0.0.1.domain: 52874+
> A? www.google.com. (32)
> 20:34:12.378428 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto
> UDP (17), length 170)    10.0.0.1.domain > 10.0.0.3.54937: 52874 q: A?
> www.google.com. 6/0/0 www.google.com. [1d3h55m19s] CNAME
> www.l.google.com., www.l.google.com. [1m33s] A 74.125.225.209,
> www.l.google.com. [1m33s] A 74.125.225.208, www.l.google.com. [1m33s]
> A 74.125.225.212, www.l.google.com. [1m33s] A 74.125.225.211,
> www.l.google.com. [1m33s] A 74.125.225.210 (142)
>
> But some packets don't make it all of the way to the bridged interface:
> root at m5-p:~# brctl show
> bridge name     bridge id               STP enabled     interfaces
> br200           8000.fa163e18927b       no              vlan200
>
> root at m5-p:~# tcpdump -K -p -i br200 -v -vv udp port 53
> tcpdump: listening on br200, link-type EN10MB (Ethernet), capture size
> 65535 bytes
> 20:34:12.378264 IP (tos 0x0, ttl 64, id 59763, offset 0, flags [none],
> proto UDP (17), length 60)
>     10.0.0.3.54937 > 10.0.0.1.domain: 52874+ A? www.google.com. (32)
> 20:34:12.378424 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto
> UDP (17), length 170)
>     10.0.0.1.domain > 10.0.0.3.54937: 52874 q: A? www.google.com.
> 6/0/0 www.google.com. [1d3h55m19s] CNAME www.l.google.com.,
> www.l.google.com. [1m33s] A 74.125.225.209, www.l.google.com. [1m33s]
> A 74.125.225.208, www.l.google.com. [1m33s] A 74.125.225.212,
> www.l.google.com. [1m33s] A 74.125.225.211, www.l.google.com. [1m33s]
> A 74.125.225.210 (142)
>
> I can't find any way that ipfilter could be implicated in this; there
> aren't deny rules that are hitting.
>
> Oddly enough, this seems to cause no loss in icmp traffic, even with ping
> -f.
>
> So far, searching hasn't netted very much. I've found this similar
> sounding ubuntu bug report, but it looks like no one is working on it:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/986043
>
> We're at 3.2.0-24, and there is a 3.2.0-25, but it is reported to not
> fix this issue, and neither are 3.4 kernels.
>
> It seems sad to try backrevving to an onieric kernel, but that is on
> my list for tomorrow.  If this is a kernel bug, it will make the
> precise default kernel unusable for nova-network servers with dot1q
> (or whatever the appropriate feature interaction is).
>
> Does this ring any bells, or is there another course of action I should
> attempt?
> thanks in advance for any suggestions.
>  -nld
>
> _______________________________________________
> Mailing list: https://launchpad.net/~openstack
> Post to     : openstack at lists.launchpad.net
> Unsubscribe : https://launchpad.net/~openstack
> More help   : https://help.launchpad.net/ListHelp
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20120716/5f8103fa/attachment.html>


More information about the Openstack mailing list