[Openstack-operators] Network get unstable, put the whole system on halt

Mike Wilson geekinutah at gmail.com
Wed May 22 15:07:25 UTC 2013


Salman,

We use openvswitch in our deployment and haven't had this issue. Our setup
is completely different however, we are on centos 6.3, latest 2.6.32 kernel
and we compiled our own openvswitch kmod. We used the latest 1.7 branch to
do this, it has been very solid for us. That being said I think you are
going to get more expert eyes on the issue trying to interest someone on
the LKML or openvswitch mailing lists. This is a problem specifically with
running code in the kernel so I would go down those avenues.

-Mike Wilson


On Wed, May 22, 2013 at 4:01 AM, Salman Toor <salman.toor at it.uu.se> wrote:

> Hi again,
>
> Anyone share some thoughts regarding this matter ...
>
> Regards..
> Salman.
>
>
> On May 20, 2013, at 11:57 AM, Salman Toor wrote:
>
> > Hi,
> >
> > We are working with Grizzly together with openvswitch for quantum.
> Following are the details of our system..
> >
> > Controller and Compute nodes are running with Ubuntu 12.04.5, kernel 3.5
> and OpenVSwitch version 1.4.0+build0 with GRE tunnels.
> >
> > The problem is with very little activity everything works very fine but
> as we started to increase the load on the system the kernel log started to
> grow on the controller and fill the entire disk space and halt the complete
> system. And it happen within 2 to 3 hours ...
> >
> > Most of the log is filled with the following messages
> >
> >
> > May 20 06:31:04 ukko233-cern-controller kernel: [67607.598519] Call
> Trace:
> > May 20 06:31:04 ukko233-cern-controller kernel: [67607.598519]  <IRQ>
> [<ffffffff81052c9f>] warn_slowpath_common+0x7f/0xc0
> > May 20 06:31:04 ukko233-cern-controller kernel: [67607.598524]
>  [<ffffffff81052d96>] warn_slowpath_fmt+0x46/0x50
> > May 20 06:31:04 ukko233-cern-controller kernel: [67607.598526]
>  [<ffffffff8157501b>] ? skb_release_data.part.47+0xcb/0x110
> > May 20 06:31:04 ukko233-cern-controller kernel: [67607.598528]
>  [<ffffffff8169abd0>] skb_warn_bad_offload+0xbe/0xc9
> > May 20 06:31:04 ukko233-cern-controller kernel: [67607.598531]
>  [<ffffffff8157f396>] skb_gso_segment+0x246/0x2c0
> > May 20 06:31:04 ukko233-cern-controller kernel: [67607.598536]
>  [<ffffffffa03dd02f>] ovs_tnl_send+0x1ef/0xc90 [openvswitch]
> > May 20 06:31:04 ukko233-cern-controller kernel: [67607.598539]
>  [<ffffffff8169e7de>] ? _raw_spin_lock+0xe/0x20
> > May 20 06:31:04 ukko233-cern-controller kernel: [67607.598541]
>  [<ffffffff810e0001>] ? kdb_bc+0x191/0x240
> > May 20 06:31:04 ukko233-cern-controller kernel: [67607.598544]
>  [<ffffffff810e4fe4>] ? handle_edge_irq+0x94/0x130
> > May 20 06:31:04 ukko233-cern-controller kernel: [67607.598552]
>  [<ffffffffa03de52e>] ovs_vport_send+0x1e/0x50 [openvswitch]
> > May 20 06:31:04 ukko233-cern-controller kernel: [67607.598561]
>  [<ffffffffa03d5552>] do_execute_actions+0x3e2/0x790 [openvswitch]
> > May 20 06:31:04 ukko233-cern-controller kernel: [67607.598570]
>  [<ffffffffa03d5968>] ovs_execute_actions+0x68/0x110 [openvswitch]
> > May 20 06:31:04 ukko233-cern-controller kernel: [67607.598579]
>  [<ffffffffa03d802e>] ovs_dp_process_received_packet+0x6e/0x150
> [openvswitch]
> > May 20 06:31:04 ukko233-cern-controller kernel: [67607.598589]
>  [<ffffffffa03de4ff>] ovs_vport_receive+0x5f/0x70 [openvswitch]
> > May 20 06:31:04 ukko233-cern-controller kernel: [67607.598595]
>  [<ffffffffa03e0e07>] patch_send+0x27/0x50 [openvswitch]
> > May 20 06:31:04 ukko233-cern-controller kernel: [67607.598599]
>  [<ffffffffa03de52e>] ovs_vport_send+0x1e/0x50 [openvswitch]
> > May 20 06:31:04 ukko233-cern-controller kernel: [67607.598603]
>  [<ffffffffa03d5552>] do_execute_actions+0x3e2/0x790 [openvswitch]
> > May 20 06:31:04 ukko233-cern-controller kernel: [67607.598607]
>  [<ffffffffa03de52e>] ? ovs_vport_send+0x1e/0x50 [openvswitch]
> > May 20 06:31:04 ukko233-cern-controller kernel: [67607.598610]
>  [<ffffffffa03d5552>] ? do_execute_actions+0x3e2/0x790 [openvswitch]
> > May 20 06:31:04 ukko233-cern-controller kernel: [67607.598613]
>  [<ffffffffa03d5968>] ovs_execute_actions+0x68/0x110 [openvswitch]
> > May 20 06:31:04 ukko233-cern-controller kernel: [67607.598617]
>  [<ffffffffa03d802e>] ovs_dp_process_received_packet+0x6e/0x150
> [openvswitch]
> > May 20 06:31:04 ukko233-cern-controller kernel: [67607.598621]
>  [<ffffffffa045b9b2>] ? tcp_in_window+0x342/0x5e0 [nf_conntrack]
> > May 20 06:31:04 ukko233-cern-controller kernel: [67607.598626]
>  [<ffffffffa03de4ff>] ovs_vport_receive+0x5f/0x70 [openvswitch]
> > May 20 06:31:04 ukko233-cern-controller kernel: [67607.598630]
>  [<ffffffffa03e0143>] internal_dev_xmit+0x23/0x30 [openvswitch]
> > May 20 06:31:04 ukko233-cern-controller kernel: [67607.598632]
>  [<ffffffff815848b6>] dev_hard_start_xmit+0x256/0x550
> > May 20 06:31:04 ukko233-cern-controller kernel: [67607.598634]
>  [<ffffffff81584e7c>] dev_queue_xmit+0x2cc/0x470
> > May 20 06:31:04 ukko233-cern-controller kernel: [67607.598637]
>  [<ffffffff8159f87a>] ? eth_header+0x3a/0xf0
> > May 20 06:31:04 ukko233-cern-controller kernel: [67607.598640]
>  [<ffffffff8158c832>] neigh_resolve_output+0x122/0x210
> > May 20 06:31:04 ukko233-cern-controller kernel: [67607.598642]
>  [<ffffffff815adf85>] ? nf_hook_slow+0x75/0x150
> > May 20 06:31:04 ukko233-cern-controller kernel: [67607.598644]
>  [<ffffffff815ba840>] ? ip_fragment+0x810/0x810
> > May 20 06:31:04 ukko233-cern-controller kernel: [67607.598646]
>  [<ffffffff815ba9be>] ip_finish_output+0x17e/0x2d0
> > May 20 06:31:04 ukko233-cern-controller kernel: [67607.598648]
>  [<ffffffff815bb4a6>] ip_output+0x66/0xa0
> > May 20 06:31:04 ukko233-cern-controller kernel: [67607.598650]
>  [<ffffffff815b58d0>] ? inet_del_protocol+0x40/0x40
> > May 20 06:31:04 ukko233-cern-controller kernel: [67607.598653]
>  [<ffffffff815b7689>] ip_forward_finish+0x69/0x80
> > May 20 06:31:04 ukko233-cern-controller kernel: [67607.598655]
>  [<ffffffff815b7931>] ip_forward+0x291/0x3e0
> > May 20 06:31:04 ukko233-cern-controller kernel: [67607.598657]
>  [<ffffffff815b59dd>] ip_rcv_finish+0x10d/0x370
> > May 20 06:31:04 ukko233-cern-controller kernel: [67607.598660]
>  [<ffffffff815b6291>] ip_rcv+0x201/0x300
> > May 20 06:31:04 ukko233-cern-controller kernel: [67607.598662]
>  [<ffffffff81582a13>] ? netif_receive_skb+0x23/0x90
> > May 20 06:31:04 ukko233-cern-controller kernel: [67607.598664]
>  [<ffffffff81582576>] __netif_receive_skb+0x4c6/0x540
> > May 20 06:31:04 ukko233-cern-controller kernel: [67607.598666]
>  [<ffffffff815835c1>] process_backlog+0xb1/0x190
> > May 20 06:31:04 ukko233-cern-controller kernel: [67607.598668]
>  [<ffffffff815832f4>] net_rx_action+0x134/0x240
> > May 20 06:31:04 ukko233-cern-controller kernel: [67607.598671]
>  [<ffffffff8105ba88>] __do_softirq+0xa8/0x210
> > May 20 06:31:04 ukko233-cern-controller kernel: [67607.598679]
>  [<ffffffff8169e7de>] ? _raw_spin_lock+0xe/0x20
> > May 20 06:31:04 ukko233-cern-controller kernel: [67607.598686]
>  [<ffffffff816a841c>] call_softirq+0x1c/0x30
> > May 20 06:31:04 ukko233-cern-controller kernel: [67607.598694]
>  [<ffffffff81016245>] do_softirq+0x65/0xa0
> > May 20 06:31:04 ukko233-cern-controller kernel: [67607.598702]
>  [<ffffffff8105be6e>] irq_exit+0x8e/0xb0
> > May 20 06:31:04 ukko233-cern-controller kernel: [67607.598710]
>  [<ffffffff816a8c73>] do_IRQ+0x63/0xe0
> > May 20 06:31:04 ukko233-cern-controller kernel: [67607.598715]
>  [<ffffffff8169ec6a>] common_interrupt+0x6a/0x6a
> > May 20 06:31:04 ukko233-cern-controller kernel: [67607.598716]  <EOI>
> > May 20 06:31:04 ukko233-cern-controller kernel: [67607.598717] ---[ end
> trace 4ed1c8725cfe8f94 ]---
> > May 20 06:31:04 ukko233-cern-controller kernel: [67607.598733]
> ------------[ cut here ]------------
> > May 20 06:31:04 ukko233-cern-controller kernel: [67607.598736] WARNING:
> at /build/buildd/linux-lts-quantal-3.5.0/net/core/dev.c:1904
> skb_warn_bad_offload+0xbe/0xc9()
> > May 20 06:31:04 ukko233-cern-controller kernel: [67607.598737] Hardware
> name: PowerEdge M610
> > May 20 06:31:04 ukko233-cern-controller kernel: [67607.598738] :
> caps=(0x00000000400158e9, 0x0000000000000000) len=2856 data_len=1402
> gso_size=1402 gso_type=1 ip_summed=1
> > May 20 06:31:04 ukko233-cern-controller kernel: [67607.598739] Modules
> linked in: 8021q garp xt_conntrack ipt_REDIRECT ip6table_filter ip6_tables
> ebtable_nat ebtables ipt_MASQUERADE xt_state ipt_REJECT xt_CHECKSUM bridge
> stp llc xt_tcpudp iptable_filter iptable_mangle iptable_nat nf_nat vesafb
> nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ip_tables x_tables
> openvswitch(O) iscsi_trgt(O) nfsd nfs lockd fscache auth_rpcgss nfs_acl
> sunrpc ib_iser ext2 rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr
> iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi gpio_ich coretemp
> kvm_intel kvm dcdbas microcode wmi acpi_power_meter lpc_ich joydev ioatdma
> dca i7core_edac edac_core mac_hid lp parport hid_generic usbhid hid
> usb_storage uas mptsas mptscsih mptbase scsi_transport_sas bnx2x libcrc32c
> mdio bnx2
> >
> > The size of the kernel log is more then 30GB in few hours.
> >
> > We are wondering does anybody else have experience this?
> >
> > Or any hint which can help us to fix this problem.
> >
> > Regards.
> > Salman.
> >
> >
> >
> >
> > _______________________________________________
> > OpenStack-operators mailing list
> > OpenStack-operators at lists.openstack.org
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
>
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20130522/9878548c/attachment.html>


More information about the OpenStack-operators mailing list