[Openstack] [rhos-list] system panic after starting OVS agent
Xin Zhao
xzhao at bnl.gov
Tue Nov 5 22:31:51 UTC 2013
Hi Thomas,
Thanks for the reply, here is the info of the drivers:
1) NIC for the VM network connection:
$> ethtool -i eth1
driver: e1000e
version: 2.1.4-k
firmware-version: 1.6-12
bus-info: 0000:04:00.1
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no
2) NIC for the external internet connection:
$> ethtool -i eth2
driver: myri10ge
version: 1.5.1-1.451
firmware-version: 1.4.52 -- 2010/10/28 21:27:06 m
bus-info: 0000:0b:00.0
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: no
Xin
On 11/5/2013 4:55 PM, Thomas Graf wrote:
> On 11/05/2013 10:43 PM, Xin Zhao wrote:
>> Add the general Openstack list, sorry for folks who are on both lists...
>>
>> On 11/5/2013 2:42 PM, Xin Zhao wrote:
>>> Hello,
>>>
>>> On my grizzly quantum/OVS network node, after I start the
>>> quantum-openvswitch-agent, the system log shows errors as below,
>>> and it repeats every second since then... and the panic messages
>>> continue on even after I stop all openstack daemons, only a system
>>> reboot
>>> can clear it out.
>>>
>>> Nov 5 14:13:58 cldnet01 kernel: qg-581539d2-ac: hw csum failure.
>>> Nov 5 14:13:58 cldnet01 kernel: Pid: 0, comm: swapper Not tainted
>>> 2.6.32-358.123.2.openstack.el6.x86_64 #1
>>> Nov 5 14:13:58 cldnet01 kernel: Call Trace:
>>> Nov 5 14:13:58 cldnet01 kernel: <IRQ> [<ffffffff8144a252>] ?
>>> netdev_rx_csum_fault+0x42/0x50
>>> Nov 5 14:13:58 cldnet01 kernel: [<ffffffff81442cc0>] ?
>>> __skb_checksum_complete_head+0x60/0x70
>>> Nov 5 14:13:58 cldnet01 kernel: [<ffffffff81442ce1>] ?
>>> __skb_checksum_complete+0x11/0x20
>>> Nov 5 14:13:58 cldnet01 kernel: [<ffffffff814c8b7d>] ?
>>> nf_ip_checksum+0x5d/0x130
>>> Nov 5 14:13:58 cldnet01 kernel: [<ffffffffa01b4d31>] ?
>>> udp_error+0xb1/0x1e0 [nf_conntrack]
>>> Nov 5 14:13:58 cldnet01 kernel: [<ffffffffa01aec98>] ?
>>> nf_conntrack_in+0x138/0xa00 [nf_conntrack]
>>> Nov 5 14:13:58 cldnet01 kernel: [<ffffffffa00721bb>] ?
>>> alloc_null_binding+0x5b/0xa0 [iptable_nat]
>>> Nov 5 14:13:58 cldnet01 kernel: [<ffffffffa0072441>] ?
>>> nf_nat_fn+0x91/0x260 [iptable_nat]
>>> Nov 5 14:13:58 cldnet01 kernel: [<ffffffffa01cc721>] ?
>>> ipv4_conntrack_in+0x21/0x30 [nf_conntrack_ipv4]
>>> Nov 5 14:13:58 cldnet01 kernel: [<ffffffff81477459>] ?
>>> nf_iterate+0x69/0xb0
>>> Nov 5 14:13:58 cldnet01 kernel: [<ffffffff814819e9>] ?
>>> ip_rcv_finish+0x199/0x440
>>> Nov 5 14:13:58 cldnet01 kernel: [<ffffffff81481850>] ?
>>> ip_rcv_finish+0x0/0x440
>>> Nov 5 14:13:58 cldnet01 kernel: [<ffffffff81477614>] ?
>>> nf_hook_slow+0x74/0x110
>>> Nov 5 14:13:58 cldnet01 kernel: [<ffffffff81481850>] ?
>>> ip_rcv_finish+0x0/0x440
>>> Nov 5 14:13:58 cldnet01 kernel: [<ffffffff81481ef4>] ?
>>> ip_rcv+0x264/0x350
>>> Nov 5 14:13:58 cldnet01 kernel: [<ffffffffa024b503>] ?
>>> ovs_netdev_frame_hook+0xb3/0x110 [openvswitch]
>>> Nov 5 14:13:58 cldnet01 kernel: [<ffffffff81449e6b>] ?
>>> __netif_receive_skb+0x4ab/0x750
>>> Nov 5 14:13:58 cldnet01 kernel: [<ffffffff8144a1aa>] ?
>>> process_backlog+0x9a/0x100
>>> Nov 5 14:13:58 cldnet01 kernel: [<ffffffff8144f483>] ?
>>> net_rx_action+0x103/0x2f0
>>> Nov 5 14:13:58 cldnet01 kernel: [<ffffffff810770b1>] ?
>>> __do_softirq+0xc1/0x1e0
>>> Nov 5 14:13:58 cldnet01 kernel: [<ffffffff810e1bb0>] ?
>>> handle_IRQ_event+0x60/0x170
>>> Nov 5 14:13:58 cldnet01 kernel: [<ffffffff8100c1cc>] ?
>>> call_softirq+0x1c/0x30
>>> Nov 5 14:13:58 cldnet01 kernel: [<ffffffff8100de05>] ?
>>> do_softirq+0x65/0xa0
>>> Nov 5 14:13:58 cldnet01 kernel: [<ffffffff81076e95>] ?
>>> irq_exit+0x85/0x90
>>> Nov 5 14:13:58 cldnet01 kernel: [<ffffffff8151cd75>] ?
>>> do_IRQ+0x75/0xf0
>>> Nov 5 14:13:58 cldnet01 kernel: [<ffffffff8100b9d3>] ?
>>> ret_from_intr+0x0/0x11
>>> Nov 5 14:13:58 cldnet01 kernel: <EOI> [<ffffffff81014907>] ?
>>> mwait_idle+0x77/0xd0
>>> Nov 5 14:13:58 cldnet01 kernel: [<ffffffff8151931a>] ?
>>> atomic_notifier_call_chain+0x1a/0x20
>>> Nov 5 14:13:58 cldnet01 kernel: [<ffffffff81009fc6>] ?
>>> cpu_idle+0xb6/0x110
>>> Nov 5 14:13:58 cldnet01 kernel: [<ffffffff8150cc00>] ?
>>> start_secondary+0x2ac/0x2ef
>>>
>>> The only other message in the syslog that's related to CSUM is the
>>> following:
>>> Nov 5 14:10:44 cldnet01 kernel: lo: Dropping TSO features since no
>>> CSUM feature.
>>> Nov 5 14:10:44 cldnet01 kernel: lo: Disabled Privacy Extensions
>>> (this message appears after starting the l3-agent)
>>>
>>> The network host is RHEL6.4, kernel is
>>> 2.6.32-358.123.2.openstack.el6.x86_64
>>>
>>> All the daemons appear to being running, an instance can start, but
>>> network doesn't work for the instance.
>>>
>>> Any wisdom on what's going on?
>
> Most likely a driver issue, what NIC model and driver are you using?
> ehttool -i $DEV will help.
More information about the Openstack
mailing list