[Openstack] [rhos-list] system panic after starting OVS agent

Xin Zhao xzhao at bnl.gov
Fri Nov 8 16:39:45 UTC 2013


Hi Thomas,

I didn't see similar error message from the compute node, where OVS 
agent/libvirt/Nova-compute daemons are running.

For the network host, I did several things in the last couple of days to 
address the issue:

1) switch to a new physical machine for the network host, the VM network 
NIC info is now:

[root at cldnet01 bin]# ethtool -i em2
driver: bnx2
version: 2.2.3
firmware-version: 6.2.14 bc 5.2.3 NCSI 2.0.11
bus-info: 0000:01:00.1
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no


2) installed the latest version of Openstack kernel/packages from RDO on 
the network host

3) noticed vlan tagging module was not loaded, so I load the 8021q 
module on both the network and compute nodes

4) set use_namespace to True on both the L3 and DHCP agent config file 
on network host

5) recreated the bridges (br-int/br-em2/br-ex) on the network host

Now if I only start the openvswitch daemon on the network host, the 
syslog shows similar errors (and repeating it...), but after the 
quantum-openvswitch-agent daemon starts,
the error message stops. Below after my signature is the snippet from 
the syslog.

Now if I start an instance, its eth0 still can't get an IP.

Any wisdom on what's going on ?

Thanks,
Xin

(start openvswitch service)

Nov  8 11:21:01 cldnet01 ovs-vsctl: ovs|00001|vsctl|INFO|Called as 
ovs-vsctl --no-wait -- init -- set Open_vSwitch . db-version=7.1.0
Nov  8 11:21:01 cldnet01 ovs-vsctl: ovs|00001|vsctl|INFO|Called as 
ovs-vsctl --no-wait set Open_vSwitch . ovs-version=1.11.0 
"external-ids:system-id=\"9161c89c-da3f-43da-a42d-712465fbf71c\"" 
"system-type=\"unknown\"" "system-version=\"unknown\""
Nov  8 11:21:04 cldnet01 kernel: br-int: hw csum failure.
Nov  8 11:21:04 cldnet01 kernel: Pid: 0, comm: swapper Not tainted 
2.6.32-358.123.2.openstack.el6.x86_64 #1
Nov  8 11:21:04 cldnet01 kernel: Call Trace:
Nov  8 11:21:04 cldnet01 kernel: <IRQ> [<ffffffff8144a252>] ? 
netdev_rx_csum_fault+0x42/0x50
Nov  8 11:21:04 cldnet01 kernel: [<ffffffff81442cc0>] ? 
__skb_checksum_complete_head+0x60/0x70
Nov  8 11:21:04 cldnet01 kernel: [<ffffffff81442ce1>] ? 
__skb_checksum_complete+0x11/0x20
Nov  8 11:21:04 cldnet01 kernel: [<ffffffff814c8b7d>] ? 
nf_ip_checksum+0x5d/0x130
Nov  8 11:21:04 cldnet01 kernel: [<ffffffffa01a1d31>] ? 
udp_error+0xb1/0x1e0 [nf_conntrack]
Nov  8 11:21:04 cldnet01 kernel: [<ffffffffa025cfe2>] ? 
ovs_vport_send+0x22/0x90 [openvswitch]
Nov  8 11:21:04 cldnet01 kernel: [<ffffffff8143da2e>] ? 
__skb_clone+0x2e/0x120
Nov  8 11:21:04 cldnet01 kernel: [<ffffffffa019bc98>] ? 
nf_conntrack_in+0x138/0xa00 [nf_conntrack]
Nov  8 11:21:04 cldnet01 kernel: [<ffffffffa025bb46>] ? 
find_bucket+0x66/0x70 [openvswitch]
Nov  8 11:21:04 cldnet01 kernel: [<ffffffffa025bd61>] ? 
ovs_flow_tbl_lookup+0x51/0xb0 [openvswitch]
Nov  8 11:21:04 cldnet01 kernel: [<ffffffffa01b9721>] ? 
ipv4_conntrack_in+0x21/0x30 [nf_conntrack_ipv4]
Nov  8 11:21:04 cldnet01 kernel: [<ffffffff81477459>] ? nf_iterate+0x69/0xb0
Nov  8 11:21:04 cldnet01 kernel: [<ffffffff81481850>] ? 
ip_rcv_finish+0x0/0x440
Nov  8 11:21:04 cldnet01 kernel: [<ffffffff81477614>] ? 
nf_hook_slow+0x74/0x110
Nov  8 11:21:04 cldnet01 kernel: [<ffffffff81481850>] ? 
ip_rcv_finish+0x0/0x440
Nov  8 11:21:04 cldnet01 kernel: [<ffffffff81481ef4>] ? ip_rcv+0x264/0x350
Nov  8 11:21:04 cldnet01 kernel: [<ffffffffa025e503>] ? 
ovs_netdev_frame_hook+0xb3/0x110 [openvswitch]
Nov  8 11:21:04 cldnet01 kernel: [<ffffffff81449e6b>] ? 
__netif_receive_skb+0x4ab/0x750
Nov  8 11:21:04 cldnet01 kernel: [<ffffffff8144a1aa>] ? 
process_backlog+0x9a/0x100
Nov  8 11:21:04 cldnet01 kernel: [<ffffffff8144f483>] ? 
net_rx_action+0x103/0x2f0
Nov  8 11:21:04 cldnet01 kernel: [<ffffffff810770b1>] ? 
__do_softirq+0xc1/0x1e0
Nov  8 11:21:04 cldnet01 kernel: [<ffffffff810e1bb0>] ? 
handle_IRQ_event+0x60/0x170
Nov  8 11:21:04 cldnet01 kernel: [<ffffffff8100c1cc>] ? 
call_softirq+0x1c/0x30
Nov  8 11:21:04 cldnet01 kernel: [<ffffffff8100de05>] ? do_softirq+0x65/0xa0
Nov  8 11:21:04 cldnet01 kernel: [<ffffffff81076e95>] ? irq_exit+0x85/0x90
Nov  8 11:21:04 cldnet01 kernel: [<ffffffff8151cd75>] ? do_IRQ+0x75/0xf0
Nov  8 11:21:04 cldnet01 kernel: [<ffffffff8100b9d3>] ? 
ret_from_intr+0x0/0x11
Nov  8 11:21:04 cldnet01 kernel: <EOI> [<ffffffff812d48fe>] ? 
intel_idle+0xde/0x170
Nov  8 11:21:04 cldnet01 kernel: [<ffffffff812d48e1>] ? 
intel_idle+0xc1/0x170
Nov  8 11:21:04 cldnet01 kernel: [<ffffffff81416247>] ? 
cpuidle_idle_call+0xa7/0x140
Nov  8 11:21:04 cldnet01 kernel: [<ffffffff81009fc6>] ? cpu_idle+0xb6/0x110
Nov  8 11:21:04 cldnet01 kernel: [<ffffffff8150cc00>] ? 
start_secondary+0x2ac/0x2ef
Nov  8 11:21:04 cldnet01 kernel: br-em2: hw csum failure.
Nov  8 11:21:04 cldnet01 kernel: Pid: 0, comm: swapper Not tainted 
2.6.32-358.123.2.openstack.el6.x86_64 #1
Nov  8 11:21:04 cldnet01 kernel: Call Trace:
Nov  8 11:21:04 cldnet01 kernel: <IRQ> [<ffffffff8144a252>] ? 
netdev_rx_csum_fault+0x42/0x50
Nov  8 11:21:04 cldnet01 kernel: [<ffffffff81442cc0>] ? 
__skb_checksum_complete_head+0x60/0x70
Nov  8 11:21:04 cldnet01 kernel: [<ffffffff81442ce1>] ? 
__skb_checksum_complete+0x11/0x20
Nov  8 11:21:04 cldnet01 kernel: [<ffffffff814c8b7d>] ? 
nf_ip_checksum+0x5d/0x130
Nov  8 11:21:04 cldnet01 kernel: [<ffffffffa01a1d31>] ? 
udp_error+0xb1/0x1e0 [nf_conntrack]
Nov  8 11:21:04 cldnet01 kernel: [<ffffffffa025cfe2>] ? 
ovs_vport_send+0x22/0x90 [openvswitch]
Nov  8 11:21:04 cldnet01 kernel: [<ffffffff8143da2e>] ? 
__skb_clone+0x2e/0x120
Nov  8 11:21:04 cldnet01 kernel: [<ffffffffa019bc98>] ? 
nf_conntrack_in+0x138/0xa00 [nf_conntrack]
Nov  8 11:21:04 cldnet01 kernel: [<ffffffffa025bb46>] ? 
find_bucket+0x66/0x70 [openvswitch]
Nov  8 11:21:04 cldnet01 kernel: [<ffffffffa025bd61>] ? 
ovs_flow_tbl_lookup+0x51/0xb0 [openvswitch]
Nov  8 11:21:04 cldnet01 kernel: [<ffffffffa01b9721>] ? 
ipv4_conntrack_in+0x21/0x30 [nf_conntrack_ipv4]
Nov  8 11:21:04 cldnet01 kernel: [<ffffffff81477459>] ? nf_iterate+0x69/0xb0
Nov  8 11:21:04 cldnet01 kernel: [<ffffffff81481850>] ? 
ip_rcv_finish+0x0/0x440
Nov  8 11:21:04 cldnet01 kernel: [<ffffffff81477614>] ? 
nf_hook_slow+0x74/0x110
Nov  8 11:21:04 cldnet01 kernel: [<ffffffff81481850>] ? 
ip_rcv_finish+0x0/0x440
Nov  8 11:21:04 cldnet01 kernel: [<ffffffff81481ef4>] ? ip_rcv+0x264/0x350
Nov  8 11:21:04 cldnet01 kernel: [<ffffffffa025e503>] ? 
ovs_netdev_frame_hook+0xb3/0x110 [openvswitch]
Nov  8 11:21:04 cldnet01 kernel: [<ffffffff81449e6b>] ? 
__netif_receive_skb+0x4ab/0x750
Nov  8 11:21:04 cldnet01 kernel: [<ffffffff8144a1aa>] ? 
process_backlog+0x9a/0x100
Nov  8 11:21:04 cldnet01 kernel: [<ffffffff8144f483>] ? 
net_rx_action+0x103/0x2f0
Nov  8 11:21:04 cldnet01 kernel: [<ffffffff810770b1>] ? 
__do_softirq+0xc1/0x1e0
Nov  8 11:21:04 cldnet01 kernel: [<ffffffff810e1bb0>] ? 
handle_IRQ_event+0x60/0x170
Nov  8 11:21:04 cldnet01 kernel: [<ffffffff8100c1cc>] ? 
call_softirq+0x1c/0x30
Nov  8 11:21:04 cldnet01 kernel: [<ffffffff8100de05>] ? do_softirq+0x65/0xa0
Nov  8 11:21:04 cldnet01 kernel: [<ffffffff81076e95>] ? irq_exit+0x85/0x90
Nov  8 11:21:04 cldnet01 kernel: [<ffffffff8151cd75>] ? do_IRQ+0x75/0xf0
Nov  8 11:21:04 cldnet01 kernel: [<ffffffff8100b9d3>] ? 
ret_from_intr+0x0/0x11
Nov  8 11:21:04 cldnet01 kernel: <EOI> [<ffffffff812d48fe>] ? 
intel_idle+0xde/0x170
Nov  8 11:21:04 cldnet01 kernel: [<ffffffff812d48e1>] ? 
intel_idle+0xc1/0x170
Nov  8 11:21:04 cldnet01 kernel: [<ffffffff81416247>] ? 
cpuidle_idle_call+0xa7/0x140
Nov  8 11:21:04 cldnet01 kernel: [<ffffffff81009fc6>] ? cpu_idle+0xb6/0x110
Nov  8 11:21:04 cldnet01 kernel: [<ffffffff8150cc00>] ? 
start_secondary+0x2ac/0x2ef
...
...
...

(start quantum-openvswitch-agent service)

Nov  8 11:21:13 cldnet01 ovs-vsctl: ovs|00001|vsctl|INFO|Called as 
/usr/bin/ovs-vsctl --timeout=2 -- --if-exists del-port br-int patch-tun
Nov  8 11:21:13 cldnet01 ovs-vsctl: ovs|00001|vsctl|INFO|Called as 
/usr/bin/ovs-vsctl --timeout=2 -- --if-exists del-port br-int int-br-ex
Nov  8 11:21:13 cldnet01 kernel: device int-br-ex left promiscuous mode
Nov  8 11:21:13 cldnet01 ovs-vsctl: ovs|00001|vsctl|INFO|Called as 
/usr/bin/ovs-vsctl --timeout=2 -- --if-exists del-port br-ex phy-br-ex
Nov  8 11:21:13 cldnet01 kernel: device phy-br-ex left promiscuous mode
Nov  8 11:21:14 cldnet01 ovs-vsctl: ovs|00001|vsctl|INFO|Called as 
/usr/bin/ovs-vsctl --timeout=2 -- --may-exist add-port br-int int-br-ex
Nov  8 11:21:14 cldnet01 kernel: device int-br-ex entered promiscuous mode
Nov  8 11:21:14 cldnet01 ovs-vsctl: ovs|00001|vsctl|INFO|Called as 
/usr/bin/ovs-vsctl --timeout=2 -- --may-exist add-port br-ex phy-br-ex
Nov  8 11:21:14 cldnet01 kernel: device phy-br-ex entered promiscuous mode
Nov  8 11:21:14 cldnet01 kernel: ADDRCONF(NETDEV_UP): int-br-ex: link is 
not ready
Nov  8 11:21:14 cldnet01 kernel: ADDRCONF(NETDEV_CHANGE): int-br-ex: 
link becomes ready
Nov  8 11:21:14 cldnet01 ovs-vsctl: ovs|00001|vsctl|INFO|Called as 
/usr/bin/ovs-vsctl --timeout=2 -- --if-exists del-port br-int int-br-em2
Nov  8 11:21:14 cldnet01 kernel: device int-br-em2 left promiscuous mode
Nov  8 11:21:14 cldnet01 ovs-vsctl: ovs|00001|vsctl|INFO|Called as 
/usr/bin/ovs-vsctl --timeout=2 -- --if-exists del-port br-em2 phy-br-em2
Nov  8 11:21:14 cldnet01 kernel: device phy-br-em2 left promiscuous mode
Nov  8 11:21:16 cldnet01 ntpd[2295]: Deleting interface #26 int-br-ex, 
fe80::68e7:66ff:fef7:f28b#123, interface stats: received=0, sent=0, 
dropped=0, active_time=1853 secs
Nov  8 11:21:16 cldnet01 ntpd[2295]: Deleting interface #27 phy-br-ex, 
fe80::94dd:f8ff:fe75:e19b#123, interface stats: received=0, sent=0, 
dropped=0, active_time=1853 secs
Nov  8 11:21:16 cldnet01 ntpd[2295]: Deleting interface #28 int-br-em2, 
fe80::bc7e:9aff:fe93:2a0c#123, interface stats: received=0, sent=0, 
dropped=0, active_time=1851 secs
Nov  8 11:21:16 cldnet01 ntpd[2295]: Deleting interface #29 phy-br-em2, 
fe80::20b0:56ff:fe3e:7019#123, interface stats: received=0, sent=0, 
dropped=0, active_time=1851 secs
Nov  8 11:21:16 cldnet01 ovs-vsctl: ovs|00001|vsctl|INFO|Called as 
/usr/bin/ovs-vsctl --timeout=2 -- --may-exist add-port br-int int-br-em2
Nov  8 11:21:16 cldnet01 kernel: device int-br-em2 entered promiscuous mode
Nov  8 11:21:16 cldnet01 ovs-vsctl: ovs|00001|vsctl|INFO|Called as 
/usr/bin/ovs-vsctl --timeout=2 -- --may-exist add-port br-em2 phy-br-em2
Nov  8 11:21:16 cldnet01 kernel: device phy-br-em2 entered promiscuous mode
Nov  8 11:21:16 cldnet01 kernel: ADDRCONF(NETDEV_UP): int-br-em2: link 
is not ready
Nov  8 11:21:16 cldnet01 kernel: ADDRCONF(NETDEV_CHANGE): int-br-em2: 
link becomes ready
Nov  8 11:21:17 cldnet01 ovs-vsctl: ovs|00001|vsctl|INFO|Called as 
/usr/bin/ovs-vsctl --timeout=2 set Port qr-ed2e07cf-db tag=1
Nov  8 11:21:17 cldnet01 ovs-vsctl: ovs|00001|vsctl|INFO|Called as 
/usr/bin/ovs-vsctl --timeout=2 set Port tap4aafbf9c-49 tag=1
Nov  8 11:21:18 cldnet01 ntpd[2295]: Listening on interface #30 
int-br-ex, fe80::14d5:daff:feda:2941#123 Enabled
Nov  8 11:21:18 cldnet01 ntpd[2295]: Listening on interface #31 
phy-br-ex, fe80::5452:6bff:fe41:27bc#123 Enabled
Nov  8 11:21:20 cldnet01 ntpd[2295]: Listening on interface #32 
int-br-em2, fe80::4067:b1ff:fe50:194c#123 Enabled
Nov  8 11:21:20 cldnet01 ntpd[2295]: Listening on interface #33 
phy-br-em2, fe80::bc54:5dff:fedd:64fe#123 Enabled

(no more "hw csum failure" errors from here on)






On 11/8/2013 8:52 AM, Thomas Graf wrote:
> On 11/06/2013 02:29 AM, Paul Robert Marino wrote:
>> Which kernel are you running
>> If you are running the stock RedHat kernel not the patched version that
>> comes with RHOS or RDO that might explain it.
>>
>>
>>
>> -- Sent from my HP Pre3
>>
>> ------------------------------------------------------------------------
>> On Nov 5, 2013 17:33, Xin Zhao <xzhao at bnl.gov> wrote:
>>
>> Hi Thomas,
>>
>> Thanks for the reply, here is the info of the drivers:
>>
>> 1) NIC for the VM network connection:
>>
>> $> ethtool -i eth1
>> driver: e1000e
>> version: 2.1.4-k
>> firmware-version: 1.6-12
>> bus-info: 0000:04:00.1
>> supports-statistics: yes
>> supports-test: yes
>> supports-eeprom-access: yes
>> supports-register-dump: yes
>> supports-priv-flags: no
>
> e1000e is a driver that is known to work. The error would indicate a
> silicon or firmware issue. Are you seeing this issue on multiple
> machines?





More information about the Openstack mailing list