[Openstack] [rhos-list] system panic after starting OVS agent
Xin Zhao
xzhao at bnl.gov
Fri Nov 8 16:39:45 UTC 2013
Hi Thomas,
I didn't see similar error message from the compute node, where OVS
agent/libvirt/Nova-compute daemons are running.
For the network host, I did several things in the last couple of days to
address the issue:
1) switch to a new physical machine for the network host, the VM network
NIC info is now:
[root at cldnet01 bin]# ethtool -i em2
driver: bnx2
version: 2.2.3
firmware-version: 6.2.14 bc 5.2.3 NCSI 2.0.11
bus-info: 0000:01:00.1
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no
2) installed the latest version of Openstack kernel/packages from RDO on
the network host
3) noticed vlan tagging module was not loaded, so I load the 8021q
module on both the network and compute nodes
4) set use_namespace to True on both the L3 and DHCP agent config file
on network host
5) recreated the bridges (br-int/br-em2/br-ex) on the network host
Now if I only start the openvswitch daemon on the network host, the
syslog shows similar errors (and repeating it...), but after the
quantum-openvswitch-agent daemon starts,
the error message stops. Below after my signature is the snippet from
the syslog.
Now if I start an instance, its eth0 still can't get an IP.
Any wisdom on what's going on ?
Thanks,
Xin
(start openvswitch service)
Nov 8 11:21:01 cldnet01 ovs-vsctl: ovs|00001|vsctl|INFO|Called as
ovs-vsctl --no-wait -- init -- set Open_vSwitch . db-version=7.1.0
Nov 8 11:21:01 cldnet01 ovs-vsctl: ovs|00001|vsctl|INFO|Called as
ovs-vsctl --no-wait set Open_vSwitch . ovs-version=1.11.0
"external-ids:system-id=\"9161c89c-da3f-43da-a42d-712465fbf71c\""
"system-type=\"unknown\"" "system-version=\"unknown\""
Nov 8 11:21:04 cldnet01 kernel: br-int: hw csum failure.
Nov 8 11:21:04 cldnet01 kernel: Pid: 0, comm: swapper Not tainted
2.6.32-358.123.2.openstack.el6.x86_64 #1
Nov 8 11:21:04 cldnet01 kernel: Call Trace:
Nov 8 11:21:04 cldnet01 kernel: <IRQ> [<ffffffff8144a252>] ?
netdev_rx_csum_fault+0x42/0x50
Nov 8 11:21:04 cldnet01 kernel: [<ffffffff81442cc0>] ?
__skb_checksum_complete_head+0x60/0x70
Nov 8 11:21:04 cldnet01 kernel: [<ffffffff81442ce1>] ?
__skb_checksum_complete+0x11/0x20
Nov 8 11:21:04 cldnet01 kernel: [<ffffffff814c8b7d>] ?
nf_ip_checksum+0x5d/0x130
Nov 8 11:21:04 cldnet01 kernel: [<ffffffffa01a1d31>] ?
udp_error+0xb1/0x1e0 [nf_conntrack]
Nov 8 11:21:04 cldnet01 kernel: [<ffffffffa025cfe2>] ?
ovs_vport_send+0x22/0x90 [openvswitch]
Nov 8 11:21:04 cldnet01 kernel: [<ffffffff8143da2e>] ?
__skb_clone+0x2e/0x120
Nov 8 11:21:04 cldnet01 kernel: [<ffffffffa019bc98>] ?
nf_conntrack_in+0x138/0xa00 [nf_conntrack]
Nov 8 11:21:04 cldnet01 kernel: [<ffffffffa025bb46>] ?
find_bucket+0x66/0x70 [openvswitch]
Nov 8 11:21:04 cldnet01 kernel: [<ffffffffa025bd61>] ?
ovs_flow_tbl_lookup+0x51/0xb0 [openvswitch]
Nov 8 11:21:04 cldnet01 kernel: [<ffffffffa01b9721>] ?
ipv4_conntrack_in+0x21/0x30 [nf_conntrack_ipv4]
Nov 8 11:21:04 cldnet01 kernel: [<ffffffff81477459>] ? nf_iterate+0x69/0xb0
Nov 8 11:21:04 cldnet01 kernel: [<ffffffff81481850>] ?
ip_rcv_finish+0x0/0x440
Nov 8 11:21:04 cldnet01 kernel: [<ffffffff81477614>] ?
nf_hook_slow+0x74/0x110
Nov 8 11:21:04 cldnet01 kernel: [<ffffffff81481850>] ?
ip_rcv_finish+0x0/0x440
Nov 8 11:21:04 cldnet01 kernel: [<ffffffff81481ef4>] ? ip_rcv+0x264/0x350
Nov 8 11:21:04 cldnet01 kernel: [<ffffffffa025e503>] ?
ovs_netdev_frame_hook+0xb3/0x110 [openvswitch]
Nov 8 11:21:04 cldnet01 kernel: [<ffffffff81449e6b>] ?
__netif_receive_skb+0x4ab/0x750
Nov 8 11:21:04 cldnet01 kernel: [<ffffffff8144a1aa>] ?
process_backlog+0x9a/0x100
Nov 8 11:21:04 cldnet01 kernel: [<ffffffff8144f483>] ?
net_rx_action+0x103/0x2f0
Nov 8 11:21:04 cldnet01 kernel: [<ffffffff810770b1>] ?
__do_softirq+0xc1/0x1e0
Nov 8 11:21:04 cldnet01 kernel: [<ffffffff810e1bb0>] ?
handle_IRQ_event+0x60/0x170
Nov 8 11:21:04 cldnet01 kernel: [<ffffffff8100c1cc>] ?
call_softirq+0x1c/0x30
Nov 8 11:21:04 cldnet01 kernel: [<ffffffff8100de05>] ? do_softirq+0x65/0xa0
Nov 8 11:21:04 cldnet01 kernel: [<ffffffff81076e95>] ? irq_exit+0x85/0x90
Nov 8 11:21:04 cldnet01 kernel: [<ffffffff8151cd75>] ? do_IRQ+0x75/0xf0
Nov 8 11:21:04 cldnet01 kernel: [<ffffffff8100b9d3>] ?
ret_from_intr+0x0/0x11
Nov 8 11:21:04 cldnet01 kernel: <EOI> [<ffffffff812d48fe>] ?
intel_idle+0xde/0x170
Nov 8 11:21:04 cldnet01 kernel: [<ffffffff812d48e1>] ?
intel_idle+0xc1/0x170
Nov 8 11:21:04 cldnet01 kernel: [<ffffffff81416247>] ?
cpuidle_idle_call+0xa7/0x140
Nov 8 11:21:04 cldnet01 kernel: [<ffffffff81009fc6>] ? cpu_idle+0xb6/0x110
Nov 8 11:21:04 cldnet01 kernel: [<ffffffff8150cc00>] ?
start_secondary+0x2ac/0x2ef
Nov 8 11:21:04 cldnet01 kernel: br-em2: hw csum failure.
Nov 8 11:21:04 cldnet01 kernel: Pid: 0, comm: swapper Not tainted
2.6.32-358.123.2.openstack.el6.x86_64 #1
Nov 8 11:21:04 cldnet01 kernel: Call Trace:
Nov 8 11:21:04 cldnet01 kernel: <IRQ> [<ffffffff8144a252>] ?
netdev_rx_csum_fault+0x42/0x50
Nov 8 11:21:04 cldnet01 kernel: [<ffffffff81442cc0>] ?
__skb_checksum_complete_head+0x60/0x70
Nov 8 11:21:04 cldnet01 kernel: [<ffffffff81442ce1>] ?
__skb_checksum_complete+0x11/0x20
Nov 8 11:21:04 cldnet01 kernel: [<ffffffff814c8b7d>] ?
nf_ip_checksum+0x5d/0x130
Nov 8 11:21:04 cldnet01 kernel: [<ffffffffa01a1d31>] ?
udp_error+0xb1/0x1e0 [nf_conntrack]
Nov 8 11:21:04 cldnet01 kernel: [<ffffffffa025cfe2>] ?
ovs_vport_send+0x22/0x90 [openvswitch]
Nov 8 11:21:04 cldnet01 kernel: [<ffffffff8143da2e>] ?
__skb_clone+0x2e/0x120
Nov 8 11:21:04 cldnet01 kernel: [<ffffffffa019bc98>] ?
nf_conntrack_in+0x138/0xa00 [nf_conntrack]
Nov 8 11:21:04 cldnet01 kernel: [<ffffffffa025bb46>] ?
find_bucket+0x66/0x70 [openvswitch]
Nov 8 11:21:04 cldnet01 kernel: [<ffffffffa025bd61>] ?
ovs_flow_tbl_lookup+0x51/0xb0 [openvswitch]
Nov 8 11:21:04 cldnet01 kernel: [<ffffffffa01b9721>] ?
ipv4_conntrack_in+0x21/0x30 [nf_conntrack_ipv4]
Nov 8 11:21:04 cldnet01 kernel: [<ffffffff81477459>] ? nf_iterate+0x69/0xb0
Nov 8 11:21:04 cldnet01 kernel: [<ffffffff81481850>] ?
ip_rcv_finish+0x0/0x440
Nov 8 11:21:04 cldnet01 kernel: [<ffffffff81477614>] ?
nf_hook_slow+0x74/0x110
Nov 8 11:21:04 cldnet01 kernel: [<ffffffff81481850>] ?
ip_rcv_finish+0x0/0x440
Nov 8 11:21:04 cldnet01 kernel: [<ffffffff81481ef4>] ? ip_rcv+0x264/0x350
Nov 8 11:21:04 cldnet01 kernel: [<ffffffffa025e503>] ?
ovs_netdev_frame_hook+0xb3/0x110 [openvswitch]
Nov 8 11:21:04 cldnet01 kernel: [<ffffffff81449e6b>] ?
__netif_receive_skb+0x4ab/0x750
Nov 8 11:21:04 cldnet01 kernel: [<ffffffff8144a1aa>] ?
process_backlog+0x9a/0x100
Nov 8 11:21:04 cldnet01 kernel: [<ffffffff8144f483>] ?
net_rx_action+0x103/0x2f0
Nov 8 11:21:04 cldnet01 kernel: [<ffffffff810770b1>] ?
__do_softirq+0xc1/0x1e0
Nov 8 11:21:04 cldnet01 kernel: [<ffffffff810e1bb0>] ?
handle_IRQ_event+0x60/0x170
Nov 8 11:21:04 cldnet01 kernel: [<ffffffff8100c1cc>] ?
call_softirq+0x1c/0x30
Nov 8 11:21:04 cldnet01 kernel: [<ffffffff8100de05>] ? do_softirq+0x65/0xa0
Nov 8 11:21:04 cldnet01 kernel: [<ffffffff81076e95>] ? irq_exit+0x85/0x90
Nov 8 11:21:04 cldnet01 kernel: [<ffffffff8151cd75>] ? do_IRQ+0x75/0xf0
Nov 8 11:21:04 cldnet01 kernel: [<ffffffff8100b9d3>] ?
ret_from_intr+0x0/0x11
Nov 8 11:21:04 cldnet01 kernel: <EOI> [<ffffffff812d48fe>] ?
intel_idle+0xde/0x170
Nov 8 11:21:04 cldnet01 kernel: [<ffffffff812d48e1>] ?
intel_idle+0xc1/0x170
Nov 8 11:21:04 cldnet01 kernel: [<ffffffff81416247>] ?
cpuidle_idle_call+0xa7/0x140
Nov 8 11:21:04 cldnet01 kernel: [<ffffffff81009fc6>] ? cpu_idle+0xb6/0x110
Nov 8 11:21:04 cldnet01 kernel: [<ffffffff8150cc00>] ?
start_secondary+0x2ac/0x2ef
...
...
...
(start quantum-openvswitch-agent service)
Nov 8 11:21:13 cldnet01 ovs-vsctl: ovs|00001|vsctl|INFO|Called as
/usr/bin/ovs-vsctl --timeout=2 -- --if-exists del-port br-int patch-tun
Nov 8 11:21:13 cldnet01 ovs-vsctl: ovs|00001|vsctl|INFO|Called as
/usr/bin/ovs-vsctl --timeout=2 -- --if-exists del-port br-int int-br-ex
Nov 8 11:21:13 cldnet01 kernel: device int-br-ex left promiscuous mode
Nov 8 11:21:13 cldnet01 ovs-vsctl: ovs|00001|vsctl|INFO|Called as
/usr/bin/ovs-vsctl --timeout=2 -- --if-exists del-port br-ex phy-br-ex
Nov 8 11:21:13 cldnet01 kernel: device phy-br-ex left promiscuous mode
Nov 8 11:21:14 cldnet01 ovs-vsctl: ovs|00001|vsctl|INFO|Called as
/usr/bin/ovs-vsctl --timeout=2 -- --may-exist add-port br-int int-br-ex
Nov 8 11:21:14 cldnet01 kernel: device int-br-ex entered promiscuous mode
Nov 8 11:21:14 cldnet01 ovs-vsctl: ovs|00001|vsctl|INFO|Called as
/usr/bin/ovs-vsctl --timeout=2 -- --may-exist add-port br-ex phy-br-ex
Nov 8 11:21:14 cldnet01 kernel: device phy-br-ex entered promiscuous mode
Nov 8 11:21:14 cldnet01 kernel: ADDRCONF(NETDEV_UP): int-br-ex: link is
not ready
Nov 8 11:21:14 cldnet01 kernel: ADDRCONF(NETDEV_CHANGE): int-br-ex:
link becomes ready
Nov 8 11:21:14 cldnet01 ovs-vsctl: ovs|00001|vsctl|INFO|Called as
/usr/bin/ovs-vsctl --timeout=2 -- --if-exists del-port br-int int-br-em2
Nov 8 11:21:14 cldnet01 kernel: device int-br-em2 left promiscuous mode
Nov 8 11:21:14 cldnet01 ovs-vsctl: ovs|00001|vsctl|INFO|Called as
/usr/bin/ovs-vsctl --timeout=2 -- --if-exists del-port br-em2 phy-br-em2
Nov 8 11:21:14 cldnet01 kernel: device phy-br-em2 left promiscuous mode
Nov 8 11:21:16 cldnet01 ntpd[2295]: Deleting interface #26 int-br-ex,
fe80::68e7:66ff:fef7:f28b#123, interface stats: received=0, sent=0,
dropped=0, active_time=1853 secs
Nov 8 11:21:16 cldnet01 ntpd[2295]: Deleting interface #27 phy-br-ex,
fe80::94dd:f8ff:fe75:e19b#123, interface stats: received=0, sent=0,
dropped=0, active_time=1853 secs
Nov 8 11:21:16 cldnet01 ntpd[2295]: Deleting interface #28 int-br-em2,
fe80::bc7e:9aff:fe93:2a0c#123, interface stats: received=0, sent=0,
dropped=0, active_time=1851 secs
Nov 8 11:21:16 cldnet01 ntpd[2295]: Deleting interface #29 phy-br-em2,
fe80::20b0:56ff:fe3e:7019#123, interface stats: received=0, sent=0,
dropped=0, active_time=1851 secs
Nov 8 11:21:16 cldnet01 ovs-vsctl: ovs|00001|vsctl|INFO|Called as
/usr/bin/ovs-vsctl --timeout=2 -- --may-exist add-port br-int int-br-em2
Nov 8 11:21:16 cldnet01 kernel: device int-br-em2 entered promiscuous mode
Nov 8 11:21:16 cldnet01 ovs-vsctl: ovs|00001|vsctl|INFO|Called as
/usr/bin/ovs-vsctl --timeout=2 -- --may-exist add-port br-em2 phy-br-em2
Nov 8 11:21:16 cldnet01 kernel: device phy-br-em2 entered promiscuous mode
Nov 8 11:21:16 cldnet01 kernel: ADDRCONF(NETDEV_UP): int-br-em2: link
is not ready
Nov 8 11:21:16 cldnet01 kernel: ADDRCONF(NETDEV_CHANGE): int-br-em2:
link becomes ready
Nov 8 11:21:17 cldnet01 ovs-vsctl: ovs|00001|vsctl|INFO|Called as
/usr/bin/ovs-vsctl --timeout=2 set Port qr-ed2e07cf-db tag=1
Nov 8 11:21:17 cldnet01 ovs-vsctl: ovs|00001|vsctl|INFO|Called as
/usr/bin/ovs-vsctl --timeout=2 set Port tap4aafbf9c-49 tag=1
Nov 8 11:21:18 cldnet01 ntpd[2295]: Listening on interface #30
int-br-ex, fe80::14d5:daff:feda:2941#123 Enabled
Nov 8 11:21:18 cldnet01 ntpd[2295]: Listening on interface #31
phy-br-ex, fe80::5452:6bff:fe41:27bc#123 Enabled
Nov 8 11:21:20 cldnet01 ntpd[2295]: Listening on interface #32
int-br-em2, fe80::4067:b1ff:fe50:194c#123 Enabled
Nov 8 11:21:20 cldnet01 ntpd[2295]: Listening on interface #33
phy-br-em2, fe80::bc54:5dff:fedd:64fe#123 Enabled
(no more "hw csum failure" errors from here on)
On 11/8/2013 8:52 AM, Thomas Graf wrote:
> On 11/06/2013 02:29 AM, Paul Robert Marino wrote:
>> Which kernel are you running
>> If you are running the stock RedHat kernel not the patched version that
>> comes with RHOS or RDO that might explain it.
>>
>>
>>
>> -- Sent from my HP Pre3
>>
>> ------------------------------------------------------------------------
>> On Nov 5, 2013 17:33, Xin Zhao <xzhao at bnl.gov> wrote:
>>
>> Hi Thomas,
>>
>> Thanks for the reply, here is the info of the drivers:
>>
>> 1) NIC for the VM network connection:
>>
>> $> ethtool -i eth1
>> driver: e1000e
>> version: 2.1.4-k
>> firmware-version: 1.6-12
>> bus-info: 0000:04:00.1
>> supports-statistics: yes
>> supports-test: yes
>> supports-eeprom-access: yes
>> supports-register-dump: yes
>> supports-priv-flags: no
>
> e1000e is a driver that is known to work. The error would indicate a
> silicon or firmware issue. Are you seeing this issue on multiple
> machines?
More information about the Openstack
mailing list