[Openstack] [Fuel] node name issue

Jim Okken jim at jokken.com
Thu Oct 19 03:37:14 UTC 2017


hi all,

please help us out with an issue we are seeing on multiple compute nodes
running Newton (Ubuntu 16.04.3 Kernel 4.4.0). After about 1 hour of running
our VOIP test application the instances become non-responsive and can't be
pinged as well do the compute nodes.

messages appear on the compute node console screens. a screen shot of that
is hosted here:

http://www.jokken.com/downloads/console.png

i'll try to attach it also.

The first compute node this was seen on was running 2 instances, the second
was running only 1 instance. They were using on a portion of the total 40
vCPUs available, and the load was moderate. Cold boot these nodes and all
is well again, until we run our application for about 1 hour.

please let us know what you think thanks!

not a lot is shown in DEBUG logging of Nova and Neutron on the compute node

these logs are here:

http://www.jokken.com/downloads/logs.zip

i'll try to attach them too.

https://ask.openstack.org/en/question/110748/soft-lockup-on-newton-compute-nodes/

/var/log/messages on the compute node shows many repeats of these messages:

2017-10-18T20:49:26.462309+00:00 node-58 kernel: [1297007.624935] Modules
linked in: binfmt_misc nf_conntrack_netlink vhost_net vhost macvtap macvlan
ip6table_raw xt_mac xt_tcpudp xt_physdev br_netfilter xt_set
ip_set_hash_net ip_set nfnetlink veth ebtable_filter ebtables openvswitch
ocfs2 quota_tree ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager
ocfs2_stackglue configfs ip6table_filter ip6_tables xt_multiport
xt_conntrack iptable_filter xt_comment xt_CT iptable_raw ip_tables x_tables
xfs ipmi_ssif 8021q garp mrp intel_rapl x86_pkg_temp_thermal
intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel
aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd
serio_raw bridge stp llc sb_edac edac_core hpilo ioatdma lpc_ich shpchp dca
ipmi_si 8250_fintek ipmi_msghandler acpi_power_meter mac_hid kvm_intel kvm
irqbypass ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr
iscsi_tcp libiscsi_tcp nf_conntrack_proto_gre nf_conntrack_ipv6
nf_defrag_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack autofs4 raid10
raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor
raid6_pq libcrc32c raid1 raid0 multipath linear dm_round_robin ses
enclosure uas usb_storage psmouse ahci lpfc be2iscsi libahci be2net
iscsi_boot_sysfs libiscsi vxlan scsi_transport_fc ip6_udp_tunnel
scsi_transport_iscsi udp_tunnel wmi fjes scsi_dh_emc scsi_dh_rdac
scsi_dh_alua dm_multipath

2017-10-18T20:49:26.462311+00:00 node-58 kernel: [1297007.625008] CPU: 27
PID: 860 Comm: qemu-system-x86 Not tainted 4.4.0-93-generic #116-Ubuntu

2017-10-18T20:49:26.462313+00:00 node-58 kernel: [1297007.625009] Hardware
name: HP ProLiant BL460c Gen9, BIOS I36 02/17/2017

2017-10-18T20:49:26.462314+00:00 node-58 kernel: [1297007.625010] task:
ffff881faaaa7000 ti: ffff881fa3a34000 task.ti: ffff881fa3a34000

2017-10-18T20:49:26.462315+00:00 node-58 kernel: [1297007.625011] RIP:
0010:[<ffffffff810cb29c>]  [<ffffffff810cb29c>]
native_queued_spin_lock_slowpath+0x15c/0x170

2017-10-18T20:49:26.462316+00:00 node-58 kernel: [1297007.625018] RSP:
0018:ffff883fff143c30  EFLAGS: 00000202

2017-10-18T20:49:26.462317+00:00 node-58 kernel: [1297007.625019] RAX:
0000000000000101 RBX: ffff881f677603f0 RCX: 0000000000000001

2017-10-18T20:49:26.462337+00:00 node-58 kernel: [1297007.625020] RDX:
0000000000000101 RSI: 0000000000000001 RDI: ffff881f677603ec

2017-10-18T20:49:26.462340+00:00 node-58 kernel: [1297007.625020] RBP:
ffff883fff143c30 R08: 0000000000000101 R09: ffffffff81191e27

2017-10-18T20:49:26.462341+00:00 node-58 kernel: [1297007.625021] R10:
ffffea00ffb09780 R11: 0000000000000a00 R12: ffff881f677603ec

2017-10-18T20:49:26.462342+00:00 node-58 kernel: [1297007.625022] R13:
0000000000000a00 R14: 00000000000a5000 R15: 0000000000000a00

2017-10-18T20:49:26.462343+00:00 node-58 kernel: [1297007.625023] FS:
00007f0c53fb3c00(0000) GS:ffff883fff140000(0000) knlGS:0000000000000000

2017-10-18T20:49:26.462343+00:00 node-58 kernel: [1297007.625024] CS:  0010
DS: 0000 ES: 0000 CR0: 0000000080050033

2017-10-18T20:49:26.462344+00:00 node-58 kernel: [1297007.625025] CR2:
00007fe018e2547e CR3: 0000003ec0b75000 CR4: 00000000001426e0

2017-10-18T20:49:26.462345+00:00 node-58 kernel: [1297007.625026] Stack:

2017-10-18T20:49:26.462347+00:00 node-58 kernel: [1297007.625026]
ffff883fff143c40 ffffffff81842f71 ffff883fff143c60 ffffffff81841085

2017-10-18T20:49:26.462348+00:00 node-58 kernel: [1297007.625028]
ffff881dc609ac00 ffff881f677604b0 ffff883fff143c70 ffffffff818410cb

2017-10-18T20:49:26.462349+00:00 node-58 kernel: [1297007.625029]
ffff883fff143ca0 ffffffffc08c658d ffff883feff9d500 0000000000000a00

2017-10-18T20:49:26.462351+00:00 node-58 kernel: [1297007.625031] Call
Trace:

2017-10-18T20:49:26.462353+00:00 node-58 kernel: [1297007.625032]  <IRQ>

2017-10-18T20:49:26.462354+00:00 node-58 kernel: [1297007.625039]
[<ffffffff81842f71>] _raw_spin_lock+0x21/0x30

2017-10-18T20:49:26.462356+00:00 node-58 kernel: [1297007.625041]
[<ffffffff81841085>] __mutex_unlock_slowpath+0x25/0x50

2017-10-18T20:49:26.462356+00:00 node-58 kernel: [1297007.625042]
[<ffffffff818410cb>] mutex_unlock+0x1b/0x20

2017-10-18T20:49:26.462357+00:00 node-58 kernel: [1297007.625076]
[<ffffffffc08c658d>] ocfs2_dio_end_io+0x6d/0x80 [ocfs2]

2017-10-18T20:49:26.462358+00:00 node-58 kernel: [1297007.625080]
[<ffffffff8124d34c>] dio_complete+0x11c/0x1c0

2017-10-18T20:49:26.462359+00:00 node-58 kernel: [1297007.625081]
[<ffffffff8124d463>] dio_bio_end_aio+0x73/0x100

2017-10-18T20:49:26.462361+00:00 node-58 kernel: [1297007.625085]
[<ffffffff813c2b9f>] bio_endio+0x3f/0x60

2017-10-18T20:49:26.462362+00:00 node-58 kernel: [1297007.625087]
[<ffffffff813ca547>] blk_update_request+0x87/0x310

2017-10-18T20:49:26.462363+00:00 node-58 kernel: [1297007.625091]
[<ffffffff816bae96>] end_clone_bio+0x46/0x70

2017-10-18T20:49:26.462363+00:00 node-58 kernel: [1297007.625092]
[<ffffffff813c2b9f>] bio_endio+0x3f/0x60

2017-10-18T20:49:26.462364+00:00 node-58 kernel: [1297007.625093]
[<ffffffff813ca547>] blk_update_request+0x87/0x310

2017-10-18T20:49:26.462365+00:00 node-58 kernel: [1297007.625097]
[<ffffffff815c4583>] scsi_end_request+0x33/0x1d0

2017-10-18T20:49:26.462367+00:00 node-58 kernel: [1297007.625100]
[<ffffffff815c7cb6>] scsi_io_completion+0x1b6/0x690

2017-10-18T20:49:26.462368+00:00 node-58 kernel: [1297007.625104]
[<ffffffff810beb66>] ? rebalance_domains+0x166/0x2d0

2017-10-18T20:49:26.462368+00:00 node-58 kernel: [1297007.625107]
[<ffffffff815be8df>] scsi_finish_command+0xcf/0x120

2017-10-18T20:49:26.462377+00:00 node-58 kernel: [1297007.625109]
[<ffffffff815c7444>] scsi_softirq_done+0x124/0x150

2017-10-18T20:49:26.462378+00:00 node-58 kernel: [1297007.625112]
[<ffffffff813d2437>] blk_done_softirq+0x87/0xb0

2017-10-18T20:49:26.462379+00:00 node-58 kernel: [1297007.625116]
[<ffffffff81085dd1>] __do_softirq+0x101/0x290

2017-10-18T20:49:26.462381+00:00 node-58 kernel: [1297007.625118]
[<ffffffff810860d3>] irq_exit+0xa3/0xb0

2017-10-18T20:49:26.462382+00:00 node-58 kernel: [1297007.625121]
[<ffffffff81050e03>] smp_call_function_single_interrupt+0x33/0x40

2017-10-18T20:49:26.462382+00:00 node-58 kernel: [1297007.625124]
[<ffffffff81844622>] call_function_single_interrupt+0x82/0x90

2017-10-18T20:49:26.462383+00:00 node-58 kernel: [1297007.625125]  <EOI>

2017-10-18T20:49:26.462383+00:00 node-58 kernel: [1297007.625127]
[<ffffffff81842f64>] ? _raw_spin_lock+0x14/0x30

2017-10-18T20:49:26.462385+00:00 node-58 kernel: [1297007.625129]
[<ffffffff81840f72>] __mutex_lock_slowpath+0x72/0x130

2017-10-18T20:49:26.462387+00:00 node-58 kernel: [1297007.625142]
[<ffffffffc08dd099>] ? ocfs2_inode_unlock+0x119/0x120 [ocfs2]

2017-10-18T20:49:26.462387+00:00 node-58 kernel: [1297007.625143]
[<ffffffff8184104f>] mutex_lock+0x1f/0x30

2017-10-18T20:49:26.462388+00:00 node-58 kernel: [1297007.625155]
[<ffffffffc08e677a>] ocfs2_file_write_iter+0x95a/0xdf0 [ocfs2]

2017-10-18T20:49:26.462388+00:00 node-58 kernel: [1297007.625158]
[<ffffffff81224090>] ? poll_select_copy_remaining+0x140/0x140

2017-10-18T20:49:26.462389+00:00 node-58 kernel: [1297007.625169]
[<ffffffffc08e5e20>] ? ocfs2_check_range_for_refcount+0x150/0x150 [ocfs2]

2017-10-18T20:49:26.462391+00:00 node-58 kernel: [1297007.625171]
[<ffffffff812601ba>] aio_run_iocb+0x26a/0x2d0

2017-10-18T20:49:26.462392+00:00 node-58 kernel: [1297007.625174]
[<ffffffff8122d6b5>] ? __fget_light+0x25/0x60

2017-10-18T20:49:26.462394+00:00 node-58 kernel: [1297007.625175]
[<ffffffff8122d703>] ? __fdget+0x13/0x20

2017-10-18T20:49:26.462395+00:00 node-58 kernel: [1297007.625177]
[<ffffffff8126108f>] do_io_submit+0x25f/0x500

2017-10-18T20:49:26.462396+00:00 node-58 kernel: [1297007.625178]
[<ffffffff81261340>] SyS_io_submit+0x10/0x20

2017-10-18T20:49:26.462398+00:00 node-58 kernel: [1297007.625181]
[<ffffffff818431f2>] entry_SYSCALL_64_fastpath+0x16/0x71

2017-10-18T20:49:26.462399+00:00 node-58 kernel: [1297007.625181] Code: 01
48 8b 02 48 85 c0 75 0a f3 90 48 8b 02 48 85 c0 74 f6 c7 40 08 01 00 00 00
e9 63 ff ff ff 83 fa 01 75 07 e9 c4 fe ff ff f3 90 <8b> 07 84 c0 75 f8 b8
01 00 00 00 66 89 07 5d c3 0f 1f 40 00 0f
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20171018/a53e632c/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: console.png
Type: image/png
Size: 182301 bytes
Desc: not available
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20171018/a53e632c/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: logs.zip
Type: application/zip
Size: 94847 bytes
Desc: not available
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20171018/a53e632c/attachment.zip>


More information about the Openstack mailing list