[Openstack] BUG: soft lockup messages
Matej Mailing
mailing at tam.si
Wed Jan 7 10:40:17 UTC 2015
2015-01-07 9:06 GMT+01:00 Ritesh Raj Sarraf <rrs at researchut.com>:
>
>
> On 01/05/2015 03:46 PM, Matej Mailing wrote:
>> Hello Erhan,
>>
>> soft lock-up has just happened again on the instance node. I am
>> monitoring network interface traffic on which the NFS server is
>> connected and the interface has constantly been under 20% of it's
>> capacity, also the load on the NFS server is load - though I am unsure
>> if this is relevant at all...
>>
>> What makes me wonder - is is "normal" that both lock-ups are for the
>> same period of time (51s) even on two CPUs and with two different
>> PIDs?
>>
>> The output from the log is:
>>
>> Jan 5 11:01:13 postar kernel: [477123.485080] NMI watchdog: BUG: soft
>> lockup - CPU#3 stuck for 51s! [mysqld:2612]
>> Jan 5 11:01:13 postar kernel: [477123.485151] Modules linked in:
>> xt_hl ip6t_rt nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT
>> nf_reject_ipv4 xt_limit
>> xt_tcpudp xt_addrtype nf_conntrack_ipv4 nf_defrag_ipv4 xt_state
>> ip6table_filter ip6_tables nfsd nfs_acl auth_rpcgss
>> nf_conntrack_netbios_ns
>> nf_conntrack_broadcast nfs nf_nat_ftp nf_nat ppdev nf_conntrack_ftp
>> nf_conntrack fscache parport_pc parport lockd joydev pvpanic
>> iptable_filter cirrus
>> ip_tables 8250_fintek ttm psmouse drm_kms_helper sunrpc x_tables
>> serio_raw hid_generic drm grace mac_hid sysimgblt sysfillrect
>> syscopyarea i2c_piix4
>> usbhid hid floppy
>> Jan 5 11:01:13 postar kernel: [477123.485200] CPU: 3 PID: 2612 Comm:
>> mysqld Tainted: G L 3.18.1-031801-generic #201412170637
>> Jan 5 11:01:13 postar kernel: [477123.485202] Hardware name:
>> OpenStack Foundation OpenStack Nova, BIOS Bochs 01/01/2011
>> Jan 5 11:01:13 postar kernel: [477123.485205] task: ffff880037ab6400
>> ti: ffff880206e40000 task.ti: ffff880206e40000
>> Jan 5 11:01:13 postar kernel: [477123.485207] RIP:
>> 0010:[<ffffffff813a97dc>] [<ffffffff813a97dc>]
>> copy_user_generic_string+0x2c/0x40
>> Jan 5 11:01:13 postar kernel: [477123.485217] RSP:
>> 0018:ffff880206e43c90 EFLAGS: 00010246
>> Jan 5 11:01:13 postar kernel: [477123.485219] RAX: 0000000076793000
>> RBX: ffff880206e43ea0 RCX: 0000000000000200
>> Jan 5 11:01:13 postar kernel: [477123.485221] RDX: 0000000000000000
>> RSI: ffff880076793000 RDI: 00007fd365448000
>> Jan 5 11:01:13 postar kernel: [477123.485223] RBP: ffff880206e43d08
>> R08: ffffea0001d9e4dc R09: ffff880206e43ca0
>> Jan 5 11:01:13 postar kernel: [477123.485225] R10: ffff8801b76ca6f0
>> R11: 0000000000000293 R12: 0000000000000000
>> Jan 5 11:01:13 postar kernel: [477123.485227] R13: ffff880206e43ec8
>> R14: 0000000000001000 R15: 00007fd365448000
>> Jan 5 11:01:13 postar kernel: [477123.485235] FS:
>> 00007fd36c1fb700(0000) GS:ffff88023fd80000(0000)
>> knlGS:0000000000000000
>> Jan 5 11:01:13 postar kernel: [477123.485237] CS: 0010 DS: 0000 ES:
>> 0000 CR0: 0000000080050033
>> Jan 5 11:01:13 postar kernel: [477123.485239] CR2: 000000000b416003
>> CR3: 00000002317ee000 CR4: 00000000000006e0
>> Jan 5 11:01:13 postar kernel: [477123.485253] Stack:
>> Jan 5 11:01:13 postar kernel: [477123.485255] ffffffff811a0f75
>> 000000000012f2ac ffff8801b76ca6f0 ffff880206e43cc8
>> Jan 5 11:01:13 postar kernel: [477123.485259] ffffffff8117864e
>> ffff880076793000 ffffea0001d9e4c0 0000000000001000
>> Jan 5 11:01:13 postar kernel: [477123.485262] ffff880076793000
>> ffffea0001d9e4c0 ffffea0001d9e4c0 ffff880231e4b930
>> Jan 5 11:01:13 postar kernel: [477123.485265] Call Trace:
>> Jan 5 11:01:13 postar kernel: [477123.485273] [<ffffffff811a0f75>] ?
>> copy_page_to_iter_iovec+0xe5/0x300
>> Jan 5 11:01:13 postar kernel: [477123.485279] [<ffffffff8117864e>] ?
>> find_get_entry+0x1e/0x90
>> Jan 5 11:01:13 postar kernel: [477123.485282] [<ffffffff811a14a6>]
>> copy_page_to_iter+0x16/0x70
>> Jan 5 11:01:13 postar kernel: [477123.485286] [<ffffffff81179428>]
>> do_generic_file_read+0x1f8/0x490
>> Jan 5 11:01:13 postar kernel: [477123.485289] [<ffffffff8117a234>]
>> generic_file_read_iter+0xf4/0x150
>> Jan 5 11:01:13 postar kernel: [477123.485294] [<ffffffff810aade1>] ?
>> update_curr+0x141/0x1f0
>> Jan 5 11:01:13 postar kernel: [477123.485298] [<ffffffff811eef28>]
>> new_sync_read+0x78/0xb0
>> Jan 5 11:01:13 postar kernel: [477123.485301] [<ffffffff811f013b>]
>> vfs_read+0xab/0x180
>> Jan 5 11:01:13 postar kernel: [477123.485304] [<ffffffff811f0402>]
>> SyS_pread64+0x92/0xa0
>> Jan 5 11:01:13 postar kernel: [477123.485309] [<ffffffff817b376d>]
>> system_call_fastpath+0x16/0x1b
>> Jan 5 11:01:13 postar kernel: [477123.485311] Code: 66 90 83 fa 08 72
>> 27 89 f9 83 e1 07 74 15 83 e9 08 f7 d9 29 ca 8a 06 88 07 48 ff c6 48
>> ff c7 ff c9 75 f$
>> Jan 5 11:01:13 postar kernel: [477123.486574] NMI watchdog: BUG: soft
>> lockup - CPU#1 stuck for 51s! [mysqld:2282]
>> Jan 5 11:01:13 postar kernel: [477123.486633] Modules linked in:
>> xt_hl ip6t_rt nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT
>> nf_reject_ipv4 xt_limit xt_tcpud$
>> Jan 5 11:01:13 postar kernel: [477123.486669] CPU: 1 PID: 2282 Comm:
>> mysqld Tainted: G L 3.18.1-031801-generic #201412170637
>> Jan 5 11:01:13 postar kernel: [477123.486671] Hardware name:
>> OpenStack Foundation OpenStack Nova, BIOS Bochs 01/01/2011
>> Jan 5 11:01:13 postar kernel: [477123.486673] task: ffff880232bbb200
>> ti: ffff880232bd8000 task.ti: ffff880232bd8000
>> Jan 5 11:01:13 postar kernel: [477123.486675] RIP:
>> 0010:[<ffffffff813a97dc>] [<ffffffff813a97dc>]
>> copy_user_generic_string+0x2c/0x40
>> Jan 5 11:01:13 postar kernel: [477123.486680] RSP:
>> 0018:ffff880232bdbc18 EFLAGS: 00010246
>> Jan 5 11:01:13 postar kernel: [477123.486683] RAX: 00007fd35f714000
>> RBX: 0000000000001000 RCX: 0000000000000200
>> Jan 5 11:01:13 postar kernel: [477123.486685] RDX: 0000000000000000
>> RSI: 00007fd35f714000 RDI: ffff88007131c000
>> Jan 5 11:01:13 postar kernel: [477123.486687] RBP: ffff880232bdbc28
>> R08: ffffea0001c4c700 R09: 00000000fffff000
>> Jan 5 11:01:13 postar kernel: [477123.486689] R10: ffff8801b77814e0
>> R11: 0000000000000293 R12: 0000000000001000
>> Jan 5 11:01:13 postar kernel: [477123.486691] R13: ffff880232bdbea0
>> R14: 0000000000000000 R15: 0000000000001000
>> Jan 5 11:01:13 postar kernel: [477123.486697] FS:
>> 00007fd35afe7700(0000) GS:ffff88023fc80000(0000)
>> knlGS:0000000000000000
>> Jan 5 11:01:13 postar kernel: [477123.486699] CS: 0010 DS: 0000 ES:
>> 0000 CR0: 0000000080050033
>> Jan 5 11:01:13 postar kernel: [477123.486702] CR2: 000000000cdc3001
>> CR3: 00000002317ee000 CR4: 00000000000006e0
>> Jan 5 11:01:13 postar kernel: [477123.486714] Stack:
>> Jan 5 11:01:13 postar kernel: [477123.486716] ffffffff811a0386
>> 0000000120094000 ffff880232bdbc78 ffffffff811a0a35
>> Jan 5 11:01:13 postar kernel: [477123.486719] ffff880232bdbc88
>> 0000000000000000 ffff880232bdbc78 0000000120094000
>> Jan 5 11:01:13 postar kernel: [477123.486722] 0000000000001000
>> ffff880232bdbea0 ffff880231e4b930 0000000000000000
>> Jan 5 11:01:13 postar kernel: [477123.486726] Call Trace:
>> Jan 5 11:01:13 postar kernel: [477123.486752] [<ffffffff811a0386>] ?
>> copy_from_user_atomic_iovec+0x56/0x80
>> Jan 5 11:01:13 postar kernel: [477123.486757] [<ffffffff811a0a35>]
>> iov_iter_copy_from_user_atomic+0xd5/0xe0
>> Jan 5 11:01:13 postar kernel: [477123.486761] [<ffffffff81177410>]
>> generic_perform_write+0xe0/0x1c0
>> Jan 5 11:01:13 postar kernel: [477123.486766] [<ffffffff8120a0f1>] ?
>> update_time+0x81/0xc0
>> Jan 5 11:01:13 postar kernel: [477123.486770] [<ffffffff8120e4a2>] ?
>> mnt_clone_write+0x12/0x30
>> Jan 5 11:01:13 postar kernel: [477123.486773] [<ffffffff81179e9f>]
>> __generic_file_write_iter+0x16f/0x350
>> Jan 5 11:01:13 postar kernel: [477123.486778] [<ffffffff8126b7d9>]
>> ext4_file_write_iter+0x119/0x3d0
>> Jan 5 11:01:13 postar kernel: [477123.486783] [<ffffffff810efcd8>] ?
>> get_futex_key+0x1f8/0x2e0
>> Jan 5 11:01:13 postar kernel: [477123.486786] [<ffffffff811ef0eb>]
>> new_sync_write+0x7b/0xb0
>> Jan 5 11:01:13 postar kernel: [477123.486789] [<ffffffff811eff67>]
>> vfs_write+0xc7/0x1f0
>> Jan 5 11:01:13 postar kernel: [477123.486792] [<ffffffff811f04a2>]
>> SyS_pwrite64+0x92/0xa0
>> Jan 5 11:01:13 postar kernel: [477123.486795] [<ffffffff817b376d>]
>> system_call_fastpath+0x16/0x1b
>> Jan 5 11:01:13 postar kernel: [477123.486797] Code: 66 90 83 fa 08 72
>> 27 89 f9 83 e1 07 74 15 83 e9 08 f7 d9 29 ca 8a 06 88 07 48 ff c6 48
>> ff c7 ff c9
>> 75 f2 89 d1 c1 e9 03 83 e2 07 <f3> 48 a5 89 d1 f3 a4 31 c0 66 66 90 c3
>> 0f 1f 80 00 00 00 00 66
>>
>>
>> Thanks,
>> Matej
>
>
> The profile of this hardware looks virtualized. That makes me ask if you
> have installed the para-virtualized drivers ?
>
>
> --
> Given the large number of mailing lists I follow, I request you to CC me
> in replies for quicker response
>
>
> _______________________________________________
> Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to : openstack at lists.openstack.org
> Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Hi all, just an update:
today we got a stall and then a soft lock-up:
an 7 08:05:28 postar kernel: [639021.260041] INFO: rcu_sched
self-detected stall on CPU { 2} (t=91560 jiffies g=4180521 c=4180520
q=0)
Jan 7 08:05:28 postar kernel: [639021.260041] Task dump for CPU 2:
Jan 7 08:05:28 postar kernel: [639021.260041] /usr/sbin/spamd R
running task 0 9932 24127 0x00000008
Jan 7 08:05:28 postar kernel: [639021.260041] ffffffff81c55b00
ffff88023fd03cf8 ffffffff810a0ee8 0000000000000002
Jan 7 08:05:28 postar kernel: [639021.260041] ffffffff81c55b00
ffff88023fd03d18 ffffffff810a388f ffff88023fd03d58
Jan 7 08:05:28 postar kernel: [639021.260041] ffffffff81c55b00
ffff88023fd03d48 ffffffff810d3e7b ffffffff81c55b00
Jan 7 08:05:28 postar kernel: [639021.260041] Call Trace:
Jan 7 08:05:28 postar kernel: [639021.260041] <IRQ>
[<ffffffff810a0ee8>] sched_show_task+0xa8/0x110
Jan 7 08:05:28 postar kernel: [639021.260041] [<ffffffff810a388f>]
dump_cpu_task+0x3f/0x50
Jan 7 08:05:28 postar kernel: [639021.260041] [<ffffffff810d3e7b>]
rcu_dump_cpu_stacks+0x8b/0xc0
Jan 7 08:05:28 postar kernel: [639021.260041] [<ffffffff810d5cb7>]
print_cpu_stall+0xd7/0x140
Jan 7 08:05:28 postar kernel: [639021.260041] [<ffffffff810d6405>]
check_cpu_stall.isra.53+0xa5/0xb0
Jan 7 08:05:28 postar kernel: [639021.260041] [<ffffffff810d6448>]
__rcu_pending+0x38/0x1d0
Jan 7 08:05:28 postar kernel: [639021.260041] [<ffffffff810d8065>]
rcu_check_callbacks+0xa5/0x190
Jan 7 08:05:28 postar kernel: [639021.260041] [<ffffffff810dd648>]
update_process_times+0x48/0x80
Jan 7 08:05:28 postar kernel: [639021.260041] [<ffffffff810ed1a3>]
tick_sched_handle.isra.17+0x33/0x70
Jan 7 08:05:28 postar kernel: [639021.260041] [<ffffffff810ed2dc>]
tick_sched_timer+0x4c/0x80
Jan 7 08:05:28 postar kernel: [639021.260041] [<ffffffff810de106>]
__run_hrtimer+0x76/0x230
Jan 7 08:05:28 postar kernel: [639021.260041] [<ffffffff810ed290>] ?
tick_nohz_handler+0xb0/0xb0
Jan 7 08:05:28 postar kernel: [639021.260041] [<ffffffff810de90f>]
hrtimer_interrupt+0xff/0x260
Jan 7 08:05:28 postar kernel: [639021.260041] [<ffffffff8104ae3b>]
local_apic_timer_interrupt+0x3b/0x60
Jan 7 08:05:28 postar kernel: [639021.260041] [<ffffffff817b65f5>]
smp_apic_timer_interrupt+0x45/0x60
Jan 7 08:05:28 postar kernel: [639021.260041] [<ffffffff817b46bd>]
apic_timer_interrupt+0x6d/0x80
....
and then:
Jan 7 08:05:28 postar kernel: [639377.621033] NMI watchdog: BUG: soft
lockup - CPU#1 stuck for 92s! [apache2:5279]
....
This time I have iostat stats for this exact moment since I kept it logging:
last entry before Jan 7 08:05:28 is:
2015-01-07 07:59:31 avg-cpu: %user %nice %system %iowait %steal %idle
2015-01-07 07:59:31 0.00 0.00 48.46 0.00 17.75 33.79
2015-01-07 07:59:31
2015-01-07 07:59:31 Device: rrqm/s wrqm/s r/s w/s
rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
2015-01-07 07:59:31 vda 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
2015-01-07 07:59:31
and then:
2015-01-07 08:05:28 avg-cpu: %user %nice %system %iowait %steal %idle
2015-01-07 08:05:28 5.88 0.00 11.76 0.00 17.65 64.71
2015-01-07 08:05:28
2015-01-07 08:05:28 Device: rrqm/s wrqm/s r/s w/s
rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
2015-01-07 08:05:28 vda 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
2015-01-07 08:05:28
2015-01-07 08:05:29 avg-cpu: %user %nice %system %iowait %steal %idle
2015-01-07 08:05:29 8.59 0.00 15.46 16.15 13.40 46.39
2015-01-07 08:05:29
2015-01-07 08:05:29 Device: rrqm/s wrqm/s r/s w/s
rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
2015-01-07 08:05:29 vda 0.00 0.00 0.00 230.69
0.00 1679.21 14.56 32.52 140.98 0.00 140.98 3.30 76.04
2015-01-07 08:05:29
2015-01-07 08:05:30 avg-cpu: %user %nice %system %iowait %steal %idle
2015-01-07 08:05:30 4.21 0.00 52.87 5.36 21.84 15.71
2015-01-07 08:05:30
2015-01-07 08:05:30 Device: rrqm/s wrqm/s r/s w/s
rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
2015-01-07 08:05:30 vda 0.00 0.00 0.00 111.00
0.00 448.00 8.07 0.83 7.50 0.00 7.50 5.05 56.00
2015-01-07 08:05:30
2015-01-07 08:05:31 avg-cpu: %user %nice %system %iowait %steal %idle
2015-01-07 08:05:31 15.41 0.00 54.14 3.01 15.41 12.03
2015-01-07 08:05:31
2015-01-07 08:05:31 Device: rrqm/s wrqm/s r/s w/s
rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
2015-01-07 08:05:31 vda 0.00 0.00 0.00 266.00
0.00 1044.00 7.85 0.78 2.93 0.00 2.93 1.68 44.80
2015-01-07 08:05:31
2015-01-07 08:05:32 avg-cpu: %user %nice %system %iowait %steal %idle
2015-01-07 08:05:32 13.79 0.00 63.60 2.30 19.16 1.15
2015-01-07 08:05:32
2015-01-07 08:05:32 Device: rrqm/s wrqm/s r/s w/s
rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
2015-01-07 08:05:32 vda 0.00 0.00 0.00 120.59
0.00 478.43 7.93 0.38 3.15 0.00 3.15 2.05 24.71
2015-01-07 08:05:32
2015-01-07 08:05:33 avg-cpu: %user %nice %system %iowait %steal %idle
2015-01-07 08:05:33 18.61 0.00 33.94 8.39 14.60 24.45
2015-01-07 08:05:33
2015-01-07 08:05:33 Device: rrqm/s wrqm/s r/s w/s
rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
2015-01-07 08:05:33 vda 0.00 0.00 0.00 274.26
0.00 6637.62 48.40 4.82 17.56 0.00 17.56 1.47 40.40
2015-01-07 08:05:33
Then load slowly goes down. Can we blame I/O load?
Thanks,
Matej
More information about the Openstack
mailing list