[Openstack] BUG: soft lockup messages

Matej Mailing mailing at tam.si
Mon Jan 5 10:16:18 UTC 2015


2015-01-04 21:20 GMT+01:00 Erhan Ekici <erhan.ekici at gmail.com>:
> Hi Matej,
>
> Can you change the values(both on compute and instance node) and see if soft
> lock-ups generated again? There are two approaches; one decreasing and the
> other increasing the cache.
>
> vm.dirty_background_ratio = 5
> vm.dirty_ratio = 10
>
> If I/O & Cache is not the source of the problem, we should focus on CPU
> settings or kernel.
>
> For example, you can try to use a different kernel (before 3.18.1) and see
> if it happens again?
>
> Thanks,
> Erhan

Hello Erhan,

soft lock-up has just happened again on the instance node. I am
monitoring network interface traffic on which the NFS server is
connected and the interface has constantly been under 20% of it's
capacity, also the load on the NFS server is load - though I am unsure
if this is relevant at all...

What makes me wonder - is is "normal" that both lock-ups are for the
same period of time (51s) even on two CPUs and with two different
PIDs?

The output from the log is:

Jan  5 11:01:13 postar kernel: [477123.485080] NMI watchdog: BUG: soft
lockup - CPU#3 stuck for 51s! [mysqld:2612]
Jan  5 11:01:13 postar kernel: [477123.485151] Modules linked in:
xt_hl ip6t_rt nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT
nf_reject_ipv4 xt_limit
xt_tcpudp xt_addrtype nf_conntrack_ipv4 nf_defrag_ipv4 xt_state
ip6table_filter ip6_tables nfsd nfs_acl auth_rpcgss
nf_conntrack_netbios_ns
nf_conntrack_broadcast nfs nf_nat_ftp nf_nat ppdev nf_conntrack_ftp
nf_conntrack fscache parport_pc parport lockd joydev pvpanic
iptable_filter cirrus
ip_tables 8250_fintek ttm psmouse drm_kms_helper sunrpc x_tables
serio_raw hid_generic drm grace mac_hid sysimgblt sysfillrect
syscopyarea i2c_piix4
usbhid hid floppy
Jan  5 11:01:13 postar kernel: [477123.485200] CPU: 3 PID: 2612 Comm:
mysqld Tainted: G             L 3.18.1-031801-generic #201412170637
Jan  5 11:01:13 postar kernel: [477123.485202] Hardware name:
OpenStack Foundation OpenStack Nova, BIOS Bochs 01/01/2011
Jan  5 11:01:13 postar kernel: [477123.485205] task: ffff880037ab6400
ti: ffff880206e40000 task.ti: ffff880206e40000
Jan  5 11:01:13 postar kernel: [477123.485207] RIP:
0010:[<ffffffff813a97dc>]  [<ffffffff813a97dc>]
copy_user_generic_string+0x2c/0x40
Jan  5 11:01:13 postar kernel: [477123.485217] RSP:
0018:ffff880206e43c90  EFLAGS: 00010246
Jan  5 11:01:13 postar kernel: [477123.485219] RAX: 0000000076793000
RBX: ffff880206e43ea0 RCX: 0000000000000200
Jan  5 11:01:13 postar kernel: [477123.485221] RDX: 0000000000000000
RSI: ffff880076793000 RDI: 00007fd365448000
Jan  5 11:01:13 postar kernel: [477123.485223] RBP: ffff880206e43d08
R08: ffffea0001d9e4dc R09: ffff880206e43ca0
Jan  5 11:01:13 postar kernel: [477123.485225] R10: ffff8801b76ca6f0
R11: 0000000000000293 R12: 0000000000000000
Jan  5 11:01:13 postar kernel: [477123.485227] R13: ffff880206e43ec8
R14: 0000000000001000 R15: 00007fd365448000
Jan  5 11:01:13 postar kernel: [477123.485235] FS:
00007fd36c1fb700(0000) GS:ffff88023fd80000(0000)
knlGS:0000000000000000
Jan  5 11:01:13 postar kernel: [477123.485237] CS:  0010 DS: 0000 ES:
0000 CR0: 0000000080050033
Jan  5 11:01:13 postar kernel: [477123.485239] CR2: 000000000b416003
CR3: 00000002317ee000 CR4: 00000000000006e0
Jan  5 11:01:13 postar kernel: [477123.485253] Stack:
Jan  5 11:01:13 postar kernel: [477123.485255]  ffffffff811a0f75
000000000012f2ac ffff8801b76ca6f0 ffff880206e43cc8
Jan  5 11:01:13 postar kernel: [477123.485259]  ffffffff8117864e
ffff880076793000 ffffea0001d9e4c0 0000000000001000
Jan  5 11:01:13 postar kernel: [477123.485262]  ffff880076793000
ffffea0001d9e4c0 ffffea0001d9e4c0 ffff880231e4b930
Jan  5 11:01:13 postar kernel: [477123.485265] Call Trace:
Jan  5 11:01:13 postar kernel: [477123.485273]  [<ffffffff811a0f75>] ?
copy_page_to_iter_iovec+0xe5/0x300
Jan  5 11:01:13 postar kernel: [477123.485279]  [<ffffffff8117864e>] ?
find_get_entry+0x1e/0x90
Jan  5 11:01:13 postar kernel: [477123.485282]  [<ffffffff811a14a6>]
copy_page_to_iter+0x16/0x70
Jan  5 11:01:13 postar kernel: [477123.485286]  [<ffffffff81179428>]
do_generic_file_read+0x1f8/0x490
Jan  5 11:01:13 postar kernel: [477123.485289]  [<ffffffff8117a234>]
generic_file_read_iter+0xf4/0x150
Jan  5 11:01:13 postar kernel: [477123.485294]  [<ffffffff810aade1>] ?
update_curr+0x141/0x1f0
Jan  5 11:01:13 postar kernel: [477123.485298]  [<ffffffff811eef28>]
new_sync_read+0x78/0xb0
Jan  5 11:01:13 postar kernel: [477123.485301]  [<ffffffff811f013b>]
vfs_read+0xab/0x180
Jan  5 11:01:13 postar kernel: [477123.485304]  [<ffffffff811f0402>]
SyS_pread64+0x92/0xa0
Jan  5 11:01:13 postar kernel: [477123.485309]  [<ffffffff817b376d>]
system_call_fastpath+0x16/0x1b
Jan  5 11:01:13 postar kernel: [477123.485311] Code: 66 90 83 fa 08 72
27 89 f9 83 e1 07 74 15 83 e9 08 f7 d9 29 ca 8a 06 88 07 48 ff c6 48
ff c7 ff c9 75 f$
Jan  5 11:01:13 postar kernel: [477123.486574] NMI watchdog: BUG: soft
lockup - CPU#1 stuck for 51s! [mysqld:2282]
Jan  5 11:01:13 postar kernel: [477123.486633] Modules linked in:
xt_hl ip6t_rt nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT
nf_reject_ipv4 xt_limit xt_tcpud$
Jan  5 11:01:13 postar kernel: [477123.486669] CPU: 1 PID: 2282 Comm:
mysqld Tainted: G             L 3.18.1-031801-generic #201412170637
Jan  5 11:01:13 postar kernel: [477123.486671] Hardware name:
OpenStack Foundation OpenStack Nova, BIOS Bochs 01/01/2011
Jan  5 11:01:13 postar kernel: [477123.486673] task: ffff880232bbb200
ti: ffff880232bd8000 task.ti: ffff880232bd8000
Jan  5 11:01:13 postar kernel: [477123.486675] RIP:
0010:[<ffffffff813a97dc>]  [<ffffffff813a97dc>]
copy_user_generic_string+0x2c/0x40
Jan  5 11:01:13 postar kernel: [477123.486680] RSP:
0018:ffff880232bdbc18  EFLAGS: 00010246
Jan  5 11:01:13 postar kernel: [477123.486683] RAX: 00007fd35f714000
RBX: 0000000000001000 RCX: 0000000000000200
Jan  5 11:01:13 postar kernel: [477123.486685] RDX: 0000000000000000
RSI: 00007fd35f714000 RDI: ffff88007131c000
Jan  5 11:01:13 postar kernel: [477123.486687] RBP: ffff880232bdbc28
R08: ffffea0001c4c700 R09: 00000000fffff000
Jan  5 11:01:13 postar kernel: [477123.486689] R10: ffff8801b77814e0
R11: 0000000000000293 R12: 0000000000001000
Jan  5 11:01:13 postar kernel: [477123.486691] R13: ffff880232bdbea0
R14: 0000000000000000 R15: 0000000000001000
Jan  5 11:01:13 postar kernel: [477123.486697] FS:
00007fd35afe7700(0000) GS:ffff88023fc80000(0000)
knlGS:0000000000000000
Jan  5 11:01:13 postar kernel: [477123.486699] CS:  0010 DS: 0000 ES:
0000 CR0: 0000000080050033
Jan  5 11:01:13 postar kernel: [477123.486702] CR2: 000000000cdc3001
CR3: 00000002317ee000 CR4: 00000000000006e0
Jan  5 11:01:13 postar kernel: [477123.486714] Stack:
Jan  5 11:01:13 postar kernel: [477123.486716]  ffffffff811a0386
0000000120094000 ffff880232bdbc78 ffffffff811a0a35
Jan  5 11:01:13 postar kernel: [477123.486719]  ffff880232bdbc88
0000000000000000 ffff880232bdbc78 0000000120094000
Jan  5 11:01:13 postar kernel: [477123.486722]  0000000000001000
ffff880232bdbea0 ffff880231e4b930 0000000000000000
Jan  5 11:01:13 postar kernel: [477123.486726] Call Trace:
Jan  5 11:01:13 postar kernel: [477123.486752]  [<ffffffff811a0386>] ?
copy_from_user_atomic_iovec+0x56/0x80
Jan  5 11:01:13 postar kernel: [477123.486757]  [<ffffffff811a0a35>]
iov_iter_copy_from_user_atomic+0xd5/0xe0
Jan  5 11:01:13 postar kernel: [477123.486761]  [<ffffffff81177410>]
generic_perform_write+0xe0/0x1c0
Jan  5 11:01:13 postar kernel: [477123.486766]  [<ffffffff8120a0f1>] ?
update_time+0x81/0xc0
Jan  5 11:01:13 postar kernel: [477123.486770]  [<ffffffff8120e4a2>] ?
mnt_clone_write+0x12/0x30
Jan  5 11:01:13 postar kernel: [477123.486773]  [<ffffffff81179e9f>]
__generic_file_write_iter+0x16f/0x350
Jan  5 11:01:13 postar kernel: [477123.486778]  [<ffffffff8126b7d9>]
ext4_file_write_iter+0x119/0x3d0
Jan  5 11:01:13 postar kernel: [477123.486783]  [<ffffffff810efcd8>] ?
get_futex_key+0x1f8/0x2e0
Jan  5 11:01:13 postar kernel: [477123.486786]  [<ffffffff811ef0eb>]
new_sync_write+0x7b/0xb0
Jan  5 11:01:13 postar kernel: [477123.486789]  [<ffffffff811eff67>]
vfs_write+0xc7/0x1f0
Jan  5 11:01:13 postar kernel: [477123.486792]  [<ffffffff811f04a2>]
SyS_pwrite64+0x92/0xa0
Jan  5 11:01:13 postar kernel: [477123.486795]  [<ffffffff817b376d>]
system_call_fastpath+0x16/0x1b
Jan  5 11:01:13 postar kernel: [477123.486797] Code: 66 90 83 fa 08 72
27 89 f9 83 e1 07 74 15 83 e9 08 f7 d9 29 ca 8a 06 88 07 48 ff c6 48
ff c7 ff c9
75 f2 89 d1 c1 e9 03 83 e2 07 <f3> 48 a5 89 d1 f3 a4 31 c0 66 66 90 c3
0f 1f 80 00 00 00 00 66


Thanks,
Matej


>
> On Sun, Jan 4, 2015 at 8:39 PM, Matej Mailing <mailing at tam.si> wrote:
>>
>> 2015-01-04 17:10 GMT+01:00 Erhan Ekici <erhan.ekici at gmail.com>:
>> > Hi Matej,
>> >
>> > I am not sure but most of the soft lock-up messages generated due to I/O
>> > issues.  From your log entries, I see mysqld issues as well as others:
>> >
>> > Jan  3 10:41:44 postar kernel: [303154.352793] NMI watchdog: BUG: soft
>> > lockup - CPU#3 stuck for 45s! [mysqld:4601]
>> > ....
>> > Jan  3 10:44:00 postar kernel: [303290.052498] INFO: task mysqld:2290
>> > blocked for more than 120 seconds.
>> > Jan  3 10:41:44 postar kernel: [303154.352958] CPU: 3 PID: 4601 Comm:
>> > mysqld
>> > Not tainted 3.18.1-031801-generic #201412170637
>> > ....
>> > Jan  3 10:44:00 postar kernel: [303290.052499]       Tainted: G L
>> > 3.18.1-031801-generic #201412170637
>> > Jan  3 10:44:00 postar kernel: [303290.052499] "echo 0 >
>> > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> > Jan  3 10:44:00 postar kernel: [303290.052502] mysqld          D
>> > 0000000000000002     0  2290      1 0x00000000
>> >
>> > Can you post the output of the following command on both instance nodes
>> > OS
>> > and compute node OS and memory of Compute node and instances?
>> >
>> > [user at node] sudo sysctl -a | grep dirty
>> >
>> > vm.dirty_background_bytes = 0
>> > vm.dirty_background_ratio = 10
>> > vm.dirty_bytes = 0
>> > vm.dirty_expire_centisecs = 3000
>> > vm.dirty_ratio = 20
>> > vm.dirty_writeback_centisecs = 500
>> >
>> >
>> > Erhan,
>>
>> Hello Erhan,
>>
>> instance node output:
>>
>> error: permission denied on key 'net.ipv4.route.flush'
>> error: permission denied on key 'net.ipv6.route.flush'
>> error: permission denied on key 'vm.compact_memory'
>> vm.dirty_background_bytes = 0
>> vm.dirty_background_ratio = 10
>> vm.dirty_bytes = 0
>> vm.dirty_expire_centisecs = 3000
>> vm.dirty_ratio = 20
>> vm.dirty_writeback_centisecs = 500
>>
>> Mem:   8175996k total,  4105708k used,  4070288k free,   218552k buffers
>>
>>
>> compute node output:
>> vm.dirty_background_bytes = 0
>> vm.dirty_background_ratio = 10
>> vm.dirty_bytes = 0
>> vm.dirty_expire_centisecs = 3000
>> vm.dirty_ratio = 20
>> vm.dirty_writeback_centisecs = 500
>>
>> KiB Mem:  16432492 total, 16254932 used,   177560 free,     9316 buffers
>>
>> Lock-ups happen with different processes, above-mentioned mysqld is
>> just an example.
>>
>> Thanks,
>> Matej
>>
>>
>> >
>> > On Sun, Jan 4, 2015 at 2:36 PM, Matej Mailing <mailing at tam.si> wrote:
>> >>
>> >> 2015-01-03 17:56 GMT+01:00 John Griffith <john.griffith at solidfire.com>:
>> >> > On Sat, Jan 3, 2015 at 5:05 AM, Matej Mailing <mailing at tam.si> wrote:
>> >> >> Hi,
>> >> >>
>> >> >> we are experiencing randomly-timed soft lockup messages in different
>> >> >> instances (running CentOS, Ubuntu, etc.) with different processes
>> >> >> and
>> >> >> on different compute nodes. I suspect that access to the cinder
>> >> >> storage via NFS could be the cause of issue, but then perhaps all
>> >> >> the
>> >> >> instances would trigger the error message on the same time?
>> >> >> Currently
>> >> >> they don't and no higher access loads are repoted on the NFS server
>> >> >> itself, neither the load is higher at those times ...
>> >> >> What are the best methods to debug this error? Any suggestions and
>> >> >> expeiences on fixing it will be very welcome :-)
>> >> >>
>> >> >> Thanks,
>> >> >> Matej
>> >> >>
>> >> >> _______________________________________________
>> >> >> Mailing list:
>> >> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>> >> >> Post to     : openstack at lists.openstack.org
>> >> >> Unsubscribe :
>> >> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>> >> >
>> >> > Hi Matej,
>> >> >
>> >> > I'm assuming the kernel logs indicate NFS as the culprit here? What
>> >> > sort of IO load are you seeing on the system when these are
>> >> > encountered?
>> >> >
>> >> > A quick search on "soft lockup nfs" has a pretty lengthy list of
>> >> > results including some setting recommendations etc.  Maybe start
>> >> > there?
>> >> >
>> >> > Thanks,
>> >> > John
>> >>
>> >> Hello John,
>> >>
>> >> thank you very much for your answer. I can't find anything at all in
>> >> the compute node's logs and my NFS-idea is just a guess
>> >> actually.Example entry from Ubuntu instances is like:
>> >>
>> >> Jan  3 10:41:44 postar kernel: [303154.352793] NMI watchdog: BUG: soft
>> >> lockup - CPU#3 stuck for 45s! [mysqld:4601]
>> >> Jan  3 10:41:44 postar kernel: [303154.352890] Modules linked in:
>> >> xt_hl ip6t_rt nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT
>> >> nf_reject_ipv4 xt_limit xt_tcpudp xt_addrtype nf_conntrack_ipv4
>> >> nf_defrag_ipv4 xt_state ip6table_filter ip6_tables nfsd nfs_acl
>> >> auth_rpcgss nf_conntrack_netbios_ns
>> >> nf_conntrack_broadcast nfs nf_nat_ftp nf_nat ppdev nf_conntrack_ftp
>> >> nf_conntrack fscache parport_pc parport lockd joydev pvpanic
>> >> iptable_filter cirrus ip_tables 8250_fintek ttm psmouse drm_kms_helper
>> >> sunrpc x_tables serio_raw hid_generic drm grace mac_hid sysimgblt
>> >> sysfillrect syscopyarea
>> >> i2c_piix4 usbhid hid floppy
>> >> Jan  3 10:41:44 postar kernel: [303154.352958] CPU: 3 PID: 4601 Comm:
>> >> mysqld Not tainted 3.18.1-031801-generic #201412170637
>> >> Jan  3 10:41:44 postar kernel: [303154.352961] Hardware name:
>> >> OpenStack Foundation OpenStack Nova, BIOS Bochs 01/01/2011
>> >> Jan  3 10:41:44 postar kernel: [303154.352964] task: ffff880231b63c00
>> >> ti: ffff8801e6014000 task.ti: ffff8801e6014000
>> >> Jan  3 10:41:44 postar kernel: [303154.352966] RIP:
>> >> 0010:[<ffffffff813a97dc>]  [<ffffffff813a97dc>]
>> >> copy_user_generic_string+0x2c/0x40
>> >> Jan  3 10:41:44 postar kernel: [303154.352979] RSP:
>> >> 0018:ffff8801e6017c90  EFLAGS: 00010246
>> >> Jan  3 10:41:44 postar kernel: [303154.352981] RAX: 00000000a2418000
>> >> RBX: ffff8801e6017ea0 RCX: 0000000000000200
>> >> Jan  3 10:41:44 postar kernel: [303154.352983] RDX: 0000000000000000
>> >> RSI: ffff8800a2418000 RDI: 00007fd365b90000
>> >> Jan  3 10:41:44 postar kernel: [303154.352985] RBP: ffff8801e6017d08
>> >> R08: ffffea000289061c R09: ffff8801e6017ca0
>> >> Jan  3 10:41:44 postar kernel: [303154.352987] R10: ffff8800a5c1b258
>> >> R11: 0000000000000293 R12: 0000000000000000
>> >> Jan  3 10:41:44 postar kernel: [303154.352989] R13: ffff8801e6017ec8
>> >> R14: 0000000000001000 R15: 00007fd365b90000
>> >> Jan  3 10:41:44 postar kernel: [303154.352995] FS:
>> >> 00007fd36c28e700(0000) GS:ffff88023fd80000(0000)
>> >> knlGS:0000000000000000
>> >> Jan  3 10:41:44 postar kernel: [303154.352998] CS:  0010 DS: 0000 ES:
>> >> 0000 CR0: 0000000080050033
>> >> Jan  3 10:41:44 postar kernel: [303154.353000] CR2: 0000000002b9e003
>> >> CR3: 00000002317ee000 CR4: 00000000000006e0
>> >> Jan  3 10:41:44 postar kernel: [303154.353014] Stack:
>> >> Jan  3 10:41:44 postar kernel: [303154.353016]  ffffffff811a0f75
>> >> ffff8801e6017d88 ffff8800a5c1b258 ffff8801e6017cc8
>> >> Jan  3 10:41:44 postar kernel: [303154.353019]  ffffffff8117864e
>> >> ffff8800a2418000 ffffea0002890600 0000000000001000
>> >> Jan  3 10:41:44 postar kernel: [303154.353023]  ffff8800a2418000
>> >> 0000000000000000 ffffea0002890600 ffff880231e4b930
>> >> Jan  3 10:41:44 postar kernel: [303154.353026] Call Trace:
>> >> Jan  3 10:41:44 postar kernel: [303154.353034]  [<ffffffff811a0f75>] ?
>> >> copy_page_to_iter_iovec+0xe5/0x300
>> >> Jan  3 10:41:44 postar kernel: [303154.353040]  [<ffffffff8117864e>] ?
>> >> find_get_entry+0x1e/0x90
>> >> Jan  3 10:41:44 postar kernel: [303154.353044]  [<ffffffff811a14a6>]
>> >> copy_page_to_iter+0x16/0x70
>> >> Jan  3 10:41:44 postar kernel: [303154.353047]  [<ffffffff81179428>]
>> >> do_generic_file_read+0x1f8/0x490
>> >> Jan  3 10:41:44 postar kernel: [303154.353051]  [<ffffffff8117a234>]
>> >> generic_file_read_iter+0xf4/0x150
>> >> Jan  3 10:41:44 postar kernel: [303154.353056]  [<ffffffff810aade1>] ?
>> >> update_curr+0x141/0x1f0
>> >> Jan  3 10:41:44 postar kernel: [303154.353060]  [<ffffffff810efcd8>] ?
>> >> get_futex_key+0x1f8/0x2e0
>> >> Jan  3 10:41:44 postar kernel: [303154.353065]  [<ffffffff811eef28>]
>> >> new_sync_read+0x78/0xb0
>> >> Jan  3 10:41:44 postar kernel: [303154.353068]  [<ffffffff811f013b>]
>> >> vfs_read+0xab/0x180
>> >> Jan  3 10:41:44 postar kernel: [303154.353071]  [<ffffffff811f0402>]
>> >> SyS_pread64+0x92/0xa0
>> >> Jan  3 10:41:44 postar kernel: [303154.353077]  [<ffffffff817b376d>]
>> >> system_call_fastpath+0x16/0x1b
>> >> Jan  3 10:41:44 postar kernel: [303154.353079] Code: 66 90 83 fa 08 72
>> >> 27 89 f9 83 e1 07 74 15 83 e9 08 f7 d9 29 ca 8a 06 88 07 48 ff c6 48
>> >> ff c7 ff c9 75 f2 89 d1 c1 e9 03 83 e2 07 <f3> 48 a5 89 d1 f3 a4 31 c0
>> >> 66 66 90 c3 0f 1f 80 00 00 00 00 66
>> >>
>> >>
>> >> and:
>> >>
>> >> Jan  3 10:44:00 postar kernel: [303164.436029] INFO: rcu_sched
>> >> detected stalls on CPUs/tasks: { 3} (detected by 0, t=33923 jiffies,
>> >> g=1793081, c=1793080, q=0)
>> >> Jan  3 10:44:00 postar kernel: [303164.436029] Task dump for CPU 3:
>> >> Jan  3 10:44:00 postar kernel: [303164.436029] mysqld          R
>> >> running task        0  4601      1 0x00000008
>> >> Jan  3 10:44:00 postar kernel: [303164.436029]  0000000000000001
>> >> 00007fd36286c000 0000000000004000 0000000000004000
>> >> Jan  3 10:44:00 postar kernel: [303164.436029]  00007fd36286c000
>> >> ffff8801e6017f28 ffffffff811f013b 00007fd36c28ba80
>> >> Jan  3 10:44:00 postar kernel: [303164.436029]  ffff8802316eaf01
>> >> ffff8802316eaf00 00007fd36286c000 0000000000004000
>> >> Jan  3 10:44:00 postar kernel: [303164.436029] Call Trace:
>> >> Jan  3 10:44:00 postar kernel: [303164.436029]  [<ffffffff811f013b>] ?
>> >> vfs_read+0xab/0x180
>> >> Jan  3 10:44:00 postar kernel: [303164.436029]  [<ffffffff811f0402>] ?
>> >> SyS_pread64+0x92/0xa0
>> >> Jan  3 10:44:00 postar kernel: [303164.436029]  [<ffffffff817b376d>] ?
>> >> system_call_fastpath+0x16/0x1b
>> >> Jan  3 10:44:00 postar kernel: [303290.052393] INFO: task
>> >> jbd2/vda1-8:256 blocked for more than 120 seconds.
>> >> Jan  3 10:44:00 postar kernel: [303290.052395]       Tainted: G
>> >>      L 3.18.1-031801-generic #201412170637
>> >> Jan  3 10:44:00 postar kernel: [303290.052396] "echo 0 >
>> >> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> >> Jan  3 10:44:00 postar kernel: [303290.052399] jbd2/vda1-8     D
>> >> 0000000000000002     0   256      2 0x00000000
>> >> Jan  3 10:44:00 postar kernel: [303290.052402]  ffff88023170b6a8
>> >> 0000000000000046 ffff88023170b648 0000000000f2f754
>> >> Jan  3 10:44:00 postar kernel: [303290.052404]  ffff88023170bfd8
>> >> 0000000000013640 ffff8800baef5f80 0000000000013640
>> >> Jan  3 10:44:00 postar kernel: [303290.052406]  ffff8802342ab200
>> >> ffff880231b64600 ffff88023170b678 ffff88023fd13f40
>> >> Jan  3 10:44:00 postar kernel: [303290.052407] Call Trace:
>> >> Jan  3 10:44:00 postar kernel: [303290.052413]  [<ffffffff817af549>]
>> >> schedule+0x29/0x70
>> >> Jan  3 10:44:00 postar kernel: [303290.052415]  [<ffffffff817af61f>]
>> >> io_schedule+0x8f/0xd0
>> >> Jan  3 10:44:00 postar kernel: [303290.052419]  [<ffffffff81379f30>]
>> >> bt_get+0xf0/0x180
>> >> Jan  3 10:44:00 postar kernel: [303290.052423]  [<ffffffff810b4bd0>] ?
>> >> prepare_to_wait_event+0x100/0x100
>> >> Jan  3 10:44:00 postar kernel: [303290.052425]  [<ffffffff8137a2f2>]
>> >> blk_mq_get_tag+0x82/0xd0
>> >> Jan  3 10:44:00 postar kernel: [303290.052428]  [<ffffffff813762da>]
>> >> __blk_mq_alloc_request+0x2a/0xb0
>> >> Jan  3 10:44:00 postar kernel: [303290.052430]  [<ffffffff81377d72>]
>> >> blk_mq_map_request+0x182/0x1f0
>> >> Jan  3 10:44:00 postar kernel: [303290.052432]  [<ffffffff81377a36>] ?
>> >> __blk_mq_run_hw_queue+0xf6/0x260
>> >> Jan  3 10:44:00 postar kernel: [303290.052434]  [<ffffffff81378446>]
>> >> blk_sq_make_request+0x96/0x310
>> >> Jan  3 10:44:00 postar kernel: [303290.052438]  [<ffffffff81369c65>]
>> >> generic_make_request.part.75+0x75/0xb0
>> >> Jan  3 10:44:00 postar kernel: [303290.052440]  [<ffffffff8136a288>]
>> >> generic_make_request+0x68/0x70
>> >> Jan  3 10:44:00 postar kernel: [303290.052442]  [<ffffffff8136a316>]
>> >> submit_bio+0x86/0x160
>> >> Jan  3 10:44:00 postar kernel: [303290.052446]  [<ffffffff81177789>] ?
>> >> unlock_page+0x69/0x70
>> >> Jan  3 10:44:00 postar kernel: [303290.052449]  [<ffffffff81278b39>]
>> >> ext4_io_submit+0x29/0x50
>> >> Jan  3 10:44:00 postar kernel: [303290.052451]  [<ffffffff8127427b>]
>> >> ext4_writepage+0xcb/0x240
>> >> Jan  3 10:44:00 postar kernel: [303290.052454]  [<ffffffff811b16c0>] ?
>> >> SyS_msync+0x230/0x230
>> >> Jan  3 10:44:00 postar kernel: [303290.052457]  [<ffffffff81182b87>]
>> >> __writepage+0x17/0x50
>> >> Jan  3 10:44:00 postar kernel: [303290.052460]  [<ffffffff8118323a>]
>> >> write_cache_pages+0x20a/0x470
>> >> Jan  3 10:44:00 postar kernel: [303290.052462]  [<ffffffff81182b70>] ?
>> >> set_page_dirty_lock+0x60/0x60
>> >> Jan  3 10:44:00 postar kernel: [303290.052464]  [<ffffffff811834ea>]
>> >> generic_writepages+0x4a/0x70
>> >> Jan  3 10:44:00 postar kernel: [303290.052468]  [<ffffffff812bb73f>]
>> >> journal_submit_data_buffers+0xcf/0x180
>> >> Jan  3 10:44:00 postar kernel: [303290.052470]  [<ffffffff812bc1c4>]
>> >> jbd2_journal_commit_transaction+0x394/0x1440
>> >> Jan  3 10:44:00 postar kernel: [303290.052474]  [<ffffffff812c09b3>]
>> >> kjournald2+0xb3/0x230
>> >> Jan  3 10:44:00 postar kernel: [303290.052476]  [<ffffffff810b4bd0>] ?
>> >> prepare_to_wait_event+0x100/0x100
>> >> Jan  3 10:44:00 postar kernel: [303290.052478]  [<ffffffff812c0900>] ?
>> >> commit_timeout+0x10/0x10
>> >> Jan  3 10:44:00 postar kernel: [303290.052481]  [<ffffffff81093a09>]
>> >> kthread+0xc9/0xe0
>> >> Jan  3 10:44:00 postar kernel: [303290.052483]  [<ffffffff81093940>] ?
>> >> flush_kthread_worker+0x90/0x90
>> >> Jan  3 10:44:00 postar kernel: [303290.052485]  [<ffffffff817b36bc>]
>> >> ret_from_fork+0x7c/0xb0
>> >> Jan  3 10:44:00 postar kernel: [303290.052487]  [<ffffffff81093940>] ?
>> >> flush_kthread_worker+0x90/0x90
>> >> Jan  3 10:44:00 postar kernel: [303290.052498] INFO: task mysqld:2290
>> >> blocked for more than 120 seconds.
>> >> Jan  3 10:44:00 postar kernel: [303290.052499]       Tainted: G
>> >>      L 3.18.1-031801-generic #201412170637
>> >> Jan  3 10:44:00 postar kernel: [303290.052499] "echo 0 >
>> >> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> >> Jan  3 10:44:00 postar kernel: [303290.052502] mysqld          D
>> >> 0000000000000002     0  2290      1 0x00000000
>> >> Jan  3 10:44:00 postar kernel: [303290.052504]  ffff88023122b9c8
>> >> 0000000000000082 ffff880200000002 0000000000f2f748
>> >> Jan  3 10:44:00 postar kernel: [303290.052506]  ffff88023122bfd8
>> >> 0000000000013640 ffff880231b98380 0000000000013640
>> >> Jan  3 10:44:00 postar kernel: [303290.052508]  ffff880037ab3200
>> >> ffff880233142800 ffff88023122b998 ffff88023fd13f40
>> >> Jan  3 10:44:00 postar kernel: [303290.052509] Call Trace:
>> >> Jan  3 10:44:00 postar kernel: [303290.052511]  [<ffffffff817af549>]
>> >> schedule+0x29/0x70
>> >> Jan  3 10:44:00 postar kernel: [303290.052531]  [<ffffffff817af61f>]
>> >> io_schedule+0x8f/0xd0
>> >> Jan  3 10:44:00 postar kernel: [303290.052534]  [<ffffffff81379f30>]
>> >> bt_get+0xf0/0x180
>> >> Jan  3 10:44:00 postar kernel: [303290.052536]  [<ffffffff810b4bd0>] ?
>> >> prepare_to_wait_event+0x100/0x100
>> >> Jan  3 10:44:00 postar kernel: [303290.052538]  [<ffffffff8137a2f2>]
>> >> blk_mq_get_tag+0x82/0xd0
>> >> Jan  3 10:44:00 postar kernel: [303290.052540]  [<ffffffff813762da>]
>> >> __blk_mq_alloc_request+0x2a/0xb0
>> >> Jan  3 10:44:00 postar kernel: [303290.052542]  [<ffffffff81377d72>]
>> >> blk_mq_map_request+0x182/0x1f0
>> >> Jan  3 10:44:00 postar kernel: [303290.052545]  [<ffffffff81378446>]
>> >> blk_sq_make_request+0x96/0x310
>> >> Jan  3 10:44:00 postar kernel: [303290.052547]  [<ffffffff81369c65>]
>> >> generic_make_request.part.75+0x75/0xb0
>> >> Jan  3 10:44:00 postar kernel: [303290.052549]  [<ffffffff8136a288>]
>> >> generic_make_request+0x68/0x70
>> >> Jan  3 10:44:00 postar kernel: [303290.052551]  [<ffffffff8136a316>]
>> >> submit_bio+0x86/0x160
>> >> Jan  3 10:44:00 postar kernel: [303290.052553]  [<ffffffff812b940d>] ?
>> >> jbd2__journal_start+0x4d/0x60
>> >> Jan  3 10:44:00 postar kernel: [303290.052555]  [<ffffffff81278b39>]
>> >> ext4_io_submit+0x29/0x50
>> >> Jan  3 10:44:00 postar kernel: [303290.052557]  [<ffffffff81277446>]
>> >> ext4_writepages+0x4f6/0x7b0
>> >> Jan  3 10:44:00 postar kernel: [303290.052560]  [<ffffffff810efcd8>] ?
>> >> get_futex_key+0x1f8/0x2e0
>> >> Jan  3 10:44:00 postar kernel: [303290.052563]  [<ffffffff81184730>]
>> >> do_writepages+0x20/0x40
>> >> Jan  3 10:44:00 postar kernel: [303290.052565]  [<ffffffff81179b29>]
>> >> __filemap_fdatawrite_range+0x59/0x60
>> >> Jan  3 10:44:00 postar kernel: [303290.052567]  [<ffffffff81179b67>]
>> >> filemap_write_and_wait_range+0x37/0x80
>> >> Jan  3 10:44:00 postar kernel: [303290.052571]  [<ffffffff8126c3a5>]
>> >> ext4_sync_file+0x105/0x290
>> >> Jan  3 10:44:00 postar kernel: [303290.052573]  [<ffffffff811f0036>] ?
>> >> vfs_write+0x196/0x1f0
>> >> Jan  3 10:44:00 postar kernel: [303290.052576]  [<ffffffff8121faa8>]
>> >> do_fsync+0x58/0x80
>> >> Jan  3 10:44:00 postar kernel: [303290.052578]  [<ffffffff8121fe70>]
>> >> SyS_fsync+0x10/0x20
>> >> Jan  3 10:44:00 postar kernel: [303290.052580]  [<ffffffff817b376d>]
>> >> system_call_fastpath+0x16/0x1b
>> >> Jan  3 10:44:00 postar kernel: [303290.052590] INFO: task
>> >> kworker/u8:1:22405 blocked for more than 120 seconds.
>> >> Jan  3 10:44:00 postar kernel: [303290.052592]       Tainted: G
>> >>      L 3.18.1-031801-generic #201412170637
>> >> Jan  3 10:44:00 postar kernel: [303290.052592] "echo 0 >
>> >> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> >> Jan  3 10:44:00 postar kernel: [303290.052595] kworker/u8:1    D
>> >> 0000000000000002     0 22405      2 0x00000000
>> >> Jan  3 10:44:00 postar kernel: [303290.052600] Workqueue: writeback
>> >> bdi_writeback_workfn (flush-253:0)
>> >> Jan  3 10:44:00 postar kernel: [303290.052602]  ffff8801f38374c8
>> >> 0000000000000046 ffff8801f38374b8 0000000000f2f754
>> >> Jan  3 10:44:00 postar kernel: [303290.052604]  ffff8801f3837fd8
>> >> 0000000000013640 ffff8800baef5f80 0000000000013640
>> >> Jan  3 10:44:00 postar kernel: [303290.052606]  ffff880231b64600
>> >> ffff8801f397d000 ffff8801f3837498 ffff88023fd13f40
>> >> Jan  3 10:44:00 postar kernel: [303290.052607] Call Trace:
>> >> Jan  3 10:44:00 postar kernel: [303290.052609]  [<ffffffff817af549>]
>> >> schedule+0x29/0x70
>> >> Jan  3 10:44:00 postar kernel: [303290.052611]  [<ffffffff817af61f>]
>> >> io_schedule+0x8f/0xd0
>> >> Jan  3 10:44:00 postar kernel: [303290.052614]  [<ffffffff81379f30>]
>> >> bt_get+0xf0/0x180
>> >> Jan  3 10:44:00 postar kernel: [303290.052616]  [<ffffffff810b4bd0>] ?
>> >> prepare_to_wait_event+0x100/0x100
>> >> Jan  3 10:44:00 postar kernel: [303290.052618]  [<ffffffff8137a2f2>]
>> >> blk_mq_get_tag+0x82/0xd0
>> >> Jan  3 10:44:00 postar kernel: [303290.052620]  [<ffffffff813762da>]
>> >> __blk_mq_alloc_request+0x2a/0xb0
>> >> Jan  3 10:44:00 postar kernel: [303290.052622]  [<ffffffff81377d72>]
>> >> blk_mq_map_request+0x182/0x1f0
>> >> Jan  3 10:44:00 postar kernel: [303290.052626]  [<ffffffff814692de>] ?
>> >> alloc_indirect+0x1e/0x50
>> >> Jan  3 10:44:00 postar kernel: [303290.052628]  [<ffffffff81378446>]
>> >> blk_sq_make_request+0x96/0x310
>> >> Jan  3 10:44:00 postar kernel: [303290.052630]  [<ffffffff81369c65>]
>> >> generic_make_request.part.75+0x75/0xb0
>> >> Jan  3 10:44:00 postar kernel: [303290.052633]  [<ffffffff8136a288>]
>> >> generic_make_request+0x68/0x70
>> >> Jan  3 10:44:00 postar kernel: [303290.052634]  [<ffffffff8136a316>]
>> >> submit_bio+0x86/0x160
>> >> Jan  3 10:44:00 postar kernel: [303290.052638]  [<ffffffff81364e5b>] ?
>> >> bio_alloc_bioset+0xbb/0x220
>> >> Jan  3 10:44:00 postar kernel: [303290.052641]  [<ffffffff81223ff0>]
>> >> _submit_bh+0x110/0x150
>> >> Jan  3 10:44:00 postar kernel: [303290.052643]  [<ffffffff81224040>]
>> >> submit_bh+0x10/0x20
>> >> Jan  3 10:44:00 postar kernel: [303290.052646]  [<ffffffff81225cc0>]
>> >> __block_write_full_page.constprop.35+0x1a0/0x380
>> >> Jan  3 10:44:00 postar kernel: [303290.052648]  [<ffffffff81226110>] ?
>> >> I_BDEV+0x10/0x10
>> >> Jan  3 10:44:00 postar kernel: [303290.052650]  [<ffffffff81226110>] ?
>> >> I_BDEV+0x10/0x10
>> >> Jan  3 10:44:00 postar kernel: [303290.052652]  [<ffffffff81226039>]
>> >> block_write_full_page+0xa9/0xe0
>> >> Jan  3 10:44:00 postar kernel: [303290.052654]  [<ffffffff81226b38>]
>> >> blkdev_writepage+0x18/0x20
>> >> Jan  3 10:44:00 postar kernel: [303290.052656]  [<ffffffff81182b87>]
>> >> __writepage+0x17/0x50
>> >> Jan  3 10:44:00 postar kernel: [303290.052658]  [<ffffffff8118323a>]
>> >> write_cache_pages+0x20a/0x470
>> >> Jan  3 10:44:00 postar kernel: [303290.052660]  [<ffffffff81182b70>] ?
>> >> set_page_dirty_lock+0x60/0x60
>> >> Jan  3 10:44:00 postar kernel: [303290.052663]  [<ffffffff811834ea>]
>> >> generic_writepages+0x4a/0x70
>> >> Jan  3 10:44:00 postar kernel: [303290.052665]  [<ffffffff81184730>]
>> >> do_writepages+0x20/0x40
>> >> Jan  3 10:44:00 postar kernel: [303290.052668]  [<ffffffff81217fe5>]
>> >> __writeback_single_inode+0x45/0x1c0
>> >> Jan  3 10:44:00 postar kernel: [303290.052670]  [<ffffffff812199be>]
>> >> writeback_sb_inodes+0x22e/0x340
>> >> Jan  3 10:44:00 postar kernel: [303290.052673]  [<ffffffff81219b6e>]
>> >> __writeback_inodes_wb+0x9e/0xd0
>> >> Jan  3 10:44:00 postar kernel: [303290.052675]  [<ffffffff81219e2b>]
>> >> wb_writeback+0x28b/0x330
>> >> Jan  3 10:44:00 postar kernel: [303290.052678]  [<ffffffff8120ac32>] ?
>> >> get_nr_dirty_inodes+0x52/0x80
>> >> Jan  3 10:44:00 postar kernel: [303290.052681]  [<ffffffff81219f6f>]
>> >> wb_check_old_data_flush+0x9f/0xb0
>> >> Jan  3 10:44:00 postar kernel: [303290.052683]  [<ffffffff8121a0b4>]
>> >> wb_do_writeback+0x134/0x1c0
>> >> Jan  3 10:44:00 postar kernel: [303290.052685]  [<ffffffff810908bf>] ?
>> >> set_worker_desc+0x6f/0x80
>> >> Jan  3 10:44:00 postar kernel: [303290.052688]  [<ffffffff8121c048>]
>> >> bdi_writeback_workfn+0x78/0x1f0
>> >> Jan  3 10:44:00 postar kernel: [303290.052729]  [<ffffffff8108d50e>]
>> >> process_one_work+0x14e/0x460
>> >> Jan  3 10:44:00 postar kernel: [303290.052732]  [<ffffffff8108de8b>]
>> >> worker_thread+0x11b/0x3f0
>> >> Jan  3 10:44:00 postar kernel: [303290.052735]  [<ffffffff8108dd70>] ?
>> >> create_worker+0x1e0/0x1e0
>> >> Jan  3 10:44:00 postar kernel: [303290.052737]  [<ffffffff81093a09>]
>> >> kthread+0xc9/0xe0
>> >> Jan  3 10:44:00 postar kernel: [303290.052739]  [<ffffffff81093940>] ?
>> >> flush_kthread_worker+0x90/0x90
>> >> Jan  3 10:44:00 postar kernel: [303290.052741]  [<ffffffff817b36bc>]
>> >> ret_from_fork+0x7c/0xb0
>> >> Jan  3 10:44:00 postar kernel: [303290.052743]  [<ffffffff81093940>] ?
>> >> flush_kthread_worker+0x90/0x90
>> >>
>> >> The instance's qcow2 file is on a NFS share, mounted on a compute node.
>> >>
>> >> Different processes are triggering the soft lock-up, what is
>> >> interesting is that we see soft lock-ups showing up on the same time
>> >> on all the instances running on the same compute node. I will examine
>> >> iostat and top outputs closely when it happens, it's relatively hard
>> >> to catch that moment since it happens for a relatively short period of
>> >> time.
>> >>
>> >> Thank you for any ideas in advance.
>> >> Matej
>> >>
>> >> _______________________________________________
>> >> Mailing list:
>> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>> >> Post to     : openstack at lists.openstack.org
>> >> Unsubscribe :
>> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>> >
>> >
>
>




More information about the Openstack mailing list