[Openstack] [Swift] block I/O all disks

Heiko Krämer hkraemer at anynines.com
Tue Nov 10 11:59:19 UTC 2015


Hi Mark,

thanks for your response.
I changed the dirty page settings again and i have a logging server
which gets the syslog and swift logs.

Nov 10 08:28:17 storage3 kernel: [92476.204616] INFO: task
xfsaild/sdd:1108 blocked for more than 120 seconds.
Nov 10 08:28:17 storage3 kernel: [92476.204635]       Not tainted
3.19.0-30-generic #34~14.04.1-Ubuntu
Nov 10 08:28:17 storage3 kernel: [92476.204655] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 10 08:28:17 storage3 kernel: [92476.204680] xfsaild/sdd     D
ffff880813ae3d28     0  1108      2 0x00000000
Nov 10 08:28:17 storage3 kernel: [92476.204688]  ffff880813ae3d28
ffff880811a31d70 0000000000013e80 ffff880813ae3fd8
Nov 10 08:28:17 storage3 kernel: [92476.204695]  0000000000013e80
ffff880814939d70 ffff880811a31d70 0000000000000286
Nov 10 08:28:17 storage3 kernel: [92476.204701]  ffff88040ebb4128
ffff880811a31d70 0000000000000000 ffff88040ebb4000
Nov 10 08:28:17 storage3 kernel: [92476.204707] Call Trace:
Nov 10 08:28:17 storage3 kernel: [92476.204721]  [<ffffffff817b2a99>]
schedule+0x29/0x70
Nov 10 08:28:17 storage3 kernel: [92476.204792]  [<ffffffffc0373a21>]
_xfs_log_force+0x171/0x270 [xfs]
Nov 10 08:28:17 storage3 kernel: [92476.204801]  [<ffffffff810a0a90>] ?
wake_up_state+0x20/0x20
Nov 10 08:28:17 storage3 kernel: [92476.204807]  [<ffffffff810dab60>] ?
internal_add_timer+0x80/0x80
Nov 10 08:28:17 storage3 kernel: [92476.204851]  [<ffffffffc0373b4a>]
xfs_log_force+0x2a/0x90 [xfs]
Nov 10 08:28:17 storage3 kernel: [92476.204895]  [<ffffffffc037e2d0>] ?
xfs_trans_ail_cursor_first+0x90/0x90 [xfs]
Nov 10 08:28:17 storage3 kernel: [92476.204939]  [<ffffffffc037e410>]
xfsaild+0x140/0x5a0 [xfs]
Nov 10 08:28:17 storage3 kernel: [92476.204983]  [<ffffffffc037e2d0>] ?
xfs_trans_ail_cursor_first+0x90/0x90 [xfs]
Nov 10 08:28:17 storage3 kernel: [92476.204991]  [<ffffffff81093822>]
kthread+0xd2/0xf0
Nov 10 08:28:17 storage3 kernel: [92476.204997]  [<ffffffff81093750>] ?
kthread_create_on_node+0x1c0/0x1c0
Nov 10 08:28:17 storage3 kernel: [92476.205004]  [<ffffffff817b6d98>]
ret_from_fork+0x58/0x90
Nov 10 08:28:17 storage3 kernel: [92476.205009]  [<ffffffff81093750>] ?
kthread_create_on_node+0x1c0/0x1c0


But this messages will not help to find the problem. They are stale
processes and it seems the start partition is the SSD Raid1 which caused
in a cascade overall SATA HDD's.

Thanks and cheers
Heiko




Am 09.11.2015 um 21:58 schrieb Mark Kirkwood:
> On 10/11/15 00:41, Eren Türkay wrote:
>> On 09-11-2015 12:39, Heiko Krämer wrote:
>>> You're right only a hard reboot can solve the problem because SSH
>>> login or other commands can't be executed because the whole system was
>>> frozen.
>>
>> Hello Heiko,
>>
>> I just want to give some tips about debugging those kind of issues. I
>> had
>> completely different problem which hanged the machine and I needed to
>> debug it.
>> Since I couldn't access the logs, I setup kernel debugging to remote
>> server. It
>> is called netconsole. You may want to setup netconsole listener on
>> one of the
>> working servers outside of your swift, and setup netconsole on the
>> servers that
>> hang. Here is the information about how to setup:
>>
>> https://www.kernel.org/doc/Documentation/networking/netconsole.txt
>>
>> At least you may be able to see the kernel log just before the
>> machines hang. I
>> hope you find it useful.
>>
>>
>
> Also might be worth looking at using the remote logging capability of
> rsyslog to collect your swift logs on another server, so you can see
> what is/was happening on the swift side of things immediately before
> any hang.
>
> regards
>
> Mark
>
> _______________________________________________
> Mailing list:
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to     : openstack at lists.openstack.org
> Unsubscribe :
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack


-- 
anynines.com

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 473 bytes
Desc: OpenPGP digital signature
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20151110/e014f065/attachment.sig>


More information about the Openstack mailing list