[Openstack] [Swift] block I/O all disks
Heiko Krämer
hkraemer at anynines.com
Mon Nov 9 10:39:09 UTC 2015
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi Mark,
nothing can be logged because all disks are stale.
I see only an output on the IPMI
blocked for more than 120 seconds
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
I found there could be a problem with the write caching of the disks.
So i have reduced the available dirty cache:
vm.dirty_background_ratio = 5
vm.dirty_ratio = 10
I hope this will solve the issue.
You're right only a hard reboot can solve the problem because SSH
login or other commands can't be executed because the whole system was
frozen.
That's the problem i can't get any deeper informations was happened.
Thanks and cheers
Heiko
On 07.11.2015 00:19, Mark Kirkwood wrote:
> Do you reboot the machine? It might be interesting just restart
> the swift storage services and see if that brings everything right
> again.
>
> Also check out the swift logs for what it is doing when things
> start to hang, and also what dmesg is saying at the time.
>
> A bit more info about your setup would be good - I'm guessing you
> have 12 swift object devices (on sata) and ? account and container
> ones (on ssd)?
>
> Approx how many containers and objects live on each server (or if
> easier tell us how many servers you have + replication level and
> how many accounts, containers and objects in total)?
>
> Regards
>
> Mark
>
> On 04/11/15 22:41, Heiko Krämer wrote:
>> Hi guys,
>>
>> we notice on our Swift storage nodes some problems with our
>> disks. After some time they blocks all I/O requests to the disks.
>> Therefore the server isn't working suddenly and needs a reboot.
>>
>> Serversetup: * Kernel 3.19.x * Ubuntu 14.04 * Swift (Kilo) * 12 x
>> 2TB SATA (JBOD) * 2 x 480GB SSD (Raid1) * 32GB RAM * 8 Cores CPU
>>
>> The first try was an upgrade of the raid controller firmware and
>> drivers. The second one some tests of writes and reads to each
>> disk. I can't reproduce this issue but i heard on the last
>> summit, Swift can be the problem by this issue.
>>
>> Do anyone solve this problem ?
>>
>>
>> - Heiko
>>
>>
>>
>> _______________________________________________ Mailing list:
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>> Post to : openstack at lists.openstack.org Unsubscribe :
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>>
>
>
> _______________________________________________ Mailing list:
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post
> to : openstack at lists.openstack.org Unsubscribe :
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
- --
Anynines.com
B.Sc. Informatik
CIO
Heiko Krämer
Twitter: @anynines
- - ----
Geschäftsführer: Alexander Faißt, Dipl.-Inf.(FH) Julian Fischer
Handelsregister: AG Saarbrücken HRB 17413, Ust-IdNr.: DE262633168
Sitz: Saarbrücken
Avarteq GmbH
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAEBAgAGBQJWQHfNAAoJELxFogM4ixOFZecIAL6s+lYMCLV+Dcs8AlRJ2WrD
PT9B3JDiF12vkJYMYs8WiKRQxQ4PXvI86vdKF7HM/xpZsnk44zjVjUYGZQlrF/Uk
kuNURH47iZ7g2Kmib9rXyGJgzlWNwIV8pi7cQe9UxLk39kDZJBeO18CavX56L6oT
zr4ZL/rcgKBWr9TG2oQvsunUJsPzyIXIA4yAc+C7R3VqAWipTzffyUY8Fgdzgzw2
gCvkuDzFBHMrrvzyh0Gz1q9+x3QA6pmcZkNm8qdcu8F1okGa3wpjkK+79q/hO8wC
0dxFXyPv/w6AcP2NnqtbakX2Htgp5DDjrNhpe9ZHHANJmd26G3NkrfjMsunybhk=
=zw2Y
-----END PGP SIGNATURE-----
More information about the Openstack
mailing list