[Openstack] [Swift] block I/O all disks

Heiko Krämer hkraemer at anynines.com
Mon Nov 9 10:39:09 UTC 2015


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Mark,

nothing can be logged because all disks are stale.
I see only an output on the IPMI

blocked for more than 120 seconds
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.

I found there could be a problem with the write caching of the disks.
So i have reduced the available dirty cache:

vm.dirty_background_ratio = 5
vm.dirty_ratio = 10


I hope this will solve the issue.

You're right only a hard reboot can solve the problem because SSH
login or other commands can't be executed because the whole system was
frozen.
That's the problem i can't get any deeper informations was happened.



Thanks and cheers
Heiko



On 07.11.2015 00:19, Mark Kirkwood wrote:
> Do you reboot the machine? It might be interesting just restart
> the swift storage services and see if that brings everything right
> again.
> 
> Also check out the swift logs for what it is doing when things
> start to hang, and also what dmesg is saying at the time.
> 
> A bit more info about your setup would be good - I'm guessing you
> have 12 swift object devices (on sata) and ? account and container
> ones (on ssd)?
> 
> Approx how many containers and objects live on each server (or if
> easier tell us how many servers you have + replication level and
> how many accounts, containers and objects in total)?
> 
> Regards
> 
> Mark
> 
> On 04/11/15 22:41, Heiko Krämer wrote:
>> Hi guys,
>> 
>> we notice on our Swift storage nodes some problems with our
>> disks. After some time they blocks all I/O requests to the disks.
>> Therefore the server isn't working suddenly and needs a reboot.
>> 
>> Serversetup: * Kernel 3.19.x * Ubuntu 14.04 * Swift (Kilo) * 12 x
>> 2TB SATA (JBOD) * 2 x 480GB SSD (Raid1) * 32GB RAM * 8 Cores CPU
>> 
>> The first try was an upgrade of the raid controller firmware and 
>> drivers. The second one some tests of writes and reads to each
>> disk. I can't reproduce this issue but i heard on the last
>> summit, Swift can be the problem by this issue.
>> 
>> Do anyone solve this problem ?
>> 
>> 
>> - Heiko
>> 
>> 
>> 
>> _______________________________________________ Mailing list: 
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack 
>> Post to     : openstack at lists.openstack.org Unsubscribe : 
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>> 
> 
> 
> _______________________________________________ Mailing list:
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post
> to     : openstack at lists.openstack.org Unsubscribe :
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

- -- 
Anynines.com

B.Sc. Informatik
CIO
Heiko Krämer


Twitter: @anynines

- - ----
Geschäftsführer: Alexander Faißt, Dipl.-Inf.(FH) Julian Fischer
Handelsregister: AG Saarbrücken HRB 17413, Ust-IdNr.: DE262633168
Sitz: Saarbrücken
Avarteq GmbH
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)

iQEcBAEBAgAGBQJWQHfNAAoJELxFogM4ixOFZecIAL6s+lYMCLV+Dcs8AlRJ2WrD
PT9B3JDiF12vkJYMYs8WiKRQxQ4PXvI86vdKF7HM/xpZsnk44zjVjUYGZQlrF/Uk
kuNURH47iZ7g2Kmib9rXyGJgzlWNwIV8pi7cQe9UxLk39kDZJBeO18CavX56L6oT
zr4ZL/rcgKBWr9TG2oQvsunUJsPzyIXIA4yAc+C7R3VqAWipTzffyUY8Fgdzgzw2
gCvkuDzFBHMrrvzyh0Gz1q9+x3QA6pmcZkNm8qdcu8F1okGa3wpjkK+79q/hO8wC
0dxFXyPv/w6AcP2NnqtbakX2Htgp5DDjrNhpe9ZHHANJmd26G3NkrfjMsunybhk=
=zw2Y
-----END PGP SIGNATURE-----




More information about the Openstack mailing list