[Openstack] [Swift] block I/O all disks

Heiko Krämer hkraemer at anynines.com
Wed Nov 18 08:26:53 UTC 2015


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Ok,

was a fail. After now 1 week we had 3 outages on one machine.
I have actually no idea anymore.

My actual kernel settings:

# disable TIME_WAIT.. wait..
net.ipv4.tcp_tw_recycle=1
net.ipv4.tcp_tw_reuse=1

# disable syn cookies
net.ipv4.tcp_syncookies = 0

# double amount of allowed conntrack
net.ipv4.netfilter.ip_conntrack_max = 524288
net.netfilter.nf_conntrack_max = 524288

net.ipv4.ip_local_port_range = 7000    65535
net.ipv4.netfilter.ip_conntrack_tcp_timeout_time_wait = 1
net.netfilter.nf_conntrack_tcp_timeout_established=600
net.netfilter.nf_conntrack_tcp_timeout_time_wait=30
net.ipv4.tcp_fin_timeout=15
net.ipv4.tcp_keepalive_intvl=30
net.ipv4.tcp_keepalive_probes=5


# Disk I/O
vm.dirty_background_ratio = 3
vm.dirty_ratio = 6


# Reduktion der TIME_WAIT connections bei vielen kurzen Connections
# tcp_fin_timeout (Default: 60) , unser SWIFt System: 15
# Im TIME_WAIT Status kosstet es weniger dem Host zuantworten als eine
Neue Connection auf zu bauen,
# andererseits werden Resourcen für Connections schneller frei um mehr
Connections ab zu arbeiten
# Hiewr muss ein passender Mittelweg gefunden werden, jedoch sollte dies
i.d.R. das doppelte der Paket TTL betragen
net.ipv4.tcp_fin_timeout = 20


# Netzwerk Buffer
net.core.rmem_max = 13421568
net.core.wmem_max = 13421568

# TCP Buffer
net.ipv4.tcp_rmem = 4096 87380 6710272
net.ipv4.tcp_wmem = 4096 87380 6710272

# Input Queue
net.core.netdev_max_backlog = 25000

# Hyper Text Caching Protocol (RFC 2756)
#net.ipv4.tcp_congestion_control=htcp
# verfügbare module unter net.ipv4.tcp_available_congestion_control abrufen
# ggf. htpc modul laden

# Empfehlung für Hosts mit jumbo frames (Default: 0, off) ; Swift 1
net.ipv4.tcp_mtu_probing=1

# Tweaks für viele Connections mittels Proxy
# z.B. bei "possible SYN flooding on port 80. Sending cookies" Meldungen,
# welche entstehen, wenn die Portrange erschöpft ist.
# (Proxy benötigt immer 2 Connections pro Request,
client<->proxy<->upstream )
# Default: 7000 65535
net.ipv4.ip_local_port_range = 1024 65535



Is there anything wrong which could occurs this kind of issue?

Thanks and Cheers
Heiko

On 09.11.2015 21:53, Mark Kirkwood wrote:
> Right,
>
> I agree that the dirty size of the cache could be the issue
(particularly with SATA drives in a JBOD array without the help of any
writeback caches etc).
>
> You might even want to wind those settings down a bit more (3 and 6
perhaps which means with 32G of ram your dirty cache size is between 1
and 2G), should stop the massive IO stall.
>
> regards
>
> Mark
>
>
> On 09/11/15 23:39, Heiko Krämer wrote:
>>
>> Hi Mark,
>>
>> nothing can be logged because all disks are stale.
>> I see only an output on the IPMI
>>
>> blocked for more than 120 seconds
>> kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
>> this message.
>>
>> I found there could be a problem with the write caching of the disks.
>> So i have reduced the available dirty cache:
>>
>> vm.dirty_background_ratio = 5
>> vm.dirty_ratio = 10
>>
>>
>> I hope this will solve the issue.
>>
>> You're right only a hard reboot can solve the problem because SSH
>> login or other commands can't be executed because the whole system was
>> frozen.
>> That's the problem i can't get any deeper informations was happened.
>>
>>
>
>
> _______________________________________________
> Mailing list:
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to     : openstack at lists.openstack.org
> Unsubscribe :
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack


- -- 
anynines.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)

iQEcBAEBAgAGBQJWTDZNAAoJELxFogM4ixOFl7wIANX/X/IxCeuIZ1P9RzyRwFji
VgPOUE28obdjQPVkUs1rv0YM9CTWwUv/53pVmOiNNQ5K7LFT00HQdTRIpY2eMsyZ
NXc14jiGjj3ulVQtfxyucY4m50Tvs1Ljazx1/SBX+cOYVsfMtEmKp8koBxzVIPq6
c7xKg2kPaT5sX3fWAhxWW7ZVjjIRoAbO7hwLBkQcHSd4n0H4UMKj9SWRyPYwdxwP
8qb8dIRVeilt+qgpWbuZeXVW0p5MYfj1cSigCWCRfidVNrqLhYJANMlDTWX99zhO
IQ+F6O9/2dOxAWyT2FSjKzKSeK7x++nr0qL1JVHmyuLOdxiJuSfVAcyV19cdgGs=
=BlB5
-----END PGP SIGNATURE-----





More information about the Openstack mailing list