[Openstack] [Swift] [Storage node] Lots of timeouts in load test after several hours around 1, 000, 0000 operations

Kuo Hugo tonytkdk at gmail.com
Tue Jul 10 08:09:18 UTC 2012


Hello , folks

Seems most of time consumed by the following code in obj/server.py

iter(lambda: reader(self.network_chunk_size), '')

L591 - L605
https://github.com/openstack/swift/blob/master/swift/obj/server.py#L591



10323  Jul 10 14:34:42 object-server WTF: InitTime: 0.000183#012
SavingTime: 0.055627#012 OS-Write 0.000015 MetadataTime: 15.848296#012
UpdateContainerTime: 0.042656 X-Trans-ID :
tx7a2181d0e9444ef5a13f9f60f657288f

10324  Jul 10 14:34:42 object-server WTF: InitTime: 0.000248#012
SavingTime: 0.862101#012 OS-Write 0.000014 MetadataTime: 0.089192#012
UpdateContainerTime: 0.003802 X-Trans-ID :
tx37f8a2e958734083ba064f898e9fdcb2

*10325  Jul 10 14:34:42 object-server WTF: InitTime: 0.000379#012
SavingTime: 14.094034#012 OS-Write 0.000013 MetadataTime: 0.033566#012
UpdateContainerTime: 0.004655 X-Trans-ID :
tx9ef952731e5e463daa05a0c973907f32*

10326  Jul 10 14:34:42 object-server WTF: InitTime: 0.000310#012
SavingTime: 0.801216#012 OS-Write 0.000017 MetadataTime: 0.122491#012
UpdateContainerTime: 0.008453 X-Trans-ID :
tx6a5a0c634bf9439282ea4736e7ba7422

10327  Jul 10 14:34:42 object-server WTF: InitTime: 0.000176#012
SavingTime: 0.006937#012 OS-Write 0.000011 MetadataTime: 15.642381#012
UpdateContainerTime: 0.297634 X-Trans-ID :
tx1b0f4e03daef48d68cbfdc6c6e915a0b
10328  Jul 10 14:34:42 object-server WTF: InitTime: 0.000268#012
SavingTime: 0.012993#012 OS-Write 0.000016 MetadataTime: 0.001211#012
UpdateContainerTime: 0.001846 X-Trans-ID :

 As the above result , there’s a request low down the average speed.

What will cause iter(lambda: reader(self.network_chunk_size), '') consumes
lots of time?

Too many files in XFS or anything else ? Could it possible be a bug ?


Thanks


2012/7/4 Kuo Hugo <tonytkdk at gmail.com>

> I found that updater and replicator could improve this issue.
>
> In my original practice , for getting best performance , I only start main
> workers ( account-server , container-server , object-server) , And keep
> upload / download / delete objects over 1000000 times.
>
> Issues:
>
> 1. XFS or Swift consumes lots of memory for some reason , does anyone know
> what's been cached(or buffered , cached usage is not too much though) in
> memory in this practice ? After running container/object replicator , those
> memory all released. I'm curious the contents in memory . Is that all about
> object's metadata or something else?
>
> 2. Plenty of 10s timeout in proxy-server's log . Due to timeout for
> getting final status of put object from storage node.
> At beginning , object-workers complain about 3s timeout for updating
> container (async later). but there's not too much complains . As more and
> more put / get / delete  operations , more and more timeout happend.
> Seems that updater can improve this issue.
> Does this behavior related to the number of data in pickle ?
>
>
> Thanks
> Hugo
>
>
> 2012/7/2 Kuo Hugo <tonytkdk at gmail.com>
>
>> Hi all ,
>>
>> I did several loading tests for swift in recent days.
>>
>> I'm facing an issue ....... Hope you can share you consideration to me
>> ...
>>
>> My environment:
>> Swift-proxy with Tempauth in one server : 4 cores/32G rams
>>
>> Swift-object + Swift-account + Swift-container in storage node * 3 , each
>> for : 8 cores/32G rams   2TB SATA HDD * 7
>>
>> =====================================================================================
>> bench.conf :
>>
>> [bench]
>> auth = http://172.168.1.1:8082/auth/v1.0
>> user = admin:admin
>> key = admin
>> concurrency = 200
>> object_size = 4048
>> num_objects = 100000
>> num_gets = 100000
>> delete = yes
>> =====================================================================
>>
>> After 70 rounds .....
>>
>> PUT operations get lots of failures , but GET still works properly
>> *ERROR log:*
>> Jul  1 04:35:03 proxy-server ERROR with Object server
>> 192.168.100.103:36000/DISK6 re: Trying to get final status of PUT to
>> /v1/AUTH_admin/af5862e653054f7b803d8cf1728412d2_6/24fc2f997bcc4986a86ac5ff992c4370:
>> Timeout (10s) (txn: txd60a2a729bae46be9b667d10063a319f) (client_ip:
>> 172.168.1.2)
>> Jul  1 04:34:32 proxy-server ERROR with Object server
>> 192.168.100.103:36000/DISK2 re: Expect: 100-continue on
>> /AUTH_admin/af5862e653054f7b803d8cf1728412d2_19/35993faa53b849a89f96efd732652e31:Timeout (10s)
>>
>>
>> And kernel starts to report failed message as below
>> *kernel failed log:*
>> 76666 Jul  1 16:37:50 angryman-storage-03 kernel: [350840.020736] w83795
>> 0-002f: Failed to read from register 0x03c, err -6
>>    76667 Jul  1 16:37:50 angryman-storage-03 kernel: [350840.052654]
>> w83795 0-002f: Failed to read from register 0x015, err -6
>>    76668 Jul  1 16:37:50 angryman-storage-03 kernel: [350840.080613]
>> w83795 0-002f: Failed to read from register 0x03c, err -6
>>    76669 Jul  1 16:37:50 angryman-storage-03 kernel: [350840.112583]
>> w83795 0-002f: Failed to read from register 0x016, err -6
>>    76670 Jul  1 16:37:50 angryman-storage-03 kernel: [350840.144517]
>> w83795 0-002f: Failed to read from register 0x03c, err -6
>>    76671 Jul  1 16:37:50 angryman-storage-03 kernel: [350840.176468]
>> w83795 0-002f: Failed to read from register 0x017, err -6
>>    76672 Jul  1 16:37:50 angryman-storage-03 kernel: [350840.208455]
>> w83795 0-002f: Failed to read from register 0x03c, err -6
>>    76673 Jul  1 16:37:51 angryman-storage-03 kernel: [350840.240410]
>> w83795 0-002f: Failed to read from register 0x01b, err -6
>>    76674 Jul  1 16:37:51 angryman-storage-03 kernel: [350840.272Jul  1
>> 17:05:28 angryman-storage-03 kernel: imklog 6.2.0, log source          =
>> /proc/kmsg started.
>>
>> PUTs become slower and slower , from 1,200/s to 200/s ...
>>
>> I'm not sure if this is a bug or that's the limitation of XFS. If it's an
>> limit of XFS . How to improve it ?
>>
>> An additional question is XFS seems consume lots of memory , does anyone
>> know about the reason of this behavior?
>>
>>
>> Appreciate .......
>>
>>
>> --
>> +Hugo Kuo+
>> tonytkdk at gmail.com
>> + <tonytkdk at gmail.com>886 935004793
>>
>>
>
>
> --
> +Hugo Kuo+
> tonytkdk at gmail.com
> + <tonytkdk at gmail.com>886 935004793
>
>


-- 
+Hugo Kuo+
tonytkdk at gmail.com
+ <tonytkdk at gmail.com>886 935004793
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20120710/bfc1c983/attachment.html>


More information about the Openstack mailing list