[Openstack] Bad perf on swift servers...

Shyam Prasad N nspmangalore at gmail.com
Fri May 30 12:48:38 UTC 2014


Hi Hugo,
Thanks for the reply. Sorry for the delay in this reply.

A couple of disks in one of the swift servers was accidentally wiped a
couple of days back. And swift was trying hard to restore back the data to
those disks. It looks like this was definitely contributing to the CPU
load.
Does swift use rsync to perform this data restoration? Also, is there a way
to configure swift or rsync to reduce the priority of such rsync? I realize
that since my replica count is 2, it makes sense for swift to try hard to
restore the data. But will it be any different if replica count was higher,
say 3 or 4?

Regarding the troubleshooting of account-server cpu usage, the cluster is
currently down for some other issues. Will report back if the issue
persists after I reboot the setup.
As for the topology, I have 4 swift symmetric servers
(proxy+object+container+account) each with 4GB of ram and 10G ethernet
cards to communicate to each other and to clients through a 10G switch on a
private network.

Regards,
Shyam



On Fri, May 30, 2014 at 7:49 AM, Kuo Hugo <tonytkdk at gmail.com> wrote:

> Hi ,
>
> 1. Correct ! Once you adding new devices and rebalance rings, portion of
> partitions will be reassigned to new devices. If those partitions were used
> by some objects, object-replicator is going to move data to new devices.
> You should see logs of object-replicator to transfer objects from one
> device to another by invoking rsync.
>
> 2. Regarding to busy swift-account-server, that's pretty abnormal tho. Is
> there any log indicating account-server doing any jobs?   A possibility is
> the ring which includes wrong port number of other workers to
> account-server. Perhaps you can paste all your rings layout to
> http://paste.openstack.org/ . To use strace on account-server process may
> help to track the exercise.
>
> 3. In kind of deployment that outward-facing interface shares same network
> resource with cluster-facing interface, it definitely causes some race on
> network utilization. Hence the frontend traffic is under impact by
> replication traffic now.
>
> 4. To have a detail network topology diagram will help.
>
> Hugo Kuo
>
>
> 2014-05-29 1:06 GMT+08:00 Shyam Prasad N <nspmangalore at gmail.com>:
>
>> Hi,
>>
>> Confused about the right mailing list to ask this question. So including
>> both openstack and openstack-dev in the CC list.
>>
>> I'm running a swift cluster with 4 nodes.
>> All 4 nodes are symmetrical. i.e. proxy, object, container, and account
>> servers running on each with similar storage configuration and conf files.
>> The I/O traffic to this cluster is mainly to upload dynamic large objects
>> (typically 1GB chunks (sub-objects) and around 5-6 chunks under each large
>> object).
>>
>> The setup is running and serving data; but I've begun to see a few perf
>> issues, as the traffic increases. I want to understand the reason behind
>> some of these issues, and make sure that there is nothing wrong with the
>> setup configuration.
>>
>> 1. High CPU utilization from rsync. I have set replica count in each of
>> account, container, and object rings to 2. From what I've read, this
>> assigns 2 devices for each partition in the storage cluster. And for each
>> PUT, the 2 replicas should be written synchronously. And for GET, the I/O
>> is through one of the object servers. So nothing here should be
>> asynchronous in nature. Then what is causing the rsync traffic here?
>>
>> I recently ran a ring rebalance command after adding a node recently.
>> Could this be causing the issue?
>>
>> 2. High CPU utilization from swift-account-server threads. All my
>> frontend traffic use 1 account and 1 container on the servers. There are
>> hundreds of such objects in the same container. I don't understand what's
>> keeping the account servers busy.
>>
>> 3. I've started noticing that the 1GB object transfers of the frontend
>> traffic are taking significantly more time than they used to (more than
>> double the time). Could this be because i'm using the same subnet for both
>> the internal and the frontend traffic.
>>
>> 4. Can someone provide me some pointers/tips to improving perf for my
>> cluster configuration? (I guess I've given out most details above. Feel
>> free to ask if you need more details)
>>
>> As always, thanks in advance for your replies. Appreciate the support. :)
>> --
>> -Shyam
>>
>> _______________________________________________
>> Mailing list:
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>> Post to     : openstack at lists.openstack.org
>> Unsubscribe :
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>>
>>
>


-- 
-Shyam
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20140530/583095cd/attachment.html>


More information about the Openstack mailing list