[Openstack] [Swift] CPU utilization consistently at 100%

Shrinand Javadekar shrinand at maginatics.com
Fri Apr 3 18:47:16 UTC 2015


Thanks Clay.

On Fri, Apr 3, 2015 at 3:03 AM, Clay Gerrard <clay.gerrard at gmail.com> wrote:
> On a single node where network transfers are cheaper, and a small object
> size request rate oriented workload - a good load generator should be able
> to reach cpu limits with enough concurrency.  If you're targeting a disk
> saturating throughput oriented workload - larger objects sizes (1-10MB) is
> the way to go.

Yes, I am aware of this. But the object sizes may not be in my
control. Therefore, I will have to stick to 256K objects.

>
> Is the load generator also running on the same box?  You should try to
> validate your observations with a well know swift benchmarking tool like
> ssbench.  What's your total requests per second?

Nope, the load generator is running on a separate machine connected to
the Swift instance by a 1G link.

I want to get as much throughput from Swift as possible. During these
experiments, I have 256 PUTs happening in parallel and a total of
102400 PUTs. I have seen ~300 Obj/s. But, I'm getting this at the cost
of 100% CPU utilization.

I am reasonably confident that the benchmarking tool is not at fault
here. We have tested several different object stores with the same
tool and the results there have been consistent with the expectations.

>
> My profiling in the past has revealed that the md5 checksumming in the
> object server(s) is the largest (but by far not the only) consumer of cpu -
> all of the other things you mentioned take cpu cycles - tanstaafl.  On a
> single node the problem is exasperated per replica - what's your goals?

I see. I'm using 2 replicas; they're being written to two different disks.

>
> Are you sure you're saturating all the cores evenly - what's it look like
> with like `htop` - have you tried tuning your worker counts or any other
> other config settings?

I have set workers to auto. Reducing the workers, esp. the proxy
server worker has resulted in lower throughput. Also, I have set
threads-per-disk in the object server to 4. I experimented with 8, but
didn't see too much difference. Analysis done using sysdig suggests
that CPU is the bottleneck; not disk.

I'll take a deeper look at this with htop and see what's happening.

-Shri

P.S. "tanstaafl": Knew the phrase; but learnt the acronym just now...
Learn something new everyday!! :-).

>
> -Clay
>
> On Thu, Apr 2, 2015 at 10:12 PM, Shrinand Javadekar
> <shrinand at maginatics.com> wrote:
>>
>> Top shows the CPUs pegged at ~100%. Writes are done by a tool built
>> in-house which is similar in functionality to other object store
>> benchmarking tools. As I mentioned, there are 256 parallel object
>> writes (PUTS), each of 256K bytes.
>>
>> On Thu, Apr 2, 2015 at 9:21 PM, Yogesh Girikumar <yogeshg1987 at gmail.com>
>> wrote:
>> > Also how are you doing the object writes to benchmark it? Are you using
>> > dd?
>> >
>> > On 3 April 2015 at 09:50, Yogesh Girikumar <yogeshg1987 at gmail.com>
>> > wrote:
>> >>
>> >> What does top say?
>> >>
>> >> On 3 April 2015 at 02:34, Shrinand Javadekar <shrinand at maginatics.com>
>> >> wrote:
>> >>>
>> >>> Hi,
>> >>>
>> >>> I have a single node Swift instance. It has 16 cpus, 8 disks and 64GB
>> >>> memory. As part of testing, I am doing 256 object writes in parallel
>> >>> for ~10 mins. Each object is also 256K bytes in size.
>> >>>
>> >>> While my experiment is running, I see that the CPU utilization of the
>> >>> box is always ~100%. I am trying to understand what is causing this
>> >>> high CPU utilization. Some of this could be attributed to:
>> >>>
>> >>> 1. MD5 checksum calculation done to verify every PUT.
>> >>> 2. MD5 checksum calculation by the auditor (if it runs during this
>> >>> interval).
>> >>> 3. Hash calculation of the path to decide which partition the object
>> >>> goes
>> >>> to.
>> >>>
>> >>> Are there any other CPU intensive operations happening on the system
>> >>> that I should be aware of?
>> >>>
>> >>> I see that the proxy-server has a "PUT" queue. Is there some
>> >>> processing of the data in this queue? Would simply putting data in and
>> >>> out of the queue, streaming the data between the proxy and object
>> >>> server use considerable CPU?
>> >>>
>> >>> Thanks in advance.
>> >>> -Shri
>> >>>
>> >>> _______________________________________________
>> >>> Mailing list:
>> >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>> >>> Post to     : openstack at lists.openstack.org
>> >>> Unsubscribe :
>> >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>> >>
>> >>
>> >
>>
>> _______________________________________________
>> Mailing list:
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>> Post to     : openstack at lists.openstack.org
>> Unsubscribe :
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>
>




More information about the Openstack mailing list