[Openstack] [Swift] CPU utilization consistently at 100%

Yogesh Girikumar yogeshg1987 at gmail.com
Fri Apr 3 19:07:27 UTC 2015


Hi,

I was going to suggest htop. That would tell you which one is taking so
much CPU. Also, can you post the result of 'uptime'?

--
Y

On 4 April 2015 at 00:17, Shrinand Javadekar <shrinand at maginatics.com>
wrote:

> Thanks Clay.
>
> On Fri, Apr 3, 2015 at 3:03 AM, Clay Gerrard <clay.gerrard at gmail.com>
> wrote:
> > On a single node where network transfers are cheaper, and a small object
> > size request rate oriented workload - a good load generator should be
> able
> > to reach cpu limits with enough concurrency.  If you're targeting a disk
> > saturating throughput oriented workload - larger objects sizes (1-10MB)
> is
> > the way to go.
>
> Yes, I am aware of this. But the object sizes may not be in my
> control. Therefore, I will have to stick to 256K objects.
>
> >
> > Is the load generator also running on the same box?  You should try to
> > validate your observations with a well know swift benchmarking tool like
> > ssbench.  What's your total requests per second?
>
> Nope, the load generator is running on a separate machine connected to
> the Swift instance by a 1G link.
>
> I want to get as much throughput from Swift as possible. During these
> experiments, I have 256 PUTs happening in parallel and a total of
> 102400 PUTs. I have seen ~300 Obj/s. But, I'm getting this at the cost
> of 100% CPU utilization.
>
> I am reasonably confident that the benchmarking tool is not at fault
> here. We have tested several different object stores with the same
> tool and the results there have been consistent with the expectations.
>
> >
> > My profiling in the past has revealed that the md5 checksumming in the
> > object server(s) is the largest (but by far not the only) consumer of
> cpu -
> > all of the other things you mentioned take cpu cycles - tanstaafl.  On a
> > single node the problem is exasperated per replica - what's your goals?
>
> I see. I'm using 2 replicas; they're being written to two different disks.
>
> >
> > Are you sure you're saturating all the cores evenly - what's it look like
> > with like `htop` - have you tried tuning your worker counts or any other
> > other config settings?
>
> I have set workers to auto. Reducing the workers, esp. the proxy
> server worker has resulted in lower throughput. Also, I have set
> threads-per-disk in the object server to 4. I experimented with 8, but
> didn't see too much difference. Analysis done using sysdig suggests
> that CPU is the bottleneck; not disk.
>
> I'll take a deeper look at this with htop and see what's happening.
>
> -Shri
>
> P.S. "tanstaafl": Knew the phrase; but learnt the acronym just now...
> Learn something new everyday!! :-).
>
> >
> > -Clay
> >
> > On Thu, Apr 2, 2015 at 10:12 PM, Shrinand Javadekar
> > <shrinand at maginatics.com> wrote:
> >>
> >> Top shows the CPUs pegged at ~100%. Writes are done by a tool built
> >> in-house which is similar in functionality to other object store
> >> benchmarking tools. As I mentioned, there are 256 parallel object
> >> writes (PUTS), each of 256K bytes.
> >>
> >> On Thu, Apr 2, 2015 at 9:21 PM, Yogesh Girikumar <yogeshg1987 at gmail.com
> >
> >> wrote:
> >> > Also how are you doing the object writes to benchmark it? Are you
> using
> >> > dd?
> >> >
> >> > On 3 April 2015 at 09:50, Yogesh Girikumar <yogeshg1987 at gmail.com>
> >> > wrote:
> >> >>
> >> >> What does top say?
> >> >>
> >> >> On 3 April 2015 at 02:34, Shrinand Javadekar <
> shrinand at maginatics.com>
> >> >> wrote:
> >> >>>
> >> >>> Hi,
> >> >>>
> >> >>> I have a single node Swift instance. It has 16 cpus, 8 disks and
> 64GB
> >> >>> memory. As part of testing, I am doing 256 object writes in parallel
> >> >>> for ~10 mins. Each object is also 256K bytes in size.
> >> >>>
> >> >>> While my experiment is running, I see that the CPU utilization of
> the
> >> >>> box is always ~100%. I am trying to understand what is causing this
> >> >>> high CPU utilization. Some of this could be attributed to:
> >> >>>
> >> >>> 1. MD5 checksum calculation done to verify every PUT.
> >> >>> 2. MD5 checksum calculation by the auditor (if it runs during this
> >> >>> interval).
> >> >>> 3. Hash calculation of the path to decide which partition the object
> >> >>> goes
> >> >>> to.
> >> >>>
> >> >>> Are there any other CPU intensive operations happening on the system
> >> >>> that I should be aware of?
> >> >>>
> >> >>> I see that the proxy-server has a "PUT" queue. Is there some
> >> >>> processing of the data in this queue? Would simply putting data in
> and
> >> >>> out of the queue, streaming the data between the proxy and object
> >> >>> server use considerable CPU?
> >> >>>
> >> >>> Thanks in advance.
> >> >>> -Shri
> >> >>>
> >> >>> _______________________________________________
> >> >>> Mailing list:
> >> >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> >> >>> Post to     : openstack at lists.openstack.org
> >> >>> Unsubscribe :
> >> >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> >> >>
> >> >>
> >> >
> >>
> >> _______________________________________________
> >> Mailing list:
> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> >> Post to     : openstack at lists.openstack.org
> >> Unsubscribe :
> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> >
> >
>
> _______________________________________________
> Mailing list:
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to     : openstack at lists.openstack.org
> Unsubscribe :
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20150404/4a29dcba/attachment.html>


More information about the Openstack mailing list