[OpenStack-Infra] Poor nodepool performance under load (help needed!)

James E. Blair corvus at inaugust.com
Wed Mar 4 16:33:53 UTC 2015


"Heald, Mike" <mike.heald at hp.com> writes:

> This shows the time taken in seconds by each nodepool task
> (e.g. AddFloatingIPTask). Yes, it's slow, but consistent. During high
> load, the tasks only get more densely packed, they don't get slower.

Quickly scanning the logs we see upstream, I think we are seeing roughly
similar times for hpcloud.

> This shows the number of individual nodepool tasks
> (e.g. AddFloatingIPTask) waiting in the queue. Guess when a load of
> jobs hit us!

Again, from a skim, our queues are usually around 50-80 deep.

> That shows the amount of time the nodes spend in the delete state,
> from going from used to delete, to all the delete tasks having run and
> the node getting removed. Take a look at what happens when there's a
> lot of stuff in the queue. Ouchy.

It usually takes on the order of 5 minutes for us to delete a server.

> Our 'rate' is the default of 1.0. Any ideas or help would be appreciated!

I think there are two major differences.  Our rate is set to 0.1,
meaning that we issue requests 10 times as fast (the option is
incorrectly named, sorry).  Also, due to networking limitations, we have
5 "providers" configured for hpcloud, each servicing about 100 nodes.
This means that we are performing parallel operations, up to a point.
If you are managing more than 100 nodes with a provider, that
parallelism will complicate performance comparisons.

I think the biggest thing here is that the rate is set very low.  I
would recommend finding the actual rate limit for your cloud account,
and setting the nodepool rate to match (but not exceed).  You might even
see about having that limit increased for your account.

-Jim



More information about the OpenStack-Infra mailing list