[Openstack-operators] [Scale][Performance] <nodes_with_someting> / compute_nodes ratio experience

Gustavo Randich gustavo.randich at gmail.com
Tue Mar 15 20:50:22 UTC 2016


We recently had a power outage, and perhaps one of the scenarios of
controller capacity planning is starting all of the compute nodes at once
or in large batches (when power was restored).

We painfully learned about our nova-conductor being low on workers/cores,
but still we doubted whether it was a problem of our deployment. Now we
know nova-conductor is very resource hungry.

Official recommendations about node ratios would be very appreciated.



On Thu, Nov 19, 2015 at 8:36 PM, Rochelle Grober <rochelle.grober at huawei.com
> wrote:

> Sorry this doesn't thread properly, but cut and pasted out of the digest...
>
>
>
> > As providing OpenStack community with understandable recommendations
>
> > and instructions on performant OpenStack cloud deployments is part of
>
> > Performance Team mission, I'm kindly asking you to share your
>
> > experience on safe cloud deployment ratio between various types of
>
> > nodes you're having right now and the possible issues you observed (as
>
> > an example: discussed GoDaddy's cloud is having 3 conductor boxes vs
>
> > 250 computes in the cell, and there was an opinion that it's simply
>
> > not enough and that's it).
>
>
>
> That was my opinion, and it was based on an apparently incorrect
> assumption that they had a lot of things coming and going on their cloud. I
> think they've demonstrated at this point that (other issues
>
> aside) three is enough for them, given their environment, workload, and
> configuration.
>
>
>
> This information is great for building rules of thumb, so to speak.
> GoDaddy has an example configuraton that is adequate for low frequency
> construct/destruct (low number of vm create/destroy) cloud architectures.
> This provides a lower bounds and might be representative of a lot of
> enterprise cloud deployments.
>
>
>
> The problem with coming up with any sort of metric that will apply to
> everyone is that it's highly variable. If you have 250 compute nodes and
> never create or destroy any instances, you'll be able to get away with
>
> *many* fewer conductors than if you have a very active cloud. Similarly,
> during a live upgrade (or following any upgrade where we do some online
> migration of data), your conductor load will be higher than normal. Of
> course, 4-core and 96-core conductor nodes aren't equal either.
>
>
>
> And here we have another rule of thumb, but no numbers put to it yet.  If
> you have a low frequency construct/destruct cloud model, you will need to
> temporarily increase your number of conductors by {x amount OR x%} when
> performing OpenStack live upgrades.
>
>
>
> So, by all means, we should gather information on what people are doing
> successfully, but keep in mind that it depends *a lot* on what sort of
> workloads the cloud is supporting.
>
>
>
> Right, but we can start applying fuzzy logic (the human kind, not machine)
> and get a better understanding of working configurations and **why** they
> work, then start examining where the transition states between
> configurations are.   You need data before you can create information ;-)
>
>
>
> --Rocky
>
>
>
> --Dan
>
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20160315/7f0ee81a/attachment.html>


More information about the OpenStack-operators mailing list