[ops][largescale-sig] How many compute nodes in a single cluster ?

Sean Mooney smooney at redhat.com
Tue Feb 2 17:50:27 UTC 2021


On Tue, 2021-02-02 at 17:37 +0000, Arnaud Morin wrote:
> Hey all,
> 
> I will start the answers :)
> 
> At OVH, our hard limit is around 1500 hypervisors on a region.
> It also depends a lot on number of instances (and neutron ports).
> The effects if we try to go above this number:
> - load on control plane (db/rabbit) is increasing a lot
> - "burst" load is hard to manage (e.g. restart of all neutron agent or
>   nova computes is putting a high pressure on control plane)
> - and of course, failure domain is bigger
> 
> Note that we dont use cells.
> We are deploying multiple regions, but this is painful to manage /
> understand for our clients.
> We are looking for a solution to unify the regions, but we did not find
> anything which could fit our needs for now.

i assume you do not see cells v2 as a replacment for multipel regions because they 
do not provide indepente falut domains and also because they are only a nova feature
so it does not solve sclaing issue in other service like neutorn which are streached acrooss
all cells.

cells are a scaling mechinm but the larger the cloud the harder it is to upgrade and cells does not
help with that infact by adding more contoler it hinders upgrades.

seperate regoins can all be upgraded indepently and can be fault tolerant if you dont share serviecs
between regjions and use fedeeration to avoid sharing keystone.


glad to hear you can manage 1500 compute nodes by the way.

the old value of 500 nodes max has not been true for a very long time
rabbitmq and the db still tends to be the bottelneck to scale however beyond 1500 nodes
outside of the operational overhead.

> 
> Cheers,
> 





More information about the openstack-discuss mailing list