<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>
</head>
<body dir="ltr">
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
> <span style="font-family:"Segoe UI", "Segoe UI Web (West European)", "Segoe UI", -apple-system, BlinkMacSystemFont, Roboto, "Helvetica Neue", sans-serif;font-size:14.6667px;background-color:rgb(255, 255, 255);display:inline !important">the old value of 500
nodes max has not been true for a very long time</span><br>
<span style="font-family:"Segoe UI", "Segoe UI Web (West European)", "Segoe UI", -apple-system, BlinkMacSystemFont, Roboto, "Helvetica Neue", sans-serif;font-size:14.6667px;background-color:rgb(255, 255, 255);display:inline !important">rabbitmq and the db still
tends to be the bottelneck to scale however beyond 1500 nodes</span><br style="font-family:"Segoe UI", "Segoe UI Web (West European)", "Segoe UI", -apple-system, BlinkMacSystemFont, Roboto, "Helvetica Neue", sans-serif;font-size:14.6667px;background-color:rgb(255, 255, 255)">
<span style="font-family:"Segoe UI", "Segoe UI Web (West European)", "Segoe UI", -apple-system, BlinkMacSystemFont, Roboto, "Helvetica Neue", sans-serif;font-size:14.6667px;background-color:rgb(255, 255, 255);display:inline !important">outside of the operational
overhead.</span></div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<span style="font-family:"Segoe UI", "Segoe UI Web (West European)", "Segoe UI", -apple-system, BlinkMacSystemFont, Roboto, "Helvetica Neue", sans-serif;font-size:14.6667px;background-color:rgb(255, 255, 255);display:inline !important"><br>
</span></div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
We manage our scale with regions as well. <span style="color: rgb(0, 0, 0); font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt;">With 1k nodes our RabbitMQ isn't breaking a sweat, and no signs that the database would be hitting any limits. </span><span style="color: rgb(0, 0, 0); font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt;">Our
issues have been limited to scaling Neutron and VM scheduling on Nova mostly due to, NUMA pinning.</span></div>
<div id="appendonsend"></div>
<hr style="display:inline-block;width:98%" tabindex="-1">
<div id="divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>From:</b> Sean Mooney <smooney@redhat.com><br>
<b>Sent:</b> Tuesday, February 2, 2021 9:50 AM<br>
<b>To:</b> openstack-discuss@lists.openstack.org <openstack-discuss@lists.openstack.org><br>
<b>Subject:</b> Re: [ops][largescale-sig] How many compute nodes in a single cluster ?</font>
<div> </div>
</div>
<div class="BodyFragment"><font size="2"><span style="font-size:11pt;">
<div class="PlainText">On Tue, 2021-02-02 at 17:37 +0000, Arnaud Morin wrote:<br>
> Hey all,<br>
> <br>
> I will start the answers :)<br>
> <br>
> At OVH, our hard limit is around 1500 hypervisors on a region.<br>
> It also depends a lot on number of instances (and neutron ports).<br>
> The effects if we try to go above this number:<br>
> - load on control plane (db/rabbit) is increasing a lot<br>
> - "burst" load is hard to manage (e.g. restart of all neutron agent or<br>
> nova computes is putting a high pressure on control plane)<br>
> - and of course, failure domain is bigger<br>
> <br>
> Note that we dont use cells.<br>
> We are deploying multiple regions, but this is painful to manage /<br>
> understand for our clients.<br>
> We are looking for a solution to unify the regions, but we did not find<br>
> anything which could fit our needs for now.<br>
<br>
i assume you do not see cells v2 as a replacment for multipel regions because they
<br>
do not provide indepente falut domains and also because they are only a nova feature<br>
so it does not solve sclaing issue in other service like neutorn which are streached acrooss<br>
all cells.<br>
<br>
cells are a scaling mechinm but the larger the cloud the harder it is to upgrade and cells does not<br>
help with that infact by adding more contoler it hinders upgrades.<br>
<br>
seperate regoins can all be upgraded indepently and can be fault tolerant if you dont share serviecs<br>
between regjions and use fedeeration to avoid sharing keystone.<br>
<br>
<br>
glad to hear you can manage 1500 compute nodes by the way.<br>
<br>
the old value of 500 nodes max has not been true for a very long time<br>
rabbitmq and the db still tends to be the bottelneck to scale however beyond 1500 nodes<br>
outside of the operational overhead.<br>
<br>
> <br>
> Cheers,<br>
> <br>
<br>
<br>
<br>
</div>
</span></font></div>
</body>
</html>