<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Nov 12, 2019 at 11:38 AM Belmiro Moreira <<a href="mailto:moreira.belmiro.email.lists@gmail.com">moreira.belmiro.email.lists@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr">Dan Smith just point me the conductor groups that were added in Stein.<div><a href="https://specs.openstack.org/openstack/nova-specs/specs/stein/implemented/ironic-conductor-groups.html" target="_blank">https://specs.openstack.org/openstack/nova-specs/specs/stein/implemented/ironic-conductor-groups.html</a></div><div>This is an interesting way to partition the deployment much better than the multiple nova-computes setup.</div></div></div></blockquote><div><br></div><div>Just a note, they aren't mutually exclusive. You can run multiple nova-computes to manage a single conductor group, whether for HA or because you're using groups for some other construct (cells, racks, halls, network zones, etc) which you want to shard further.</div><div><br></div><div>// jim</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div><br></div><div>Thanks,</div><div>Belmiro</div><div>CERN</div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Nov 12, 2019 at 5:06 PM Belmiro Moreira <<a href="mailto:moreira.belmiro.email.lists@gmail.com" target="_blank">moreira.belmiro.email.lists@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Hi,<div>using several cells for the Ironic deployment would be great however it doesn't work with the current architecture.</div><div>The nova ironic driver gets all the nodes available in Ironic. This means that if we have several cells all of them will report the same nodes!</div><div>The other possibility is to have a dedicated Ironic instance per cell, but in this case it will be very hard to manage a large deployment.</div><div><br></div><div>What we are trying is to shard the ironic nodes between several nova-computes.</div><div>nova/ironic deployment supports several nova-computes and it will be great if the RT nodes cycle is sharded between them.</div><div><br></div><div>But anyway, this will also require speeding up the big lock.</div><div>It would be great if a compute node can handle more than 500 nodes.</div><div>Considering our use case: 15k/500 = 30 compute nodes.</div><div><br></div><div>Belmiro</div><div>CERN</div><div><br></div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Nov 11, 2019 at 9:13 PM Matt Riedemann <<a href="mailto:mriedemos@gmail.com" target="_blank">mriedemos@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On 11/11/2019 7:03 AM, Chris Dent wrote:<br>
> Or using<br>
> separate processes? For the ironic and vsphere contexts, increased<br>
> CPU usage by the nova-compute process does not impact on the<br>
> workload resources, so parallization is likely a good option.<br>
<br>
I don't know how much it would help - someone would have to actually <br>
test it out and get metrics - but one easy win might just be using a <br>
thread or process executor pool here [1] so that N compute nodes could <br>
be processed through the update_available_resource periodic task <br>
concurrently, maybe $ncpu or some factor thereof. By default make it <br>
serialized for backward compatibility and non-ironic deployments. Making <br>
that too highly concurrent could have negative impacts on other things <br>
running on that host, like the neutron agent, or potentially storming <br>
conductor/rabbit with a ton of DB requests from that compute.<br>
<br>
That doesn't help with the scenario that the big <br>
COMPUTE_RESOURCE_SEMAPHORE lock is held by the periodic task while <br>
spawning, moving, or deleting an instance that also needs access to the <br>
big lock to update the resource tracker, but baby steps if any steps in <br>
this area of the code would be my recommendation.<br>
<br>
[1] <br>
<a href="https://github.com/openstack/nova/blob/20.0.0/nova/compute/manager.py#L8629" rel="noreferrer" target="_blank">https://github.com/openstack/nova/blob/20.0.0/nova/compute/manager.py#L8629</a><br>
<br>
-- <br>
<br>
Thanks,<br>
<br>
Matt<br>
<br>
</blockquote></div>
</blockquote></div>
</blockquote></div></div>