<div dir="ltr"><div dir="ltr">Dan Smith just point me the conductor groups that were added in Stein.<div><a href="https://specs.openstack.org/openstack/nova-specs/specs/stein/implemented/ironic-conductor-groups.html">https://specs.openstack.org/openstack/nova-specs/specs/stein/implemented/ironic-conductor-groups.html</a></div><div>This is an interesting way to partition the deployment much better than the multiple nova-computes setup.</div><div><br></div><div>Thanks,</div><div>Belmiro</div><div>CERN</div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Nov 12, 2019 at 5:06 PM Belmiro Moreira <<a href="mailto:moreira.belmiro.email.lists@gmail.com">moreira.belmiro.email.lists@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div dir="ltr">Hi,<div>using several cells for the Ironic deployment would be great however it doesn't work with the current architecture.</div><div>The nova ironic driver gets all the nodes available in Ironic. This means that if we have several cells all of them will report the same nodes!</div><div>The other possibility is to have a dedicated Ironic instance per cell, but in this case it will be very hard to manage a large deployment.</div><div><br></div><div>What we are trying is to shard the ironic nodes between several nova-computes.</div><div>nova/ironic deployment supports several nova-computes and it will be great if the RT nodes cycle is sharded between them.</div><div><br></div><div>But anyway, this will also require speeding up the big lock.</div><div>It would be great if a compute node can handle more than 500 nodes.</div><div>Considering our use case: 15k/500 = 30 compute nodes.</div><div><br></div><div>Belmiro</div><div>CERN</div><div><br></div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Nov 11, 2019 at 9:13 PM Matt Riedemann <<a href="mailto:mriedemos@gmail.com" target="_blank">mriedemos@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">On 11/11/2019 7:03 AM, Chris Dent wrote:<br>

> Or using<br>

> separate processes? For the ironic and vsphere contexts, increased<br>

> CPU usage by the nova-compute process does not impact on the<br>

> workload resources, so parallization is likely a good option.<br>

<br>

I don't know how much it would help - someone would have to actually <br>

test it out and get metrics - but one easy win might just be using a <br>

thread or process executor pool here [1] so that N compute nodes could <br>

be processed through the update_available_resource periodic task <br>

concurrently, maybe $ncpu or some factor thereof. By default make it <br>

serialized for backward compatibility and non-ironic deployments. Making <br>

that too highly concurrent could have negative impacts on other things <br>

running on that host, like the neutron agent, or potentially storming <br>

conductor/rabbit with a ton of DB requests from that compute.<br>

<br>

That doesn't help with the scenario that the big <br>

COMPUTE_RESOURCE_SEMAPHORE lock is held by the periodic task while <br>

spawning, moving, or deleting an instance that also needs access to the <br>

big lock to update the resource tracker, but baby steps if any steps in <br>

this area of the code would be my recommendation.<br>

<br>

[1] <br>

<a href="https://github.com/openstack/nova/blob/20.0.0/nova/compute/manager.py#L8629" rel="noreferrer" target="_blank">https://github.com/openstack/nova/blob/20.0.0/nova/compute/manager.py#L8629</a><br>

<br>

-- <br>

<br>

Thanks,<br>

<br>

Matt<br>

<br>

</blockquote></div>

</blockquote></div>