[nova][ironic][ptg] Resource tracker scaling issues

Belmiro Moreira moreira.belmiro.email.lists at gmail.com
Tue Nov 12 16:06:17 UTC 2019


Hi,
using several cells for the Ironic deployment would be great however it
doesn't work with the current architecture.
The nova ironic driver gets all the nodes available in Ironic. This means
that if we have several cells all of them will report the same nodes!
The other possibility is to have a dedicated Ironic instance per cell, but
in this case it will be very hard to manage a large deployment.

What we are trying is to shard the ironic nodes between several
nova-computes.
nova/ironic deployment supports several nova-computes and it will be great
if the RT nodes cycle is sharded between them.

But anyway, this will also require speeding up the big lock.
It would be great if a compute node can handle more than 500 nodes.
Considering our use case: 15k/500 = 30 compute nodes.

Belmiro
CERN



On Mon, Nov 11, 2019 at 9:13 PM Matt Riedemann <mriedemos at gmail.com> wrote:

> On 11/11/2019 7:03 AM, Chris Dent wrote:
> > Or using
> > separate processes? For the ironic and vsphere contexts, increased
> > CPU usage by the nova-compute process does not impact on the
> > workload resources, so parallization is likely a good option.
>
> I don't know how much it would help - someone would have to actually
> test it out and get metrics - but one easy win might just be using a
> thread or process executor pool here [1] so that N compute nodes could
> be processed through the update_available_resource periodic task
> concurrently, maybe $ncpu or some factor thereof. By default make it
> serialized for backward compatibility and non-ironic deployments. Making
> that too highly concurrent could have negative impacts on other things
> running on that host, like the neutron agent, or potentially storming
> conductor/rabbit with a ton of DB requests from that compute.
>
> That doesn't help with the scenario that the big
> COMPUTE_RESOURCE_SEMAPHORE lock is held by the periodic task while
> spawning, moving, or deleting an instance that also needs access to the
> big lock to update the resource tracker, but baby steps if any steps in
> this area of the code would be my recommendation.
>
> [1]
> https://github.com/openstack/nova/blob/20.0.0/nova/compute/manager.py#L8629
>
> --
>
> Thanks,
>
> Matt
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20191112/250f1ba0/attachment-0001.html>


More information about the openstack-discuss mailing list