Small follow up:

And what can make things way worse in this scenario, if you have some trait requirement available only in some AZs (like GPUs).
This becomes really, really painful then.


On Tue, 24 Dec 2024, 12:23 Dmitriy Rabotyagov, <noonedeadpunk@gmail.com> wrote:

I would really argue how well it works, as it is a blind random method. There's no awareness about resource availability and it will work well only until you run out of resources in one of AZ, and then you need to track that externally and reconfigure the controller in advance to prevent random scheduling failures.

While I agree this works, there's quite some room for improvement.
Though, this improvement is gonna be not an easy one for sure.


On Tue, 24 Dec 2024, 12:18 Thomas Goirand, <zigo@debian.org> wrote:
On 12/19/24 14:19, Arnaud Morin wrote:
> Ok, thank you all for your answers.
> I see multiples options on the table here:
> - have my controllers equally split across AZ and set a different
>    default_schedule_zone on each (sounds hacky to me)

That's exactly what Sylvain B. (PTL of Nova) suggested to me. We
implemented it, and it works very well. As our users query Nova to
schedule new VMs, they end up in a random API server (one of the 3
backend nova-api behind our haproxy, in our case), and then the VM are
scheduled randomly on one of the 3 AZ, as expected.

Cheers,

Thomas Goirand (zigo)