All VMs fail when --max exceeds available resources
mriedemos at gmail.com
Thu Nov 21 14:45:58 UTC 2019
On 11/21/2019 6:04 AM, Sean Mooney wrote:
> i think the behavior might change if the max vaule exceeds the batch size. we group the resues in set of 10? by default
> if all the vms in a batch go active and latter vms in a different set fail the first vms will remain active.
> i cant remember which config option contolse that but there is one. its max concurent build or somethign like that.
That batch size option is per-compute. For what Albert was hitting it
failed with NoValidHost in the scheduler so the compute isn't involved.
What you're describing is likely legacy behavior where the scheduler
said, "yup sure putting 20 instances on a few computes is probably OK"
and then they raced to do the RT claim on the compute and failed late
and went to ERROR while some went ACTIVE. That window was closed for
vcpu/ram/disk claims in Pike when the scheduler started using placement
to create atomic resource allocation claims.
So if someone can reproduce this issue with --max and some go active
while some go error in the same request post-pike I'd be surprised.
Doing that in *concurrent* requests I could understand since the
scheduler could be a bit split brain there but placement still would not be.
More information about the openstack-discuss