All VMs fail when --max exceeds available resources

Matt Riedemann mriedemos at gmail.com
Thu Nov 21 14:45:58 UTC 2019


On 11/21/2019 6:04 AM, Sean Mooney wrote:
> i think the behavior might change if the max vaule exceeds the batch size. we group the resues in set of 10? by default
> if all the vms in a batch go active and latter vms in a different set fail the first vms will remain active.
> i cant remember which config option contolse that but there is one. its max concurent build or somethign like that.

That batch size option is per-compute. For what Albert was hitting it 
failed with NoValidHost in the scheduler so the compute isn't involved.

What you're describing is likely legacy behavior where the scheduler 
said, "yup sure putting 20 instances on a few computes is probably OK" 
and then they raced to do the RT claim on the compute and failed late 
and went to ERROR while some went ACTIVE. That window was closed for 
vcpu/ram/disk claims in Pike when the scheduler started using placement 
to create atomic resource allocation claims.

So if someone can reproduce this issue with --max and some go active 
while some go error in the same request post-pike I'd be surprised. 
Doing that in *concurrent* requests I could understand since the 
scheduler could be a bit split brain there but placement still would not be.

-- 

Thanks,

Matt



More information about the openstack-discuss mailing list