If I create 20 VMs at once, at least one of them fails with "Exceeded maximum number of retries." When I look at the logs I see that the scheduler sent the VM to a host that doesn't have enough CPU "Free vcpu 14.00 VCPU < requested 16 VCPU."
https://paste.fedoraproject.org/paste/6N3wcDzlbNQgj6hRApHiDQ
I thought that this must be caused by a race condition, so I stopped the scheduler and conductor on 2 controllers, and then created 20 more VMs. Now I see the logs only on controller 3, and some of the failures are now saying "Unable to establish connection to <LB>" but I still see the single scheduler sending VMs to a host that lacks resources "Free vcpu 14.00 VCPU < requested 16 VCPU."
https://paste.fedoraproject.org/paste/lGlVpfbB9C19mMzrWQcHCQ
I'm looking at my nova.conf but don't see anything misconfigured. My filters are pretty standard:
enabled_filters = RetryFilter,AvailabilityZoneFilter,CoreFilter,RamFilter,ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter, ServerGroupAntiAffinityFilter,ServerGroupAffinityFilter,DifferentHostFilter,SameHostFilter
What should I be looking for here? Why would a single scheduler send a VM to a host that is too full? We have lots of compute hosts that are not full:
https://paste.fedoraproject.org/paste/6SX9pQ4V1KnWfQkVnfoHOw
This is the command line I used:
openstack server create --flavor s1.16cx120g --image QSC-P-CentOS6.6-19P1-v4 --network vg-network --max 20 alberttestB what version of openstack are you running? if its not using placement then this behaviour is expected as the resources are not claimed untill the vm is booted on
On Tue, 2019-11-12 at 19:42 +0000, Albert Braden wrote: the node so there is and interval where the scudler is selecting hosts where you can race with other vm boot. if you are using placement and you are not using numa or pci pass though, which you do not appear to be based on your enabled filters, then this should not happen and we should dig deeper as there is likely a bug either in your configuration or in nova.