I am running stable Queens with hundreds of ironic baremetal nodes. Things are mostly stable but occasionally some baremetal node provisions are failing. These failures have been tracked to nova placement failure leading to 409 errors.My nova and baremetal filters do NOT have the 3 filters you mention. [root@sc-control03 objects]# grep filter /etc/nova/nova.conf | grep filters # * enabled_filters #enabled_filters=RetryFilter,AvailabilityZoneFilter,ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,ServerGroupAntiAffinityFilter,ServerGroupAffinityFilter#use_baremetal_filters=false#baremetal_enabled_filters=RetryFilter,AvailabilityZoneFilter,ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,ExactRamFilter,ExactDiskFilter,ExactCoreFilter The baremetal nodes are all using resource class. My image does NOT have the changes for https://review.opendev.org/#/c/565841 Ultimately, nova-conductor is reported "NoValidHost: No valid host was found. There are not enough hosts available"This has been traced to nova-placement-api "Allocation for CUSTOM_RRR430 on resource provider 3cacac3f-9af0-4e39-9bc8-d1f362bdb730 violates min_unit, max_unit, or step_size. Requested: 2, min_unit: 1, max_unit: 1, step_size: 1" Any pointers on what next steps I should be looking at ? thanks,Fred. Relevant logs: nova-conductor.log2019-11-12 10:26:02.593 1666486 ERROR nova.conductor.manager [req-fa1bfb2e-c765-432d-aa66-e16db8329312 - - - - -] Failed to schedule instances: NoValidHost_Remote: No valid host was found. There are not enough hosts available.Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 226, in inner return func(*args, **kwargs) File "/usr/lib/python2.7/site-packages/nova/scheduler/manager.py", line 154, in select_destinations allocation_request_version, return_alternates) File "/usr/lib/python2.7/site-packages/nova/scheduler/filter_scheduler.py", line 91, in select_destinations allocation_request_version, return_alternates) File "/usr/lib/python2.7/site-packages/nova/scheduler/filter_scheduler.py", line 243, in _schedule claimed_instance_uuids) File "/usr/lib/python2.7/site-packages/nova/scheduler/filter_scheduler.py", line 280, in _ensure_sufficient_hosts raise exception.NoValidHost(reason=reason) NoValidHost: No valid host was found. There are not enough hosts available. nova-placement-api.log 3cacac3f-9af0-4e39-9bc8-d1f362bdb730 = resource ID of baremetal node 84ea2b90-06b2-489e-92ea-24b859b3c997 = instance ID 2019-11-12 10:26:02.427 4161131 INFO nova.api.openstack.placement.requestlog [req-66a6dc45-8326-4e24-9216-fc77099303ba 1ee9f9bf77294e8e8bf50bb35c581689 acf8cd411e5e4751a61d1ed54e8e874d - default default] 10.33.24.13 "GET /allocations/84ea2b90-06b2-489e-92ea-24b859b3c997" status: 200 len: 111 microversion: 1.0 2019-11-12 10:26:02.461 4161129 WARNING nova.objects.resource_provider [req-6d79841e-6abe-490e-b79b-8d88b04215af 1ee9f9bf77294e8e8bf50bb35c581689 acf8cd411e5e4751a61d1ed54e8e874d - default default] Allocation for CUSTOM_Z370_A on resource provider 3cacac3f-9af0-4e39-9bc8-d1f362bdb730 violates min_unit, max_unit, or step_size. Requested: 2, min_unit: 1, max_unit: 1, step_size: 1 2019-11-12 10:26:02.568 4161129 INFO nova.api.openstack.placement.requestlog [req-6d79841e-6abe-490e-b79b-8d88b04215af 1ee9f9bf77294e8e8bf50bb35c581689 acf8cd411e5e4751a61d1ed54e8e874d - default default] 10.33.24.13 "PUT /allocations/84ea2b90-06b2-489e-92ea-24b859b3c997" status: 409 len: 383 microversion: 1.17 http_access_log10.33.24.13 - - [12/Nov/2019:10:26:02 -0800] "GET /allocations/84ea2b90-06b2-489e-92ea-24b859b3c997 HTTP/1.1" 200 111 "-" "nova-scheduler keystoneauth1/3.4.0 python-requests/2.14.2 CPython/2.7.5"10.33.24.13 - - [12/Nov/2019:10:26:02 -0800] "PUT /allocations/84ea2b90-06b2-489e-92ea-24b859b3c997 HTTP/1.1" 409 383 "-" "nova-scheduler keystoneauth1/3.4.0 python-requests/2.14.2 CPython/2.7.5" On Wednesday, November 13, 2019, 11:36:35 AM PST, Albert Braden <albert.braden@synopsys.com> wrote: Removing these 3 obsolete filters appears to have fixed the problem. Thank you for your advice! -----Original Message----- From: Matt Riedemann <mriedemos@gmail.com> Sent: Tuesday, November 12, 2019 1:14 PM To: openstack-discuss@lists.openstack.org Subject: Re: Scheduler sends VM to HV that lacks resources On 11/12/2019 2:47 PM, Albert Braden wrote:
It's probably a config error. Where should I be looking? This is our nova config on the controllers:
If your deployment is pike or newer (I'm guessing rocky because your other email says rocky), then you don't need these filters: RetryFilter - alternate hosts bp in queens release makes this moot CoreFilter - placement filters on VCPU RamFilter - placement filters on MEMORY_MB -- Thanks, Matt