[openstack-dev] [nova] How to debug no valid host failures with placement

Eric Fried openstack at fried.cc
Tue Aug 14 16:29:13 UTC 2018


Folks-

	The patch mentioned below [1] has undergone several rounds of review
and collaborative revision, and we'd really like to get your feedback on
it. From the commit message:

Here are some examples of the debug output:

- A request for three resources with no aggregate or trait filters:

 found 7 providers with available 5 VCPU
 found 9 providers with available 1024 MEMORY_MB
       5 after filtering by previous result
 found 8 providers with available 1500 DISK_GB
       2 after filtering by previous result

- The same request, but with a required trait that nobody has, shorts
  out quickly:

 found 0 providers after applying required traits filter
({'HW_CPU_X86_AVX2': 65})

- A request for one resource with aggregates and forbidden (but no
  required) traits:

 found 2 providers after applying aggregates filter
([['3ed8fb2f-4793-46ee-a55b-fdf42cb392ca']])
 found 1 providers after applying forbidden traits filter
({u'CUSTOM_TWO': 201, u'CUSTOM_THREE': 202})
 found 3 providers with available 4 VCPU
       1 after applying initial aggregate and trait filters

Thanks,
efried

[1] https://review.openstack.org/#/c/590041


> I've created a patch that (hopefully) will address some of the
> difficulty that folks have had in diagnosing which parts of a request
> caused all providers to be filtered out from the return of GET
> /allocation_candidates:
> 
> https://review.openstack.org/#/c/590041
> 
> This patch changes two primary things:
> 
> 1) Query-splitting
> 
> The patch splits the existing monster SQL query that was being used for
> querying for all providers that matched all requested resources,
> required traits, forbidden traits and required aggregate associations
> into doing multiple queries, one for each requested resource. While this
> does increase the number of database queries executed for each call to
> GET /allocation_candidates, the changes allow better visibility into
> what parts of the request cause an exhaustion of matching providers.
> We've benchmarked the new patch and have shown the performance impact of
> doing 3 queries versus 1 (when there is a request for 3 resources --
> VCPU, RAM and disk) is minimal (a few extra milliseconds for execution
> against a DB with 1K providers having inventory of all three resource
> classes).
> 
> 2) Diagnostic logging output
> 
> The patch adds debug log output within each loop iteration, so there is
> no logging output that shows how many matching providers were found for
> each resource class involved in the request. The output looks like this
> in the logs:
> 
> [req-2d30faa8-4190-4490-a91e-610045530140] inside VCPU request loop.
> before applying trait and aggregate filters, found 12 matching providers
> [req-2d30faa8-4190-4490-a91e-610045530140] found 12 providers with
> capacity for the requested 1 VCPU.
> [req-2d30faa8-4190-4490-a91e-610045530140] inside MEMORY_MB request
> loop. before applying trait and aggregate filters, found 9 matching
> providers [req-2d30faa8-4190-4490-a91e-610045530140] found 9 providers
> with capacity for the requested 64 MEMORY_MB. before loop iteration we
> had 12 matches. [req-2d30faa8-4190-4490-a91e-610045530140]
> RequestGroup(use_same_provider=False, resources={MEMORY_MB:64, VCPU:1},
> traits=[], aggregates=[]) (suffix '') returned 9 matches
> 
> If a request includes required traits, forbidden traits or required
> aggregate associations, there are additional log messages showing how
> many matching providers were found after applying the trait or aggregate
> filtering set operation (in other words, the log output shows the impact
> of the trait filter or aggregate filter in much the same way that the
> existing FilterScheduler logging shows the "before and after" impact
> that a particular filter had on a request process.
> 
> Have a look at the patch in question and please feel free to add your
> feedback and comments on ways this can be improved to meet your needs.
> 
> Best,
> -jay
> 
> 
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 



More information about the OpenStack-dev mailing list