[openstack-dev] [nova] How to debug no valid host failures with placement

melanie witt melwittt at gmail.com
Thu Aug 2 19:04:41 UTC 2018


On Thu, 2 Aug 2018 13:20:43 -0500, Eric Fried wrote:
>> And we could do the same kind of approach with the non-granular request
>> groups by reducing the single large SQL statement that is used for all
>> resources and all traits (and all agg associations) into separate SELECT
>> statements.
>>
>> It could be slightly less performance-optimized but more readable and
>> easier to output debug logs like those above.
> 
> Okay, but first we should define the actual problem(s) we're trying to
> solve, as Chris says, so we can assert that it's worth the (possible)
> perf hit and (definite) dev resources, not to mention the potential for
> injecting bugs.

The problem is an infamous one, which is, your users are trying to boot 
instances and they get "No Valid Host" and an instance in ERROR state. 
They contact support, and now support is trying to determine why 
NoValidHost happened. In the past, they would turn on DEBUG log level on 
the nova-scheduler, try another request, and take a look at the 
scheduler logs. They'd see a message, for example, "DiskFilter [start: 
2, end: 0]" (there were 2 candidates before DiskFilter ran and there 
were 0 after it ran) when the scheduling fails, indicating that 
scheduling failed because no computes were reporting enough disk to 
fulfill the request. The key thing here is they could see which resource 
was not available in their cluster.

Now, with placement, all the resources are checked in one go and support 
can't tell which resource or trait was rejected, assuming it wasn't all 
of them. They want to know what resource or trait was rejected in order 
to help them find the problematic compute host or configuration or other 
and fix it.

At present, I think the only approach support could take is to query a 
view of resource providers with their resource and trait availability 
and compare against the request flavor that failed, to figure out which 
resources or traits don't pass what's reported as available.

Hope that helps.

-melanie



More information about the OpenStack-dev mailing list