[openstack-dev] [nova] How to debug no valid host failures with placement
melanie witt
melwittt at gmail.com
Thu Aug 2 19:04:41 UTC 2018
On Thu, 2 Aug 2018 13:20:43 -0500, Eric Fried wrote:
>> And we could do the same kind of approach with the non-granular request
>> groups by reducing the single large SQL statement that is used for all
>> resources and all traits (and all agg associations) into separate SELECT
>> statements.
>>
>> It could be slightly less performance-optimized but more readable and
>> easier to output debug logs like those above.
>
> Okay, but first we should define the actual problem(s) we're trying to
> solve, as Chris says, so we can assert that it's worth the (possible)
> perf hit and (definite) dev resources, not to mention the potential for
> injecting bugs.
The problem is an infamous one, which is, your users are trying to boot
instances and they get "No Valid Host" and an instance in ERROR state.
They contact support, and now support is trying to determine why
NoValidHost happened. In the past, they would turn on DEBUG log level on
the nova-scheduler, try another request, and take a look at the
scheduler logs. They'd see a message, for example, "DiskFilter [start:
2, end: 0]" (there were 2 candidates before DiskFilter ran and there
were 0 after it ran) when the scheduling fails, indicating that
scheduling failed because no computes were reporting enough disk to
fulfill the request. The key thing here is they could see which resource
was not available in their cluster.
Now, with placement, all the resources are checked in one go and support
can't tell which resource or trait was rejected, assuming it wasn't all
of them. They want to know what resource or trait was rejected in order
to help them find the problematic compute host or configuration or other
and fix it.
At present, I think the only approach support could take is to query a
view of resource providers with their resource and trait availability
and compare against the request flavor that failed, to figure out which
resources or traits don't pass what's reported as available.
Hope that helps.
-melanie
More information about the OpenStack-dev
mailing list