[openstack-dev] [nova] How to debug no valid host failures with placement
Ben Nemec
openstack at nemebean.com
Wed Aug 1 17:17:36 UTC 2018
On 08/01/2018 11:23 AM, Chris Friesen wrote:
> On 08/01/2018 09:58 AM, Andrey Volkov wrote:
>> Hi,
>>
>> It seems you need first to check what placement knows about resources
>> of your cloud.
>> This can be done either with REST API [1] or with osc-placement [2].
>> For osc-placement you could use:
>>
>> pip install osc-placement
>> openstack allocation candidate list --resource DISK_GB=20 --resource
>> MEMORY_MB=2048 --resource VCPU=1 --os-placement-api-version 1.10
>>
>> And you can explore placement state with other commands like openstack
>> resource
>> provider list, resource provider inventory list, resource provider
>> usage show.
>>
>
> Unfortunately this doesn't help figure out what the missing resources
> were *at the time of the failure*.
>
> The fact that there is no real way to get the equivalent of the old
> detailed scheduler logs is a known shortcoming in placement, and will
> become more of a problem if/when we move more complicated things like
> CPU pinning, hugepages, and NUMA-awareness into placement.
>
> The problem is that getting useful logs out of placement would require
> significant development work.
Yeah, in my case I only had one compute node so it was obvious what the
problem was, but if I had a scheduling failure on a busy cloud with
hundreds of nodes I don't see how you would ever track it down. Maybe
we need to have a discussion with operators about how often they do
post-mortem debugging of this sort of thing?
More information about the OpenStack-dev
mailing list