Hi there, I like the idea, but historically, Nova has steered away from giving more details on why things failed to schedule in order to prevent leaking information about the cloud. I agree that it’s one of the more painful errors, but I see the purpose behind masking it from the user in an environment where the user is not the operator. It would be good to hear from other devs, or maybe if this can be an admin-level thing. Thanks Mohammed On Wed, Aug 25, 2021 at 9:53 AM Brito, Hugo Nicodemos < Hugo.Brito@windriver.com> wrote:
Hi,
In a prototype, we have improved Nova's scheduling error messages. This helps both developers and end users better understand the scheduler problems that occur on creation of an instance.
When a scheduler error happens during instance creation via the nova upstream, we get the following message on the Overview tab (Horizon dashboard): "No valid host was found." This doesn't give us enough information about what really happened, so our solution was to add more details on the instance's overview page, e.g.:
**Fault:Message** attribute provides a summary of why each host can not satisfy the instance’s resource requirements, e.g. for controller-0, it indicates “No valid host was found. Not enough host cell CPUs to fit instance cell” (where cell is a numa-node or socket).
**Fault:Details** attribute provides even more detail for each individual host, for example it shows that the instance “required” 2 CPU cores and shows the “actual” CPU cores available on each “numa” node: “actual:0, numa:1” and “actual:1, numa:0”.
These details are also present using the OpenStack CLI, in the _fault_ attribute:
- openstack server show <instance>
With that in mind, we'd like to know if you are open to consider such a change. We are willing to submit a spec and upstream that implementation.
Regards, - nicodemos
-- Mohammed Naser VEXXHOST, Inc.