+1 for option 3, too.
Check host/hypervisor_hostname fisrt in API layer so that we will not create
a "ERROR" vm with "NoValidHost" exception.
>> It seems we've come to an impasse on this change [1] because of a >> concern about where to validate the requested host and/or >> hypervisor_hostname. >> >> The change is currently validating in the API to provide a fast fail 400 >> response to the user if the host and/or node don't exist. The concern is >> that the lookup for the compute node will be done again in the scheduler >> [2]. Also, if the host is not provided, then to validate the node we >> have to iterate the cells looking for the given compute node (we could >> use placement though, more on that below). >> >> I've added this to the nova meeting agenda for tomorrow but wanted to >> try and enumerate what I see are the options before the meeting so we >> don't have to re-cap all of this during the meeting. >> >> The options as I see them are: >> >> 1. Omit the validation in the API and let the scheduler do the validation. >> >> Pros: no performance impact in the API when creating server(s) >> >> Cons: if the host/node does not exist, the user will get a 202 response >> and eventually a NoValidHost error which is not a great user experience, >> although it is what happens today with the availability_zone forced >> host/node behavior we already have [3] so maybe it's acceptable. >> >> 2. Only validate host in the API since we can look up the HostMapping in >> the API DB. If the user also provided a node then we'd just throw that >> on the RequestSpec and let the scheduler code validate it. >> >> Pros: basic validation for the simple and probably most widely used case >> since for non-baremetal instances the host and node are going to be the same >> >> Cons: still could have a late failure in the scheduler with NoValidHost >> error; does not cover the case that only node (no host) is specified >> >> 3. Validate both the host and node in the API. This can be broken down: >> >> a) If only host is specified, do #2 above. >> b) If only node is specified, iterate the cells looking for the node (or >> query a resource provider with that name in placement which would avoid >> down cell issues) >> c) If both host and node is specified, get the HostMapping and from that >> lookup the ComputeNode in the given cell (per the HostMapping) >> >> Pros: fail fast behavior in the API if either the host and/or node do >> not exist >> >> Cons: performance hit in the API to validate the host/node and >> redundancy with the scheduler to find the ComputeNode to get its uuid >> for the in_tree filtering on GET /allocation_candidates. >> >> Note that if we do find the ComputeNode in the API, we could also >> (later?) make a change to the Destination object to add a node_uuid
IMHO, is it better to call it compute_node_uuid? : )
>> field so we can pass that through on the RequestSpec from >> API->conductor->scheduler and that should remove the need for the >> duplicate query in the scheduler code for the in_tree logic. >> >> I'm personally in favor of option 3 since we know that users hate >> NoValidHost errors and we have ways to mitigate the performance overhead >> of that validation. > >Count me in the option 3 boat too, for the same reasons. Rather avoid >NoValidHost and there's mitigation we can do for the perf issue. > >-melanie > >> Note that this isn't necessarily something that has to happen in the >> same change that introduces the host/hypervisor_hostname parameters to >> the API. If we do the validation in the API I'd probably split the >> validation logic into it's own patch to make it easier to test and >> review on its own. >> >> [1] https://review.opendev.org/#/c/645520/ >> [2] >> https://github.com/openstack/nova/blob/2e85453879533af0b4d0e1178797d26f026a9423/nova/scheduler/utils.py#L528 >> [3] https://docs.openstack.org/nova/latest/admin/availability-zones.html >>--Boxiang