On Wed, 22 May 2019 17:13:48 -0500, Matt Riedemann <mriedemos@gmail.com> wrote:
It seems we've come to an impasse on this change [1] because of a concern about where to validate the requested host and/or hypervisor_hostname.
The change is currently validating in the API to provide a fast fail 400 response to the user if the host and/or node don't exist. The concern is that the lookup for the compute node will be done again in the scheduler [2]. Also, if the host is not provided, then to validate the node we have to iterate the cells looking for the given compute node (we could use placement though, more on that below).
I've added this to the nova meeting agenda for tomorrow but wanted to try and enumerate what I see are the options before the meeting so we don't have to re-cap all of this during the meeting.
The options as I see them are:
1. Omit the validation in the API and let the scheduler do the validation.
Pros: no performance impact in the API when creating server(s)
Cons: if the host/node does not exist, the user will get a 202 response and eventually a NoValidHost error which is not a great user experience, although it is what happens today with the availability_zone forced host/node behavior we already have [3] so maybe it's acceptable.
2. Only validate host in the API since we can look up the HostMapping in the API DB. If the user also provided a node then we'd just throw that on the RequestSpec and let the scheduler code validate it.
Pros: basic validation for the simple and probably most widely used case since for non-baremetal instances the host and node are going to be the same
Cons: still could have a late failure in the scheduler with NoValidHost error; does not cover the case that only node (no host) is specified
3. Validate both the host and node in the API. This can be broken down:
a) If only host is specified, do #2 above. b) If only node is specified, iterate the cells looking for the node (or query a resource provider with that name in placement which would avoid down cell issues) c) If both host and node is specified, get the HostMapping and from that lookup the ComputeNode in the given cell (per the HostMapping)
Pros: fail fast behavior in the API if either the host and/or node do not exist
Cons: performance hit in the API to validate the host/node and redundancy with the scheduler to find the ComputeNode to get its uuid for the in_tree filtering on GET /allocation_candidates.
Note that if we do find the ComputeNode in the API, we could also (later?) make a change to the Destination object to add a node_uuid field so we can pass that through on the RequestSpec from API->conductor->scheduler and that should remove the need for the duplicate query in the scheduler code for the in_tree logic.
I'm personally in favor of option 3 since we know that users hate NoValidHost errors and we have ways to mitigate the performance overhead of that validation.
Count me in the option 3 boat too, for the same reasons. Rather avoid NoValidHost and there's mitigation we can do for the perf issue. -melanie
Note that this isn't necessarily something that has to happen in the same change that introduces the host/hypervisor_hostname parameters to the API. If we do the validation in the API I'd probably split the validation logic into it's own patch to make it easier to test and review on its own.
[1] https://review.opendev.org/#/c/645520/ [2] https://github.com/openstack/nova/blob/2e85453879533af0b4d0e1178797d26f026a9... [3] https://docs.openstack.org/nova/latest/admin/availability-zones.html