[nova] Validation for requested host/node on server create
Matt Riedemann
mriedemos at gmail.com
Wed May 22 22:13:48 UTC 2019
It seems we've come to an impasse on this change [1] because of a
concern about where to validate the requested host and/or
hypervisor_hostname.
The change is currently validating in the API to provide a fast fail 400
response to the user if the host and/or node don't exist. The concern is
that the lookup for the compute node will be done again in the scheduler
[2]. Also, if the host is not provided, then to validate the node we
have to iterate the cells looking for the given compute node (we could
use placement though, more on that below).
I've added this to the nova meeting agenda for tomorrow but wanted to
try and enumerate what I see are the options before the meeting so we
don't have to re-cap all of this during the meeting.
The options as I see them are:
1. Omit the validation in the API and let the scheduler do the validation.
Pros: no performance impact in the API when creating server(s)
Cons: if the host/node does not exist, the user will get a 202 response
and eventually a NoValidHost error which is not a great user experience,
although it is what happens today with the availability_zone forced
host/node behavior we already have [3] so maybe it's acceptable.
2. Only validate host in the API since we can look up the HostMapping in
the API DB. If the user also provided a node then we'd just throw that
on the RequestSpec and let the scheduler code validate it.
Pros: basic validation for the simple and probably most widely used case
since for non-baremetal instances the host and node are going to be the same
Cons: still could have a late failure in the scheduler with NoValidHost
error; does not cover the case that only node (no host) is specified
3. Validate both the host and node in the API. This can be broken down:
a) If only host is specified, do #2 above.
b) If only node is specified, iterate the cells looking for the node (or
query a resource provider with that name in placement which would avoid
down cell issues)
c) If both host and node is specified, get the HostMapping and from that
lookup the ComputeNode in the given cell (per the HostMapping)
Pros: fail fast behavior in the API if either the host and/or node do
not exist
Cons: performance hit in the API to validate the host/node and
redundancy with the scheduler to find the ComputeNode to get its uuid
for the in_tree filtering on GET /allocation_candidates.
Note that if we do find the ComputeNode in the API, we could also
(later?) make a change to the Destination object to add a node_uuid
field so we can pass that through on the RequestSpec from
API->conductor->scheduler and that should remove the need for the
duplicate query in the scheduler code for the in_tree logic.
I'm personally in favor of option 3 since we know that users hate
NoValidHost errors and we have ways to mitigate the performance overhead
of that validation.
Note that this isn't necessarily something that has to happen in the
same change that introduces the host/hypervisor_hostname parameters to
the API. If we do the validation in the API I'd probably split the
validation logic into it's own patch to make it easier to test and
review on its own.
[1] https://review.opendev.org/#/c/645520/
[2]
https://github.com/openstack/nova/blob/2e85453879533af0b4d0e1178797d26f026a9423/nova/scheduler/utils.py#L528
[3] https://docs.openstack.org/nova/latest/admin/availability-zones.html
--
Thanks,
Matt
More information about the openstack-discuss
mailing list