Open Stack

Wed May 22 22:13:48 UTC 2019

It seems we've come to an impasse on this change [1] because of a 
concern about where to validate the requested host and/or 
hypervisor_hostname.

The change is currently validating in the API to provide a fast fail 400 
response to the user if the host and/or node don't exist. The concern is 
that the lookup for the compute node will be done again in the scheduler 
[2]. Also, if the host is not provided, then to validate the node we 
have to iterate the cells looking for the given compute node (we could 
use placement though, more on that below).

I've added this to the nova meeting agenda for tomorrow but wanted to 
try and enumerate what I see are the options before the meeting so we 
don't have to re-cap all of this during the meeting.

The options as I see them are:

1. Omit the validation in the API and let the scheduler do the validation.

Pros: no performance impact in the API when creating server(s)

Cons: if the host/node does not exist, the user will get a 202 response 
and eventually a NoValidHost error which is not a great user experience, 
although it is what happens today with the availability_zone forced 
host/node behavior we already have [3] so maybe it's acceptable.

2. Only validate host in the API since we can look up the HostMapping in 
the API DB. If the user also provided a node then we'd just throw that 
on the RequestSpec and let the scheduler code validate it.

Pros: basic validation for the simple and probably most widely used case 
since for non-baremetal instances the host and node are going to be the same

Cons: still could have a late failure in the scheduler with NoValidHost 
error; does not cover the case that only node (no host) is specified

3. Validate both the host and node in the API. This can be broken down:

a) If only host is specified, do #2 above.
b) If only node is specified, iterate the cells looking for the node (or 
query a resource provider with that name in placement which would avoid 
down cell issues)
c) If both host and node is specified, get the HostMapping and from that 
lookup the ComputeNode in the given cell (per the HostMapping)

Pros: fail fast behavior in the API if either the host and/or node do 
not exist

Cons: performance hit in the API to validate the host/node and 
redundancy with the scheduler to find the ComputeNode to get its uuid 
for the in_tree filtering on GET /allocation_candidates.

Note that if we do find the ComputeNode in the API, we could also 
(later?) make a change to the Destination object to add a node_uuid 
field so we can pass that through on the RequestSpec from 
API->conductor->scheduler and that should remove the need for the 
duplicate query in the scheduler code for the in_tree logic.

I'm personally in favor of option 3 since we know that users hate 
NoValidHost errors and we have ways to mitigate the performance overhead 
of that validation.

Note that this isn't necessarily something that has to happen in the 
same change that introduces the host/hypervisor_hostname parameters to 
the API. If we do the validation in the API I'd probably split the 
validation logic into it's own patch to make it easier to test and 
review on its own.

[1] https://review.opendev.org/#/c/645520/
[2] 
https://github.com/openstack/nova/blob/2e85453879533af0b4d0e1178797d26f026a9423/nova/scheduler/utils.py#L528
[3] https://docs.openstack.org/nova/latest/admin/availability-zones.html

-- 

Thanks,

Matt

Open Stack

[nova] Validation for requested host/node on server create

OpenStack

Community

Documentation

Branding & Legal