+1 for option 3, too.

Check host/hypervisor_hostname fisrt in API layer so that we will not create

a "ERROR" vm with "NoValidHost" exception.

>> It seems we've come to an impasse on this change [1] because of a
>> concern about where to validate the requested host and/or
>> hypervisor_hostname.
>> 
>> The change is currently validating in the API to provide a fast fail 400
>> response to the user if the host and/or node don't exist. The concern is
>> that the lookup for the compute node will be done again in the scheduler
>> [2]. Also, if the host is not provided, then to validate the node we
>> have to iterate the cells looking for the given compute node (we could
>> use placement though, more on that below).
>> 
>> I've added this to the nova meeting agenda for tomorrow but wanted to
>> try and enumerate what I see are the options before the meeting so we
>> don't have to re-cap all of this during the meeting.
>> 
>> The options as I see them are:
>> 
>> 1. Omit the validation in the API and let the scheduler do the validation.
>> 
>> Pros: no performance impact in the API when creating server(s)
>> 
>> Cons: if the host/node does not exist, the user will get a 202 response
>> and eventually a NoValidHost error which is not a great user experience,
>> although it is what happens today with the availability_zone forced
>> host/node behavior we already have [3] so maybe it's acceptable.
>> 
>> 2. Only validate host in the API since we can look up the HostMapping in
>> the API DB. If the user also provided a node then we'd just throw that
>> on the RequestSpec and let the scheduler code validate it.
>> 
>> Pros: basic validation for the simple and probably most widely used case
>> since for non-baremetal instances the host and node are going to be the same
>> 
>> Cons: still could have a late failure in the scheduler with NoValidHost
>> error; does not cover the case that only node (no host) is specified
>> 
>> 3. Validate both the host and node in the API. This can be broken down:
>> 
>> a) If only host is specified, do #2 above.
>> b) If only node is specified, iterate the cells looking for the node (or
>> query a resource provider with that name in placement which would avoid
>> down cell issues)
>> c) If both host and node is specified, get the HostMapping and from that
>> lookup the ComputeNode in the given cell (per the HostMapping)
>> 
>> Pros: fail fast behavior in the API if either the host and/or node do
>> not exist
>> 
>> Cons: performance hit in the API to validate the host/node and
>> redundancy with the scheduler to find the ComputeNode to get its uuid
>> for the in_tree filtering on GET /allocation_candidates.
>> 
>> Note that if we do find the ComputeNode in the API, we could also
>> (later?) make a change to the Destination object to add a node_uuid

IMHO, is it better to call it compute_node_uuid?  : )

>> field so we can pass that through on the RequestSpec from
>> API->conductor->scheduler and that should remove the need for the
>> duplicate query in the scheduler code for the in_tree logic.
>> 
>> I'm personally in favor of option 3 since we know that users hate
>> NoValidHost errors and we have ways to mitigate the performance overhead
>> of that validation.
>
>Count me in the option 3 boat too, for the same reasons. Rather avoid 
>NoValidHost and there's mitigation we can do for the perf issue.
>
>-melanie
>
>> Note that this isn't necessarily something that has to happen in the
>> same change that introduces the host/hypervisor_hostname parameters to
>> the API. If we do the validation in the API I'd probably split the
>> validation logic into it's own patch to make it easier to test and
>> review on its own.
>> 
>> [1] https://review.opendev.org/#/c/645520/
>> [2]
>> https://github.com/openstack/nova/blob/2e85453879533af0b4d0e1178797d26f026a9423/nova/scheduler/utils.py#L528
>> [3] https://docs.openstack.org/nova/latest/admin/availability-zones.html
>> 

--
Boxiang