[openstack-dev] [nova][placement] Trying to summarize bp/glance-image-traits scheduling alternatives for rebuild

Matt Riedemann mriedemos at gmail.com
Mon Apr 23 19:48:38 UTC 2018


We seem to be at a bit of an impasse in this spec amendment [1] so I 
want to try and summarize the alternative solutions as I see them.

The overall goal of the blueprint is to allow defining traits via image 
properties, like flavor extra specs. Those image-defined traits are used 
to filter hosts during scheduling of the instance. During server create, 
that filtering happens during the normal "GET /allocation_candidates" 
call to placement.

The problem is during rebuild with a new image that specifies new 
required traits. A rebuild is not a move operation, but we run through 
the scheduler filters to make sure the new image (if one is specified), 
is valid for the host on which the instance is currently running.

We don't currently call "GET /allocation_candidates" during rebuild 
because that could inadvertently filter out the host we know we need 
[2]. Also, since flavors don't change for rebuild, we haven't had a need 
for getting allocation candidates during rebuild since we're not 
allocating new resources (pretend bug 1763766 [3] does not exist for now).

Now that we know the problem, here are some of the solutions that have 
been discussed in the spec amendment, again, only for rebuild with a new 
image that has new traits:

1. Fail in the API saying you can't rebuild with a new image with new 
required traits.

Pros:

- Simple way to keep the new image off a host that doesn't support it.
- Similar solution to volume-backed rebuild with a new image.

Cons:

- Confusing user experience since they might be able to rebuild with 
some new images but not others with no clear explanation about the 
difference.

2. Have the ImagePropertiesFilter call "GET 
/resource_providers/{rp_uuid}/traits" and compare the compute node root 
provider traits against the new image's required traits.

Pros:

- Avoids having to call "GET /allocation_candidates" during rebuild.
- Simple way to compare the required image traits against the compute 
node provider traits.

Cons:

- Does not account for nested providers so the scheduler could reject 
the image due to its required traits which actually apply to a nested 
provider in the tree. This is somewhat related to bug 1763766.

3. Slight variation on #2 except build a set of all traits from all 
providers in the same tree.

Pros:

- Handles the nested provider traits issue from #2.

Cons:

- Duplicates filtering in ImagePropertiesFilter that could otherwise 
happen in "GET /allocation_candidates".

4. Add a microversion to change "GET /allocation_candidates" to make two 
changes:

a) Add an "in_tree" filter like in "GET /resource_providers". This would 
be needed to limit the scope of what gets returned since we know we only 
want to check against one specific host (the current host for the instance).

b) Make "resources" optional since on a rebuild we don't want to 
allocate new resources (again, notwithstanding bug 1763766).

Pros:

- We can call "GET /allocation_candidates?in_tree=<current node rp 
UUID>&required=<new image required traits>" and if nothing is returned, 
we know the new image's required traits don't work with the current node.
- The filtering is baked into "GET /allocation_candidates" and not 
client-side in ImagePropertiesFilter.

Cons:

- Changes to the "GET /allocation_candidates" API which is going to be 
more complicated and more up-front work, but I don't have a good idea of 
how hard this would be to add since we already have the same "in_tree" 
logic in "GET /resource_providers".
- Potentially slows down the completion of the overall blueprint.

===========

My personal thoughts are, I don't like option 1 since it adds technical 
debt which we'll eventually just need to solve later (think about [4]). 
Similar feelings for #2. #3 might be a short-term solution until #4 is 
done, but I think the best long-term solution to this problem is #4.

[1] https://review.openstack.org/#/c/560718/
[2] https://review.openstack.org/#/c/546357/
[3] https://bugs.launchpad.net/nova/+bug/1763766
[4] https://review.openstack.org/#/c/532407/

-- 

Thanks,

Matt



More information about the OpenStack-dev mailing list