Re: [nova][ptg] Summary: Implicit trait-based filters

8 May 2019

      On 5/6/2019 1:44 PM, Eric Fried wrote:
...
Addendum:
There's another implicit trait-based filter that bears mentioning:
Excluding disabled compute hosts.
We have code that disables a compute service when "something goes wrong"
in various ways. This code should decorate the compute node's resource
provider with a COMPUTE_SERVICE_DISABLED trait, and every GET
/allocation_candidates request should include
?required=!COMPUTE_SERVICE_DISABLED, so that we don't retrieve
allocation candidates for disabled hosts.
mriedem has started to prototype the code for this [1].
Action: Spec to be written. Code to be polished up. Possibly aspiers to
be involved in this bit as well.
efried
[1]https://review.opendev.org/#/c/654596/
Here is the spec [1]. There are noted TODOs and quite a few alternatives 
listed, mostly alternatives to the proposed design and what's in my PoC.

One thing my PoC didn't cover was the service group API and it 
automatically reporting a service as up or down, I think that will have 
to be incorp0rated into this, but how best to do that without having 
this 'disabled' trait management everywhere might be tricky. My PoC 
tries to make the compute the single place we manage the trait, but 
that's also problematic if we lose a race with the API to disable a 
compute before the compute dies, or if MQ drops the call, etc.

We might need/want to hook into the update_available_resource periodic 
to heal / sync the trait if we have an issue like that, or on startup 
during upgrade, and we likely also need a CLI to sync the trait status 
manually - at least to aid with the upgrade.

Who knew that managing a status reporting daemon could be complicated 
(oh right everyone).

[1] https://review.opendev.org/#/c/657884/

-- 

Thanks,

Matt

Re: [nova][ptg] Summary: Implicit trait-based filters

Matt Riedemann