[nova][ptg] Summary: Implicit trait-based filters
Summary: In keeping with the first proposed cycle theme [1] (though we didn't land on that until later in the PTG), we would like to be able to add required traits to the GET /allocation_candidates query to reduce the number of results returned - i.e. do more filtering in placement rather than in the scheduler (or worse, the compute). You can already do this by explicitly adding required traits to flavor/image; we want to be able to do it implicitly based on things like: - If the instance requires multiattach, make sure it lands on a compute that supports multiattach [2]. - If the image is in X format, make sure it lands on a compute that can read X format [3]. Currently the proposals in [2],[3] work by modifying the RequestSpec.flavor right before select_destinations calls GET /allocation_candidates. This just happens to be okay because we don't persist that copy of the flavor back to the instance (which we wouldn't want to do, since we don't want these implicit additions to e.g. show up when we GET server details, or to affect other lifecycle operations). But this isn't a robust design. What we would like to do instead is exploit the RequestSpec.requested_resources field [4] as it was originally intended, accumulating all the resource/trait/aggregate/etc. criteria from the flavor, image, *and* request_filter-y things like the above. However, gibi started on this [5] and it turns out to be difficult to express the unnumbered request group in that field for... reasons. Action: Since gibi is going to be pretty occupied and unlikely to have time to resolve [5], aspiers has graciously (been) volunteered to take it over; and then follow [2] and [3] to use that mechanism once it's available. efried [1] https://review.opendev.org/#/c/657171/1/priorities/train-priorities.rst@13 [2] https://review.opendev.org/#/c/645316/ [3] https://review.opendev.org/#/q/topic:bp/request-filter-image-types+(status:o...) [4] https://opendev.org/openstack/nova/src/commit/5934c5dc6932fbf19ca7f3011c4ccc... [5] https://review.opendev.org/#/c/647396/
Addendum: There's another implicit trait-based filter that bears mentioning: Excluding disabled compute hosts. We have code that disables a compute service when "something goes wrong" in various ways. This code should decorate the compute node's resource provider with a COMPUTE_SERVICE_DISABLED trait, and every GET /allocation_candidates request should include ?required=!COMPUTE_SERVICE_DISABLED, so that we don't retrieve allocation candidates for disabled hosts. mriedem has started to prototype the code for this [1]. Action: Spec to be written. Code to be polished up. Possibly aspiers to be involved in this bit as well. efried [1] https://review.opendev.org/#/c/654596/
On 5/6/2019 1:44 PM, Eric Fried wrote:
Addendum: There's another implicit trait-based filter that bears mentioning: Excluding disabled compute hosts.
We have code that disables a compute service when "something goes wrong" in various ways. This code should decorate the compute node's resource provider with a COMPUTE_SERVICE_DISABLED trait, and every GET /allocation_candidates request should include ?required=!COMPUTE_SERVICE_DISABLED, so that we don't retrieve allocation candidates for disabled hosts.
mriedem has started to prototype the code for this [1].
Action: Spec to be written. Code to be polished up. Possibly aspiers to be involved in this bit as well.
efried
Here is the spec [1]. There are noted TODOs and quite a few alternatives listed, mostly alternatives to the proposed design and what's in my PoC. One thing my PoC didn't cover was the service group API and it automatically reporting a service as up or down, I think that will have to be incorp0rated into this, but how best to do that without having this 'disabled' trait management everywhere might be tricky. My PoC tries to make the compute the single place we manage the trait, but that's also problematic if we lose a race with the API to disable a compute before the compute dies, or if MQ drops the call, etc. We might need/want to hook into the update_available_resource periodic to heal / sync the trait if we have an issue like that, or on startup during upgrade, and we likely also need a CLI to sync the trait status manually - at least to aid with the upgrade. Who knew that managing a status reporting daemon could be complicated (oh right everyone). [1] https://review.opendev.org/#/c/657884/ -- Thanks, Matt
On Mon, May 6, 2019 at 8:03 PM, Eric Fried <openstack@fried.cc> wrote:
Summary: In keeping with the first proposed cycle theme [1] (though we didn't land on that until later in the PTG), we would like to be able to add required traits to the GET /allocation_candidates query to reduce the number of results returned - i.e. do more filtering in placement rather than in the scheduler (or worse, the compute). You can already do this by explicitly adding required traits to flavor/image; we want to be able to do it implicitly based on things like: - If the instance requires multiattach, make sure it lands on a compute that supports multiattach [2]. - If the image is in X format, make sure it lands on a compute that can read X format [3].
Currently the proposals in [2],[3] work by modifying the RequestSpec.flavor right before select_destinations calls GET /allocation_candidates. This just happens to be okay because we don't persist that copy of the flavor back to the instance (which we wouldn't want to do, since we don't want these implicit additions to e.g. show up when we GET server details, or to affect other lifecycle operations).
But this isn't a robust design.
What we would like to do instead is exploit the RequestSpec.requested_resources field [4] as it was originally intended, accumulating all the resource/trait/aggregate/etc. criteria from the flavor, image, *and* request_filter-y things like the above. However, gibi started on this [5] and it turns out to be difficult to express the unnumbered request group in that field for... reasons.
Sorry that I was not able to describe the problems with the approach on the PTG. I will try now in a mail. So this patch [5] tries to create the unnumbered group in RequestSpec.requested_resources based on the other fields (flavor, image ..) in the RequestSpec early enough that the above mentioned pre-filters can add traits to this group instead of adding it the the flavor extra_spec. The current sequence is the following: * RequestSpec is created in three diffefent ways 1) RequestSpec.from_components(): used during server create. (and cold migrate if legacy compute is present) 2) RequestSpec.from_primitives(): deprecated but still used during re-schedule 3) RequestSpec.__init__(): oslo OVO deepcopy calls __init__ then copies over every field one by one. * Before nova scheduler sends the Placement a_c query it calls nova.scheduler.utils.resources_from_request_spec(RequestSpec) that code use the RequesSpec fields and collect all the request groups and all the other parameters (e.g. limit, group_policy) What we would need at the end: * When the RequetSpec is created in any way we need to populate the RequestSpec.requested_resources field based on the other RequestSpec fields. Note that __init__ cannot be used for this as all three instantiation of the object creates an empty object first with __init__ then pupulates the fields later one by one. * When any of the interesting fields (flavor, image, is_bvf, force_*, ...) is updated on the RequestSpec the request groups in RequestSpec.requested_resources needs to be updated to reflect the change. However we have to be careful not to blindly re-generate such data as the unnumbered group migh already contain traits that are not coming form any of these direct sources but coming from the above mentioned implicit required traits code paths. * When the Placement a_c query is generated it needs to be generated from RequestSpec.requested_resources There are couple of problems: 1) Detecting a change of a RequestSpec field cannot be done via wrapping the field in a propery due to OVO limitations [6]. Even if it would be possible the way we create the RequestSpec object (init an empty object then set fields one by one) the field setters might be called on an incomplete object. 2) Regeneration of RequestSpec.requested_resources would need to distinguish between data that can be regenerated from the other fields of the RequestSpec and the traits added from outside (implicit required traits). 3) The request pre-filters [7] run before the placement a_c query is generated. But these today changes the fields of the RequestSpec (e.g. requested_destination) that would mean the regeneration of RequestSpec.requested_resources would be needed. This probably solvable by changing the pre-filters to work directly on RequestSpec.requested_resources after we solved all the other issues. 4) The numbered request groups can come from multiple places. When it comes from the Flavor the number is stable as provided by the person created the Flavor. But when it comes from a Neutron port the number is generated (the next unoccupied int). So a re-generation of such groups would potentially re-numbed the groups. This makes the debuging hard as well as mapping numbered group back to the entity it requested the resource (port) after allocation. This probably solvable by using the proposed placement extension that allows a string in the numbered group name instead of just a single int. [8] This way the port uuid can be used as the identity for the numbered group to make the indenity stable. Cheers, gibi [6] https://bugs.launchpad.net/oslo.versionedobjects/+bug/1821619 [7] https://github.com/openstack/nova/blob/master/nova/scheduler/request_filter.... [8] https://storyboard.openstack.org/#!/story/2005575
Action: Since gibi is going to be pretty occupied and unlikely to have time to resolve [5], aspiers has graciously (been) volunteered to take it over; and then follow [2] and [3] to use that mechanism once it's available.
Aspier, ping me if you want to talk about these in IRC. Cheers, gibi
efried
[1] https://protect2.fireeye.com/url?k=07226944-5ba84bad-072229df-0cc47ad93e2e-db879b26751dd159&u=https://review.opendev.org/#/c/657171/1/priorities/train-priorities.rst@13 [2] https://protect2.fireeye.com/url?k=6793d282-3b19f06b-67939219-0cc47ad93e2e-b61d4c15f019d018&u=https://review.opendev.org/#/c/645316/ [3] https://protect2.fireeye.com/url?k=975e0f6d-cbd42d84-975e4ff6-0cc47ad93e2e-9cf6144999db0dfb&u=https://review.opendev.org/#/q/topic:bp/request-filter-image-types+(status:open+OR+status:merged) [4] https://protect2.fireeye.com/url?k=495a140e-15d036e7-495a5495-0cc47ad93e2e-745cad547e47b7cc&u=https://opendev.org/openstack/nova/src/commit/5934c5dc6932fbf19ca7f3011c4ccc07b0038ac4/nova/objects/request_spec.py#L93-L100 [5] https://protect2.fireeye.com/url?k=733c10d0-2fb63239-733c504b-0cc47ad93e2e-25f07d70c4385f31&u=https://review.opendev.org/#/c/647396/
On 5/7/2019 2:19 AM, Balázs Gibizer wrote:
3) The request pre-filters [7] run before the placement a_c query is generated. But these today changes the fields of the RequestSpec (e.g. requested_destination) that would mean the regeneration of RequestSpec.requested_resources would be needed. This probably solvable by changing the pre-filters to work directly on RequestSpec.requested_resources after we solved all the other issues.
Yeah this is something I ran into while hacking on the routed networks aggregate stuff [1]. I added information to the RequestSpec so I could use it in a pre-filter (required aggregates) but I can't add that to the requested_resources in the RequestSpec without resources (and in the non-bw port case there is no RequestSpec.requested_resources yet), so what I did was hack the unnumbered RequestGroup after the pre-filters and after the RequestSpec was processed by resources_from_request_spec, but before the code that makes the GET /a_c call. It's definitely ugly and I'm not even sure it works yet (would need functional testing). What I've wondered is if there is a way we could merge request groups in resources_from_request_spec so if a pre-filter added an unnumbered RequestGroup to the RequestSpec (via the requestd_resources attribute) that resources_from_request_spec would then merge in the flavor information. That's what I initially tried with the multiattach required traits patch [2] but the groups weren't merged for whatever reason and GET /a_c failed because I had a group with a required trait but no resources. [1] https://review.opendev.org/#/c/656885/3/nova/scheduler/manager.py [2] https://review.opendev.org/#/c/645316/ -- Thanks, Matt
On Wed, May 8, 2019 at 5:58 PM, Matt Riedemann <mriedemos@gmail.com> wrote:
On 5/7/2019 2:19 AM, Balázs Gibizer wrote:
3) The request pre-filters [7] run before the placement a_c query is generated. But these today changes the fields of the RequestSpec (e.g. requested_destination) that would mean the regeneration of RequestSpec.requested_resources would be needed. This probably solvable by changing the pre-filters to work directly on RequestSpec.requested_resources after we solved all the other issues.
Yeah this is something I ran into while hacking on the routed networks aggregate stuff [1]. I added information to the RequestSpec so I could use it in a pre-filter (required aggregates) but I can't add that to the requested_resources in the RequestSpec without resources (and in the non-bw port case there is no RequestSpec.requested_resources yet), so what I did was hack the unnumbered RequestGroup after the pre-filters and after the RequestSpec was processed by resources_from_request_spec, but before the code that makes the GET /a_c call. It's definitely ugly and I'm not even sure it works yet (would need functional testing).
What I've wondered is if there is a way we could merge request groups in resources_from_request_spec so if a pre-filter added an unnumbered RequestGroup to the RequestSpec (via the requestd_resources attribute) that resources_from_request_spec would then merge in the flavor information. That's what I initially tried with the multiattach required traits patch [2] but the groups weren't merged for whatever reason and GET /a_c failed because I had a group with a required trait but no resources.
If we only need to merge once then it feels doable. We just add new things to the pre-existing unnumbered group from the flavor and image. But if we ever need to update what we already merged into the unnumbered group then we would need access to the old flavor / image to first subtract them from the unnumbered group and then add the requests from the new flavor / image to the unnumbered group. The other way would be to store the extra traits separately as well in the RequestSpec and only generate the unnumbered group from all the input when needed. Cheers, gibi
[1] https://review.opendev.org/#/c/656885/3/nova/scheduler/manager.py [2] https://review.opendev.org/#/c/645316/
--
Thanks,
Matt
participants (3)
-
Balázs Gibizer
-
Eric Fried
-
Matt Riedemann