HTML: https://anticdent.org/placement-update-19-08.html Welcome back to the placement update. If I've read the signs correctly, I should now be back to this as a regular thing. Apologies for the gap, I had to attend to some other responsibilities. # Most Important A lot has changed in the past few months, so it's hard to extract out a most important. It will depend on who is reading. Review what's changed for a summary of important stuff. # What's Changed * Placement is now its own official project. Until elections are held (it looks like nominations start this coming Tuesday), Mel is the PTL. * [Setting up storyboard](https://review.openstack.org/#/c/639445/) for placement-related projects is in progress. For the time being we are continuing to use launchpad for most tracking. See a [related email thread](http://lists.openstack.org/pipermail/openstack-discuss/2019-February/003102....). * Deleting placement code from nova has been put on hold until Train to make it easier for certain types of upgrades to happen. New installs should prefer the extracted code, as the nova-side is frozen, but the placement side is not. * A large stack of code to remove oslo.versionedobjects from placement has merged. This has resulted in a significant change in performance on the `perfload` test that runs in the gate. While not a complete representation of the entire system, it's enough to say "yeah, that was worth it": A request for allocation candidates that used to take around 2.5 seconds now takes 1.2. That refactoring continues (see below), seeking additional simplifications. * Microversion 1.31 adds `in_tree` and `in_treeN` query parameters to GET /allocation_candidates. This is useful in a variety of nested resource provider scenarios, including the big bandwidth QoS changes that are in progress in nova and neutron. * Placement is now publishing [install docs](https://docs.openstack.org/placement/latest/install/) but it is important to note that those docs have not been validated (as far as I'm aware) by the packagers. That's a thing that needs to happen, presumably by the packagers. * os-resource-classes 0.3.0 has been [released](https://pypi.org/p/os-resource-classes) with a `normalize_name` function. * There are some pending specs from nova which are primarily placement feature specs. We'll continue with those as is (see below), but come the next cycle the plan is to manage specs in the placement repo, not have a separate repo, and not have separate spec cores. # Specs/Blueprints/Features ## Near to Done * [Filter Allocation Candidates by Provider Tree](http://specs.openstack.org/openstack/nova-specs/specs/stein/approved/alloc-c...) has been mostly completed by Tetsuro, but there's a [pending update to the spec](https://review.openstack.org/639033). ## Not yet Done * [Support filtering by forbidden aggregate membership](http://specs.openstack.org/openstack/nova-specs/specs/stein/approved/negativ...) * [Support any traits in allocation_candidates query](http://specs.openstack.org/openstack/nova-specs/specs/stein/approved/placeme...) * [Support mixing required traits with any traits](http://specs.openstack.org/openstack/nova-specs/specs/stein/approved/placeme...) ## Not yet Approved * [Update alloc-candidates-in-tree](https://review.openstack.org/#/c/639033/) updates the in-tree spec above to reflect what was learned while doing the actual implementation. Notably how numbered `in_tree` parameters impact results. * [Resource provider - request group mapping in allocation candidate](https://review.openstack.org/#/c/597601/) has had a recent resurgence in attention. # Bugs * Placement related [bugs not yet in progress](https://goo.gl/TgiPXb): 15. * [In progress placement bugs](https://goo.gl/vzGGDQ) 17. # osc-placement osc-placement is currently behind by 14 microversions. Code for 1.18 is [under review](https://review.openstack.org/#/c/639738/). # Main Themes This section now overlaps a bit with the Specs/Features bit above. This will settle out with a bit more clarity as we move along. ## Nested * Reshaper handing in nova keeps exposing additional things that need to be remembered on the nova-side, so there are a few patches remaining related to [vgpu reshaping](https://review.openstack.org/#/q/topic:bp/reshape-provider-tree+status:open) but it is mostly ready. * The bandwidth-resource-provider topic has merged a vast amount of code but there is still [plenty left](https://review.openstack.org/#/q/topic:bp/bandwidth-resource-provider). Related to all this nested stuff: The complex hardware models that drove the development of the nested resource provider system are challenging to test. The cloud hardware provided to OpenStack infrastructure does not expose the hardware that would allow real integration tests. If anyone reading this is in a position to provide third party CI with fancy hardware for NUMA, NFV, FPGA, and GPU related integration testing with nova, there's a significant need for that. ## Refactoring (I think refactoring should be a constant theme. To reflect that, I'm going to have a section here. Editorial privilege or something.) There's a collection of patches in progress, currently under the topic [scrub-Lists](https://review.openstack.org/#/q/topic:scrub-Lists) that is a follow up to the patches that removed oslo versioned objects. That work pointed out some opportunities to DRY-up the List classes (e.g., UsageList) to remove some duplication and simplify. Then, after looking at that, it became clear that entirely removing the List classes, in favor of using python native lists, would further simplify the code. Apart from the previously mentioned performance and simplicity benefits of these changes, it's also managed to expose and fix a few bugs, simple because we were looking at things and moving them around. If you pick up rocks, you can see the bugs and squash them. If you don't, they breed. # Other Placement * <https://review.openstack.org/#/q/topic:improve-debug-log> A series of improvements leading to a better debug log when retrieving allocation candidates. * <https://review.openstack.org/#/c/639628/> Docs: extract testing info to own sub-page * <https://review.openstack.org/#/q/topic:cd/gabbi-tempest-job> Gabbi-based integration tests of placement. These recently found a bug that none of the functional, grenade, nor tempest tests did. * <https://review.openstack.org/#/c/619050/> Optionally migrate database at service startup (so you don't have to run `placement-manage db sync` if you don't want to). * <https://review.openstack.org/#/c/630216/> Add a vision-reflection (of the Technical Vision doc). # Other Service Users ## Nova See also the several links above for more nova changes. Also, I'm a bit behind on my tracking in this area, so there is likely plenty of other stuff too. This will improve over time. * <https://review.openstack.org/538498> Convert driver supported capabilities to compute node provider traits * <https://review.openstack.org/621494> Add descriptions of numbered resource classes and traits * <https://review.openstack.org/636412> Make move_allocations handle empty source allocations (Part of a series on cross-cell resize) * <https://review.openstack.org/#/q/topic:bp/count-quota-usage-from-placement> Using placement (from nova) for counting (some of) quota. ## Not Nova * <https://review.openstack.org/#/q/topic:tripleo-nova-placement-removal> * <https://review.openstack.org/#/q/topic:tripleo-placement-extraction> * <https://review.openstack.org/#/q/topic:minimum-bandwidth-allocation-placement-api> Neutron side of minimum bandwidth. * <https://review.openstack.org/#/q/topic:puppet-placement-extraction> * <https://review.openstack.org/#/q/bp/no-affinity-instance-reservation> Blazar reservation handling, including some manipulation of inventory in placement. * <https://review.openstack.org/633204> Blazar: Retry on inventory update conflict # End Though this is long, it doesn't really bring us fully up to date. If something is missing that you think is important please let me know. Once I'm back in the flow it should become increasingly complete. -- Chris Dent ٩◔̯◔۶ https://anticdent.org/ freenode: cdent tw: @anticdent
Thanks for restarting this, Chris. I've always found it informative and helpful.
* Microversion 1.31 adds `in_tree` and `in_treeN` query parameters to GET /allocation_candidates. This is useful in a variety of nested resource provider scenarios, including the big bandwidth QoS changes that are in progress in nova and neutron.
To expand on this: The bandwidth-related use case would be for e.g. adding new QoS'd vifs to an existing instance. We would want to GET /allocation_candidates from only those providers associated with the compute node on which the instance already resides. However, the original motivation for this effort was for the force_hosts bug [1]. TL;DR: if you force_hosts in a big cloud, GET /allocation_candidates either returns you an untenably long list; or you can `?limit` it, but then there's a possibility that your desired host won't be represented in the limited list. With ?in_tree, you can specify the UUID of the host RP you're forcing and get back a very small number of results (usually one, until you have bandwidth/VGPU/cyborg/etc. in your provider tree) that are already filtered to the host you want. in_tree may also help us (partially) resolve the famous "doubled allocations on resize-to-same-host" bug [2]. But I don't fully understand that one, so I'll leave it as a teaser :) [1] https://bugs.launchpad.net/nova/+bug/1777591 [2] https://bugs.launchpad.net/nova/+bug/1790204 -efried
participants (2)
-
Chris Dent
-
Eric Fried