[placement] update 18-49
HTML: https://anticdent.org/placement-update-18-49.html This will be the last placement update of the year. I'll be travelling next Friday and after that we'll be deep in the December lull. I'll catch us up next on January 4th. # Most Important As last week, progress continues on the work in ansible, puppet/tripleo, kolla, loci to package placement up and establish upgrade processes. All of these things need review (see below). Work on GPU reshaping in virt drivers is getting close. # What's Changed * The perfload jobs which used to run in the nova-next job now has its [own job](https://review.openstack.org/#/c/619248/), running on each change. This may be of general interest because it runs placement "live" but without devstack. Results in job runs that are less than 4 minutes. * We've decided to go ahead with the simple os-resource-classes idea, so a repo is [being created](https://review.openstack.org/#/c/621666/). # Slow Reviews (Reviews which need additional attention because they have unresolved questions.) * <https://review.openstack.org/#/c/619126/> Set root_provider_id in the database This has some indecision because it does a data migration within schema migrations. For this particular case this is safe and quick, but there's concern that it softens a potentially useful boundary between schema and data migrations. # Bugs * Placement related [bugs not yet in progress](https://goo.gl/TgiPXb): 17. -2. * [In progress placement bugs](https://goo.gl/vzGGDQ) 13. -1 ## Interesting Bugs (Bugs that are sneaky and interesting and need someone to pick them up.) * <https://bugs.launchpad.net/nova/+bug/1805858> placement/objects/resource_provider.py missing test coverage for several methods This is likely the result of the extraction. Tests in nova's test_servers and friends probably covered some of this stuff, but now we need placement-specific tests. * <https://bugs.launchpad.net/nova/+bug/1804453> maximum recursion possible while setting aggregates in placement This can only happen under very heavy load with a very low number of placement processes, but the code that fails should probably change anyway: it's a potentially infinite loop with no safety breakout. # Specs Spec freeze is milestone 2, the week of January 7th. There was going to be a spec review sprint next week but it was agreed that people are already sufficiently busy. This will certainly mean that some of these specs do not get accepted for this cycle. None of the specs listed last week have merged. * <https://review.openstack.org/#/c/544683/> Account for host agg allocation ratio in placement (Still in rocky/) * <https://review.openstack.org/#/c/595236/> Add subtree filter for GET /resource_providers * <https://review.openstack.org/#/c/597601/> Resource provider - request group mapping in allocation candidate * <https://review.openstack.org/#/c/549067/> VMware: place instances on resource pool (still in rocky/) * <https://review.openstack.org/#/c/555081/> Standardize CPU resource tracking * <https://review.openstack.org/#/c/599957/> Allow overcommit of dedicated CPU (Has an alternative which changes allocations to a float) * <https://review.openstack.org/#/c/591037/> Modelling passthrough devices for report to placement * <https://review.openstack.org/#/c/603955/> Nova Cyborg interaction specification. * <https://review.openstack.org/#/c/601596/> supporting virtual NVDIMM devices * <https://review.openstack.org/#/c/603352/> Spec: Support filtering by forbidden aggregate * <https://review.openstack.org/#/c/552924/> Proposes NUMA topology with RPs * <https://review.openstack.org/#/c/569011/> Count quota based on resource class * <https://review.openstack.org/#/c/141219/> Adds spec for instance live resize * <https://review.openstack.org/#/c/612497/> Provider config YAML file * <https://review.openstack.org/#/c/509042/> Propose counting quota usage from placement and API database * <https://review.openstack.org/603545> Resource modeling in cyborg. * <https://review.openstack.org/#/c/609960/> Support filtering of allocation_candidates by forbidden aggregates # Main Themes ## Making Nested Useful Progress continues on gpu-reshaping for libvirt and xen: * <https://review.openstack.org/#/q/topic:bp/reshape-provider-tree+status:open> Also making use of nested is bandwidth-resource-provider: * <https://review.openstack.org/#/q/topic:bp/bandwidth-resource-provider> Eric's in the process of doing lots of cleanups to how often the ProviderTree in the resource tracker is checked against placement, and a variety of other "let's make this more right" changes in the same neighborhood: * Stack at: <https://review.openstack.org/#/c/615677> ## Extraction The [extraction etherpad](https://etherpad.openstack.org/p/placement-extract-stein-4) is starting to contain more strikethrough text than not. Progress is being made. The main tasks are the reshaper work mentioned above and the work to get deployment tools operating with an extracted placement: * [TripleO](https://review.openstack.org/#/q/topic:tripleo-placement-extraction) * [OpenStack Ansible](https://review.openstack.org/#/q/project:openstack/openstack-ansible-os_plac...) * [Kolla](https://review.openstack.org/#/c/613589/) * [Kolla Ansible](https://review.openstack.org/#/c/613629/) * [Loci](https://review.openstack.org/#/c/617273/) Documentation tuneups: * Release-notes: <https://review.openstack.org/#/c/618708/> This is blocked until we refactor the release notes to reflect _now_ better. * The main remaining task here is participating in [openstack-manuals](https://docs.openstack.org/doc-contrib-guide/doc-index.html). The functional tests in nova that use [extracted placement](https://review.openstack.org/#/c/617941/) are working but not yet merged. A child of that patch [removes the placement code](https://review.openstack.org/#/c/618215/). Further work will be required to tune up the various pieces of documentation in nova that reference placement. # Other There are currently only 8 [open changes](https://review.openstack.org/#/q/project:openstack/placement+status:open) in placement itself. Most of the time critical work is happening elsewhere (notably the deployment tool changes listed above). Of those placement changes the [database-related](https://review.openstack.org/#/q/owner:nakamura.tetsuro%2540lab.ntt.co.jp+st...) ones from Tetsuro are the most important. Outside of placement: * <https://review.openstack.org/#/q/topic:bp/initial-allocation-ratios> Improve handling of default allocation ratios * <https://review.openstack.org/#/q/topic:minimum-bandwidth-allocation-placement-api> Neutron minimum bandwidth implementation * <https://review.openstack.org/#/c/602160/> Add OWNERSHIP $SERVICE traits * <https://review.openstack.org/#/c/586960/> zun: Use placement for unified resource management * <https://review.openstack.org/#/q/project:openstack/blazar+topic:bp/placement-api> Blazar using the placement-api * <https://review.openstack.org/619626> Tenks doing some node management, with a bit of optional placement. * <https://review.openstack.org/#/c/620485/> Sync placement database to the current version (in grenade) * <https://review.openstack.org/#/c/621645/> WIP: add Placement aggregates tests (in tempest) # End In case it hasn't been clear: things being listed here is an explicit invitation (even plea) for _you_ to help out by reviewing or fixing. Thank you. -- Chris Dent ٩◔̯◔۶ https://anticdent.org/ freenode: cdent tw: @anticdent
On 12/7/2018 6:58 AM, Chris Dent wrote:
* <https://review.openstack.org/#/c/544683/> Account for host agg allocation ratio in placement (Still in rocky/)
Given https://bugs.launchpad.net/nova/+bug/1804125 and the discussion with the operator in there, I've been thinking about this again lately. We will soon [1] at least have the restriction documented but the comment from the operator in that bug is painful: """ What's more distressing is that this appears to have produced a schism between the intended, documented functions of Nova scheduler and the actual operation of those functions on several consecutive releases of OpenStack. If the Aggregate* filters are no longer functional, and are no longer intended to be so, then I would think they should reasonably have been removed from the documentation and from the project so that deployers wouldn't expect to rely on them. """ With https://review.openstack.org/#/q/topic:bp/initial-allocation-ratios we at least have some sanity in nova-compute and you can control the allocation ratios per-compute (resource provider) either via nova config (the CERN use case) or the placement API using RBAC (the mgagne scenario, with placement RBAC added in Rocky). What is missing is something user-friendly for those that want to control allocation ratios in aggregate from the API. In Dublin we said we'd write an osc-placement CLI to help with this: https://etherpad.openstack.org/p/nova-ptg-rocky-placement ~L37 But that didn't happen unfortunately. It doesn't mean we couldn't still easily add that. That solution does require tooling changes from deployers though. The other alternative is Jay's spec which is to have nova-api mirror/proxy allocation ratio information from the compute host aggregates API to the placement API. Since Rocky the compute API already mirrors aggregate information to placement, so this would be building on that to also set allocation ratio information on each resource provider within said aggregate in placement. Part of me doesn't like that proxy work given our stance on no more proxies [2] but on the other hand we definitely regressed our own compute API (and scheduler) in Ocata, so it seems on us to provide the most user-friendly (no upgrade impact) way to solve that. Either way we go, at this point, doesn't it mean we can deprecate the Aggregate* filters since they are essentially useless when using the FilterScheduler and placement (remember the CachingScheduler is gone now)? [1] https://review.openstack.org/#/q/Ifaf596a8572637f843f47daf5adce394b0365676 [2] https://docs.openstack.org/nova/latest/contributor/project-scope.html#api-sc... -- Thanks, Matt
participants (2)
-
Chris Dent
-
Matt Riedemann