[openstack-dev] [nova][placement] PTG Summary and Rocky Priorities
Jay Pipes
jaypipes at gmail.com
Thu Mar 8 12:51:27 UTC 2018
We had a productive PTG and were able to discuss a great many
scheduler-related topics. I've put together an etherpad [0] with a
summary, reproduced below.
Expect follow-up emails about each priority item in the scheduler track
from those contributors working on that area.
Best,
-jay
Placement/scheduler: Rocky PTG Summary
== Key topics ==
- Aggregates
- How we messed up operators using nova host aggregates for
allocation ratios
- Placement currently doesn't "auto-create" placement aggregates when
nova host aggregates change
- Standardizing trait handling for virt drivers
- Placement REST API
- Partial allocation patching
- Removing assumptions around generation 0
- Supporting policy/RBAC
-NUMA
- Supporting both shared and dedicated CPU on the same host as well
as the same instance
- vGPU handling
- Tracking ingress/egress bandwidth resources using placement
- Finally supporting live migration of CPU-pinned instances
== Agreements and decisions ==
- dansmith's "placement request filters" work is an important enabler of
a number of use cases, particularly around aggregate filtering. Spec is
already approved here: https://review.openstack.org/#/c/544585/
- We need a method of filtering providers that do NOT have a certain
trait. This is tentatively being called "forbidden traits". Spec review
here: https://review.openstack.org/548915
- For parity/consistency reasons, we should add the in_tree=<RP_UUID>
query parameter to GET /resource_providers
- To assist operators, add some new osc-placement CLI commands for
applying traits/allocation ratio to batches of resource providers in an
aggregate
- We should allow image metadata to specify required traits in the same
fashion as flavor extra specs. Spec review here:
https://review.openstack.org/#/c/541507/
- virt drivers should begin reporting their CPU features as traits. Spec
review here: https://review.openstack.org/#/c/497733/
- Furthermore, virt drivers should respect the cpu_model CONF option
for overriding CPU-related traits
- We will eventually want to provide the ability to patch an already
existing allocation
- Hot-attaching a network interface is the canonical use case here.
We want to add the new NIC resources to the existing allocation for the
instance consumer without needing to re-PUT the entire allocation
- In order to do this, we will need to add a generation field to the
consumers table, allowing multiple allocation writers to ensure their
view of the consumer is consistent (TODO: need a blueprint/spec for this)
- We should extricate the standard resource classes currently defined in
`nova.objects.fields.ResourceClass` into a small `os-resource-classes`
library (TODO: need a blueprint/spec for this)
- We should use oslo.policy in the placement API (TODO: specless
blueprint for this)
- Use case here is making the transition to placement easy for
operators that currently use the os-aggregates interface for managing
compute resources
- Calling code should not assume the initial generation for a resource
provider is zero. Spec review here: https://review.openstack.org/#/c/548903/
- Extracting placement into separate packages is not a priority, but we
think incrementatl progress to extraction can be made in Rocky
- Placement's microversion handling should be extracted into a
separate library
- Trimming nova imports
- We should add some support to nova-manage to assist operators using
the caching scheduler to migrate to placement (and get rid of the caching
- VGPU_DISPLAY_HEAD resource class should be removed and replaced with
a set of os-traits traits that indicate the maximum supported number of
display heads for the vGPU type
- A new PCPU resource class should be created to describe physical CPUs
(logical processors in the hardware). Virt drivers will be able to set
inventories of PCPU on resource providers representing NUMA nodes and
therefore use placement to track dedicated CPU resources (TODO: need a
blueprint/spec for this)
- artom is going to write a spec for supporting live migration of
CPU-pinned instances (and abandon the complicated old patches)
- Multiple agreements about strict minimum bandwidth support feature in
nova - Spec has already been updated accordingly:
https://review.openstack.org/#/c/502306/
- For now we keep the hostname as the information connecting the
nova-compute and the neutron-agent on the same host but we are aiming
for having the hostname as an FQDN to avoid possible ambiguity.
- We agreed not to make this feature dependent on moving the nova
port create to the conductor. The current scope is to support
pre-created neutron port only.
- Neutron will provide the resource request in the port API so this
feature does not depend on the neutron port binding API work
- Neutron will create resource providers in placement under the
compute RP. Also Neutron will report inventories on those RPs
- Nova will do the claim of the port related resources in placement
and the consumer_id will be the instance UUID
- We should mirror nova host aggregate information to placement using
an online data migration technique on the add/remove_host methods of
nova.objects.Aggregate and a `nova-manage db online_migration` command
== Priorities for Rocky release cycle ==
1. Merge the update_provider_tree patch series (efried)
2. Placement request filters (dansmith)
3. Mirror aggregate information from nova to placement (jaypipes)
4. Forbidden traits (cdent)
== Non-priority Items for Rocky ==
- Add consumers.generation field and related API plumbing (efried and cdent)
- Support requested traits in image metadata (arvind)
- Provide CLI functionality to set traits and things like allocation
ratios for a batch of resource providers via aggregate (ttsurya)
- Migrating off of the caching scheduler and on to placement (mriedem)
- Create `os-resource-classes` library and write migration code to
replace `nova.objects.fields.ResourceClass` usage with calls to
os_resource_classes (
- Policy/RBAC support in Placement REST API (mriedem)
- Extract placement's microversion handling into separate library (cdent)
- CPU-pinned instance live migration support (stephenfin and artom)
[0] https://etherpad.openstack.org/p/rocky-ptg-scheduler-placement-summary
More information about the OpenStack-dev
mailing list