[nova][placement] Yoga PTG summary

Sylvain Bauza sbauza at redhat.com
Mon Oct 25 16:50:12 UTC 2021


Well, it's a try given we discussed for 4 days and it could be a large
summary ;)
You can see all the notes in a read-only etherpad here :
https://etherpad.opendev.org/p/r.e70aa851abf8644c29c8abe4bce32b81

### Cross-project discussions

# Cyborg cross-project discussion with Nova
We agreed on adding a OWNED_BY_NOVA trait for all Resource Providers
creating by Nova so Cyborg would provide their own OWNED_BY_CYBORG trait
for knowing which inventories are used by either Nova or Cyborg.
Cyborg contributors need to modify
https://review.opendev.org/c/openstack/nova-specs/+/780452
We also agreed on the fact that
https://blueprints.launchpad.net/nova/+spec/cyborg-suspend-and-resume is a
specless blueprint.

# Oslo cross-project discussion with Nova
gmann agreed on providing a new flag for oslopolicy-sample-generator for
adding deprecated rules in the generated policy file.

# RBAC popup team discussion with Nova
Eventually, we found no consensus for this topic as there were some left
open questions about system-scope. FWIW, the popup team then discussed on
Friday with the TC about this, so please look at the TC etherpad if you
want to know more.
A side impact is about
https://review.opendev.org/c/openstack/nova-specs/+/793011 which is now
punted until we figure out a good path.

# Neutron cross-project discussion with Nova
Again, about RBAC for external events api interaction, we discussed about
the scopes and eventually punted the discussion.
About specific events related to Neutron backends, ralonso accepted to
provide a documentation explaining what backends sends which events, and we
accepted to merge https://review.opendev.org/c/openstack/nova/+/813419 as a
short-term solution while we would like to get a long-term solution by
having Neutron providing the event information by the port binding
information.
About testing move operations, we agreed on continuing to have a ML2/OVS
multinode job.
During another Nova session, we also agreed on changing libvirt to directly
unplug (not unbind) ports during VM shutdown.

# Interop cross-project disussion with Nova
We agreed on reviewing
https://review.opendev.org/c/openinfra/interop/+/811049/3/guidelines/2021.11.json

# Cinder cross-project discussion with Nova
We discussed about
https://blueprints.launchpad.net/nova/+spec/volume-backed-server-rebuild
and we said the workflow should be something like user > nova > cinder >
nova. We also  discussed about some upgrade questions for this one, but we
eventually agreed on it.
We also discussed about devstack-plugin-nfs gate issues and how to contact
the nova events API for resize.

# Manila integration with libvirt driver
The first spec looks promising
https://review.opendev.org/c/openstack/nova-specs/+/813180
There were discussions about cloud-init and the Ironic case, but we agree
on the smaller scope for the Yoga release that's proposed in the spec.


### Nova specific topics ###

# Xena retrospective
We agreed on stopping to have an Asian-friendly meeting timeslot once per
month as unfortunately, no contributors went to meetings when we had them.
We also agreed on modifying the review-priority label for abling
contributors to use it too, but we first need to provide a documentation
explaining it before we change Gerrit.

# Continuing to use Storyboard for Placement or not ?
We eventually said we would look at how create a script for moving
Storyboard (SB) stories to Launchpad (as either features or bugs) but we
still want to continue verifying whether contributors use Storyboard for
Placement bugs or feature requests and bauzas accepted to provide
visibility on Placement SB stories during the weekly meeting.
gmann also agreed on asking contributors to move off the
#openstack-placement IRC channel to #openstack-nova so we would delete this
IRC channel eventually during this cycle.

# Yoga release deadlines for Nova
No real change in deadlines compared to the Xena timeframe. We agreed on
having two spec review days, one around mid-Nov (before yoga-1) and one
around mid-Dec (before Christmas period) in order to prioritize
implementation reviews after Jan 1st even if we can continue to review
specs until yoga-2.

# python version pinning with tox
We agreed on the fact this is a pain for developers. We then had a
consensus on modifying tox to accept multiple python versions automatically
with https://review.opendev.org/c/openstack/nova/+/804292 so it would fix
the issue.

# Nova DB issues with DB coalation
We agreed on the fact it's a problem, so we'll document the fact that Nova
APIs are case-insensitive at the moment even if Python is case-sensitive,
which creates problems. What we propose in order to fix this is to provide
a nova-manage db upgrade command that would modify the DB tables to use
COLLATE utf8_bin but we also agree we can't ask operators to migrate by one
cycle and we accept the fact this command could be there for a while.

# SQLAlchemy 2.0
The wave is coming and we need contributors to help us change what  we have
in our nova DB modules to no longer use the deprecated calls. This looks a
low-hanging-fruit and I'll try to find some contributor for this.

# Bumping minimum microversion from v2.1
No, we said no because $users use it.

# Unified limits
We agreed on providing a read-only API for knowing the limits but *not*
providing proxy APIs for setting limits *as a first attempt*. We prefer
operators to come back with usecases and feedback from their use of Unified
Limits 1.0 before we start drafting some proxy API to Keystone.
Also, we agreed on *not* having config-driven quotas.

# Central vncproxy in a multiple-cells environment
We understand the usecase which is very specific and we accept to create a
central vncproxy service that would proxy calls to the cell-related
vncproxy service but this is not a pattern we want to follow for every
cell-specific nova service.

# Move instances between projects.
Well, we totally get the usecase but we absolutely lack of resources in
order to work on this very large effort that would span multiple services.

# Nova service healthchecks
We agreed on providing a way for monitoring tools to ping Nova services for
healthness thru http or unix socket, from every service that would return a
status based on cached data. A spec has to be written.

# Zombie Resource Providers no longer corresponding to Nova resources
Thanks to the OWNED_BY_NOVA trait we agreed when discussing with the Cyborg
team, we could find a way to know the ResourceProviders owned by Nova that
are no longer in use and we could consequently delete them, or warn the
operator if existing allocations are present.

# NUMA balancing
This is definitely a bug we will fix by providing a workaround config
option that will let operators define the packing or spreading strategy
they want for NUMA cell stacking (or not).

# Deprecation of the novaclient shell command
Yes, now that the SDK is able to negociate, we can deprecate the novaclient
CLI. More investigation work has to be done in order to know whether we can
also deprecate the novaclient library itself.

# Integration with Off-Path Network Backends
Lots of technicalities with this very large spec
https://review.opendev.org/c/openstack/nova-specs/+/787458. Long story
short, we asked the proposer to add a few more details in the spec about
upgrade scenarios, move operations and testing, but we also told that we
can't accept this spec until the Neutron one lands as there are some usage
from the extended Neutron APIs that are proposed in the Nova spec.

# 'pc' and 'q35' machine types
We agreed on changing the default machine type to 'q35' but we also agreed
on *not* deprecating the 'pc' type. Some documentation has to be written in
order to explain the intent of the migration.

# Nova use of privsep
sean-k-mooney agreed on working on a patch to remove usage of
CAP_DAC_OVERRIDE and on another patch to provide the new capabilities for
the monolith privsep context

(a few other topics were discussed but I skipped them from my summary as
they're not significant for either operators or other contributors but a
very few people - don't get me wrong, I like them but I don't wanna add
more verbosity to an already large summary)


### Nova painpoints
(taken from https://etherpad.opendev.org/p/pain-point-elimination)

# Ironic hashring failure recovery
That's a latent bug but we don't want the Ironic virt driver to modify the
host value of every instance. We'd rather prefer some operator action in
order to communicate Nova it has to shuffle a few things. More brainstorm
has honestly to be done with this as we haven't clearly drafted a designed
solution yet.

# Problems with shelve, unshelve and then shelve back
Well, this is a nasty bug and we need to fix it, agreed. We also have a
testing gap we need to close.

# Naming cinder volumes after nova instance name
We said yes, why not. We also considered that 'delete on terminate' has to
change so it does delete the volume when you delete the instance (for a
boot-for-volume case)

# Orphaned instances due to underlying network issues
We agreed on the fact it would be nice to provide a tool for knowing such
orphans and we also think it's import for instance force-delete API call to
complete successfully in such case.

# reminiscent guests on a recovered compute node while instance records
were purged
Well, we should avoid to purge instance records from the Nova DB if we are
still unable to correctly delete the compute bits unless the operator
explicitly wants to (in the case of a non-recoverable compute for example).
A potential solution can be to add a config flag to *not* archive deleted
rows on instances that are running on down computes.

# Nova APIs leaking hypervisor hardware details
No, we don't want this but we can try to add more traits for giving more
visibility on a case-by-case basis

# RabbitMQ replacement
Well, we're unfortunately lacking resources on this. We recommend the
community to investigate on use of a NATS backend for oslo.messaging

# Placement strictness with move operations
We agree this is a bug that needs to be fixed.



Okay, if you reach this point, you're very brave. Kudos to you.
I don't really want to reproduce this exercice often, but I just hope it
helps you summarizing a thousand-line large etherpad.


-Sylvain
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20211025/3bf2bcd8/attachment.htm>


More information about the openstack-discuss mailing list