[nova][placement] Yoga PTG summary
Well, it's a try given we discussed for 4 days and it could be a large summary ;) You can see all the notes in a read-only etherpad here : https://etherpad.opendev.org/p/r.e70aa851abf8644c29c8abe4bce32b81 ### Cross-project discussions # Cyborg cross-project discussion with Nova We agreed on adding a OWNED_BY_NOVA trait for all Resource Providers creating by Nova so Cyborg would provide their own OWNED_BY_CYBORG trait for knowing which inventories are used by either Nova or Cyborg. Cyborg contributors need to modify https://review.opendev.org/c/openstack/nova-specs/+/780452 We also agreed on the fact that https://blueprints.launchpad.net/nova/+spec/cyborg-suspend-and-resume is a specless blueprint. # Oslo cross-project discussion with Nova gmann agreed on providing a new flag for oslopolicy-sample-generator for adding deprecated rules in the generated policy file. # RBAC popup team discussion with Nova Eventually, we found no consensus for this topic as there were some left open questions about system-scope. FWIW, the popup team then discussed on Friday with the TC about this, so please look at the TC etherpad if you want to know more. A side impact is about https://review.opendev.org/c/openstack/nova-specs/+/793011 which is now punted until we figure out a good path. # Neutron cross-project discussion with Nova Again, about RBAC for external events api interaction, we discussed about the scopes and eventually punted the discussion. About specific events related to Neutron backends, ralonso accepted to provide a documentation explaining what backends sends which events, and we accepted to merge https://review.opendev.org/c/openstack/nova/+/813419 as a short-term solution while we would like to get a long-term solution by having Neutron providing the event information by the port binding information. About testing move operations, we agreed on continuing to have a ML2/OVS multinode job. During another Nova session, we also agreed on changing libvirt to directly unplug (not unbind) ports during VM shutdown. # Interop cross-project disussion with Nova We agreed on reviewing https://review.opendev.org/c/openinfra/interop/+/811049/3/guidelines/2021.11... # Cinder cross-project discussion with Nova We discussed about https://blueprints.launchpad.net/nova/+spec/volume-backed-server-rebuild and we said the workflow should be something like user > nova > cinder > nova. We also discussed about some upgrade questions for this one, but we eventually agreed on it. We also discussed about devstack-plugin-nfs gate issues and how to contact the nova events API for resize. # Manila integration with libvirt driver The first spec looks promising https://review.opendev.org/c/openstack/nova-specs/+/813180 There were discussions about cloud-init and the Ironic case, but we agree on the smaller scope for the Yoga release that's proposed in the spec. ### Nova specific topics ### # Xena retrospective We agreed on stopping to have an Asian-friendly meeting timeslot once per month as unfortunately, no contributors went to meetings when we had them. We also agreed on modifying the review-priority label for abling contributors to use it too, but we first need to provide a documentation explaining it before we change Gerrit. # Continuing to use Storyboard for Placement or not ? We eventually said we would look at how create a script for moving Storyboard (SB) stories to Launchpad (as either features or bugs) but we still want to continue verifying whether contributors use Storyboard for Placement bugs or feature requests and bauzas accepted to provide visibility on Placement SB stories during the weekly meeting. gmann also agreed on asking contributors to move off the #openstack-placement IRC channel to #openstack-nova so we would delete this IRC channel eventually during this cycle. # Yoga release deadlines for Nova No real change in deadlines compared to the Xena timeframe. We agreed on having two spec review days, one around mid-Nov (before yoga-1) and one around mid-Dec (before Christmas period) in order to prioritize implementation reviews after Jan 1st even if we can continue to review specs until yoga-2. # python version pinning with tox We agreed on the fact this is a pain for developers. We then had a consensus on modifying tox to accept multiple python versions automatically with https://review.opendev.org/c/openstack/nova/+/804292 so it would fix the issue. # Nova DB issues with DB coalation We agreed on the fact it's a problem, so we'll document the fact that Nova APIs are case-insensitive at the moment even if Python is case-sensitive, which creates problems. What we propose in order to fix this is to provide a nova-manage db upgrade command that would modify the DB tables to use COLLATE utf8_bin but we also agree we can't ask operators to migrate by one cycle and we accept the fact this command could be there for a while. # SQLAlchemy 2.0 The wave is coming and we need contributors to help us change what we have in our nova DB modules to no longer use the deprecated calls. This looks a low-hanging-fruit and I'll try to find some contributor for this. # Bumping minimum microversion from v2.1 No, we said no because $users use it. # Unified limits We agreed on providing a read-only API for knowing the limits but *not* providing proxy APIs for setting limits *as a first attempt*. We prefer operators to come back with usecases and feedback from their use of Unified Limits 1.0 before we start drafting some proxy API to Keystone. Also, we agreed on *not* having config-driven quotas. # Central vncproxy in a multiple-cells environment We understand the usecase which is very specific and we accept to create a central vncproxy service that would proxy calls to the cell-related vncproxy service but this is not a pattern we want to follow for every cell-specific nova service. # Move instances between projects. Well, we totally get the usecase but we absolutely lack of resources in order to work on this very large effort that would span multiple services. # Nova service healthchecks We agreed on providing a way for monitoring tools to ping Nova services for healthness thru http or unix socket, from every service that would return a status based on cached data. A spec has to be written. # Zombie Resource Providers no longer corresponding to Nova resources Thanks to the OWNED_BY_NOVA trait we agreed when discussing with the Cyborg team, we could find a way to know the ResourceProviders owned by Nova that are no longer in use and we could consequently delete them, or warn the operator if existing allocations are present. # NUMA balancing This is definitely a bug we will fix by providing a workaround config option that will let operators define the packing or spreading strategy they want for NUMA cell stacking (or not). # Deprecation of the novaclient shell command Yes, now that the SDK is able to negociate, we can deprecate the novaclient CLI. More investigation work has to be done in order to know whether we can also deprecate the novaclient library itself. # Integration with Off-Path Network Backends Lots of technicalities with this very large spec https://review.opendev.org/c/openstack/nova-specs/+/787458. Long story short, we asked the proposer to add a few more details in the spec about upgrade scenarios, move operations and testing, but we also told that we can't accept this spec until the Neutron one lands as there are some usage from the extended Neutron APIs that are proposed in the Nova spec. # 'pc' and 'q35' machine types We agreed on changing the default machine type to 'q35' but we also agreed on *not* deprecating the 'pc' type. Some documentation has to be written in order to explain the intent of the migration. # Nova use of privsep sean-k-mooney agreed on working on a patch to remove usage of CAP_DAC_OVERRIDE and on another patch to provide the new capabilities for the monolith privsep context (a few other topics were discussed but I skipped them from my summary as they're not significant for either operators or other contributors but a very few people - don't get me wrong, I like them but I don't wanna add more verbosity to an already large summary) ### Nova painpoints (taken from https://etherpad.opendev.org/p/pain-point-elimination) # Ironic hashring failure recovery That's a latent bug but we don't want the Ironic virt driver to modify the host value of every instance. We'd rather prefer some operator action in order to communicate Nova it has to shuffle a few things. More brainstorm has honestly to be done with this as we haven't clearly drafted a designed solution yet. # Problems with shelve, unshelve and then shelve back Well, this is a nasty bug and we need to fix it, agreed. We also have a testing gap we need to close. # Naming cinder volumes after nova instance name We said yes, why not. We also considered that 'delete on terminate' has to change so it does delete the volume when you delete the instance (for a boot-for-volume case) # Orphaned instances due to underlying network issues We agreed on the fact it would be nice to provide a tool for knowing such orphans and we also think it's import for instance force-delete API call to complete successfully in such case. # reminiscent guests on a recovered compute node while instance records were purged Well, we should avoid to purge instance records from the Nova DB if we are still unable to correctly delete the compute bits unless the operator explicitly wants to (in the case of a non-recoverable compute for example). A potential solution can be to add a config flag to *not* archive deleted rows on instances that are running on down computes. # Nova APIs leaking hypervisor hardware details No, we don't want this but we can try to add more traits for giving more visibility on a case-by-case basis # RabbitMQ replacement Well, we're unfortunately lacking resources on this. We recommend the community to investigate on use of a NATS backend for oslo.messaging # Placement strictness with move operations We agree this is a bug that needs to be fixed. Okay, if you reach this point, you're very brave. Kudos to you. I don't really want to reproduce this exercice often, but I just hope it helps you summarizing a thousand-line large etherpad. -Sylvain
On Mon, Oct 25 2021 at 06:50:12 PM +0200, Sylvain Bauza <sbauza@redhat.com> wrote:
Well, it's a try given we discussed for 4 days and it could be a large summary ;) You can see all the notes in a read-only etherpad here : https://etherpad.opendev.org/p/r.e70aa851abf8644c29c8abe4bce32b81
Thank you Sylvain!
### Cross-project discussions
# Cyborg cross-project discussion with Nova We agreed on adding a OWNED_BY_NOVA trait for all Resource Providers creating by Nova so Cyborg would provide their own OWNED_BY_CYBORG trait for knowing which inventories are used by either Nova or Cyborg. Cyborg contributors need to modify https://review.opendev.org/c/openstack/nova-specs/+/780452 We also agreed on the fact that https://blueprints.launchpad.net/nova/+spec/cyborg-suspend-and-resume is a specless blueprint.
A small correction. The name of the trait did not changed during the PTG discussion. It is still OWNER_<project> in the etherpad so OWNED_BY_<project> seems like a honest mistake here. [snip]
# Zombie Resource Providers no longer corresponding to Nova resources Thanks to the OWNED_BY_NOVA trait we agreed when discussing with the Cyborg team, we could find a way to know the ResourceProviders owned by Nova that are no longer in use and we could consequently delete them, or warn the operator if existing allocations are present.
and also here. [snip]
Okay, if you reach this point, you're very brave. Kudos to you. I don't really want to reproduce this exercice often, but I just hope it helps you summarizing a thousand-line large etherpad.
-Sylvain
cheers, gibi
On Tue, Oct 26, 2021 at 10:57 AM Balazs Gibizer <balazs.gibizer@est.tech> wrote:
On Mon, Oct 25 2021 at 06:50:12 PM +0200, Sylvain Bauza <sbauza@redhat.com> wrote:
Well, it's a try given we discussed for 4 days and it could be a large summary ;) You can see all the notes in a read-only etherpad here : https://etherpad.opendev.org/p/r.e70aa851abf8644c29c8abe4bce32b81
Thank you Sylvain!
### Cross-project discussions
# Cyborg cross-project discussion with Nova We agreed on adding a OWNED_BY_NOVA trait for all Resource Providers creating by Nova so Cyborg would provide their own OWNED_BY_CYBORG trait for knowing which inventories are used by either Nova or Cyborg. Cyborg contributors need to modify https://review.opendev.org/c/openstack/nova-specs/+/780452 We also agreed on the fact that https://blueprints.launchpad.net/nova/+spec/cyborg-suspend-and-resume is a specless blueprint.
A small correction. The name of the trait did not changed during the PTG discussion. It is still OWNER_<project> in the etherpad so OWNED_BY_<project> seems like a honest mistake here.
Yeah, just a clarification here : I haven't wanted to bikeshed about the trait name during our PTG session but I'm also not sure we have a consensus about it. FWIW, I'll ask about it on the spec but here, given I was saying "we agreed on adding a 'XXX' trait", I used an adjective instead of a name. TBC, we have both adjectives and names for our standard traits [1] so I'm fine with both of the traits. HTH, -Sylvain [1] https://docs.openstack.org/os-traits/latest/reference/traits.html [snip]
# Zombie Resource Providers no longer corresponding to Nova resources Thanks to the OWNED_BY_NOVA trait we agreed when discussing with the Cyborg team, we could find a way to know the ResourceProviders owned by Nova that are no longer in use and we could consequently delete them, or warn the operator if existing allocations are present.
and also here.
[snip]
Okay, if you reach this point, you're very brave. Kudos to you. I don't really want to reproduce this exercice often, but I just hope it helps you summarizing a thousand-line large etherpad.
-Sylvain
cheers, gibi
participants (2)
-
Balazs Gibizer
-
Sylvain Bauza