Whoops, made a typo. See below.

Le lun. 3 avr. 2023 à 16:57, Sylvain Bauza <sbauza@redhat.com> a écrit :

Hey folks, that's a wrap ! I'm happy to say we had a very productive PTG week with around more than 12 contributors each day. It was nice to see new faces.

As a reminder, please don't open the main etherpad link [1] if you use a Web browser with an automatic translation modification feature. If you do it, then it would internally translate all the phrases directly in the etherpad as if you were directly modifying the etherpad by yourself. Please also make sure you don't accidentally remove or modify lines .

In order to prevent any misusage, this etherpad is a read-only copy of the main one : https://etherpad.opendev.org/p/r.22d95e20f7133350dcad573c015ed7da

Last point but not the least, I'm a human who usually writes bugs [2], including when it's about writing a report of the past week's discussions and the actions taken. Don't feel offended or irritated if you read silly notes of the very beloved chat you had, please rather fix it the opensource way by correcting my errors with a reply on that thread.

Enough chat, please grab a coffee [3] (as it's a quite verbose email) and jump straight on below.

### Operator hour ###
(specific and distinct read-only etherpad here : https://etherpad.opendev.org/p/r.f94e6dda608450b6a4a5c9afc76e8a88 )
Yet again, we missed a few more operators in this room (at least we had 3 of them, thanks for those that were there !). I'm still wondering why they are not joining a free session where they can discuss their concerns or some wanted features. Anyway, that's what we discussed :
- for the moment, no operators have tried to use the Yoga unified limits feature.
- operators are happy with the new default RBAC scopes. We could create a small and quick admin-guide section for explaining that and pointing to the oslo.policy documentation.
- operators would love to use the (possibly) new virtiofs feature from Bobcat for Manila shares (if we are able to merge it). That being said, they think they couldn't provide it for their environments until Nova can live-migrate the instances using the shares (which is not possible as for the moment, even QEMU doesn't support it)
- we also discussed a pain point from an operator and then we asked him to provide a bug report (related to the vm state continuing to be ACTIVE after shelving). For the moment, we can't reproduce the problem.

### Cross-project discussions ###

## Manila/Nova ###

# Prevent shares deletion while it's attached to an instance
- Nova could add some traits to reflect the share type.
- the Manila team will create a manila spec for adding a service-level (or admin-level) lock API with a 'locked-by' parameter (or using service tokens if set)

### Neutron/Nova ###
Read Neutron summary email [4] if you want more details.

- We agreed on a delete_on_termination feature for ports. We need to see if we can use some already existing Neutron API (tags) or if we need a whole new API.
- We are quite OK with the Napatech LinkVirt SmartNics feature, but we need to review the spec and we also want some third-party CI support for it.
- the Neutron team accepted to return a HTTP409 Conflict return code if a duplicated port binding action is done by Nova and the Nova team agreed on handling it.
- About the SRIOV port tracking in Placement, the Nova team needs to help the spec owner on how to work on it as it's a large and difficult design (but we agreed on the first part of the design).

### Glance/Nova ###

Glance wants to add a new API for Image Direct URL access. Nova was quite OK with this new API, but the Nova team was having some concerns about upgrades and client calls to the new API. The Manila team agreed to create a nova spec for discussing those.

### Cinder/Nova ###

- The Nova team is quite OK with the usecase for NFV encryption. Since the design is quite simple, maybe this could be a specless feature, just using traits for telling to Placement which computes support NFV encryption. We're OK to test this with cinder NFV jobs and maybe also testing it with a Nova periodic job depending on the implementation.

Hah, you should read NFS encryption (Network File System) and of course *not* NFV encryption (Networking Function Virtualization) (or maybe you wondered why then we discussed this with the Cinder team ? ;-) )

- The Cinder team agreed on using its volume metadata API for tracking the cinder volumes hardware models.

### Horizon/Nova ###

Horizon wanted to call the Placement API. We agreed on using the openstackSDK for this as Placement doesn't have python bindings directly. Some new methods may be added against the SDK for calling specific Placement APIs if we think we may need those (instead of directly calling Placement with the standard HTTP verbs and the API URIs)

### Nova-specific topics ###

# Antelope retrospective and process-related discussions

Shorter cycle than Zed, 6 features landed (same as Zed), 46 contributors (down by 8 compared to Zed), 27 bugfixes (vs. 54 for Zed).
We were happy to not have a RC2 for the first time in the Nova history. The bug triage backlog kept small.
- We want to discuss on a better review cadence to not look at all the implementations by the last weeks.
- We want to get rid of Storyboard stories in Placement. We found and agreed on a way forward.
- We will set 5-years-old-and-more bug reports as Incomplete, so bug reporters have 90 days to confirm that the bug is still present on master.
- We will add more contrib documentation for explaining more how to find Gerrit changes for one-off reviewers and how to create some changes
- We agreed on no longer automatically accepting an already-approved spec from a previous cycle if no code is still not existing.
- All efforts on porting our client calls to the SDK should be tagged in Launchpad with the 'sdk' tag.
- A Bobcat schedule will be proposed on next Tuesday weekly meeting with feature and spec deadlines + some review days.
- We also discussed what we should discuss in the Vancouver PTG.

# CI failures continuous improvement

- We want to test some alternative image but cirros (alpine as a potential alternative) with a small footprint in some specific nova jobs
- We need to continue investigating on the volume detach/attach failures. We may create a specific job that would do serialized volume checks as a canary job for forcing the job to fail more frequently in order to dig down further.
- (related) Sylvain will propose a Vancouver PTG cross-project session for a CI debugging war room experience.

# The new release cadence

We had long arguments whether we should hold deprecations and removals this cycle, given the releasenotes tooling isn't in place yet. As a reminder, since Bobcat is a non-SLURP release, operators are able to skip it and just look at the C releasenotes so we want to make sure we forward-port upgrade notes for them. For the moment, no consensus in place, we defer the outcome into a subsequent nova meeting and in the meantime, we raised the point to the TC for guidance needs.

# The problem with hard affinity group policies

- We agreed on the fact that hard affinity/anti-affinity policies aren't ideal (operators, if you read me, please prefer soft affinity/anti-affinity policies for various reasons I won't explain here) but since we support those policies and the use case is quite understandable, we need to find a way to unblock instances that can't be easily-migrated from those kind of groups (eg. try to move your instances off a host with hard Affinity between them, good luck)
- A consensus has been reached to propose a new parameter for live-migrate, evacuate and migrate that would only skip the hard affinity checks. A backlog spec will be written capturing this consensus.
- Since a group policy may be violated, we'll also propose a new server group API modification that would show if a group has its policy violated.

# Unified limits next steps

- We agreed on the fact we should move on and enable by default unified limits in a close future, but we first need to provide a bit of tooling to help the migration.
- tool could be a nova-manage command that would provide the existing quota limits as a yaml file so we could inject those into keystone (to be further defined during Bobcat)
- a nova-status upgrade check may be nice for yelling if operators forgot to define their limits in keystone before upgrading to the release (potentially C) that would default to unified limits
- testing in the grenade job is a feat.

# Frequently written objects may lead to exhaustion of DB primary keys
- We agreed on having all new primary keys to have BigInteger as data type. Existing keys should use BigInt but there is a huge upgrade impact to mitigate.
- Operators will be sollicited to know their preferences between a costly-at-once data migration vs. a rolling data online migration for existing tables.
- We will document which DB tables are safe to reindex (since their PKs isn't normalized as a FK somewhere else)
- A backlog spec will capture all of the above

# Leftover volumes relationship when you directly call Cinder instead of Nova for detaching
- Yup, this is a known issue and we should discuss this with the Cinder team (unfortunately, we didn't did it this week)
- We maybe should have a nova-manage script that would cleanup the BDMs but we're afraid of leaking some volume residues in the compute OS
- Anyway, we need to understand the main scope and we also need to discuss about the preconditions ('when' should we call this tool and 'what' this tool should be doing ?). Probably a spec.

# Misc but not the leasc (heh)

- we will fill a rfe bug for lazy-loading instance name from system metadata or update the nested field in the instance object (no additional DB write)
- we will add a new policy for cold-migrate specifically when a host is passed as a parameter (admin-only as a default)
- We agreed on continuing the effort on compute node hostname robustification (the Bobcat spec seems not controversial)
- We agreed on a few things about the server show command and the use of the SDK. Some existing OSC patch may require a bit of rework but it would cover most of the concerns we discussed.
- we should fix the post-copy issue when you want to live-migrate a paused instance by removing the post-copy migration flag if so.
- we could enable virtio-blk trim support by default. It's both a bugfix (we want to remove an exception) and a new feature (we want to enable it by default) so we'll discuss whether we need a specless blueprint by a next meeting.
- we also discuss about a Generic VDPA support feature for Nova, and we asked the owner to provide us a spec for explaining the usecase.

### That's it. ###

Yet again, I'm impressed. If you read that phrase, you're done with reading. I just hope your coffee was good and the time you took reading that email was worth it. If you have questions or want to reach me directly on IRC, that's simple : I'm bauzas on #openstack-nova.

HTH,
-Sylvain (on behalf of the whole Nova community)

[1] https://etherpad.opendev.org/p/nova-bobcat-pt g (please remove the empty char between 'pt' and 'g')
[2] Unfortunately, I'm very good at creating bugs as I discovered that based on my 10-year unfortunate contributions to OpenStack.
[3] Or a tea, or whatever you prefer.
[4] https://lists.openstack.org/pipermail/openstack-discuss/2023-April/033115.html