[nova] 2023.2 Bobcat PTG summary
Sylvain Bauza
sbauza at redhat.com
Tue Apr 4 06:52:56 UTC 2023
Whoops, made a typo. See below.
Le lun. 3 avr. 2023 à 16:57, Sylvain Bauza <sbauza at redhat.com> a écrit :
> Hey folks, that's a wrap ! I'm happy to say we had a very productive PTG
> week with around more than 12 contributors each day. It was nice to see new
> faces.
>
> As a reminder, please don't open the main etherpad link [1] if you use a
> Web browser with an automatic translation modification feature. If you do
> it, then it would internally translate all the phrases directly in the
> etherpad as if you were directly modifying the etherpad by yourself. Please
> also make sure you don't accidentally remove or modify lines .
>
> In order to prevent any misusage, this etherpad is a read-only copy of the
> main one :
> https://etherpad.opendev.org/p/r.22d95e20f7133350dcad573c015ed7da
>
> Last point but not the least, I'm a human who usually writes bugs [2],
> including when it's about writing a report of the past week's discussions
> and the actions taken. Don't feel offended or irritated if you read silly
> notes of the very beloved chat you had, please rather fix it the opensource
> way by correcting my errors with a reply on that thread.
>
> Enough chat, please grab a coffee [3] (as it's a quite verbose email) and
> jump straight on below.
>
> ### Operator hour ###
> (specific and distinct read-only etherpad here :
> https://etherpad.opendev.org/p/r.f94e6dda608450b6a4a5c9afc76e8a88 )
> Yet again, we missed a few more operators in this room (at least we had 3
> of them, thanks for those that were there !). I'm still wondering why they
> are not joining a free session where they can discuss their concerns or
> some wanted features. Anyway, that's what we discussed :
> - for the moment, no operators have tried to use the Yoga unified limits
> feature.
> - operators are happy with the new default RBAC scopes. We could create a
> small and quick admin-guide section for explaining that and pointing to the
> oslo.policy documentation.
> - operators would love to use the (possibly) new virtiofs feature from
> Bobcat for Manila shares (if we are able to merge it). That being said,
> they think they couldn't provide it for their environments until Nova can
> live-migrate the instances using the shares (which is not possible as for
> the moment, even QEMU doesn't support it)
> - we also discussed a pain point from an operator and then we asked him
> to provide a bug report (related to the vm state continuing to be ACTIVE
> after shelving). For the moment, we can't reproduce the problem.
>
> ### Cross-project discussions ###
>
> ## Manila/Nova ###
>
> # Prevent shares deletion while it's attached to an instance
> - Nova could add some traits to reflect the share type.
> - the Manila team will create a manila spec for adding a service-level
> (or admin-level) lock API with a 'locked-by' parameter (or using service
> tokens if set)
>
> ### Neutron/Nova ###
> Read Neutron summary email [4] if you want more details.
>
> - We agreed on a delete_on_termination feature for ports. We need to see
> if we can use some already existing Neutron API (tags) or if we need a
> whole new API.
> - We are quite OK with the Napatech LinkVirt SmartNics feature, but we
> need to review the spec and we also want some third-party CI support for it.
> - the Neutron team accepted to return a HTTP409 Conflict return code if a
> duplicated port binding action is done by Nova and the Nova team agreed on
> handling it.
> - About the SRIOV port tracking in Placement, the Nova team needs to help
> the spec owner on how to work on it as it's a large and difficult design
> (but we agreed on the first part of the design).
>
> ### Glance/Nova ###
>
> Glance wants to add a new API for Image Direct URL access. Nova was quite
> OK with this new API, but the Nova team was having some concerns about
> upgrades and client calls to the new API. The Manila team agreed to create
> a nova spec for discussing those.
>
> ### Cinder/Nova ###
>
> - The Nova team is quite OK with the usecase for NFV encryption. Since the
> design is quite simple, maybe this could be a specless feature, just using
> traits for telling to Placement which computes support NFV encryption.
> We're OK to test this with cinder NFV jobs and maybe also testing it with a
> Nova periodic job depending on the implementation.
>
Hah, you should read NFS encryption (Network File System) and of course
*not* NFV encryption (Networking Function Virtualization) (or maybe you
wondered why then we discussed this with the Cinder team ? ;-) )
> - The Cinder team agreed on using its volume metadata API for tracking
> the cinder volumes hardware models.
>
> ### Horizon/Nova ###
>
> Horizon wanted to call the Placement API. We agreed on using the
> openstackSDK for this as Placement doesn't have python bindings directly.
> Some new methods may be added against the SDK for calling specific
> Placement APIs if we think we may need those (instead of directly calling
> Placement with the standard HTTP verbs and the API URIs)
>
> ### Nova-specific topics ###
>
> # Antelope retrospective and process-related discussions
>
> Shorter cycle than Zed, 6 features landed (same as Zed), 46 contributors
> (down by 8 compared to Zed), 27 bugfixes (vs. 54 for Zed).
> We were happy to not have a RC2 for the first time in the Nova history.
> The bug triage backlog kept small.
> - We want to discuss on a better review cadence to not look at all the
> implementations by the last weeks.
> - We want to get rid of Storyboard stories in Placement. We found and
> agreed on a way forward.
> - We will set 5-years-old-and-more bug reports as Incomplete, so bug
> reporters have 90 days to confirm that the bug is still present on master.
> - We will add more contrib documentation for explaining more how to find
> Gerrit changes for one-off reviewers and how to create some changes
> - We agreed on no longer automatically accepting an already-approved spec
> from a previous cycle if no code is still not existing.
> - All efforts on porting our client calls to the SDK should be tagged in
> Launchpad with the 'sdk' tag.
> - A Bobcat schedule will be proposed on next Tuesday weekly meeting with
> feature and spec deadlines + some review days.
> - We also discussed what we should discuss in the Vancouver PTG.
>
> # CI failures continuous improvement
>
> - We want to test some alternative image but cirros (alpine as a potential
> alternative) with a small footprint in some specific nova jobs
> - We need to continue investigating on the volume detach/attach failures.
> We may create a specific job that would do serialized volume checks as a
> canary job for forcing the job to fail more frequently in order to dig down
> further.
> - (related) Sylvain will propose a Vancouver PTG cross-project session for
> a CI debugging war room experience.
>
> # The new release cadence
>
> We had long arguments whether we should hold deprecations and removals
> this cycle, given the releasenotes tooling isn't in place yet. As a
> reminder, since Bobcat is a non-SLURP release, operators are able to skip
> it and just look at the C releasenotes so we want to make sure we
> forward-port upgrade notes for them. For the moment, no consensus in place,
> we defer the outcome into a subsequent nova meeting and in the meantime, we
> raised the point to the TC for guidance needs.
>
> # The problem with hard affinity group policies
>
> - We agreed on the fact that hard affinity/anti-affinity policies aren't
> ideal (operators, if you read me, please prefer soft affinity/anti-affinity
> policies for various reasons I won't explain here) but since we support
> those policies and the use case is quite understandable, we need to find a
> way to unblock instances that can't be easily-migrated from those kind of
> groups (eg. try to move your instances off a host with hard Affinity
> between them, good luck)
> - A consensus has been reached to propose a new parameter for
> live-migrate, evacuate and migrate that would only skip the hard affinity
> checks. A backlog spec will be written capturing this consensus.
> - Since a group policy may be violated, we'll also propose a new server
> group API modification that would show if a group has its policy violated.
>
> # Unified limits next steps
>
> - We agreed on the fact we should move on and enable by default unified
> limits in a close future, but we first need to provide a bit of tooling to
> help the migration.
> - tool could be a nova-manage command that would provide the existing
> quota limits as a yaml file so we could inject those into keystone (to be
> further defined during Bobcat)
> - a nova-status upgrade check may be nice for yelling if operators forgot
> to define their limits in keystone before upgrading to the release
> (potentially C) that would default to unified limits
> - testing in the grenade job is a feat.
>
> # Frequently written objects may lead to exhaustion of DB primary keys
> - We agreed on having all new primary keys to have BigInteger as data
> type. Existing keys should use BigInt but there is a huge upgrade impact to
> mitigate.
> - Operators will be sollicited to know their preferences between a
> costly-at-once data migration vs. a rolling data online migration for
> existing tables.
> - We will document which DB tables are safe to reindex (since their PKs
> isn't normalized as a FK somewhere else)
> - A backlog spec will capture all of the above
>
> # Leftover volumes relationship when you directly call Cinder instead of
> Nova for detaching
> - Yup, this is a known issue and we should discuss this with the Cinder
> team (unfortunately, we didn't did it this week)
> - We maybe should have a nova-manage script that would cleanup the BDMs
> but we're afraid of leaking some volume residues in the compute OS
> - Anyway, we need to understand the main scope and we also need to discuss
> about the preconditions ('when' should we call this tool and 'what' this
> tool should be doing ?). Probably a spec.
>
> # Misc but not the leasc (heh)
>
> - we will fill a rfe bug for lazy-loading instance name from system
> metadata or update the nested field in the instance object (no additional
> DB write)
> - we will add a new policy for cold-migrate specifically when a host is
> passed as a parameter (admin-only as a default)
> - We agreed on continuing the effort on compute node hostname
> robustification (the Bobcat spec seems not controversial)
> - We agreed on a few things about the server show command and the use of
> the SDK. Some existing OSC patch may require a bit of rework but it would
> cover most of the concerns we discussed.
> - we should fix the post-copy issue when you want to live-migrate a
> paused instance by removing the post-copy migration flag if so.
> - we could enable virtio-blk trim support by default. It's both a bugfix
> (we want to remove an exception) and a new feature (we want to enable it by
> default) so we'll discuss whether we need a specless blueprint by a next
> meeting.
> - we also discuss about a Generic VDPA support feature for Nova, and we
> asked the owner to provide us a spec for explaining the usecase.
>
> ### That's it. ###
>
> Yet again, I'm impressed. If you read that phrase, you're done with
> reading. I just hope your coffee was good and the time you took reading
> that email was worth it. If you have questions or want to reach me directly
> on IRC, that's simple : I'm bauzas on #openstack-nova.
>
> HTH,
> -Sylvain (on behalf of the whole Nova community)
>
>
> [1] https://etherpad.opendev.org/p/nova-bobcat-pt g (please remove the
> empty char between 'pt' and 'g')
> [2] Unfortunately, I'm very good at creating bugs as I discovered that
> based on my 10-year unfortunate contributions to OpenStack.
> [3] Or a tea, or whatever you prefer.
> [4]
> https://lists.openstack.org/pipermail/openstack-discuss/2023-April/033115.html
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230404/2cd8e872/attachment-0001.htm>
More information about the openstack-discuss
mailing list