[nova] 2023.2 Bobcat PTG summary

Sylvain Bauza sbauza at redhat.com
Mon Apr 3 14:57:31 UTC 2023


Hey folks, that's a wrap ! I'm happy to say we had a very productive PTG
week with around more than 12 contributors each day. It was nice to see new
faces.

As a reminder, please don't open the main etherpad link [1] if you use a
Web browser with an automatic translation modification feature. If you do
it, then it would internally translate all the phrases directly in the
etherpad as if you were directly modifying the etherpad by yourself. Please
also make sure you don't accidentally remove or modify lines .

In order to prevent any misusage, this etherpad is a read-only copy of the
main one : https://etherpad.opendev.org/p/r.22d95e20f7133350dcad573c015ed7da

Last point but not the least, I'm a human who usually writes bugs [2],
including when it's about writing a report of the past week's discussions
and the actions taken. Don't feel offended or irritated if you read silly
notes of the very beloved chat you had, please rather fix it the opensource
way by correcting my errors with a reply on that thread.

Enough chat, please grab a coffee [3] (as it's a quite verbose email) and
jump straight on below.

### Operator hour ###
(specific and distinct read-only etherpad here :
https://etherpad.opendev.org/p/r.f94e6dda608450b6a4a5c9afc76e8a88 )
Yet again, we missed a few more operators in this room (at least we had 3
of them, thanks for those that were there !). I'm still wondering why they
are not joining a free session where they can discuss their concerns or
some wanted features. Anyway, that's what we discussed :
 - for the moment, no operators have tried to use the Yoga unified limits
feature.
 - operators are happy with the new default RBAC scopes. We could create a
small and quick admin-guide section for explaining that and pointing to the
oslo.policy documentation.
 - operators would love to use the (possibly) new virtiofs feature from
Bobcat for Manila shares (if we are able to merge it). That being said,
they think they couldn't provide it for their environments until Nova can
live-migrate the instances using the shares (which is not possible as for
the moment, even QEMU doesn't support it)
 - we also discussed a pain point from an operator and then we asked him to
provide a bug report (related to the vm state continuing to be ACTIVE after
shelving). For the moment, we can't reproduce the problem.

### Cross-project discussions ###

## Manila/Nova ###

# Prevent shares deletion while it's attached to an instance
 - Nova could add some traits to reflect the share type.
 - the Manila team will create a manila spec for adding a service-level (or
admin-level) lock API with a 'locked-by' parameter (or using service tokens
if set)

### Neutron/Nova ###
Read Neutron summary email [4] if you want more details.

 - We agreed on a delete_on_termination feature for ports. We need to see
if we can use some already existing Neutron API (tags) or if we need a
whole new API.
 - We are quite OK with the Napatech LinkVirt SmartNics feature, but we
need to review the spec and we also want some third-party CI support for it.
 - the Neutron team accepted to return a HTTP409 Conflict return code if a
duplicated port binding action is done by Nova and the Nova team agreed on
handling it.
 - About the SRIOV port tracking in Placement, the Nova team needs to help
the spec owner on how to work on it as it's a large and difficult design
(but we agreed on the first part of the design).

### Glance/Nova ###

Glance wants to add a new API for Image Direct URL access. Nova was quite
OK with this new API, but the Nova team was having some concerns about
upgrades and client calls to the new API. The Manila team agreed to create
a nova spec for discussing those.

### Cinder/Nova ###

- The Nova team is quite OK with the usecase for NFV encryption. Since the
design is quite simple, maybe this could be a specless feature, just using
traits for telling to Placement which computes support NFV encryption.
We're OK to test this with cinder NFV jobs and maybe also testing it with a
Nova periodic job depending on the implementation.
 - The Cinder team agreed on using its volume metadata API for tracking the
cinder volumes hardware models.

### Horizon/Nova ###

Horizon wanted to call the Placement API. We agreed on using the
openstackSDK for this as Placement doesn't have python bindings directly.
Some new methods may be added against the SDK for calling specific
Placement APIs if we think we may need those (instead of directly calling
Placement with the standard HTTP verbs and the API URIs)

### Nova-specific topics ###

# Antelope retrospective and process-related discussions

Shorter cycle than Zed, 6 features landed (same as Zed), 46 contributors
(down by 8 compared to Zed), 27 bugfixes (vs. 54 for Zed).
We were happy to not have a RC2 for the first time in the Nova history. The
bug triage backlog kept small.
 - We want to discuss on a better review cadence to not look at all the
implementations by the last weeks.
 - We want to get rid of Storyboard stories in Placement. We found and
agreed on a way forward.
 - We will set 5-years-old-and-more bug reports as Incomplete, so bug
reporters have 90 days to confirm that the bug is still present on master.
 -  We will add more contrib documentation for explaining more how to find
Gerrit changes for one-off reviewers and how to create some changes
 - We agreed on no longer automatically accepting an already-approved spec
from a previous cycle if no code is still not existing.
 - All efforts on porting our client calls to the SDK should be tagged in
Launchpad with the 'sdk' tag.
 - A Bobcat schedule will be proposed on next Tuesday weekly meeting with
feature and spec deadlines + some review days.
 - We also discussed what we should discuss in the Vancouver PTG.

# CI failures continuous improvement

- We want to test some alternative image but cirros (alpine as a potential
alternative) with a small footprint in some specific nova jobs
- We need to continue investigating on the volume detach/attach failures.
We may create a specific job that would do serialized volume checks as a
canary job for forcing the job to fail more frequently in order to dig down
further.
- (related) Sylvain will propose a Vancouver PTG cross-project session for
a CI debugging war room experience.

# The new release cadence

We had long arguments whether we should hold deprecations and removals this
cycle, given the releasenotes tooling isn't in place yet. As a reminder,
since Bobcat is a non-SLURP release, operators are able to skip it and just
look at the C releasenotes so we want to make sure we forward-port upgrade
notes for them. For the moment, no consensus in place, we defer the outcome
into a subsequent nova meeting and in the meantime, we raised the point to
the TC for guidance needs.

# The problem with hard affinity group policies

- We agreed on the fact that hard affinity/anti-affinity policies aren't
ideal (operators, if you read me, please prefer soft affinity/anti-affinity
policies for various reasons I won't explain here) but since we support
those policies and the use case is quite understandable, we need to find a
way to unblock instances that can't be easily-migrated from those kind of
groups (eg. try to move your instances off a host with hard Affinity
between them, good luck)
- A consensus has been reached to propose a new parameter for live-migrate,
evacuate and migrate that would only skip the hard affinity checks. A
backlog spec will be written capturing this consensus.
- Since a group policy may be violated, we'll also propose a new server
group API modification that would show if a group has its policy violated.

# Unified limits next steps

- We agreed on the fact we should move on and enable by default unified
limits in a close future, but we first need to provide a bit of tooling to
help the migration.
- tool could be a nova-manage command that would provide the existing quota
limits as a yaml file so we could inject those into keystone (to be further
defined during Bobcat)
- a nova-status upgrade check may be nice for yelling if operators forgot
to define their limits in keystone before upgrading to the release
(potentially C) that would default to unified limits
- testing in the grenade job is a feat.

# Frequently written objects may lead to exhaustion of DB primary keys
- We agreed on having all new primary keys to have BigInteger as data type.
Existing keys should use BigInt but there is a huge upgrade impact to
mitigate.
- Operators will be sollicited to know their preferences between a
costly-at-once data migration vs. a rolling data online migration for
existing tables.
- We will document which DB tables are safe to reindex (since their PKs
isn't normalized as a FK somewhere else)
- A backlog spec will capture all of the above

# Leftover volumes relationship when you directly call Cinder instead of
Nova for detaching
- Yup, this is a known issue and we should discuss this with the Cinder
team (unfortunately, we didn't did it this week)
- We maybe should have a nova-manage script that would cleanup the BDMs but
we're afraid of leaking some volume residues in the compute OS
- Anyway, we need to understand the main scope and we also need to discuss
about the preconditions ('when' should we call this tool and 'what' this
tool should be doing ?). Probably a spec.

# Misc but not the leasc (heh)

 - we will fill a rfe bug for lazy-loading instance name from system
metadata or update the nested field in the instance object (no additional
DB write)
 - we will add a new policy for cold-migrate specifically when a host is
passed as a parameter (admin-only as a default)
 - We agreed on continuing the effort on compute node hostname
robustification (the Bobcat spec seems not controversial)
 - We agreed on a few things about the server show command and the use of
the SDK. Some existing OSC patch may require a bit of rework but it would
cover most of the concerns we discussed.
 - we should fix the post-copy issue when you want to live-migrate a paused
instance by removing the post-copy migration flag if so.
 - we could enable virtio-blk trim support by default. It's both a bugfix
(we want to remove an exception) and a new feature (we want to enable it by
default) so we'll discuss whether we need a specless blueprint by a next
meeting.
 - we also discuss about a Generic VDPA support feature for Nova, and we
asked the owner to provide us a spec for explaining the usecase.

### That's it. ###

Yet again, I'm impressed. If you read that phrase, you're done with
reading. I just hope your coffee was good and the time you took reading
that email was worth it. If you have questions or want to reach me directly
on IRC, that's simple : I'm bauzas on #openstack-nova.

HTH,
-Sylvain (on behalf of the whole Nova community)


[1] https://etherpad.opendev.org/p/nova-bobcat-pt g (please remove the
empty char between 'pt' and 'g')
[2] Unfortunately, I'm very good at creating bugs as I discovered that
based on my 10-year unfortunate contributions to OpenStack.
[3] Or a tea, or whatever you prefer.
[4]
https://lists.openstack.org/pipermail/openstack-discuss/2023-April/033115.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230403/fe1d2530/attachment-0001.htm>


More information about the openstack-discuss mailing list