As a reminder (like every cycle ;-) ), please don't open the main etherpad
link  if you use a Web browser with an automatic translation
modification feature. If you do it, then it would internally translate
all the phrases directly in the etherpad as if you were directly
modifying the etherpad by yourself. Please also make sure you don't
accidentally remove or modify lines.
Anyway, let's stop discussing about etherpad, and lemme provide you the summary :
(oh and I could be wrong for explaining some stuff, no worries, just reply in here if so)
(please also grab a coffee or whatever else, you may need it)
### Operator hour ###
So, we had around 4-5 operators for this operator hour, lovely.
This is what we discussed :
# flavors explosion for public clouds
Basically, there are multiple usecases here : this is hard to discover flavor properties except by looking up the description, but this is also difficult to filter filters based on some properties since they can be different across clouds.
What we agreed (as a start) :
- Glance metadefs concept  could help to provide a catalog of flavor properties definition if operators were using it.
- Nova could provide some new API microversion for filtering flavors based on a request of standard traits and/or resources (those queried within minimum and/or maximum bars). We don't know yet who could be the owner of such a specification, contributors welcome.
- Another feature could be "Nova, tell me whether my flavor could have some candidates ?" but again, we need hands on deck.
# Nova could get a better housekeeping system for orphaned resources #
(I'm literally paraphrasing the topic name)
OK, so, after discussing with the operator, we eventually found two use-cases :
- as an operator, I'd like nova to magically adopt a compute node coming from elsewhere (say a region or another nova deployment). Long story short (lookup the etherpad for more details), the winning strategy that we could implement would be :
#1 adopt a whole separate nova cell by using a specific nova-manage command that would create records on the target nova-api DB based from the records from the source api DB. The operator was happy with working on a POC and discussing with the nova community to make it a quickwin.
#2 live-migrate my instances between cells. Well, that one is more difficult to implement, so we could punt this until #1 is implemented.
- as an operator, I sometimes need to rebuild a compute OS. Well, on that one, we discussed it further more in terms of "project cleanup". Again, long story short, we went into explaining to use 'OSC project cleanup'  feature that'd do the job.
# I want to be able to detach my root volume
Well, that one wasn't really discussed, but the topic was filled in the etherpad. No real owner of such feature request, which absolutely requires a large community design discussion since resolving this isn't trivial (and we probably need to do some trade-offs in terms of guest support). Anyway, if you read those lines and you do care of such usecase, please understand that you need to associate resources and come by the community to bring them. I, as PTL, can help providing some mentoring and onboarding if needed.
### Cross-project meetings ###
We had two cross-project sessions, one with Neutron and one with Cinder.
# Neutron cross-project session
We only discussed one topic, but a very important one : how Neutron would feel if Nova starts to require some optional Neutron API extensions ? :-)
Eventually, what we agreed with Neutron is :
- we'll forward port our Bobcat deprecation notices in the Caracal release notes
- we end up merging all the required bits (mostly ORM changes) for preparing the SQLA 2.0 upgrade in a later release
- we'll remove vmwareapi and hyperv support in Caracal
- we agreed for the virt drivers removals to not provide an API microversion on the specific API endpoints and rather return a HTTP400
### python tooling we could use ###
- We agreed on using sphinx-lint, making the code changes having doc job to use it.
- We agreed on testing codespell tool with the pep8 job and eventually either keep it or remove it depending on whether it finds many false positives
- We agreed on reviewing the pyupgrade patches as a try with no clear promises.
### Planning new features ###
- the Healthchecks spec is going to be reproposed this cycle, some implementation details about how to do RPC checks have to be identified
- we agreed on exposing the value of the pinned AZ in GET /servers/<uuid>/details
- we agreed on doing kind of soft-[anti-]affinity for AZs using instance groups (either thru new policies or some new concept of 'destination')
- PCI device affinity for libvirt driver seems an effort to be resumed. Testing this is a concern but we'll discuss this in the spec
- PCI devices notion of grouping (imagine two PCI devices for the same card, you'll get the idea) seems an interesting usecase, worth spec'ing. We could imagine the notion of a single placement unit being 'pci group' with one PCI device being the main object to track. The neighbouring aspect seems hard to implement so we reserve it for later.
- Having console timeouts on token expiry seems a nice security feat, reviews welcome on the spec.
- VGPU efforts will be huge this cycle. More to be discussed below in a specific section
### Caracal cleanups, bugs and follow-ups ###
- we really want to change our default quota system to use unified limits soon but there are some upgrade concerns with flavors that contain placement resource classes that require ourselves to do further work with oslo.limits or Keystone in order to not break operators that chose to use the unified limits driver. We could also add some checking in the nova-manage migration tool that would emit a warning if resource classes were used.
- our config with images_types_backend, use_cow_images and force_raw_images is confusing and errorprone. We agreed on fixing some migration bug as a backportable way, but also deprecating use_cow_images and force_raw_images while providing meaningful config options instead.
- we're mixing IP addresses and URLs usage in our configs, we could be more consistent. What we agreed on that is that a new migration_inbound_addr config sounds cool, defaulting to the my_ip config value but being able to provide a FQDN or hostname too
### The VGPU efforts ###
- we're actively working on providing upstream testing for mdevs, thanks to the mtty kernel samples framework. mtty live-migration is currently an ongoing patch, but someone proposed to backport this into a custom kernel built on purpose for our usage. Either way, mtty testing will be done as much as possible in nova-next but we could create a new periodic/experimental target for targeting new features like live-migration.
- we plan to do mdev live-migration this cycle. There are lots of limitations that we need to document but this sounds doable. We could add some pre-live-migration check about mdev types and this would require some object change. We may also need to persist the mdev/allocation relationship but that will be discussed in the spec
- nvidia is going to drop vfio-mdev for their SR-IOV GPUs and use a specifc vfio-pci variant driver. This is a very premature task and our caracal efforts will consist in identifying the necessary nova adaptations. One is already identified (adding a managed flag) and we agreed on a solution.
- we're gonna change how nova creates the mediated devices by rather defining a libvirt nodedevice which will also facilitate operator maintenance (they could just mdevctl or define udev rules)
- we agreed on deprecating the config default that allows you to only specify mdev types without specifying the addresses.
- SRIOV GPUs currently leak some unused VFs into Placement, leading to capacity issues . We could address that problem by either making the device's explicit definition mandatory or adding a new config option that would specify the number of VFs to use per type.
- some other bugs were discussed with an operator who was present. We agreed on doing further testing to clearly identify problems, and some patches are already up for reviews (like the evacuate case or the multi-create limitation) but also require some rebase.
Anyway, I guess I'm done now. Thanks for having read until that point and I hope your coffee (or tea) was good.
HTH and thanks,