[nova][ptg] 2026.1 Gazpacho PTG summary

3 Nov 2025

      Hello everyone,

Last week was the PTG, thank you to all who joined and contributed to the
discussions!
I hope you enjoyed it and found it productive.

Again, this cycle we had a packed agenda with active participation
throughout the sessions.
For the Nova sessions (Tuesday to Friday), we had an average of 15 to 20
participants per session. Attendance was slightly higher during the
cross-project discussions, with an average of 25 to 30 people joining.

You can revisit all notes and discussions on the Etherpad here:
https://etherpad.opendev.org/p/r.9bac75a686ab0941572757b205c9415c

Below is a summary of the topics and outcomes from our PTG discussions.
I’ve tried to keep it as concise as possible, though we covered quite a lot
during the week.

**** Tuesday topics ****

#### 2025.2 Flamingo Retrospective ####

During the Flamingo cycle, 14 blueprints were accepted and 9 were fully
implemented, resulting in a 64% completion rate.
This is slightly lower than the 71% achieved during the previous Epoxy
cycle.

In addition, at least 26 bug fixes were merged during the cycle.

Review & Process Feedback

Eventlet Transition Plan, the migration continues to work as expected and
remains well-coordinated.
Release process was smooth with no major issues reported.
Recognition to Kamil for his work on the conductor, and to Sean, Dan, and
others for their consistent review efforts.

✅ Continue to prioritize the Eventlet removal as a key goal for the G cycle.

Branch Opening Policy, some uncertainty remains about what can be merged
between RC1 and GA, when master is technically open but still sensitive for
late merges.
Concerns were raised about backports and release stability.

✅ Allow merges on a case-by-case basis once the new master branch opens.
✅ Bauzas to propose a patch updating Nova contributor process
<https://docs.openstack.org/nova/latest/contributor/process.html>

<https://docs.openstack.org/nova/latest/contributor/process.html>

Bug Triage, the number of untriaged bugs remains high (around 180).
Efforts were noted, but time allocation and rotation remain challenging.
The end-of-meeting triage format was found ineffective for real-time work,
though still useful for discussing specific cases.

✅ Consider reviving a rotating triage roster if enough volunteers join.
✅ Keep triage discussions focused during the weekly meeting rather than
doing live triage.

Review Load, contributors noted increasing difficulty in finding time for
reviews.
Maintaining review velocity remains a cross-cycle challenge.

✅ Encourage feature-specific syncs or review sprints for high-priority
areas.

Feature Momentum, weekly syncs and structured updates help large features
maintain progress.
The VMware CI topic was cited as useful, though too many topics per meeting
could cause overload.

✅ Keep dedicated weekly or bi-weekly syncs for major features (e.g.
Eventlet removal).
✅ Limit meeting updates to 2-3 features per session.

Review Tracking, the previous Etherpad tracker was not effective.
However, having a central list remains important to avoid missing patches
and to monitor review health.

✅ Discuss an external or simplified review tracker during the next meeting
with more core reviewers.
✅ Keep the tracker short and up to date to maintain visibility.

#### 2025.2 Flamingo Planning ####

Hard spec freeze: December 4th
Milestone 2 (M2): January 8th
Feature Freeze (FF): February 26th
Release: End of March / early April.

The team discussed adjusting the timing of the spec freeze.

A soft spec freeze was proposed for November 20th, giving three weeks after
the PTG for contributors to submit specs.
Two spec review days will be scheduled between the soft and hard freezes.
Additionally, two implementation review days are planned: one around M2 for
early landings, and another roughly two weeks before M3.

✅ Adopt the proposed plan, while staying flexible on a case-by-case basis.
✅ Soft spec freeze on Nov 20th, followed by two spec review days before Dec
4th.
✅ Two implementation review days: one near M2 and one two weeks before M3.
✅ A +2 on a spec implies reviewers commit to reviewing the corresponding
implementation.

#### Upstream meeting schedule ####

✅ Move the weekly upstream meeting to Monday at 16:00 UTC.
✅ Keep the slot fixed (no alternating or skipping weeks).
✅ Uggla to update the meeting schedule and cycle accordingly.

#### Upstream bug triage follow up ####

The team discussed how to better organize bug triage follow-up for the
Gazpacho cycle.
A proposal was made to create a small group of volunteers who would receive
a few bugs to triage each week.
Each volunteer would handle up to three bugs, distributed by Uggla.

During the weekly upstream meeting, the team will check whether a dedicated
Q&A session is needed to discuss specific bugs.
If required, such a session will be scheduled every two weeks, ideally
right after the upstream meeting.
Otherwise, discussions will take place within the regular meeting.

✅ The team agreed to give this approach a try in the next cycle. The
overall sentiment seemed favorable, though details will likely need
adjustment as we go.

#### Eventlet removal ####

Significant progress was made since Flamingo.
In Flamingo, n-api, n-metadata, and n-sch could already be switched to
native threading via an environment variable, and the nova-next job runs
those services without Eventlet.
n-cond was nearly ready but missed feature freeze due to a late futurist
bug, which has since been fixed and released.
A py312-threading tox target and a Zuul job already validate a large part
of the unit tests without Eventlet.

In early Gazpacho, native threading for the conductor has successfully
landed.
Planned tasks for Gazpacho

   -

   Add native threading support for n-novncproxy and other proxies.
   -

   Add native threading support for n-compute.
   -

   Run all unit and functional tests in both threading and Eventlet modes,
   keeping small exclude lists where needed.
   -

   Conduct scale testing using the fake driver, possibly extended to the
   libvirt driver.
   -

   Switch n-api, n-metadata, and n-sch to native threading by default,
   keeping Eventlet as fallback.
   -

   Introduce a nova-eventlet (legacy) CI job to ensure backward
   compatibility.
   -

   Keep nova-next as the only job testing conductor, novncproxy, and
   compute with threading until the H cycle.
   -

   Fix bugs reported after Eventlet removal.
   -

   After Eventlet removal, migrate from PyMySQL to libmysqlclient for
   better DB performance.

✅ Kamil volunteered to handle the novncproxy migration.
✅ Start with nova-compute migration first to build confidence.
✅ Switch the default to threading as early as possible in Gazpacho.
✅ Add a dedicated Eventlet CI job to continue testing legacy mode.
✅ Use the hybrid plug job to test Eventlet mode even for services switched
to threading by default.
✅ Keep the work independent, but document the known issue with long-running
RPC calls.

#### Long-Running Sync REST API Calls ####

The team discussed how long-running synchronous REST API actions (like
volume or interface attach) can block nova-api threads once Eventlet is
removed.
With native threading, the limited number of threads makes blocking calls
more costly, unlike Eventlet which could spawn many lightweight greenlets.

This work is closely related but independent from the Eventlet removal
effort.
Dan started a spec on the topic, and it may be split into two (volume and
interface attach).
Detach operations are already async, and Manila shares follow a similar
model.

✅ Introduce a new microversion to keep backward compatibility.
✅ Convert volume and interface attach to async operations under that
microversion.
✅ Coordinate with Rajesh’s "show finish_time" (
https://review.opendev.org/q/topic:%22bp/show-instance-action-finish-time%22
) work to avoid race conditions.
✅ Dan and Johannes to update the spec.
✅ Further investigation is needed to determine if WSGI or API tuning could
help mitigate blocking.

#### vTPM Live Migration ####

The team reviewed how to handle TPM secret security policies during
instance operations.
Changing the assigned policy during resize is not supported, as it adds
complexity and can lead to image/flavor conflicts.
Rebuilds are already blocked for vTPM instances, so once a policy is set
via resize, it remains locked in.
Existing instances from previous releases are unaffected.

✅ Do not allow changing the TPM secret security policy after assignment.
✅ Remove the option to select the policy from the image for simplicity.
✅ Default policy is “user”, but compute nodes support all policies by
default.
✅ Document in the spec and release notes that deployers must define flavors
with hw:tpm_secret_security if they want to enable this.
✅ Mention that [libvirt]supported_tpm_secret_security = ['user', 'host',
'deployment'] can be adjusted by operators.

**** Wednesday topics ****
#### CPU Comparison During Live Migration ####

The team revisited the CPU compatibility check between source and
destination hosts during live migration.
If skip_cpu_compare_on_dest is enabled, some migrations may fail due to
missing pre-checks; if disabled, the check can sometimes be too strict.
The original goal is to avoid migrations to nodes running with QEMU
emulation, which would cause serious performance issues.

✅ Include getDomainCapabilities in the Nova pre-check to improve accuracy.
✅ Move the configuration flag from workaround to the libvirt section.
✅ Document potential unsafe edge cases so deployers can choose between the
safer (Nova) or more permissive (libvirt-only) approach.

### Start cross session with Cinder ###

#### NFS / NetApp NFS Online Volume Extend ####

The team discussed adding online volume extend support for NFS and NetApp
backends.
Instance operations (like shutdown or migration) should be blocked during
the extend process, and any new task_state must stay compatible with
existing microversions.

✅ Avoid using the event API, as it’s meant for fast operations.
✅ Introduce a dedicated API for this feature to manage versioning and
states.
✅ Consider making the Cinder call synchronous for better control.
✅ Konrad will update the spec in this way.

#### Hard-Coded io=native Setting for Cinder Backends ####

The team revisited the hard-coded io=native driver setting used for Cinder
volume backends (NFS, FC, and iSCSI).
This code dates back nearly ten years and has recently caused performance
issues and guest hangs in some environments.
The original reason for enforcing io=native is unclear.

✅ No one seems to know the reason for this original choice
✅ Move the configuration to a standard libvirt configuration option.

### End cross session with Cinder ###

#### Preserve NVRAM After Reboots and Migrations ####

The team discussed the proposal to preserve NVRAM data across instance
reboots and migrations.
The feature is ready for review, pending available reviewer bandwidth.

✅ Request a definitive answer from libvirt regarding file permission
handling through a bug report.

### Start cross session with Ironic/Cinder ###
A more complete summary of this joint session will be provided by the
Ironic team, as they can better capture the discussions and decisions made.
However, the Nova PTG notes still include a few relevant takeaways and
references for those interested.
### End cross session with Ironic/Cinder ###

### Start cross session with Manila ###
The teams discussed testing and integration plans between Nova and Manila.
Testing through Tempest was considered challenging due to configuration
complexity and potential circular dependencies.
Support for CephFS and NFS, hot attach, and live migration with newer OS
versions was identified as a goal.

✅ Manila will draft initial UX and workflow details to share with Nova for
review.
✅ Add a backlog spec in the Nova repository to track cross-team work.
✅ Manila contributors may help with open SDK and client patches as needed.
✅ Hot attach and live migration support are valuable goals but will need
explicit demand and prioritization to move forward.
### End cross session with Manila ###

### Start cross session with Neutron ###
A more complete summary of this joint session will be provided by the
Neutron team, as they can better capture the discussions and decisions made.
However, the Nova PTG notes still include a few relevant takeaways and
references for those interested.
### End cross session with Neutron ###

**** Thursday topics ****
#### Correct Firmware Detection for Stateless Firmware and AMD SEV/SEV-ES
####

The team discussed improving firmware detection for stateless firmware and
AMD SEV/SEV-ES.
Recent QEMU and libvirt versions now include firmware descriptor files that
expose these capabilities, allowing libvirt to automatically select the
right firmware without Nova duplicating that logic.
The group agreed that leveraging libvirt’s detection is the better
long-term approach.

To avoid regressions, Nova should continue to preserve existing firmware
for running instances during hard reboots, while new instances will rely on
libvirt’s detection from the start.

✅ Use libvirt’s firmware detection instead of maintaining custom logic in
Nova.
✅ Preserve the existing firmware for current instances on hard reboot. Let
libvirt select firmware for new ones.
✅ Takashi will write a spec describing the approach and detailing the
upgrade path.
✅ He will also check UEFI/stateless firmware tests in CI and run local
SEV/SEV-ES tests.

#### ROM-Type Firmware Support ####

The team continued the discussion on firmware selection, focusing on
ROM-type firmware used by AMD SEV-SNP, Intel TDX, and future Arm CCA.
Since recent libvirt versions can automatically select the appropriate
firmware and handle SMM when needed, Nova no longer needs to manage this
logic directly.

✅ Skip further work in Nova and rely on libvirt’s firmware auto-selection
mechanism.

#### AMD SEV-SNP Support ####

The team discussed initial plans for SEV-SNP support, which depends on
firmware and requires recent QEMU (≥9.2) and libvirt versions.
Work may be postponed to the 2026.2 cycle due to these dependencies.

For SEV-SNP, there is a need to provide hostData, an immutable and attested
field (up to 32 bytes) that can be set by Nova or external tools. Using
Nova metadata for this purpose was ruled out.

✅ Do not use Nova metadata to provide SEV-SNP host data.
✅ The instance POST API could be extended to allow providing this
information.
✅ Takashi will write an initial spec, keeping the design generic for both
SEV-SNP and TDX.

#### Generalize SEV / SEV-ES Code ####

The team reviewed the proposal to generalize the existing SEV/SEV-ES code
in preparation for future Confidential Computing architectures such as
Intel TDX and Arm CCA.
The work focuses on refactoring internal Nova code to make memory
encryption handling more generic, without changing any user-visible
behavior or API.

✅ Treat this as a code refactoring only (no external impact).
✅ Ensure it does not conflict with Takashi’s ongoing work.
✅ Implementation can proceed as a specless blueprint.
✅ No Tempest job available. Manual validation is acceptable, ideally with
screenshots or logs.
✅ Note: It’s possible to run Tempest manually using a flavor requesting
SEV/SEV-ES to verify correctness.

#### Arm CCA Support ####

The team discussed the roadmap for Arm CCA enablement in Nova.
Upstream dependencies include Linux Kernel (Dec 2025), QEMU (Feb 2026), and
libvirt (Mar 2026), with full availability expected around Ubuntu 26.04
(Apr 2026).
As a result, development and testing in OpenStack are expected to start
during the 2026.2 cycle.

✅ Features cannot merge until support is available in official
distributions (not custom kernels).
✅ CentOS Stream 10 may serve as an early test platform if it gains CCA
support before Ubuntu 26.04.
✅ Specs or PoCs can still be prepared in advance to ease future inclusion.
✅ Once libvirt support lands, the team will review the spec, targeting
early in the H (2026.2) cycle.

#### Show finish_time in Instance Action Details ####

The proposal adds a finish_time field to the instance action show API,
along with matching updates in the SDK and client.
Some review comments still need to be addressed before re-proposing the
patch for the G release.

✅ No objection to re-proposing the patch for Gazpacho (G).
✅ The existing spec remains valid.
✅ Gmaan will review previous comments to help move the implementation
forward.

### Start cross session with Glance ###
A more complete summary of this joint session will be provided by the
Glance team, as they can better capture the discussions and decisions made.
However, the Nova PTG notes still include a few relevant takeaways and
references for those interested.
### End cross session with Glance ###

#### Periodic Scheduler Update as Notification ####

The proposal suggests exposing Nova’s periodic resource tracker updates as
optional versioned notifications.
This would help services like Watcher obtain a more complete and up-to-date
view of compute resources, including NUMA topology and devices not yet
modeled in Placement.

While there are concerns about notification stability and data volume, the
team agreed it could be useful for cross-service integrations.

✅ Support the idea in principle.
✅ Be mindful of the volume of data included in notifications.
✅ Sean will prepare a spec proposal detailing the approach.

#### Resource Provider Weigher and Preferred/Avoided Traits ####

The proposal introduces a resource provider weigher to influence scheduling
decisions, with a longer-term goal of supporting preferred and avoided
traits.
This would allow external services, such as Watcher, to tag compute nodes
(e.g., with CPU or memory pressure) and guide the scheduler to favor or
avoid specific hosts.

✅ There are concerns about validating the weigher’s behavior at scale,
ensuring it works correctly with a large number of instances.
✅ Continue developing this approach.
✅ Improve the test framework, despite existing challenges.
✅ Add Monte Carlo–style functional tests to validate behavior in complex,
non-symmetric scenarios.

#### Proxying OVH’s Monitoring-Aware Scheduling ####

The team discussed OVH’s proposal to enhance scheduling decisions by
integrating external monitoring data.
The idea is to add a filter or weigher that delegates its logic to an
external plugin, allowing operators to plug in custom metrics (e.g.,
Prometheus, InfluxDB) without modifying Nova directly.

While the use case is relevant, existing out-of-tree filters and weighers
already allow similar integrations.
One of the main challenges identified was how to share and maintain such
code consistently across the community.

✅ Current Nova interfaces likely already support this use case.
✅ Proposal to create a dedicated repository on OpenDev for sharing and
maintaining custom filters and weighers.

**** Friday topics ****
#### LY Corp – Graceful Shutdown Proposal ####

The team discussed LY Corp’s proposal to implement graceful shutdown
handling in Nova.
The goal is to stop accepting new RPC requests while allowing in-progress
operations, such as instance boots or live migrations to complete safely.
Different designs were compared, including stopping RPC listeners or using
the upcoming oslo.messaging HTTP driver, which can handle clean shutdowns
more reliably.

The consensus was that the complete solution must involve oslo.messaging,
allowing Nova services to dynamically subscribe or unsubscribe from topics.
This approach also implies an upgrade impact, requiring versioned changes
to compute and scheduler topics.

✅ The current proposal needs further work, as live migration cases were not
yet considered.
✅ Gmaan will continue investigation and refine the approach based on
oslo.messaging improvements.
✅ A revised spec will be submitted once ready.
✅ Gmaan confirmed commitment to follow up and lead this feature.

#### Offline VM Migration Using HTTP-Based File Server ####

The team discussed a proposal to enable offline VM migration using an
HTTP-based file server.
The preferred approach is to use WebDAV, which provides a standard protocol
and existing server implementations.
Nova itself will not host the web server and will be restricted to paths
under /var/lib/nova.

✅ Use WebDAV as the preferred protocol.
✅ Avoid introducing non-standard protocols in Nova.
✅ Nova will not spawn or host the web server directly.
✅ Define a URL template in configuration for the remote filesystem driver
(e.g., remotefs_url = https://%(host)s:9999/path/to/dav).
✅ Add DevStack support to test the feature in CI.

#### PCI and VGPU Scheduling Performance in Placement

The team revisited the performance issues with PCI and VGPU scheduling in
Placement.
Two workaround sets have already been implemented to reduce the number of
valid and invalid candidates, but the core scalability problem of the GET
/allocation_candidates query remains, especially in large, symmetric
provider trees.

A long-term constraint solver approach is under consideration, and the
discussion will continue once more data is available from current
deployments.

✅ Keep the current workarounds active and gather feedback over the next six
months.
✅ Nova-next may run with these settings enabled to collect early results.
✅ Revisit the topic and defaulting decisions at the next PTG.
✅ Keep the issue open for urgent improvements if needed before the next
cycle.

#### Multithread and Multiqueue Support for Libvirt ####

The team discussed adding multithread and multiqueue support in Nova for
libvirt-based instances.
While QEMU already enables multiqueue for virtio-blk and virtio-scsi by
default, the main focus is now on IOThread support and improving
parallelism during I/O and live migrations.

✅ Enable IOThreads with one thread per device by default (handled as a
specless blueprint).
✅ virtio-blk and virtio-scsi specs are parked.
✅ Cancel the multiqueue work, since QEMU now manages it natively.
✅ The parallel migration blueprint is approved as specless.
✅ The earlier proposed spec will be abandoned in favor of this simplified
path forward.

#### AZ-Aware Scheduling (Soft Anti-Affinity) ####

The team discussed improving availability zone–aware scheduling by
introducing a soft anti-affinity policy for server groups.

✅ The overall direction is good, but the spec still needs more work.
✅ The team is broadly supportive of continuing in this direction.
✅ A spec will be proposed by Dmitriy.

#### NUMA-Aware Placement (CloudFerro Proposal) ####

The team discussed CloudFerro’s proposal to enable NUMA-aware scheduling in
Placement, based on the old Ussuri spec.
The goal is to represent NUMA nodes as resource providers to allow better
locality handling for CPU and memory resources.
However, this raises potential performance concerns, especially when
combined with PCI in Placement, as both increase candidate permutations.

✅ Ensure NUMA and PCI in Placement work together correctly before expanding
scope.
✅ Do not move PCI devices under NUMA subtrees yet.
✅ Add functional tests to validate the combined behavior.
✅ Consider extending CI to run on hosts with real NUMA topology.
✅ Enable NUMA awareness in nova-next (single NUMA node for now).
✅ Investigate with the infra team the possibility of adding dual-NUMA test
nodes.
✅ Improvements to Placement’s group_policy remain out of scope for this
work.
✅ The team is happy with the current direction and looks forward to future
patches landing in master.

#### WSGI Script Removal ####

The team discussed removing the legacy wsgi_script support from Nova.
No objections were raised, and no spec is required.

✅ Proceed with the removal, the team is aligned and ready to move forward.

#### Improving Cyborg Support ####

The team discussed reviving Cyborg integration efforts in Nova.
Several aspects remain unfinished, including proper mdev GPU XML
generation, support for additional device buses (block, NVMe, CXL, USB),
and handling ownership overlap between Nova and Cyborg in Placement.
The plan is to resume this work gradually, starting with the most mature
components.

✅ Restart the Cyborg–Nova integration effort.
✅ Split the work into multiple specs, as each part is complex enough to
warrant separate discussion.
✅ Focus first on mdev support, then extend to other buses (block, NVMe,
etc.).
✅ Investigate Placement ownership to avoid inventory conflicts between Nova
and Cyborg.

If you've read this far, thank you! 🙏
If you spot any mistakes or missing points, please don't hesitate to let me
know.

Best regards.
René.

[nova][ptg] 2026.1 Gazpacho PTG summary

Rene Ribaud