Hello everyone, Last week was the PTG, thank you to all who joined and contributed to the discussions! I hope you enjoyed it and found it productive. This cycle we had another packed agenda with active participation throughout the sessions. For the Nova sessions (Monday to Friday), we had an average of 15 to 20 participants per session. Attendance was slightly higher during the cross-project discussions, with an average of 25 to 31 participants, especially the Ironic session. You can revisit all notes and discussions on the Etherpad here: https://etherpad.opendev.org/p/nova-2026.2-ptg Below is a summary of the topics and outcomes from our PTG discussions. I've tried to keep it as concise as possible, though we covered quite a lot during the week. **** Monday topics **** #### 2026.1 Gazpacho Retrospective #### 2026.1 Gazpacho results: - Blueprints: 11 Accepted, 9.33 Implemented (84%) - Bugs: At least 28 bugfixes merged For comparison, 2025.2 Flamingo had 14 accepted blueprints with 9 implemented (64%) and at least 26 bugfixes merged. Release closure proceeded smoothly post-Feature Freeze with minimal CI issues. The only exception involved a pkg_resources issue outside Nova's control. The approach of accepting fewer, smaller tasks proved more predictable than previous cycles. ✅ Core +2 on a spec includes implicit commitment to reviewing the implementation. ✅ Upstream meetings should explicitly communicate priorities and readiness status. ✅ Bug tracking for eventlet removal work via Launchpad tags. #### 2026.2 Hibiscus Proposed Planning #### The team agreed on the following schedule: - Spec review day: May 21 - Soft spec freeze: May 28 - Spec review day: June 4 - Hard spec freeze: June 11 - M2: July 2 - Feature review day: August 4 - Feature review day: August 20 - Feature Freeze at M3: August 27 - Release: End of September / early October ✅ Adopt the proposed timeline, moving soft freeze two weeks earlier and scheduling review days one week before deadlines. ### Start cross session with Cinder ### #### Assisted Volume Extend #### The team discussed how to handle volume extend for file-based volume drivers (such as NFS) that require hypervisor assistance. Four options were analyzed (A through D), ranging from async callbacks to a generic assisted-volume-action API. ✅ Proceed with a hybrid approach combining Options A and B. ✅ Complete Cinder and client-side work first. ✅ Ensure required Nova API microversion availability during Cinder calls. ✅ Use OpenStack SDK (novaclient no longer receives support for newer API microversions). #### Expose Quiesce/Unquiesce API #### The team discussed creating a new API for quiesce/unquiesce instance volumes callable from Cinder, independent of snapshot operations. ✅ Create new API for quiesce/unquiesce instance volumes callable from Cinder. ✅ Support list of volumes for consistency groups. ✅ API should be asynchronous but with success/failure reporting capability. ✅ Require new specifications for both Nova and Cinder projects. ### End cross session with Cinder ### ### Start cross session with Manila ### Good progress was made since the last cycle. The SDK change and the OpenStack Client update have both merged. Testing has also progressed, with scenario test specifications merged and active work on Tempest modifications, virtiofsd installation in Manila devstack, and scenario tests for both VirtioFS attachments and hot attach. #### Hot Attach/Detach with VirtioFS #### Hot attach serves as prerequisite for live migration capability. The main discussion focused on whether a new API microversion is needed, since the request/response body doesn't change only the server-side precondition (allowing "active"/"paused" states instead of only "shutoff"/"error"). ✅ New microversion preferred as a feature availability marker, especially considering in-progress upgrade scenarios with mixed host states. #### Live Migration with VirtioFS Attachments #### The team discussed requirements (virtiofsd >= 1.11, QEMU >= 9.x, libvirt >= 10.x) and scope of blocked operations (unshelve, evacuate, rebuild, resize). ✅ Create two separate specifications for cold and live migration to decompose the workload. ✅ Cold migration absence represents an operator blocker that needs to be addressed. #### Memfd Support #### Adds hw:memory_backend flavor extra spec enabling per-instance shared memory backing selection (memfd/file/hugepage/anonymous). This enables OVS-DPDK and virtio-fs support, reducing the complexity. ✅ Make memfd the default immediately for new instances rather than a two-phase rollout. ### End cross session with Manila ### **** Tuesday topics **** ### Start cross session with Cyborg ### A more complete summary of this joint session will be provided by the Cyborg team, as they can better capture the discussions and decisions made. Reference: https://etherpad.opendev.org/p/cyborg-2026.2-ptg#L404 ### End cross session with Cyborg ### ### Start cross session with Ironic ### A more complete summary of this joint session will be provided by the Ironic team, as they can better capture the discussions and decisions made. On a lighter note, there was a historic and somewhat surreal moment during the session where Dan and TheJulia actually agreed on a topic, and Sean said something positive about OTU. 😄 ### End cross session with Ironic ### #### Unpinning Availability Zone #### The team discussed how to release instances pinned to specific AZs, preventing emergency migrations. Multiple approaches were considered. ✅ Allow setting the request_spec AZ to None or to the instance's current AZ to unpin. ✅ Default policy: admin-only. ✅ Document possible side effects of AZ unpinning. ✅ Preference for consistent behavior across all move operations. #### PCI Grouping #### Moving from "single device per alias" to "logical group of devices" is requested and scheduled as a single unit. Extensive discussion on traits (union approach), device types, PF/VF dependencies, resource class overrides, group lifecycle, and upgrade path. ✅ Union traits from grouped devices, devices within groups become non-consumable individually. ✅ Support multiple VFs from the same PF in different groups. ✅ Reject adding devices to allocated groups at compute startup. ✅ Handle device disappearance similarly to non-grouped devices. ✅ No reshape required; rolling upgrades supported. ✅ gibi, taking the primary implementation role, with Sylvain and Sean as main reviewers. #### Supporting Different Resource Classes for VFs (GPU/MIG Use Case) #### Modern GPUs (NVIDIA H100 with MIG) use SR-IOV to expose different compute profiles as VFs under a single PF. The current Nova implementation prevents assigning different Resource Classes to these VFs. ✅ Team not opposed, specification required. ✅ Coordinate with gibi to prevent PCI Grouping conflicts. ✅ Priority for this work needs to be defined. **** Wednesday topics **** #### Confidential Computing: AMD SEV-SNP Support #### Multiple parties report SEV-SNP running in production. The LY implementation is available and needs rebase on master. ✅ Start from LY implementation, refine via iteration. ✅ Takashi to update specification regarding hw_firmware_stateless definition. ✅ Avoid in-Nova workarounds for firmware; leverage libvirt detection. ✅ Documentation should describe firmware file modification requirements when needed. ✅ Omit idAuth/idBlock/hostData parameters from initial specification. #### Confidential Computing: TDX Spec Review #### Discussion covered TDX requirements (host passthrough, UEFI firmware limitations) and how to handle default memory-encryption model selection. The team really appreciated the quality of the presentation on this topic. ✅ Mandate hw:mem_encryption_model specification for TDX (not addressing default logic within this spec). ✅ Use Placement reservations for TDX key tracking rather than auto-detection. ✅ Document UEFI firmware limitation regarding SCSI driver stack. #### SEV Refactoring Remaining Issues #### The conflict between hw:mem_encryption=true and hw:locked_memory=False was discussed. ✅ Documentation patch to clarify interaction and expectations. #### Eventlet Removal #### Significant progress since Gazpacho. Remaining work includes console proxy conversion, CLI/nova-manage handling, deprecation statement, config option conversion, and full test coverage. ✅ Kamil volunteered to drive the Eventlet removal effort in Hibiscus. ✅ Explore generic eventlet import prevention (possibly via oslo). ✅ Verify novnc architecture in eventlet mode before conversion. ✅ Current Rally perf-test proposal represents the starting point; significant additional scenario work is needed. ### Start cross session with Horizon ### A more complete summary of this joint session will be provided by the Horizon team, as they can better capture the discussions and decisions made. Reference: https://etherpad.opendev.org/p/apr2026-ptg-horizon#L127 ### End cross session with Horizon ### #### Cross-AZ Scheduling Weigher #### The team discussed a weigher proposal for AZ-aware scheduling. Multiple concerns were raised about the approach. ✅ The pain point is understood, but the proposed weigher solution lacks fit. ✅ Alternative approaches involving server groups API or unified hard anti-affinity policies merit further exploration. ✅ No weigher adoption, topic requires further discussion. #### Add Typing #### The team discussed adopting type hints in Nova. Majority of dependencies now include type hints. ✅ Set ignore_errors = false for non-test code. ✅ Don't target strict typing. ✅ In conclusion, we have reached a consensus: type hints are welcome and should not be a reason for rejection. However, we reserve the right to push back if the hints provide no tangible benefit, the criteria for which will be refined over time. **** Thursday topics **** #### AI/LLM Knowledge and Hard Rules #### Discussion on adding knowledge prose and hard rules to Nova repository for LLM-aware development. ✅ Keep Nova knowledge in Nova repo without duplication. ✅ AGENTS.md should be minimal and point to in-repo documentation. ✅ Separate LLM-specific information from human-readable content. ✅ Prefer deterministic checks (scripts, linters, fixtures) over LLM prose guidance. ✅ Information useful to both humans and LLMs should live in documentation. ✅ Bring broader OpenStack governance questions about AI assistance to the TC and Board. #### Send os-server-external-events via SDK Instead of Novaclient #### Projects (Neutron, Cyborg, Cinder) send notifications to Nova via os-server-external-events API using novaclient. OpenStack SDK currently lacks this API. ✅ Add service-to-service API to SDK. ✅ Lajos to discuss with Stephen about sharing knowledge for similar API usage elsewhere. #### Bidirectional RPC Liveness Handshake #### Under certain conditions, Nova-Compute loses the ability to receive RPCs while the heartbeat continues, and the compute appears alive despite being unable to process requests. ✅ Specification accepted. ✅ Two-step implementation: unconditional first, then optimized. ✅ Failure threshold requirement; optimize later. ✅ Address timeout values, failure thresholds, and echo request pileup concerns in the spec. #### TempURL Usage in Nova #### The team briefly discussed the proposal for TempURL usage in Nova. ✅ General agreement on the approach. #### Attestation Integration #### Rather than Nova performing attestation, an external service will maintain host attestation traits. Nova auto-deletes these traits on startup, signaling unknown state. ✅ Define trait prefix (e.g., HW_ATTESTATION_*); Nova auto-deletes on startup. ✅ External service maintains traits at runtime. ✅ Flavors request/require traits for secure-node-only boot. ✅ Watcher may implement this via proxy service. #### Barbican Secret Handling for vTPM Live Migration #### Nova creates Barbican secrets for 'deployment' TPM security mode using Nova service user credentials. Barbican defaults allow any service project member to decrypt, which is broader than intended. ✅ Leave current design; operator responsibility. ✅ Melwitt to add documentation details/warnings/clarity to vTPM docs. **** Friday topics **** Note: Friday sessions were limited (13:00-14:00 UTC only) due to TC and Board meeting conflicts. #### Nova-Reviewers and Nova-Approvers Split #### Sean proposed adopting the Ironic/SDK model with separate nova-reviewers (+2) and nova-approvers (+W) groups. Extensive discussion followed on review bandwidth, core expansion, and how to increase patch landing velocity. The team debated multiple perspectives: - Expanding the reviewer base with proven subsystem experts - Aggressively adding full cores rather than creating a split - Concerns about lowering the landing bar ✅ No final decision reached, the team broadly agrees on the need to expand review capacity. ✅ Core expansion and mentoring remain the preferred path forward. ✅ Further discussion needed, potentially in private follow-up. #### Standardize Service-to-Service Communication #### We skipped this topic due to time and Gmann constraints. We will revisit this probably in an upstream meeting. #### Nova Ephemeral Volumes Using NetApp FlexGroup #### The team discussed a proposal for using NetApp FlexGroup as a backend for Nova ephemeral volumes. ✅ There are major concerns about the proposal. ✅ The approach looks somewhat hacky and could introduce technical debt in the future. A proper driver would be a better approach. ✅ Significant concerns were also raised about CI coverage. ✅ The team is not comfortable including a vendor-specific backend. Vendor neutrality will always be the preferred approach. ✅ A new spec proposal addressing these concerns will be welcome and discussed. #### Metadata/Tag Protection #### The team discussed the proposal for protecting instance metadata and tags from unauthorized modifications. ✅ Globbing preferred over regexp for pattern matching. ✅ Sean to leave notes in the spec and continue to refine it. ✅ Adding a concrete use case to the spec would be helpful to define the proper solution. If you've read this far, thank you! If you spot any mistakes or missing points, please don't hesitate to let me know. Best regards, René.