Hi all, As many of you may be aware the OpenStack community has a Project Team Gathering event every release cycle. I'm happy to report that the Cyborg team held its 2026.2 virtual PTG session last week and I would like to thank all who participated. Below is a summary of the topics covered, decisions made, and open questions where wider community input is welcome. Full notes are at: https://etherpad.opendev.org/p/r.bb485b94c3487349f2e36eb4e3b5a2ce Testing ------- A significant portion of the session was spent on the state of testing across the project. The conclusion is that coverage gaps are wide and need a structured effort to close. Driver support levels: We agreed to adopt formal driver support levels: supported, experimental, deprecated, and removed. All current in-tree drivers will be marked experimental immediately except the fake driver. Drivers that remain experimental for more than one release will be downgraded to deprecated after a mailing-list callout for maintainers. Cyborg will emit a start-up warning for drivers that are not at the "supported" level. As testing and documentation is built out for existing drivers they will graduate to supported over time. Exact details of the support levels will be captured in new project docs to set clear expectations for operators and developers. Once a deprecated driver has been announced in a SLURP release it can be removed in a following release. Tempest plugin: A dedicated etherpad will be created to coordinate cyborg-tempest-plugin improvements. Current gaps include missing stable test IDs, microversion tests, and per-driver scenario tests (today only one scenario test exists and it is hard-coded to the fake driver). PCI driver testing may be possible using the nvmevirt simulator or a third-party CI via RDO zuul. Functional testing: Building out functional test infrastructure (end-to-end tests within a single process against SQLite, similar to Nova's functional tests) was agreed as an aspirational goal for this cycle. A blueprint will be created; whether a full spec is needed is TBD. Contributor documentation will accompany it. python-cyborgclient: The v1 API code will be dropped this cycle; v2 Python bindings and the freestanding cyborg CLI will be formally deprecated in favour of the OpenStack SDK and OSC plugin. New test coverage will be built only around the OSC and sdk code path. Opens: Whether temporary operator-supplied test reports remain acceptable long-term and what minimum evidence they should contain; whether per-driver tempest tests become a hard gate for new drivers. Tech Debt --------- Several cleanup and scalability items were agreed for this and upcoming cycles. SRBAC: All phases will be implemented in 2026.2 (admin, manager, member, reader, and service roles). The new policies will become the default in 2027.1; legacy policies will be deprecated for removal in 2027.2, the exact details of this will be captured in a new spec. Conductor decoupling: Placement interactions should be moved to the compute agent; simple API CRUD operations should bypass the conductor RPC entirely. The conductor is currently a global single point of failure and scaling concern because cyborg does not support cells or conductor group for internal sharding of compute agents. The evolution of this will take time and thought and as such may be deferred to a later cycle. Quotas: The existing quota system is non-functional and will be replaced with Keystone unified limits. Cyborg must enforce quota on ARQ bind before device-profile on server create can be supported. Priority of this new capability is TBD and will partly depend on usage/operator feedback. Image verification: The feature was never fully implemented and depends on Nova objects that were never ported. Agreed to remove it and re-implement it from scratch with proper test coverage and documentation. With that said we may decide to fix it in place if that proves quick and back-portable. OpenStack SDK: Cyborg will port its glance interaction to the SDK, removing the python-glanceclient dependency. API design: The device, attribute, and deployables APIs have overlapping responsibilities and should be consolidated into a single surface. PATCH with JSON Patch payloads should be replaced with PUT. OpenAPI request and response schemas should be introduced. This cycle: produce a single design document. API evolution (microversion or v3) targeted for a future release. The only concrete change landing this cycle is removing stale references to the removed v1 API from the API reference. Documentation: New features must ship a release note and docs in the same series. New drivers must include a driver-specific admin doc and a cross-link to the support matrix. We also agreed to produce a service interaction doc covering how cyborg, placement, and nova work together. Opens: How to signal enabled/disabled device state to the Nova scheduler. reserved=total vs placement traits both have precedent but different trade-offs around admin reservations and per-resource granularity. Using reserved=total is likely the correct path but we need to document why and how that works. Nova Cross-Project ------------------ We met with the Nova team to align on work that requires changes in both projects. VM lifecycle (resize, cold/live migration): Gaps for stateless devices will be investigated this cycle with a PoC if time allows. The feature is targeted at 2027.1. Stateful device support (e.g. NVMe disk content transfer) is deferred to 2027.1+. For resize-revert, the expectation is that the source ARQ is held until the resize is confirmed and the destination ARQ is held until a revert is discarded via resize confirm. Attachment handles: An mdev handle type is needed for time-sliced vGPU support (Nova spec already in review). Handle content will need to carry live_migratable and managed metadata that would normally come from the nova.conf via the PCI devspec. Additional future types (nvme, block, usb, cxl) are scoped but not targeted this cycle. Device-profile on server create and attach/detach: Both have a soft dependency on unified limits support landing in cyborg first. Targeted at 2027.1+ after resize is supported. Nova scheduler extensibility: Not discussed due to time. The MVP proposal is to expose provider summaries to existing filters and weighers via the host state object. Longer-term proposals for stevedore extension points and hook points were mentioned as nice to have operator quality of life improvement for out-of-tree use cases (watcher policy-based scheduling, cyborg NUMA filtering, telemetry-aware scheduling). Opens: Whether a generic configurable traits weigher in Nova, or moving weighing logic to Placement, could satisfy the extensibility use cases without requiring out-of-tree hook infrastructure is still open. New Drivers ----------- Two new drivers are moving forward this cycle alongside early groundwork on a longer-term v2 driver framework. Generic MDEV driver: The existing NVIDIA driver will be refactored to reuse it. First-party CI testing will use the mtty/mdpy kernel modules. The companion Nova spec for mdev attach-handle support is already in review. NVMe driver: It will inherit from the generic PCI driver, use nvme-cli for device management and secure erase, and will not require the pci or SSD driver to be enabled to function. Details will be in the sepc. Generic driver framework v2: Agreed to PoC the concepts incrementally in the new MDEV and NVMe drivers this cycle, factoring out PCI enumeration and introducing typed data classes in place of generic dicts. No spec this cycle; a formal spec targeting 2027.1 will follow if the PoC succeeds. We also agreed to rename the whitelist config option to align with Nova's devspec so that shared keys behave consistently across both projects. Placement traits and RP naming: The convention CUSTOM_<BUS>_<vendor_id>_<product_id> (aligning with Nova's PCI-in-placement work) was proposed and discussed but no formal agreement was reached this cycle. Interchangeable VFs should be grouped into a single RP per NUMA node per device type rather than reported as individual RPs of inventory 1. A spec with an upgrade path is required before any of this lands. TBD if this will be done as part of the v2 generic driver work or independently; no ETA is proposed for this work. Opens: NUMA modelling in placement requires agreement with Nova on the RP naming convention and UUID generation before cyborg can create NUMA resource providers independently. If you have questions or feedback on any of the above, please reply to this thread. Thanks to everyone who participated in the sessions. regards Seán p.s. as with most of my longer prose to this list this has been translated form sean speak to english via llm assistance, enjoy the added punctuation and capital letters.