Open Stack

Thu Jun 11 11:04:12 UTC 2020

Hi all,
    Based on the Victoria PTG discussion [1], here's a stab at making some aspects of the Nova-Neutron-Cyborg interaction more concrete.

* Background: A smart NIC may have a single 'device' that combines the accelerator and the NIC, or two (or more) components in a single PCI card, with separate accelerator and NIC components. 

* What we said in the PTG: We should model the smart NIC as a single RP representing the combined accelerator/NIC for the first case. For the second case, we could have a hierarchy with separate RPs for the accelerator and the NICs, and a top-level resource-less RP which aggregates all the children RPs and combines their traits. (Correspondingly, Cyborg may represent it as a single object, which we call a Deployable, or as a hierarchy of such objects. There is already support in Cyborg for creating  a tree of such objects, though it may need validation for this use case.)

* Who creates these RPs? I suggest Cyborg create it in all cases, to keep it uniform. Neutron creates RPs today for the bandwidth provider. But, if different services create RPs depend on which feature is enabled and whether it is a single/multi-component device, that can get complex and problematic.  So, could we discuss the possibility of Neutron not creating the RP? The admin should not configure Neutron to handle such NICs.

* Ideally, the admin should be able to formulate the device profile in the same way, independent of whether it is a single-component or multi-component device. For that, the device profile must have a single resource group that includes the resource, traits and Cyborg properties for both the accelerator and NIC. The device profile for a Neutron port will presumably have only one request group. So, the device profile would look something like this:

   { "name": "my-smartnic-dp",
     "groups": [{
             "resources:FPGA":  "1",
             "resources:CUSTOM_NIC_X": "1",
             "trait:CUSTOM_FPGA_REGION_ID_FOO": "required",
             "trait:CUSTOM_NIC_TRAIT_BAR": "required",
             "trait:CUSTOM_PHYSNET_VLAN3": "required",
            "accel:bitstream_id": "3AFE"
       }]
   }

Having a single resource group for resources/traits for both accelerator and NIC would ensure that a single RP would provide all those resources, thus ensuring resource co-location in the same device. That single RP could be the top-level RP of a hierarchy. (If they were separate request groups, there is no way to ensure that the resources come from a single RP, even if we set group_policy to None.)

* During ARQ binding, Cyborg would still get a single RP as today. In the case of a multi-component device, Cyborg would translate that to the top-level Deployable object, and figure out what constituent components are present. For this scheme to work, it is important that the resource classes and traits for the accelerator RP and the NIC RP be totally disjoint (no overlapping resource classes or traits).

* We discussed the physnet trait at the PTG. My suggestion is to keep Cyborg out of this, and out of networking in general, if possible. 

[1] https://etherpad.opendev.org/p/nova-victoria-ptg "Cyborg-Nova" Lines 104-164

Regards,
Sundar

Open Stack

[cyborg][neutron][nova] Networking support in Cyborg

OpenStack

Community

Documentation

Branding & Legal