[placement][nova][ptg] resource provider affinity

Nadathur, Sundar sundar.nadathur at intel.com
Thu May 9 04:37:08 UTC 2019


On 5/8/2019 2:31 PM, Eric Fried wrote:
> Sundar-
>
>>       I have a set of compute hosts, each with several NICs of type T. Each NIC has a set of PFs: PF1, PF2, .... Each PF is a resource provider, and each has a separate custom RC: CUSTOM_RC_PF1, CUSTOM_RC_PF2, ... . The VFs are inventories of the associated PF's RC. Provider networks etc. are traits on that PF.
> It would be weird for the inventories to be called PF* if they're
> inventories of VF.
  I am focusing mainly on the concepts for now, not on the names.
> But mainly: why the custom resource classes?
This is as elaborate an example as I could cook up. IRL, we may need 
some custom RC, but maybe not one for each PF type.
> The way "resourceless RP" + "same_subtree" is designed to work is best
> explained if I model your use case with standard resource classes instead:
>
> CN
> |
> +---NIC1 (trait: I_AM_A_NIC)
> |     |
> |     +-----PF1_1 (trait: CUSTOM_PHYSNET1, inventory: VF=4)
> |     |
> |     +-----PF1_2 (trait: CUSTOM_PHYSNET2, inventory: VF=4)
> |
> +---NIC2 (trait: I_AM_A_NIC)
>        |
>        +-----PF2_1 (trait: CUSTOM_PHYSNET1, inventory: VF=4)
>        |
>        +-----PF2_2 (trait: CUSTOM_PHYSNET2, inventory: VF=4)
>
> Now if I say:
>
>   ?resources_T1=VF:1
>   &required_T1=CUSTOM_PHYSNET1
>   &resources_T2=VF:1
>   &required_T2=CUSTOM_PHYSNET2
>   &required_T3=I_AM_A_NIC
>   &same_subtree=','.join([suffix for suffix in suffixes if
> suffix.startswith('_T')]) (i.e. '_T1,_T2,_T3')
>
> ...then I'll get two candidates:
>
>   - {PF1_1: VF=1, PF1_2: VF=1} <== i.e. both from NIC1
>   - {PF2_1: VF=1, PF2_2: VF=1} <== i.e. both from NIC2
>
> ...and no candidates where one VF is from each NIC.
>
> IIUC this is how you wanted it.

Yes. The examples in the storyboard [1] for NUMA affinity use group 
numbers. If that were recast to use named groups, and we wanted NUMA 
affinity apart from device colocation, would that not require a 
different name than T? In short, if you want to express 2 different 
affinities/groupings, perhaps we need to use a name with 2 parts, and 
use 2 different same_subtree clauses. Just pointing out the implications.

BTW, I noticed there is a standard RC for NIC VFs [2].

[1] https://storyboard.openstack.org/#!/story/2005575
[2] 
https://github.com/openstack/os-resource-classes/blob/master/os_resource_classes/__init__.py#L49 

> ==============
>
> With the custom resource classes, I'm having a hard time understanding
> the model. How unique are the _PF$N bits? Do they repeat (a) from one
> NIC to the next? (b) From one host to the next? (c) Never?
>
> The only thing that begins to make sense is (a), because (b) and (c)
> would lead to skittles. So assuming (a), the model would look something
> like:
Yes, (a) is what I had in mind.
> CN
> |
> +---NIC1 (trait: I_AM_A_NIC)
> |     |
> |     +-----PF1_1 (trait: CUSTOM_PHYSNET1, inventory: CUSTOM_PF1_VF=4)
> |     |
> |     +-----PF1_2 (trait: CUSTOM_PHYSNET2, inventory: CUSTOM_PF2_VF=4)
> |
> +---NIC2 (trait: I_AM_A_NIC)
>        |
>        +-----PF2_1 (trait: CUSTOM_PHYSNET1, inventory: CUSTOM_PF1_VF=4)
>        |
>        +-----PF2_2 (trait: CUSTOM_PHYSNET2, inventory: CUSTOM_PF2_VF=4)
>
> Now you could get the same result with (essentially) the same request as
> above:
>
>   ?resources_T1=CUSTOM_PF1_VF:1
>   &required_T1=CUSTOM_PHYSNET1
>   &resources_T2=CUSTOM_PF2_VF:1
>   &required_T2=CUSTOM_PHYSNET2
>   &required_T3=I_AM_A_NIC
>   &same_subtree=','.join([suffix for suffix in suffixes if
> suffix.startswith('_T')]) (i.e. '_T1,_T2,_T3')
>
> ==>
>
>   - {PF1_1: CUSTOM_PF1_VF=1, PF1_2: CUSTOM_PF2_VF=1}
>   - {PF2_1: CUSTOM_PF1_VF=1, PF2_2: CUSTOM_PF2_VF=1}
>
> ...except that in this model, PF$N corresponds to PHYSNET$N, so you
> wouldn't actually need the required_T$N=CUSTOM_PHYSNET$N to get the same
> result:
>
>   ?resources_T1=CUSTOM_PF1_VF:1
>   &resources_T2=CUSTOM_PF2_VF:1
>   &required_T3=I_AM_A_NIC
>   &same_subtree=','.join([suffix for suffix in suffixes if
> suffix.startswith('_T')]) (i.e. '_T1,_T2,_T3')
>
> ...because you're effectively encoding the physnet into the RC. Which is
> not good IMO.
>
> But either way...
>
>> Do I have to create a 'resourceless RP' for the NIC card that contains
> the individual PF RPs as children nodes?
>
> ...if you want to be able to request this kind of affinity, then yes,
> you do (unless there's some consumable resource on the NIC, in which
> case it's not resourceless, but the spirit is the same). This is exactly
> what these features are being designed for.

Great. Thank you very much for the detailed reply.

Regards,
Sundar
> Thanks,
> efried
> .
>



More information about the openstack-discuss mailing list