Re: [nova][neutron][cyborg] Bandwidth (and accel) providers are broken if CONF.host is set

26 Nov 2019

      On Tue, 2019-11-26 at 16:57 +0100, Bence Romsics wrote:
...
Hi All,
After gibi opened the neutron bug [1] I assigned it to myself and now
I got to summarize my plan to fix it. Please comment either in a reply
to this or in the bug report if you see problems, have questions, etc.
(1) Extend ovs and sriov agents config with
'resource_provider_hypervisors' for example:
ml2_conf.ini
[ovs]
bridge_mappings = physnet0:br-physnet0,...
resource_provider_bandwidths = br-physnet0:10000000:10000000,...
resource_provider_hypervisors = physnet0:hypervisor0,... # this is
im guessing you are adding this for the ironic smart nic usecasue but i dont think this makes sense.
...
new, values default to socket.gethostname() for each key in
resource_provider_bandwidths
sriov_agent.ini
[sriov_nic]
physical_device_mappings = physnet1:ens5,...
resource_provider_bandwidths = ens5:40000000:40000000,...
resource_provider_hypervisors = physnet1:hypervisor1,... # this is
new, defaults as above
The values for resource_provider_hypervisors are opaque identifiers
for neutron. Since each physical device can have its own hypervisor
associated possible features like ovs-superagent (for smartnics) could
be supported.
it could be i think this is a rather poor way to facilitate that.
...
Use of socket.gethostname() is only hardcoded as a
default, so non-libvirt hypervisors are taken care of.
in the non irionc smarnic case where ovs is deployed localy there will be only 1 hypervior
listed for all physnets on the hsot.

in the ironic case i dont think it makes sense as physntes span multiple hyperviors
and we dont use the physnet today as part of the lookup so even in this case mapping physnets to a list of hyperviour
does not make senses.
what would make sense woudl be a mapping between CONF.host and the hypervior hostname but i honestly think we shouyd not
require a config option at all. if we simply report the hypervior_hostname by calling socket.gethostname
that shoudl be sufficent. the nova api will be extended so you can get the compute node uuid by using the value in
CONF.host so this new hypervior hostname is only for backport reasons to have a way to wrokaround the lack of query
support in the /os-hyperviors api in stein and train.

the other config option that might make sense woudl be the root resource provider uuid but we dont want to require
that nova is installed the compute node started before you can generate teh neutron agent conf so i dont think that is a
good approch.

the pyysnet is not a correct key to use for resource_provider_hypervisors
the requirement that the conf.host match on both nova and neutron is not livirt sepecific.
nova and neutron need to agree on the name of the host for the port bidnign extention to work correctly.
...
(2) Extend the report_state message's configurations field alike:
{
'bridge_mappings': {'physnet0': 'br-physnet0'},
'resource_provider_bandwidths': {'br-physnet0': {'egress': 10000000,
'ingress': 10000000}},
'resource_provider_hypervisors': {'br-physnet0': 'hypervisor0'},
again i think this should be 'resource_provider_name': 'hypervisor0',
or 'resource_provider_hypervisors': {"value set in CONF.host": 'hypervisor0', ... },
...
'resource_provider_inventory_defaults': {'allocation_ratio': 1.0,
'min_unit': 1, 'step_size': 1, 'reserved': 0}
}
Do not touch the host field of the same message.
Since we always treated the configurations field as free format, IMO
changes to it should be backportable. Let me know if you think
otherwise.
yes i think they should be backporatble
(3) In neutron-server
report_state.host is used in binding as now - no change here.
+1
report_state.configurations.resource_provider_hypervisors.PHYSDEV
for a given host all phynest or network interface will have the same
hypervior hostname.
if we had a resource_provider_name config we would only need to use it if its
value did not match the existing host value so we should just do this
name = report_state.configurations.get('resource_provider_name', report_state.host)
...
to
be used in selecting parent resource provider for agent and physdev
RP-tree. When not available in the message still fall back to using
report_state.host as today.
...
(4) At the moment I don't see the need to use the proposed new nova
API to query hypervisors managed by a nova-compute since as soon as it
returns 1+ hypervisors neutron cannot do anything with the result.
it would only do that for ironic and there is no current usage of placement via neutron with ironic.
i dont think the config option proposed above is a better solution so i think we need to continue to disucss
this.
...
Cheers,
Bence
[1] https://bugs.launchpad.net/neutron/+bug/1853840

Re: [nova][neutron][cyborg] Bandwidth (and accel) providers are broken if CONF.host is set

Sean Mooney