[nova][neutron][cyborg] Bandwidth (and accel) providers are broken if CONF.host is set

newer
[nova] There is no screen for UEFI...

Eric Fried

22 Nov 2019 22 Nov '19

2:58 a.m.

Hi all. Sean mentioned a "downstream bug" in IRC today [1], which we discussed a little bit, but without gibi; and then Matt and I discussed it more later [2], but without gibi *or* Sean. Since I don't know if there's a bug report I can comment on, I wanted to summarize here for now so I don't forget.

The Problem =========== Neutron needs to know the compute node resource provider on which to hang the child providers for QoS bandwidth. Today it assumes CONF.host is the name of that provider.

That's wrong.

The name of the provider is `hypervisor_hostname`, which for libvirt [3] happens to match the *default* value of CONF.host [4].

Per the bug Sean describes, if you override CONF.host, neutron won't find the compute node provider, and things break.

The problem will be the same for any non-nova wishing to discover the compute node RP -- e.g. cyborg for purposes of creating child providers for accelerators.

The Right Solution ================== Neutron (and any $service) should look up the compute node provider by its UUID. That's returned by the /os-hypervisors APIs after microversion 2.53, e.g. [5], but, Catch-22, you can currently only filter those results on hypervisor_hostname. So you would have to e.g. GET /os-hypervisors/detail and then walk the list looking for service.host matching CONF.host. That's way heavy for your CERNs.

So the proposal moving forward is to add (in a new microversion) a ?service_host=XXX qparam to those APIs to let you filter down to just the one entry for your CONF.host. The UUID of that entry will also be the UUID of the compute node resource provider. (At that point you don't even need to ask Placement for that provider; you can just use that UUID directly in the APIs that create the child providers. Yay, you got your extra API call back.)

Now, that's not backportable, and this problem exists in stable releases (at least those that support QoS bandwidth). So we should totally do it, but we also need...

The Backportable Solution ========================= Neutron should use `gethostname()` rather than CONF.host to discover the compute node resource provider.

I don't consider this a viable permanent solution because it is tightly coupled to knowing that hypervisor_hostname == `gethostname()`, which happens to be true for libvirt, but not necessarily for other drivers. We can get away with it for stable because we happen to know that we're only supporting QoS bandwidth via Placement for libvirt.

Upgrade Concerns ================ Matt and I didn't nail down whether neutron and compute are allowed to be at different versions on a given host, or what those are allowed to be. But things should be sane if neutron (or any $service) logics like this in >=ussuri:

if new_nova_microversion_available: do_the_os_hypervisors_thing() elif using_new_non_libvirt_feature: raise YouCantDoThisWithOldNova() else: do_the_gethostname_thing()

Action Summary ============== If the above sounds reasonable, it would entail the following actions: - Neutron(/Cyborg?): backportable patch to s/CONF.host/socket.gethostname()/ - Nova: GET /os-hypervisors*?service_host=X in a new microversion. - Neutron/Cyborg: master-only patch to do the logic described in `Upgrade Concerns`_ (though for now without the `elif` branch).

Thanks, efried

[1] http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2... [2] http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2... [3] https://opendev.org/openstack/nova/src/commit/1cd5563f2dd2b218db2422397c8aab... [4] https://opendev.org/openstack/nova/src/commit/1cd5563f2dd2b218db2422397c8aab... [5] https://docs.openstack.org/api-ref/compute/?expanded=list-hypervisors-detail...

Show replies by date

Matt Riedemann

22 Nov 22 Nov

4:23 a.m.

On 11/21/2019 5:28 PM, Eric Fried wrote:

...

So the proposal moving forward is to add (in a new microversion) a ?service_host=XXX qparam to those APIs to let you filter down to just the one entry for your CONF.host.

I couldn't find the nova bug for it (if there was one, but I could have sworn there was if even for docs) but at some point someone was asking for this kind of filtering by compute service host API change for ironic nodes as well because today you can only filter hypervisors by the hypervisor_hostname which for ironic nodes is the node uuid. But if you want to list all nodes managed by a given compute service, you need to filter on the compute service host which is in the response but not a filter parameter in the API and is what Eric is proposing here.

Point being, this would be useful even without the necessity for this nested provider wonkadoo problem.

-- Thanks, Matt

Nadathur, Sundar

5:41 a.m.

...

-----Original Message----- From: Eric Fried openstack@fried.cc Sent: Thursday, November 21, 2019 3:28 PM

...

Action Summary

If the above sounds reasonable, it would entail the following actions:

Neutron(/Cyborg?): backportable patch to

s/CONF.host/socket.gethostname()/

Nova: GET /os-hypervisors*?service_host=X in a new microversion.

Neutron/Cyborg: master-only patch to do the logic described in `Upgrade

Concerns`_ (though for now without the `elif` branch).

Cyborg does use CONF.host today. We can use socket.gethostname() instead.

On a related node, the patch for ARQ binding [1] uses instance.host, which comes from CONF.host. I'll change it to instance.node, which comes from [2], which in turn comes from get_hostname() [3].

[1] https://review.opendev.org/#/c/631244/46/nova/compute/manager.py@2634 [2] https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L9137 [3] https://github.com/openstack/nova/blob/1cd5563f2dd2b218db2422397c8aab394d484...

...

Thanks, efried

Thanks & Regards, Sundar

Sean Mooney

4:09 p.m.

On Fri, 2019-11-22 at 02:11 +0000, Nadathur, Sundar wrote:

...

...
-----Original Message----- From: Eric Fried openstack@fried.cc Sent: Thursday, November 21, 2019 3:28 PM Action Summary ============== If the above sounds reasonable, it would entail the following actions:

Neutron(/Cyborg?): backportable patch to

s/CONF.host/socket.gethostname()/

Nova: GET /os-hypervisors*?service_host=X in a new microversion.

Neutron/Cyborg: master-only patch to do the logic described in `Upgrade

Concerns`_ (though for now without the `elif` branch).

Cyborg does use CONF.host today. We can use socket.gethostname() instead.

On a related node, the patch for ARQ binding [1] uses instance.host, which comes from CONF.host. I'll change it to instance.node, which comes from [2], which in turn comes from get_hostname() [3].

for comunication with neutorn and cinder nova uses CONF.host so cyborg shoudl still use the CONF.host for acclerators but for comunicating with placemet we use the hypervior_hostname. so in the cyborg case we choudl chage form socket.getFQDN to socket.gethostname as the defualt for CONF.host but simply change the get_root_prover function so that it uses the hypervisor api to determin the uuid of the RP

...

[1] https://review.opendev.org/#/c/631244/46/nova/compute/manager.py@2634 [2] https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L9137 [3] https://github.com/openstack/nova/blob/1cd5563f2dd2b218db2422397c8aab394d484...

...
Thanks, efried

Thanks & Regards, Sundar

Balázs Gibizer

2:46 p.m.

On Thu, Nov 21, 2019 at 17:28, Eric Fried openstack@fried.cc wrote:

...

Hi all. Sean mentioned a "downstream bug" in IRC today [1], which we discussed a little bit, but without gibi; and then Matt and I discussed it more later [2], but without gibi *or* Sean. Since I don't know if there's a bug report I can comment on, I wanted to summarize here for now so I don't forget.

The Problem

Neutron needs to know the compute node resource provider on which to hang the child providers for QoS bandwidth. Today it assumes CONF.host is the name of that provider.

That's wrong.

The name of the provider is `hypervisor_hostname`, which for libvirt [3] happens to match the *default* value of CONF.host [4].

Per the bug Sean describes, if you override CONF.host, neutron won't find the compute node provider, and things break.

The problem will be the same for any non-nova wishing to discover the compute node RP -- e.g. cyborg for purposes of creating child providers for accelerators.

As a background I'm pretty sure I made this mistake due to looking at how nova and neutron does the port binding. There nova sends the instance.host (which is coming from the CONF.host) to neutron in the binding:host_id field of the neutron port. The difference between the binding and the inventory handling is that port.binding:host_id is used by neutron to figure out which service host will run the nova-compute service as well as the neutron agent. So there the CONF.host is enough. Libvirt also runs on the same physical host as nova-compute and the neutron agent. BUT while for nova and neutron the hostname can be overwritten by setting the CONF.host to an arbitrary string, for libvirt such configuration cannot be applied. Libvirt will always use the linux's gethostname call [6].

...

The Right Solution

Neutron (and any $service) should look up the compute node provider by its UUID. That's returned by the /os-hypervisors APIs after microversion 2.53, e.g. [5], but, Catch-22, you can currently only filter those results on hypervisor_hostname. So you would have to e.g. GET /os-hypervisors/detail and then walk the list looking for service.host matching CONF.host. That's way heavy for your CERNs.

So the proposal moving forward is to add (in a new microversion) a ?service_host=XXX qparam to those APIs to let you filter down to just the one entry for your CONF.host. The UUID of that entry will also be the UUID of the compute node resource provider. (At that point you don't even need to ask Placement for that provider; you can just use that UUID directly in the APIs that create the child providers. Yay, you got your extra API call back.)

The result of ?service_host=XXX cannot be a single compute node. It needs to be a list of compute nodes (due to ironic) and neutron needs to *assume* that in any case where neutron does this query there is no more than a single item in the returned list (e.g. neutron cannot mix with ironic).

...

Now, that's not backportable, and this problem exists in stable releases (at least those that support QoS bandwidth). So we should totally do it, but we also need...

The Backportable Solution

Neutron should use `gethostname()` rather than CONF.host to discover the compute node resource provider.

In theory this sounds a good solution as it will align the nova + libvirt behavior with the neutron behavior regarding placement RP naming.

Where the whole thing goes sideways is the fact that neutron does the inventory handling up in the neutron server. Neutron does the compute RP lookup, child RP creation and inventory reporting based on the information the agents are sending up to the neutron server. As in neutron there is no differentiation between compute/agent service host and hypervisor node the agents sends up a single hostname[7][9] based on the CONF.host configuration[8]. So adding a new piece of information to the agent report is most probably an RPC change between neutron agents and the neutron server.

@Neutron, @Stable: Does such RPC change is backportable in neutron?

...

I don't consider this a viable permanent solution because it is tightly coupled to knowing that hypervisor_hostname == `gethostname()`, which happens to be true for libvirt, but not necessarily for other drivers. We can get away with it for stable because we happen to know that we're only supporting QoS bandwidth via Placement for libvirt.

Temporary workaround ====================

If you configure bandwidth in neutron then do not change the CONF.host config in your computes. I think we have to document that as a limitation here [10].

...

Upgrade Concerns

Matt and I didn't nail down whether neutron and compute are allowed to be at different versions on a given host, or what those are allowed to be. But things should be sane if neutron (or any $service) logics like this in >=ussuri:

if new_nova_microversion_available: do_the_os_hypervisors_thing() elif using_new_non_libvirt_feature: raise YouCantDoThisWithOldNova() else: do_the_gethostname_thing()

The 'if' part looks OK to me and when we find a backportable solution, that needs to be in the 'else' branch.

...

Action Summary

If the above sounds reasonable, it would entail the following actions:

Neutron(/Cyborg?): backportable patch to

s/CONF.host/socket.gethostname()/

Nova: GET /os-hypervisors*?service_host=X in a new microversion.

I guess that will be me, starting with a spec that outlines the new API.

...

Neutron/Cyborg: master-only patch to do the logic described in

`Upgrade Concerns`_ (though for now without the `elif` branch).

Thanks, efried

[1] http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2... [2] http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2... [3] https://opendev.org/openstack/nova/src/commit/1cd5563f2dd2b218db2422397c8aab... [4] https://opendev.org/openstack/nova/src/commit/1cd5563f2dd2b218db2422397c8aab... [5] https://docs.openstack.org/api-ref/compute/?expanded=list-hypervisors-detail...

@Sean: thanks for finding the bug! @Eric: thanks for the good write up!

Cheers, gibi

[6] https://libvirt.org/docs/libvirt-appdev-guide-python/en-US/html/libvirt_appl... [7] https://github.com/openstack/neutron/blob/67b613b795416406fb4fab143b3ec9ba86... [8] https://github.com/openstack/neutron/blob/67b613b795416406fb4fab143b3ec9ba86... [9] https://github.com/openstack/neutron/blob/67b613b795416406fb4fab143b3ec9ba86... [10] https://docs.openstack.org/neutron/latest/admin/config-qos-min-bw.html#limit...

Sean Mooney

4:36 p.m.

On Fri, 2019-11-22 at 11:16 +0000, Balázs Gibizer wrote:

...

On Thu, Nov 21, 2019 at 17:28, Eric Fried openstack@fried.cc wrote:

...
Hi all. Sean mentioned a "downstream bug" in IRC today [1], which we discussed a little bit, but without gibi; and then Matt and I discussed it more later [2], but without gibi *or* Sean. Since I don't know if there's a bug report I can comment on, I wanted to summarize here for now so I don't forget.

The Problem

Neutron needs to know the compute node resource provider on which to hang the child providers for QoS bandwidth. Today it assumes CONF.host is the name of that provider.

That's wrong.

The name of the provider is `hypervisor_hostname`, which for libvirt [3] happens to match the *default* value of CONF.host [4].

Per the bug Sean describes, if you override CONF.host, neutron won't find the compute node provider, and things break.

The problem will be the same for any non-nova wishing to discover the compute node RP -- e.g. cyborg for purposes of creating child providers for accelerators.

As a background I'm pretty sure I made this mistake due to looking at how nova and neutron does the port binding. There nova sends the instance.host (which is coming from the CONF.host) to neutron in the binding:host_id field of the neutron port. The difference between the binding and the inventory handling is that port.binding:host_id is used by neutron to figure out which service host will run the nova-compute service as well as the neutron agent. So there the CONF.host is enough. Libvirt also runs on the same physical host as nova-compute and the neutron agent. BUT while for nova and neutron the hostname can be overwritten by setting the CONF.host to an arbitrary string, for libvirt such configuration cannot be applied. Libvirt will always use the linux's gethostname call [6].

right so normally triplo change /etc/hostname to match the value it sets in CONF.host via we think could-init (i try to avoid triploe so have not fully avlidated that) as a result in a typeical tripleo deployment this will work because gethostname and CONF.host happen to align. i dont think any other installl set the CONF.host value by default. we even disucssed removing it at the dublin ptg. the reason we found this is that triplo in standalone more does not update the hostname but did set the CONF.host since in standalone mode you have to deploy the os yourself and therefero there is not cloud-init step to update the hostname so its whatever you set in your install or get via dhcp.

...

...
The Right Solution

Neutron (and any $service) should look up the compute node provider by its UUID. That's returned by the /os-hypervisors APIs after microversion 2.53, e.g. [5], but, Catch-22, you can currently only filter those results on hypervisor_hostname. So you would have to e.g. GET /os-hypervisors/detail and then walk the list looking for service.host matching CONF.host. That's way heavy for your CERNs.

So the proposal moving forward is to add (in a new microversion) a ?service_host=XXX qparam to those APIs to let you filter down to just the one entry for your CONF.host. The UUID of that entry will also be the UUID of the compute node resource provider. (At that point you don't even need to ask Placement for that provider; you can just use that UUID directly in the APIs that create the child providers. Yay, you got your extra API call back.)

The result of ?service_host=XXX cannot be a single compute node. It needs to be a list of compute nodes (due to ironic) and neutron needs to *assume* that in any case where neutron does this query there is no more than a single item in the returned list (e.g. neutron cannot mix with ironic).

so that is problematic or will be in the futrue since there are efforets to allow contol of smartnics via neutron for ironic. in the general case however yes. what i think we need to do is extend that agent report to contain a hyperviour_hostname filed in addtion to the host filed so that in the ironic case you can set the node uuid and in the non ironic case you can set the hostname as returned by socket.gethostname()

that said i dont know if we have to handel the ironic + smartnic case currently as i dont think minium bandwith work in that case or even really makes sense. you will get the entire nic in that case regardless.

...

...
Now, that's not backportable, and this problem exists in stable releases (at least those that support QoS bandwidth). So we should totally do it, but we also need...

The Backportable Solution

Neutron should use `gethostname()` rather than CONF.host to discover the compute node resource provider.

In theory this sounds a good solution as it will align the nova + libvirt behavior with the neutron behavior regarding placement RP naming.

Where the whole thing goes sideways is the fact that neutron does the inventory handling up in the neutron server. Neutron does the compute RP lookup, child RP creation and inventory reporting based on the information the agents are sending up to the neutron server. As in neutron there is no differentiation between compute/agent service host and hypervisor node the agents sends up a single hostname[7][9] based on the CONF.host configuration[8]. So adding a new piece of information to the agent report is most probably an RPC change between neutron agents and the neutron server.

@Neutron, @Stable: Does such RPC change is backportable in neutron?

right so i mentions above that we would need to extend the agent report with a hyperviour_hostname but that is only really needed for the ironic case and currenlty neutron does not support min bandwith with ironic.

fortunetly the agent report/agent state is an unversioned dictionary of random stings

https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers... or at least the configuration dict inside is. i think the stucture as a whole is largely unversioned so we shoudl be able to extend it without breaking anything if we need too again its up to the neutron/stable folks to determin but i belive it would be fine as addtional files woudl be ignored by the server and ml2 dirvers.

...

...
I don't consider this a viable permanent solution because it is tightly coupled to knowing that hypervisor_hostname == `gethostname()`, which happens to be true for libvirt, but not necessarily for other drivers. We can get away with it for stable because we happen to know that we're only supporting QoS bandwidth via Placement for libvirt.

Temporary workaround

If you configure bandwidth in neutron then do not change the CONF.host config in your computes. I think we have to document that as a limitation here [10].

well dont change it without actully also changing the hostnaem in /etc/hostname or systemd where ever that is handeled these days. as long as it is set to the same value that will be returned by libvirt or your virt driver of choice you will be fine. granted at that point the default value would also work but tipleo are trying to force it to be the full FQDN for some reason.

...

...
Upgrade Concerns

Matt and I didn't nail down whether neutron and compute are allowed to be at different versions on a given host, or what those are allowed to be. But things should be sane if neutron (or any $service) logics like this in >=ussuri:

if new_nova_microversion_available: do_the_os_hypervisors_thing() elif using_new_non_libvirt_feature: raise YouCantDoThisWithOldNova()

you could do the full hypervour list in this case and cache the result so you do it once per host. that would suck on startup but its an option.

...

...
else: do_the_gethostname_thing()

The 'if' part looks OK to me and when we find a backportable solution, that needs to be in the 'else' branch.

...
Action Summary

If the above sounds reasonable, it would entail the following actions:

Neutron(/Cyborg?): backportable patch to

s/CONF.host/socket.gethostname()/

Nova: GET /os-hypervisors*?service_host=X in a new microversion.

I guess that will be me, starting with a spec that outlines the new API.

...

Neutron/Cyborg: master-only patch to do the logic described in

`Upgrade Concerns`_ (though for now without the `elif` branch).

Thanks, efried

[1] http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2... [2] http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2... [3]

https://opendev.org/openstack/nova/src/commit/1cd5563f2dd2b218db2422397c8aab...

...

...
[4] https://opendev.org/openstack/nova/src/commit/1cd5563f2dd2b218db2422397c8aab... [5] https://docs.openstack.org/api-ref/compute/?expanded=list-hypervisors-detail...

@Sean: thanks for finding the bug! @Eric: thanks for the good write up!

Cheers, gibi

[6]

https://libvirt.org/docs/libvirt-appdev-guide-python/en-US/html/libvirt_appl...

...

[7]

https://github.com/openstack/neutron/blob/67b613b795416406fb4fab143b3ec9ba86...

...

[8]

https://github.com/openstack/neutron/blob/67b613b795416406fb4fab143b3ec9ba86...

...

[9]

https://github.com/openstack/neutron/blob/67b613b795416406fb4fab143b3ec9ba86...

...

[10] https://docs.openstack.org/neutron/latest/admin/config-qos-min-bw.html#limit...

Balázs Gibizer

7:44 p.m.

On Fri, Nov 22, 2019 at 11:16, Balázs Gibizer balazs.gibizer@est.tech wrote:

...

On Thu, Nov 21, 2019 at 17:28, Eric Fried openstack@fried.cc wrote:

...
Action Summary

If the above sounds reasonable, it would entail the following actions:

Neutron(/Cyborg?): backportable patch to

s/CONF.host/socket.gethostname()/

Nova: GET /os-hypervisors*?service_host=X in a new microversion.

I guess that will be me, starting with a spec that outlines the new API.

Opened a blueprint[1], pushed up a small spec[2], and a WIP implementation[3] for the new API.

Cheers, gibi

[1] https://blueprints.launchpad.net/nova/+spec/filter-hypervisors-by-service-ho... [2] https://review.opendev.org/#/c/695716 [3] https://review.opendev.org/#/c/695708

Matt Riedemann

25 Nov 25 Nov

5:21 p.m.

On 11/22/2019 10:14 AM, Balázs Gibizer wrote:

...

Opened a blueprint[1], pushed up a small spec[2], and a WIP implementation[3] for the new API.

Cheers, gibi

[1] https://blueprints.launchpad.net/nova/+spec/filter-hypervisors-by-service-ho... [2]https://review.opendev.org/#/c/695716

The contentious point on the spec seems to be that it currently proposes to keep the same (incorrect) behavior where if filtering by hypevisor_hostname_pattern yields no results the API returns a 404 error today when it should just return an empty list. The question in the spec is then if when we add service_host filtering, if there are no results do we:

1. 404 (that's what the spec proposes to be consistent with existing [odd] behavior)

2. Return an empty list and in the same microversion change that 404 -> empty response behavior even if not filtering by service_host (so just filtering by hypervisor_hostname_pattern).

From the review it sounds like most people (myself, Alex and Chris) think we should go with the latter and not perpetuate the former sin.

-- Thanks, Matt

Sean Mooney

6:47 p.m.

On Mon, 2019-11-25 at 07:51 -0600, Matt Riedemann wrote:

...

On 11/22/2019 10:14 AM, Balázs Gibizer wrote:

...
Opened a blueprint[1], pushed up a small spec[2], and a WIP implementation[3] for the new API.

Cheers, gibi

[1] https://blueprints.launchpad.net/nova/+spec/filter-hypervisors-by-service-ho... [2]https://review.opendev.org/#/c/695716

The contentious point on the spec seems to be that it currently proposes to keep the same (incorrect) behavior where if filtering by hypevisor_hostname_pattern yields no results the API returns a 404 error today when it should just return an empty list. The question in the spec is then if when we add service_host filtering, if there are no results do we:

404 (that's what the spec proposes to be consistent with existing

[odd] behavior)

Return an empty list and in the same microversion change that 404 ->

empty response behavior even if not filtering by service_host (so just filtering by hypervisor_hostname_pattern).

From the review it sounds like most people (myself, Alex and Chris) think we should go with the latter and not perpetuate the former sin.

for what its worth that would be my preference as well.

...

Eric Fried

8:15 p.m.

...

...

Return an empty list and in the same microversion change that 404 ->

empty response behavior even if not filtering by service_host (so just filtering by hypervisor_hostname_pattern).

From the review it sounds like most people (myself, Alex and Chris) think we should go with the latter and not perpetuate the former sin.

for what its worth that would be my preference as well.

Bence Romsics

26 Nov 26 Nov

7:27 p.m.

Hi All,

After gibi opened the neutron bug [1] I assigned it to myself and now I got to summarize my plan to fix it. Please comment either in a reply to this or in the bug report if you see problems, have questions, etc.

(1) Extend ovs and sriov agents config with 'resource_provider_hypervisors' for example:

ml2_conf.ini [ovs] bridge_mappings = physnet0:br-physnet0,... resource_provider_bandwidths = br-physnet0:10000000:10000000,... resource_provider_hypervisors = physnet0:hypervisor0,... # this is new, values default to socket.gethostname() for each key in resource_provider_bandwidths

sriov_agent.ini [sriov_nic] physical_device_mappings = physnet1:ens5,... resource_provider_bandwidths = ens5:40000000:40000000,... resource_provider_hypervisors = physnet1:hypervisor1,... # this is new, defaults as above

The values for resource_provider_hypervisors are opaque identifiers for neutron. Since each physical device can have its own hypervisor associated possible features like ovs-superagent (for smartnics) could be supported. Use of socket.gethostname() is only hardcoded as a default, so non-libvirt hypervisors are taken care of.

(2) Extend the report_state message's configurations field alike:

{ 'bridge_mappings': {'physnet0': 'br-physnet0'}, 'resource_provider_bandwidths': {'br-physnet0': {'egress': 10000000, 'ingress': 10000000}}, 'resource_provider_hypervisors': {'br-physnet0': 'hypervisor0'}, 'resource_provider_inventory_defaults': {'allocation_ratio': 1.0, 'min_unit': 1, 'step_size': 1, 'reserved': 0} }

Do not touch the host field of the same message.

Since we always treated the configurations field as free format, IMO changes to it should be backportable. Let me know if you think otherwise.

(3) In neutron-server

report_state.host is used in binding as now - no change here.

report_state.configurations.resource_provider_hypervisors.PHYSDEV to be used in selecting parent resource provider for agent and physdev RP-tree. When not available in the message still fall back to using report_state.host as today.

(4) At the moment I don't see the need to use the proposed new nova API to query hypervisors managed by a nova-compute since as soon as it returns 1+ hypervisors neutron cannot do anything with the result.

Cheers, Bence

[1] https://bugs.launchpad.net/neutron/+bug/1853840

Matt Riedemann

8 p.m.

On 11/26/2019 9:57 AM, Bence Romsics wrote:

...

(4) At the moment I don't see the need to use the proposed new nova API to query hypervisors managed by a nova-compute since as soon as it returns 1+ hypervisors neutron cannot do anything with the result.

The API would only return more than one hypervisor for a given compute service host if that host is managing ironic nodes, which won't be the case if you're looking for nodes managed by a KVM node that supports QoS ports.

The point of the API change is to let neutron do everything automatically and avoid additional configuration, right? The configuration on the neutron side was meant to be a workaround.

-- Thanks, Matt

Sean Mooney

8:23 p.m.

On Tue, 2019-11-26 at 10:30 -0600, Matt Riedemann wrote:

...

On 11/26/2019 9:57 AM, Bence Romsics wrote:

...
(4) At the moment I don't see the need to use the proposed new nova API to query hypervisors managed by a nova-compute since as soon as it returns 1+ hypervisors neutron cannot do anything with the result.

The API would only return more than one hypervisor for a given compute service host if that host is managing ironic nodes, which won't be the case if you're looking for nodes managed by a KVM node that supports QoS ports.

The point of the API change is to let neutron do everything automatically and avoid additional configuration, right? The configuration on the neutron side was meant to be a workaround.

yes this ^

...

Bence Romsics

27 Nov 27 Nov

5:58 p.m.

Hi,

Matt wrote:

...

The point of the API change is to let neutron do everything automatically and avoid additional configuration, right? The configuration on the neutron side was meant to be a workaround.

I had the following reasons to propose using configuration:

(1) Using a nova API from neutron-server means we'd have a startup ordering dependency. Neutron would have to wait for nova to start up. So far it had to wait for placement - there we did not have an alternative. Here we have.

(2) We cannot use a new nova API for fixing this bug in stein and train. If the backportable fix works on master too, do we want to create a different fix for master?

Sean wrote:

...

...
(1) Extend ovs and sriov agents config with 'resource_provider_hypervisors' for example:

ml2_conf.ini [ovs] bridge_mappings = physnet0:br-physnet0,... resource_provider_bandwidths = br-physnet0:10000000:10000000,... resource_provider_hypervisors = physnet0:hypervisor0,... # this is

im guessing you are adding this for the ironic smart nic usecasue but i dont think this makes sense.

That's right. I made a mistake there. Thanks for catching it. I meant keying resource_provider_hypervisors by the physdevs (the values in mappings) not the physnets. For example:

resource_provider_hypervisors = br-physnet0:hypervisor0,...

I understand we don't have use for this right now (neutron with ironic does not use placement), but if we don't allow the 1:N mapping here, then we prohibit the future use of it too.

Cheers, Bence

Sean Mooney

6:29 p.m.

On Wed, 2019-11-27 at 15:28 +0100, Bence Romsics wrote:

...

Hi,

Matt wrote:

...
The point of the API change is to let neutron do everything automatically and avoid additional configuration, right? The configuration on the neutron side was meant to be a workaround.

I had the following reasons to propose using configuration:

(1) Using a nova API from neutron-server means we'd have a startup ordering dependency. Neutron would have to wait for nova to start up. So far it had to wait for placement - there we did not have an alternative. Here we have.

(2) We cannot use a new nova API for fixing this bug in stein and train. If the backportable fix works on master too, do we want to create a different fix for master?

Sean wrote:

...
...
(1) Extend ovs and sriov agents config with 'resource_provider_hypervisors' for example:

ml2_conf.ini [ovs] bridge_mappings = physnet0:br-physnet0,... resource_provider_bandwidths = br-physnet0:10000000:10000000,... resource_provider_hypervisors = physnet0:hypervisor0,... # this is

im guessing you are adding this for the ironic smart nic usecasue but i dont think this makes sense.

That's right. I made a mistake there. Thanks for catching it. I meant keying resource_provider_hypervisors by the physdevs (the values in mappings) not the physnets. For example:

resource_provider_hypervisors = br-physnet0:hypervisor0,...

this also wont work as the same bridge name will exists on multipel hosts

...

I understand we don't have use for this right now (neutron with ironic does not use placement), but if we don't allow the 1:N mapping here, then we prohibit the future use of it too.

Cheers, Bence

Bence Romsics

6:50 p.m.

...

...
resource_provider_hypervisors = br-physnet0:hypervisor0,...

this also wont work as the same bridge name will exists on multipel hosts

Of course the same bridge/nic name can exist on multiple hosts. And each report_state message is clearly belonging to a single agent and the configurations field is persisted per agent, so there won't be a collision ever.

Sean Mooney

8:33 p.m.

On Wed, 2019-11-27 at 16:20 +0100, Bence Romsics wrote:

...

...
...
resource_provider_hypervisors = br-physnet0:hypervisor0,...

this also wont work as the same bridge name will exists on multipel hosts

Of course the same bridge/nic name can exist on multiple hosts. And each report_state message is clearly belonging to a single agent and the configurations field is persisted per agent, so there won't be a collision ever.

that is in the non iroinc smart nic case. in the ironic smart nic case with the ovs super agent which is the only case where there would be multiple hypervisor managed by the same agent the agent will be remote.

so in the non ironic case it does not need to be a list. in the smartnic case it might need to be a list but a mapping of bridge or pyshnet wont be unique and a agent hostname (CONF.host) to hypervior host would be 1:N so its not clear how you would select form the N RPs if all you know form nova is the binding host which is the service host not hypervior hostname.

Balázs Gibizer

28 Nov 28 Nov

12:24 p.m.

On Wed, Nov 27, 2019 at 17:03, Sean Mooney smooney@redhat.com wrote:

...

On Wed, 2019-11-27 at 16:20 +0100, Bence Romsics wrote:

...
...
...
resource_provider_hypervisors = br-physnet0:hypervisor0,...

this also wont work as the same bridge name will exists on

multipel hosts

Of course the same bridge/nic name can exist on multiple hosts. And each report_state message is clearly belonging to a single agent and the configurations field is persisted per agent, so there won't be a collision ever.

that is in the non iroinc smart nic case. in the ironic smart nic case with the ovs super agent which is the only case where there would be multiple hypervisor managed by the same agent the agent will be remote.

When you say "ironic smart nic case with the ovs super agent", do you refer to this abandoned spec [1]?

...

so in the non ironic case it does not need to be a list. in the smartnic case it might need to be a list

In that spec the author proposes [2] not to break the 1-1 mapping between OVS agent and remote OVS. So as far as I see there is no need for a list in this case either.

...

but a mapping of bridge or pyshnet wont be unique and a agent hostname (CONF.host) to hypervior host would be 1:N so its not clear how you would select form the N RPs if all you know form nova is the binding host which is the service host not hypervior hostname.

Are we talking about a problem during binding here? As this feels to be a different problem than from creating device RPs under the proper compute node RP.

Anyhow my simple understanding is the following: * a physical NIC or an OVS integration bridge always belongs to one single hypervisor. While a hypervisor might have more than on physical NIC or an OVS bridge * the identity (e.g. hypervisor hostname) of such hypervisor is known at deployment time * the neutron agent config can have a mapping between the device (NIC or OVS bridge) and the hypervisor identity and this mapping can be sent up to the neutron server via RPC * the neutron agent already sends up the service host name where the agent runs to the neutron server via RPC. * the neutron server knowing the service host and the device -> hypervisor identity mapping can find the compute node RP under which the device RP needs to be created.

@Sean: Where does my list of reasoning breaks from your perspective?

Cheers, gibi

[1] https://review.opendev.org/#/c/595402/5/specs/stein/remote-ovs-agent.rst [2] https://review.opendev.org/#/c/595402/5/specs/stein/remote-ovs-agent.rst@28

Sean Mooney

4:27 p.m.

On Thu, 2019-11-28 at 08:54 +0000, Balázs Gibizer wrote:

...

On Wed, Nov 27, 2019 at 17:03, Sean Mooney smooney@redhat.com wrote:

...
On Wed, 2019-11-27 at 16:20 +0100, Bence Romsics wrote:

...
...
...
resource_provider_hypervisors = br-physnet0:hypervisor0,...

this also wont work as the same bridge name will exists on

multipel hosts

Of course the same bridge/nic name can exist on multiple hosts. And each report_state message is clearly belonging to a single agent and the configurations field is persisted per agent, so there won't be a collision ever.

that is in the non iroinc smart nic case. in the ironic smart nic case with the ovs super agent which is the only case where there would be multiple hypervisor managed by the same agent the agent will be remote.

When you say "ironic smart nic case with the ovs super agent", do you refer to this abandoned spec [1]?

yes and https://review.opendev.org/#/c/595512/2 there are two related abandoned spec that came up in train but i dont think either are progressing anymore.

...

...
so in the non ironic case it does not need to be a list. in the smartnic case it might need to be a list

In that spec the author proposes [2] not to break the 1-1 mapping between OVS agent and remote OVS. So as far as I see there is no need for a list in this case either.

i agree although there was an expression of the desire to allow the agent to manage multiple hosts. its been a while but i belive that is what https://review.opendev.org/#/c/595512/2/specs/stein/scalable-ovs-agent.rst covered.

...

...
but a mapping of bridge or pyshnet wont be unique and a agent hostname (CONF.host) to hypervior host would be 1:N so its not clear how you would select form the N RPs if all you know form nova is the binding host which is the service host not hypervior hostname.

Are we talking about a problem during binding here?

yes and know it would be a proablem during binding as we just pass the service host in the binding host so wew would need to add a binding:hypervior_host also if we wanted port bidnign to work in that case. otherwise we would be changing the meaning of binding:host in ml2/ovs. currenlty it refers to the service host wich is shared bettwen nova and neutron for both the comptue and networking agent. in the agentless case it is used more like the hypervior hostname. odl and i think ovn add info to the agents table in neutron even though they dont have agents to allow per host configuraiton to be expressed. the binding host is used to select that.

anyway in the rp case you have a similar problem. today odl and ovn do not support minium bandwidth, in the futrue if they add it they would have to create an rp per host based on the info in the agents table. if ml2 ovs was extened to have a 1:N mapping between neutorn ovs agent and multpie host the service host set in CONF.host would map to the host the agent is running on not the host the vm is being booted on and you would need some addtional mapping the same way the ironic driver work. in any case https://review.opendev.org/#/c/595512 is also abandoned so i dont think we shoudl try to cater for that case now especially since we want to back port this to stien. if we wanted to support 1:N mappings in the ovs agent and not requrie chagnes in nova would actully want to change CONF.host to be a list and have all the bandwith provider config be keyed off of the service host. you could do this in a number of ways that are not importnat right now like dynamic config but usign the device or physnet are not good was to approch this.

...

As this feels to be a different problem than from creating device RPs under the proper compute node RP.

...

Anyhow my simple understanding is the following:

a physical NIC or an OVS integration bridge always belongs to one

single hypervisor. While a hypervisor might have more than on physical NIC or an OVS bridge

for the most part yes there are way to make that not true when dealing with pcie over rdma and other no production composable infrasture stuff but from an openstack point of view and with relevence to stien and train you are correct.

...

the identity (e.g. hypervisor hostname) of such hypervisor is known

at deployment time

yes, and its often set by the deployment tool although not always as it can be set via dhcp or manually.

...

the neutron agent config can have a mapping between the device (NIC

or OVS bridge) and the hypervisor identity and this mapping can be sent up to the neutron server via RPC

yes although i dont think that is required. i think we jsut need to pass back the hypervior host name and no other info mataion that is not currently in the agent report.

...

the neutron agent already sends up the service host name where the

agent runs to the neutron server via RPC.

yes and neutron uses that for the same usecase that nova does to determin which host the agent run on and match to the host set in the binding_details:host field we set when binding the port.

...

the neutron server knowing the service host and the device ->

hypervisor identity mapping can find the compute node RP under which the device RP needs to be created.

you dont need the the device to hypervior mapping. in a non sriov case you dont typeically have a device that the port is assocaited too just a physnet which is None in the case of tunneled ports. so in ovs or linux bridge it more typical for the prot to be assocated with a segmenation tyep that is assocaited with a bridge that may have an interface attached but its only loosely assocated with a device.

...

@Sean: Where does my list of reasoning breaks from your perspective?

your resoning that id does not need to be a list? if that is you assertion i agree compltely it should not be a list. the once case it could break is if the agent starts to manage multiple host the same way the ironic agent does. however to support that nova would have to change the infor it sets in the port bidnign we would have to set the hypervior host name instead of the service host name. that would be a big change and would require a new api extention in my view so i dont think we should condier it now.

...

Cheers, gibi

[1] https://review.opendev.org/#/c/595402/5/specs/stein/remote-ovs-agent.rst [2] https://review.opendev.org/#/c/595402/5/specs/stein/remote-ovs-agent.rst@28

Sean Mooney

26 Nov 26 Nov

8:14 p.m.

On Tue, 2019-11-26 at 16:57 +0100, Bence Romsics wrote:

...

Hi All,

After gibi opened the neutron bug [1] I assigned it to myself and now I got to summarize my plan to fix it. Please comment either in a reply to this or in the bug report if you see problems, have questions, etc.

(1) Extend ovs and sriov agents config with 'resource_provider_hypervisors' for example:

ml2_conf.ini [ovs] bridge_mappings = physnet0:br-physnet0,... resource_provider_bandwidths = br-physnet0:10000000:10000000,... resource_provider_hypervisors = physnet0:hypervisor0,... # this is

im guessing you are adding this for the ironic smart nic usecasue but i dont think this makes sense.

in the non irionc smarnic case where ovs is deployed localy there will be only 1 hypervior listed for all physnets on the hsot.

in the ironic case i dont think it makes sense as physntes span multiple hyperviors and we dont use the physnet today as part of the lookup so even in this case mapping physnets to a list of hyperviour does not make senses. what would make sense woudl be a mapping between CONF.host and the hypervior hostname but i honestly think we shouyd not require a config option at all. if we simply report the hypervior_hostname by calling socket.gethostname that shoudl be sufficent. the nova api will be extended so you can get the compute node uuid by using the value in CONF.host so this new hypervior hostname is only for backport reasons to have a way to wrokaround the lack of query support in the /os-hyperviors api in stein and train.

the other config option that might make sense woudl be the root resource provider uuid but we dont want to require that nova is installed the compute node started before you can generate teh neutron agent conf so i dont think that is a good approch.

...

new, values default to socket.gethostname() for each key in resource_provider_bandwidths

sriov_agent.ini [sriov_nic] physical_device_mappings = physnet1:ens5,... resource_provider_bandwidths = ens5:40000000:40000000,... resource_provider_hypervisors = physnet1:hypervisor1,... # this is new, defaults as above

The values for resource_provider_hypervisors are opaque identifiers for neutron. Since each physical device can have its own hypervisor associated possible features like ovs-superagent (for smartnics) could be supported.

it could be i think this is a rather poor way to facilitate that. the pyysnet is not a correct key to use for resource_provider_hypervisors

...

Use of socket.gethostname() is only hardcoded as a default, so non-libvirt hypervisors are taken care of.

the requirement that the conf.host match on both nova and neutron is not livirt sepecific. nova and neutron need to agree on the name of the host for the port bidnign extention to work correctly.

...

(2) Extend the report_state message's configurations field alike:

{ 'bridge_mappings': {'physnet0': 'br-physnet0'}, 'resource_provider_bandwidths': {'br-physnet0': {'egress': 10000000, 'ingress': 10000000}}, 'resource_provider_hypervisors': {'br-physnet0': 'hypervisor0'},

again i think this should be 'resource_provider_name': 'hypervisor0', or 'resource_provider_hypervisors': {"value set in CONF.host": 'hypervisor0', ... },

...

'resource_provider_inventory_defaults': {'allocation_ratio': 1.0, 'min_unit': 1, 'step_size': 1, 'reserved': 0} }

Do not touch the host field of the same message.

Since we always treated the configurations field as free format, IMO changes to it should be backportable. Let me know if you think otherwise.

yes i think they should be backporatble

...

(3) In neutron-server

report_state.host is used in binding as now - no change here.

...

report_state.configurations.resource_provider_hypervisors.PHYSDEV

for a given host all phynest or network interface will have the same hypervior hostname.

if we had a resource_provider_name config we would only need to use it if its value did not match the existing host value so we should just do this name = report_state.configurations.get('resource_provider_name', report_state.host)

...

to be used in selecting parent resource provider for agent and physdev RP-tree. When not available in the message still fall back to using report_state.host as today.

...

(4) At the moment I don't see the need to use the proposed new nova API to query hypervisors managed by a nova-compute since as soon as it returns 1+ hypervisors neutron cannot do anything with the result.

it would only do that for ironic and there is no current usage of placement via neutron with ironic. i dont think the config option proposed above is a better solution so i think we need to continue to disucss this.

...

Cheers, Bence

[1] https://bugs.launchpad.net/neutron/+bug/1853840

1766

Age (days ago)

1773

Last active (days ago)

List overview

Download

19 comments

6 participants

tags (0)

participants (6)

Balázs Gibizer
Bence Romsics
Eric Fried
Matt Riedemann
Nadathur, Sundar
Sean Mooney