Hi all. Sean mentioned a "downstream bug" in IRC today [1], which we discussed a little bit, but without gibi; and then Matt and I discussed it more later [2], but without gibi *or* Sean. Since I don't know if there's a bug report I can comment on, I wanted to summarize here for now so I don't forget. The Problem =========== Neutron needs to know the compute node resource provider on which to hang the child providers for QoS bandwidth. Today it assumes CONF.host is the name of that provider. That's wrong. The name of the provider is `hypervisor_hostname`, which for libvirt [3] happens to match the *default* value of CONF.host [4]. Per the bug Sean describes, if you override CONF.host, neutron won't find the compute node provider, and things break. The problem will be the same for any non-nova wishing to discover the compute node RP -- e.g. cyborg for purposes of creating child providers for accelerators. The Right Solution ================== Neutron (and any $service) should look up the compute node provider by its UUID. That's returned by the /os-hypervisors APIs after microversion 2.53, e.g. [5], but, Catch-22, you can currently only filter those results on hypervisor_hostname. So you would have to e.g. GET /os-hypervisors/detail and then walk the list looking for service.host matching CONF.host. That's way heavy for your CERNs. So the proposal moving forward is to add (in a new microversion) a ?service_host=XXX qparam to those APIs to let you filter down to just the one entry for your CONF.host. The UUID of that entry will also be the UUID of the compute node resource provider. (At that point you don't even need to ask Placement for that provider; you can just use that UUID directly in the APIs that create the child providers. Yay, you got your extra API call back.) Now, that's not backportable, and this problem exists in stable releases (at least those that support QoS bandwidth). So we should totally do it, but we also need... The Backportable Solution ========================= Neutron should use `gethostname()` rather than CONF.host to discover the compute node resource provider. I don't consider this a viable permanent solution because it is tightly coupled to knowing that hypervisor_hostname == `gethostname()`, which happens to be true for libvirt, but not necessarily for other drivers. We can get away with it for stable because we happen to know that we're only supporting QoS bandwidth via Placement for libvirt. Upgrade Concerns ================ Matt and I didn't nail down whether neutron and compute are allowed to be at different versions on a given host, or what those are allowed to be. But things should be sane if neutron (or any $service) logics like this in >=ussuri: if new_nova_microversion_available: do_the_os_hypervisors_thing() elif using_new_non_libvirt_feature: raise YouCantDoThisWithOldNova() else: do_the_gethostname_thing() Action Summary ============== If the above sounds reasonable, it would entail the following actions: - Neutron(/Cyborg?): backportable patch to s/CONF.host/socket.gethostname()/ - Nova: GET /os-hypervisors*?service_host=X in a new microversion. - Neutron/Cyborg: master-only patch to do the logic described in `Upgrade Concerns`_ (though for now without the `elif` branch). Thanks, efried [1] http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2... [2] http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2... [3] https://opendev.org/openstack/nova/src/commit/1cd5563f2dd2b218db2422397c8aab... [4] https://opendev.org/openstack/nova/src/commit/1cd5563f2dd2b218db2422397c8aab... [5] https://docs.openstack.org/api-ref/compute/?expanded=list-hypervisors-detail...