[nova][neutron][cyborg] Bandwidth (and accel) providers are broken if CONF.host is set
openstack at fried.cc
Thu Nov 21 23:28:12 UTC 2019
Hi all. Sean mentioned a "downstream bug" in IRC today , which we
discussed a little bit, but without gibi; and then Matt and I discussed
it more later , but without gibi *or* Sean. Since I don't know if
there's a bug report I can comment on, I wanted to summarize here for
now so I don't forget.
Neutron needs to know the compute node resource provider on which to
hang the child providers for QoS bandwidth. Today it assumes CONF.host
is the name of that provider.
The name of the provider is `hypervisor_hostname`, which for libvirt 
happens to match the *default* value of CONF.host .
Per the bug Sean describes, if you override CONF.host, neutron won't
find the compute node provider, and things break.
The problem will be the same for any non-nova wishing to discover the
compute node RP -- e.g. cyborg for purposes of creating child providers
The Right Solution
Neutron (and any $service) should look up the compute node provider by
its UUID. That's returned by the /os-hypervisors APIs after microversion
2.53, e.g. , but, Catch-22, you can currently only filter those
results on hypervisor_hostname. So you would have to e.g. GET
/os-hypervisors/detail and then walk the list looking for service.host
matching CONF.host. That's way heavy for your CERNs.
So the proposal moving forward is to add (in a new microversion) a
?service_host=XXX qparam to those APIs to let you filter down to just
the one entry for your CONF.host. The UUID of that entry will also be
the UUID of the compute node resource provider. (At that point you don't
even need to ask Placement for that provider; you can just use that UUID
directly in the APIs that create the child providers. Yay, you got your
extra API call back.)
Now, that's not backportable, and this problem exists in stable releases
(at least those that support QoS bandwidth). So we should totally do it,
but we also need...
The Backportable Solution
Neutron should use `gethostname()` rather than CONF.host to discover the
compute node resource provider.
I don't consider this a viable permanent solution because it is tightly
coupled to knowing that hypervisor_hostname == `gethostname()`, which
happens to be true for libvirt, but not necessarily for other drivers.
We can get away with it for stable because we happen to know that we're
only supporting QoS bandwidth via Placement for libvirt.
Matt and I didn't nail down whether neutron and compute are allowed to
be at different versions on a given host, or what those are allowed to
be. But things should be sane if neutron (or any $service) logics like
this in >=ussuri:
If the above sounds reasonable, it would entail the following actions:
- Neutron(/Cyborg?): backportable patch to s/CONF.host/socket.gethostname()/
- Nova: GET /os-hypervisors*?service_host=X in a new microversion.
- Neutron/Cyborg: master-only patch to do the logic described in
`Upgrade Concerns`_ (though for now without the `elif` branch).
More information about the openstack-discuss