Hi Takashi, On Thu, 10 Jun 2021 at 15:06, Takashi Kajinami <tkajinam@redhat.com> wrote:
Hi All,
I've been working on bug 1926693[1], and am lost about the reasonable solutions we expect. Ideally I'd need to bring this topic in the team meeting but because of the timezone gap and complicated background, I'd like to gather some feedback in ml first.
[1] https://bugs.launchpad.net/neutron/+bug/1926693
TL;DR Which one(or ones) would be reasonable solutions for this issue ? (1) https://review.opendev.org/c/openstack/neutron/+/763563 (2) https://review.opendev.org/c/openstack/neutron/+/788893 (3) Implement something different
The issue I reported in the bug is that there is an inconsistency between nova and neutron about the way to determine a hypervisor name. Currently neutron uses socket.gethostname() (which always returns shortname)
socket.gethostname() can return fqdn or shortname - https://docs.python.org/3/library/socket.html#socket.gethostname. I've seen cases where it switched from short to fqdn but I'm not sure of the root cause - DHCP lease setting a hostname/domainname perhaps. Thanks, Ollie to determine a hypervisor name to search the corresponding resource
provider. On the other hand, nova uses libvirt's getHostname function (if libvirt driver is used) which returns a canonical name. Canonical name can be shortname or FQDN (*1) and if FQDN is used then neutron and nova never agree.
(*1) IMO this is likely to happen in real deployments. For example, TripelO uses FQDN for canonical names.
Neutron already provides the resource_provider_defauly_hypervisors option to override a hypervisor name used. However because this option accepts a map between interface and hypervisor, setting this parameter requires very redundant description especially when a compute node has multiple interfaces/bridges. The following example shows how redundant the current requirement is. ~~~ [OVS] resource_provider_bandwidths=br-data1:1024:1024,br-data2:1024:1024,\ br-data3:1024,1024,br-data4,1024:1024 resource_provider_hypervisors=br-data1:compute0.mydomain,br-data2:\ compute0.mydomain,br-data3:compute0.mydomain,br-data4:compute0.mydomain ~~~
I've submitted a change to propose a new single parameter to override the base hypervisor name but this is currently -2ed, mainly because I lacked analysis about the root cause of mismatch when I proposed this. (1) https://review.opendev.org/c/openstack/neutron/+/763563
On the other hand, I submitted a different change to neutron which implements the logic to get a hypervisor name which is fully compatible with libvirt. While this would save users from even overriding hypervisor names, I'm aware that this might break the other virt driver which depends on a different logic to generate a hypervisor name. IMO the patch is still useful considering the libvirt driver would be the most popular option now, but I'm not fully aware of the impact on the other drivers, especially because I don't know which virt driver would support the minimum QoS feature now. (2) https://review.opendev.org/c/openstack/neutron/+/788893/
In the review of (2), Sean mentioned implementing a logic to determine an appropriate resource provider(3) even if there is a mismatch about host name format, but I'm not sure how I would implement that, tbh.
My current thought is to merge (1) as a quick solution first, and discuss whether we should merge (2), but I'd like to ask for some feedback about this plan (like we should NOT merge (2)).
I'd appreciate your thoughts about this $topic.
Thank you, Takashi