Thank you all for your additional thoughts. Because I've not received very strong objections about existing two patches[1][2], I updated these patches to resolve conflicts between these patches. [1] https://review.opendev.org/c/openstack/neutron/+/763563 [2] https://review.opendev.org/c/openstack/neutron/+/788893 I made the patch to add default hypervisor name as base one because it doesn't change behavior and would be "safe" for backports. So far we have received positive feedback about fixing compatibility with libvirt (in master) but I'll create a backport of that change as well to ask some feedback about its profit and risk for backport. I think strategy is now clear with this feedback but please feel free to put your thoughts in this thread or the above patches.
if we want to "fix" this in neutron then neutron should either try looking up the RP using the host name and then fall back to using the fqdn or we should look at using the hypervior api as we discussed a few years ago when this last came up
for many deployment that do not set the fqdn as the canonical host name in /etc/host the current default behavior works out of the box whatever solution we take we need to ensure that no existing deployment is affected by the change which means we cannot default to only using the fqdn or similar as that would be an upgrade breakage so we have to maintain the current behavior by default and enhance neutron to either fall back to the fqdn if the hostname based lookup fails or use the new config intoduc ed by takashi's patch where the fqdn is used as the server canonical hostname. Thank you for pointing this out. To be clear, the behavior change I
http://lists.openstack.org/pipermail/openstack-discuss/2019-November/011044.... <http://lists.openstack.org/pipermail/openstack-discuss/2019-November/011044.html> I feel like this discussion would be a good chance to revisit the requirement of basic client implementation for placement. (or abstraction layer like castellan) Currently each components like nova, neutron, and cyborg(?) have their own placement client implementation (and logic to query resource providers) but IMO it is more efficient if we can maintain the common client implementation instead. proposed[2] doesn't break any deployment with libvirt but would break deployments with non-libvirt drivers. This point should be considered when reviewing that change. So far most of the feedback I received is that it is preferred to fix compatibility with libvirt as it's the "default" option but please share your thoughts on the patch. On Mon, Jun 14, 2021 at 7:30 PM Sean Mooney <smooney@redhat.com> wrote:
On Fri, Jun 11, 2021 at 8:48 PM Oliver Walsh <owalsh@redhat.com> wrote:
Hi Takashi,
On Thu, 10 Jun 2021 at 15:06, Takashi Kajinami <tkajinam@redhat.com> wrote:
Hi All,
I've been working on bug 1926693[1], and am lost about the reasonable solutions we expect. Ideally I'd need to bring this topic in the team meeting but because of the timezone gap and complicated background, I'd like to gather some feedback in ml first.
[1] https://bugs.launchpad.net/neutron/+bug/1926693
TL;DR Which one(or ones) would be reasonable solutions for this issue ? (1) https://review.opendev.org/c/openstack/neutron/+/763563 (2) https://review.opendev.org/c/openstack/neutron/+/788893 (3) Implement something different
The issue I reported in the bug is that there is an inconsistency between nova and neutron about the way to determine a hypervisor name. Currently neutron uses socket.gethostname() (which always returns shortname)
socket.gethostname() can return fqdn or shortname - https://docs.python.org/3/library/socket.html#socket.gethostname.
You are correct and my statement was not accurate. So socket.gethostname() returns what is returned by gethostname system call, and gethostname/sethostname accept both FQDN and short name, socket.gethostname() can return one of FQDN or short name.
However the root problem is that this logic is not completely same as the ones used in each virt driver. Of cause we can require people the "correct" format usage for canonical name as well as "hostname", but fixthing this problem in neutron would be much more helpful considering the effect caused by enforcing users to "fix" hostname/canonical name formatting at this point.
On Sat, 2021-06-12 at 00:46 +0900, Takashi Kajinami wrote: this is not really something that can be fixed in neutron we can either create a common funciton in oslo.utils or placement-lib that we can use in nova, neutron and all other project or we can use the config option.
if we want to "fix" this in neutron then neutron should either try looking up the RP using the host name and then fall back to using the fqdn or we shoudl look at using the hypervior api as we discussed a few years ago when this last came up
http://lists.openstack.org/pipermail/openstack-discuss/2019-November/011044....
i dont think neutron shoudl know anything about hyperviors so i would just proceed with the new config option that takashi has proposed but i would not implemente Rodolfo's solution of adding a hypervisor_type.
just as nova has no awareness of the neutron backend and trys to treat all fo them the same neutron should remain hypervior independent and we should look to provide common code that can be reused to identify the RP in a seperate lib as a longer term solution.
for many deployment that do not set the fqdn as the canonical host name in /etc/host the current default behavior works out of the box whatever solution we take we need to ensure that no existing deployment is affected by the change which means we cannot default to only using the fqdn or similar as that would be an upgrade breakage so we have to maintain the current behavior by default and enhance neutron to either fall back to the fqdn if the hostname based lookup fails or use the new config intoduc ed by takashi's patch where the fqdn is used as the server canonical hostname.
I've seen cases where it switched from short to fqdn but I'm not sure of the root cause - DHCP lease setting a hostname/domainname perhaps.
Thanks, Ollie
to determine a hypervisor name to search the corresponding resource provider. On the other hand, nova uses libvirt's getHostname function (if libvirt driver is used) which returns a canonical name. Canonical name can be shortname or FQDN (*1) and if FQDN is used then neutron and nova never agree.
(*1) IMO this is likely to happen in real deployments. For example, TripelO uses FQDN for canonical names.
Neutron already provides the resource_provider_defauly_hypervisors option to override a hypervisor name used. However because this option accepts a map between interface and hypervisor, setting this parameter requires very redundant description especially when a compute node has multiple interfaces/bridges. The following example shows how redundant the current requirement is. ~~~ [OVS] resource_provider_bandwidths=br-data1:1024:1024,br- data2:1024:1024,\ br-data3:1024,1024,br-data4,1024:1024 resource_provider_hypervisors=br-data1:compute0.mydomain,br-data2:\ compute0.mydomain,br-data3:compute0.mydomain,br- data4:compute0.mydomain ~~~
I've submitted a change to propose a new single parameter to override the base hypervisor name but this is currently -2ed, mainly because I lacked analysis about the root cause of mismatch when I proposed this. (1) https://review.opendev.org/c/openstack/neutron/+/763563
On the other hand, I submitted a different change to neutron which implements the logic to get a hypervisor name which is fully compatible with libvirt. While this would save users from even overriding hypervisor names, I'm aware that this might break the other virt driver which depends on a different logic to generate a hypervisor name. IMO the patch is still useful considering the libvirt driver would be the most popular option now, but I'm not fully aware of the impact on the other drivers, especially because I don't know which virt driver would support the minimum QoS feature now. (2) https://review.opendev.org/c/openstack/neutron/+/788893/
In the review of (2), Sean mentioned implementing a logic to determine an appropriate resource provider(3) even if there is a mismatch about host name format, but I'm not sure how I would implement that, tbh.
My current thought is to merge (1) as a quick solution first, and discuss whether we should merge (2), but I'd like to ask for some feedback about this plan (like we should NOT merge (2)).
I'd appreciate your thoughts about this $topic.
Thank you, Takashi