[neutron][nova][placement] bug 1926693: What would be the reasonable solution ?

Takashi Kajinami tkajinam at redhat.com
Tue Jun 15 00:17:57 UTC 2021


Thank you all for your additional thoughts.

Because I've not received very strong objections about existing two
patches[1][2],
I updated these patches to resolve conflicts between these patches.
  [1] https://review.opendev.org/c/openstack/neutron/+/763563
  [2] https://review.opendev.org/c/openstack/neutron/+/788893

I made the patch to add default hypervisor name as base one because it
doesn't
change behavior and would be "safe" for backports. So far we have received
positive
feedback about fixing compatibility with libvirt (in master) but I'll
create a backport
of that change as well to ask some feedback about its profit and risk for
backport.

I think strategy is now clear with this feedback but please feel free to
put your
thoughts in this thread or the above patches.

> if we want to "fix" this in neutron then neutron should either try
> looking up the RP using the host name and then fall back to using the
> fqdn or we should look at using the hypervior api as we discussed a few
> years ago when this last came up
>
http://lists.openstack.org/pipermail/openstack-discuss/2019-November/011044.html
<http://lists.openstack.org/pipermail/openstack-discuss/2019-November/011044.html>

I feel like this discussion would be a good chance to revisit the
requirement of basic client
implementation for placement. (or abstraction layer like castellan)
Currently each components like nova, neutron, and cyborg(?) have their own
placement
client implementation (and logic to query resource providers) but IMO it is
more efficient
if we can maintain the common client implementation instead.

> for many deployment that do not set the fqdn as the canonical host name
> in /etc/host the current default behavior works out of the box
> whatever solution we take we need to ensure that no existing deployment
> is affected by the change which means we cannot default to only using
> the fqdn or similar as that would be an upgrade breakage so we have
> to maintain the current behavior by default and enhance neutron to
> either fall back to the fqdn if the hostname based lookup fails or use
> the new config intoduc ed by takashi's patch where the fqdn is used as
> the server canonical hostname.
Thank you for pointing this out. To be clear, the behavior change I
proposed[2] doesn't
break any deployment with libvirt but would break deployments with
non-libvirt drivers.
This point should be considered when reviewing that change. So far most of
the feedback
I received is that it is preferred to fix compatibility with libvirt as
it's the "default" option
but please share your thoughts on the patch.


On Mon, Jun 14, 2021 at 7:30 PM Sean Mooney <smooney at redhat.com> wrote:

> On Sat, 2021-06-12 at 00:46 +0900, Takashi Kajinami wrote:
> > On Fri, Jun 11, 2021 at 8:48 PM Oliver Walsh <owalsh at redhat.com> wrote:
> > > Hi Takashi,
> > >
> > > On Thu, 10 Jun 2021 at 15:06, Takashi Kajinami <tkajinam at redhat.com>
> > > wrote:
> > > > Hi All,
> > > >
> > > >
> > > > I've been working on bug 1926693[1], and am lost about the
> > > > reasonable
> > > > solutions we expect. Ideally I'd need to bring this topic in the
> > > > team meeting
> > > > but because of the timezone gap and complicated background, I'd
> > > > like to
> > > > gather some feedback in ml first.
> > > >
> > > > [1] https://bugs.launchpad.net/neutron/+bug/1926693
> > > >
> > > > TL;DR
> > > >  Which one(or ones) would be reasonable solutions for this issue ?
> > > >   (1) https://review.opendev.org/c/openstack/neutron/+/763563
> > > >   (2) https://review.opendev.org/c/openstack/neutron/+/788893
> > > >   (3) Implement something different
> > > >
> > > > The issue I reported in the bug is that there is an inconsistency
> > > > between
> > > > nova and neutron about the way to determine a hypervisor name.
> > > > Currently neutron uses socket.gethostname() (which always returns
> > > > shortname)
> > > >
> > >
> > >
> > > socket.gethostname() can return fqdn or shortname -
> > > https://docs.python.org/3/library/socket.html#socket.gethostname.
> > >
> >
> > You are correct and my statement was not accurate.
> > So socket.gethostname() returns what is returned by gethostname system
> > call,
> > and gethostname/sethostname accept both FQDN and short name,
> > socket.gethostname()
> > can return one of FQDN or short name.
> >
> > However the root problem is that this logic is not completely same as
> > the ones used
> > in each virt driver. Of cause we can require people the "correct"
> > format usage for
> > canonical name as well as "hostname", but fixthing this problem in
> > neutron would
> > be much more helpful considering the effect caused by enforcing users
> > to "fix"
> > hostname/canonical name formatting at this point.
> this is not really something that can be fixed in neutron
> we can either create a common funciton in oslo.utils or placement-lib
> that we can use in nova, neutron and all other project or we can use
> the config option.
>
> if we want to "fix" this in neutron then neutron should either try
> looking up the RP using the host name and then fall back to using the
> fqdn or we shoudl look at using the hypervior api as we discussed a few
> years ago when this last came up
>
> http://lists.openstack.org/pipermail/openstack-discuss/2019-November/011044.html
>
> i dont think neutron shoudl know anything about hyperviors so i would
> just proceed with the new config option that takashi has proposed but i
> would not implemente Rodolfo's solution of adding a hypervisor_type.
>
> just as nova has no awareness of the neutron backend and trys to treat
> all fo them the same neutron should remain hypervior independent and we
> should look to provide common code that can be reused to identify the
> RP in a seperate lib as a longer term solution.
>
> for many deployment that do not set the fqdn as the canonical host name
> in /etc/host the current default behavior works out of the box
> whatever solution we take we need to ensure that no existing deployment
> is affected by the change which means we cannot default to only using
> the fqdn or similar as that would be an upgrade breakage so we have
> to maintain the current behavior by default and enhance neutron to
> either fall back to the fqdn if the hostname based lookup fails or use
> the new config intoduc ed by takashi's patch where the fqdn is used as
> the server canonical hostname.
> >
> > > I've seen cases where it switched from short to fqdn but I'm not sure
> > > of the root cause - DHCP lease setting a hostname/domainname perhaps.
> > >
> > > Thanks,
> > > Ollie
> > >
> > > > to determine a hypervisor name to search the corresponding resource
> > > > provider.
> > > > On the other hand, nova uses libvirt's getHostname function (if
> > > > libvirt driver is used)
> > > > which returns a canonical name. Canonical name can be shortname or
> > > > FQDN (*1)
> > > > and if FQDN is used then neutron and nova never agree.
> > > >
> > > > (*1)
> > > > IMO this is likely to happen in real deployments. For example,
> > > > TripelO uses
> > > > FQDN for canonical names.
> > > >
> > > >
> > > > Neutron already provides the resource_provider_defauly_hypervisors
> > > > option
> > > > to override a hypervisor name used. However because this option
> > > > accepts
> > > > a map between interface and hypervisor, setting this parameter
> > > > requires
> > > > very redundant description especially when a compute node has
> > > > multiple
> > > > interfaces/bridges. The following example shows how redundant the
> > > > current
> > > > requirement is.
> > > > ~~~
> > > > [OVS]
> > > > resource_provider_bandwidths=br-data1:1024:1024,br-
> > > > data2:1024:1024,\
> > > > br-data3:1024,1024,br-data4,1024:1024
> > > > resource_provider_hypervisors=br-data1:compute0.mydomain,br-data2:\
> > > > compute0.mydomain,br-data3:compute0.mydomain,br-
> > > > data4:compute0.mydomain
> > > > ~~~
> > > >
> > > > I've submitted a change to propose a new single parameter to
> > > > override
> > > > the base hypervisor name but this is currently -2ed, mainly because
> > > > I lacked analysis about the root cause of mismatch when I proposed
> > > > this.
> > > >  (1) https://review.opendev.org/c/openstack/neutron/+/763563
> > > >
> > > >
> > > > On the other hand, I submitted a different change to neutron which
> > > > implements
> > > > the logic to get a hypervisor name which is fully compatible with
> > > > libvirt.
> > > > While this would save users from even overriding hypervisor names,
> > > > I'm aware
> > > > that this might break the other virt driver which depends on a
> > > > different logic
> > > > to generate a hypervisor name. IMO the patch is still useful
> > > > considering
> > > > the libvirt driver would be the most popular option now, but I'm
> > > > not fully
> > > > aware of the impact on the other drivers, especially because I
> > > > don't know
> > > > which virt driver would support the minimum QoS feature now.
> > > >  (2) https://review.opendev.org/c/openstack/neutron/+/788893/
> > > >
> > > >
> > > > In the review of (2), Sean mentioned implementing a logic to
> > > > determine
> > > > an appropriate resource provider(3) even if there is a mismatch
> > > > about
> > > > host name format, but I'm not sure how I would implement that, tbh.
> > > >
> > > >
> > > > My current thought is to merge (1) as a quick solution first, and
> > > > discuss whether
> > > > we should merge (2), but I'd like to ask for some feedback about
> > > > this plan
> > > > (like we should NOT merge (2)).
> > > >
> > > > I'd appreciate your thoughts about this $topic.
> > > >
> > > > Thank you,
> > > > Takashi
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20210615/da8c7617/attachment-0001.html>


More information about the openstack-discuss mailing list