[neutron][nova][placement] bug 1926693: What would be the reasonable solution ?

Sean Mooney smooney at redhat.com
Tue Jun 15 10:38:43 UTC 2021


On Tue, 2021-06-15 at 09:17 +0900, Takashi Kajinami wrote:
> Thank you all for your additional thoughts.
> 
> Because I've not received very strong objections about existing two
> patches[1][2],
> I updated these patches to resolve conflicts between these patches.
>   [1] https://review.opendev.org/c/openstack/neutron/+/763563
> 
>   [2] https://review.opendev.org/c/openstack/neutron/+/788893
>  
> I made the patch to add default hypervisor name as base one because
> it doesn't
> change behavior and would be "safe" for backports. So far we have
> received positive
> feedback about fixing compatibility with libvirt (in master) but I'll
> create a backport
> of that change as well to ask some feedback about its profit and risk
> for backport.
> 
> I think strategy is now clear with this feedback but please feel free
> to put your
> thoughts in this thread or the above patches.
> 
> > if we want to "fix" this in neutron then neutron should either try
> > looking up the RP using the host name and then fall back to using
> the
> > fqdn or we should look at using the hypervior api as we discussed a
> few
> > years ago when this last came up
> > http://lists.openstack.org/pipermail/openstack-discuss/2019-
> November/011044.html
> 
> I feel like this discussion would be a good chance to revisit the
> requirement of basic client
> implementation for placement. (or abstraction layer like castellan)
> Currently each components like nova, neutron, and cyborg(?) have
> their own placement
> client implementation (and logic to query resource providers) but IMO
> it is more efficient
> if we can maintain the common client implementation instead.
it may be useful in a form of placement-lib
this is not somethign that coudl have been adress in a common client
however as for example ironic
or other clustered driver have 1 compute service but multipel resouce
provider per compute service
so we cant always assume 1:1 mappings. its why we cant use conf.HOST in
the general case altough we could
have used it for libvirt.
> 
> > for many deployment that do not set the fqdn as the canonical host
> name
> > in /etc/host the current default behavior works out of the box
> > whatever solution we take we need to ensure that no existing
> deployment
> > is affected by the change which means we cannot default to only
> using
> > the fqdn or similar as that would be an upgrade breakage so we have
> > to maintain the current behavior by default and enhance neutron to
> > either fall back to the fqdn if the hostname based lookup fails or
> use
> > the new config intoduc ed by takashi's patch where the fqdn is used
> as
> > the server canonical hostname.
> Thank you for pointing this out. To be clear, the behavior change I
> proposed[2] doesn't
> break any deployment with libvirt but would break deployments with
> non-libvirt drivers.
> This point should be considered when reviewing that change. So far
> most of the feedback
> I received is that it is preferred to fix compatibility with libvirt
> as it's the "default" option
> but please share your thoughts on the patch.
ok there are 3 sets of name that are likely to be used
the hostname, the fqdn, and the value of conf.HOST
conf.HOST default to the hostname.
if we are to enhance the default behavior i think we should just
implement a fallback behavior
which would check all 3 values if they are distinct
i.e. lookup by hostname, if that fails lookup by fqdn, if that fails
lookup by conf.HOST if and only if it not the same as the hostname(its
default value) or the fqdn.
it would be unusual fo rthe conf.host to not match the hostname or fqdn
but it does happen for example if you are rinning multiple
virt driver on the same host wehn you deploy say libvirt and ironic on
the same host or you use the fake dirver for scale testing.

> 
> 
> On Mon, Jun 14, 2021 at 7:30 PM Sean Mooney <smooney at redhat.com>
> wrote:
> > On Sat, 2021-06-12 at 00:46 +0900, Takashi Kajinami wrote:
> > > On Fri, Jun 11, 2021 at 8:48 PM Oliver Walsh <owalsh at redhat.com>
> > wrote:
> > > > Hi Takashi,
> > > > 
> > > > On Thu, 10 Jun 2021 at 15:06, Takashi Kajinami
> > <tkajinam at redhat.com>
> > > > wrote:
> > > > > Hi All,
> > > > > 
> > > > > 
> > > > > I've been working on bug 1926693[1], and am lost about the
> > > > > reasonable
> > > > > solutions we expect. Ideally I'd need to bring this topic in
> > the
> > > > > team meeting
> > > > > but because of the timezone gap and complicated background,
> > I'd
> > > > > like to
> > > > > gather some feedback in ml first.
> > > > > 
> > > > > [1] https://bugs.launchpad.net/neutron/+bug/1926693
> > > > > 
> > > > > TL;DR
> > > > >  Which one(or ones) would be reasonable solutions for this
> > issue ?
> > > > >   (1) https://review.opendev.org/c/openstack/neutron/+/763563
> > > > >   (2) https://review.opendev.org/c/openstack/neutron/+/788893
> > > > >   (3) Implement something different
> > > > > 
> > > > > The issue I reported in the bug is that there is an
> > inconsistency
> > > > > between
> > > > > nova and neutron about the way to determine a hypervisor
> > name.
> > > > > Currently neutron uses socket.gethostname() (which always
> > returns
> > > > > shortname)
> > > > > 
> > > > 
> > > > 
> > > > socket.gethostname() can return fqdn or shortname -   
> > > >
> > https://docs.python.org/3/library/socket.html#socket.gethostname.
> > > > 
> > > 
> > > You are correct and my statement was not accurate.
> > > So socket.gethostname() returns what is returned by gethostname
> > system
> > > call,
> > > and gethostname/sethostname accept both FQDN and short name,
> > > socket.gethostname()
> > > can return one of FQDN or short name.
> > > 
> > > However the root problem is that this logic is not completely
> > same as
> > > the ones used
> > > in each virt driver. Of cause we can require people the "correct"
> > > format usage for
> > > canonical name as well as "hostname", but fixthing this problem
> > in
> > > neutron would
> > > be much more helpful considering the effect caused by enforcing
> > users
> > > to "fix"
> > > hostname/canonical name formatting at this point.
> > this is not really something that can be fixed in neutron 
> > we can either create a common funciton in oslo.utils or placement-
> > lib
> > that we can use in nova, neutron and all other project or we can
> > use
> > the config option.
> > 
> > if we want to "fix" this in neutron then neutron should either try
> > looking up the RP using the host name and then fall back to using
> > the
> > fqdn or we shoudl look at using the hypervior api as we discussed a
> > few
> > years ago when this last came up
> >
> http://lists.openstack.org/pipermail/openstack-discuss/2019-November/011044.html
> > 
> > i dont think neutron shoudl know anything about hyperviors so i
> > would
> > just proceed with the new config option that takashi has proposed
> > but i
> > would not implemente Rodolfo's solution of adding a
> > hypervisor_type.
> > 
> > just as nova has no awareness of the neutron backend and trys to
> > treat
> > all fo them the same neutron should remain hypervior independent
> > and we
> > should look to provide common code that can be reused to identify
> > the
> > RP in a seperate lib as a longer term solution.
> > 
> > for many deployment that do not set the fqdn as the canonical host
> > name
> > in /etc/host the current default behavior works out of the box
> > whatever solution we take we need to ensure that no existing
> > deployment
> > is affected by the change which means we cannot default to only
> > using
> > the fqdn or similar as that would be an upgrade breakage so we have
> > to maintain the current behavior by default and enhance neutron to
> > either fall back to the fqdn if the hostname based lookup fails or
> > use
> > the new config intoduc ed by takashi's patch where the fqdn is used
> > as
> > the server canonical hostname.
> > >  
> > > > I've seen cases where it switched from short to fqdn but I'm
> > not sure
> > > > of the root cause - DHCP lease setting a hostname/domainname
> > perhaps.
> > > > 
> > > > Thanks,
> > > > Ollie
> > > > 
> > > > > to determine a hypervisor name to search the corresponding
> > resource
> > > > > provider.
> > > > > On the other hand, nova uses libvirt's getHostname function
> > (if
> > > > > libvirt driver is used)
> > > > > which returns a canonical name. Canonical name can be
> > shortname or
> > > > > FQDN (*1)
> > > > > and if FQDN is used then neutron and nova never agree.
> > > > > 
> > > > > (*1)
> > > > > IMO this is likely to happen in real deployments. For
> > example,
> > > > > TripelO uses
> > > > > FQDN for canonical names.  
> > > > > 
> > > > > 
> > > > > Neutron already provides the
> > resource_provider_defauly_hypervisors
> > > > > option
> > > > > to override a hypervisor name used. However because this
> > option
> > > > > accepts
> > > > > a map between interface and hypervisor, setting this
> > parameter
> > > > > requires
> > > > > very redundant description especially when a compute node has
> > > > > multiple
> > > > > interfaces/bridges. The following example shows how redundant
> > the
> > > > > current
> > > > > requirement is.
> > > > > ~~~
> > > > > [OVS]
> > > > > resource_provider_bandwidths=br-data1:1024:1024,br-
> > > > > data2:1024:1024,\
> > > > > br-data3:1024,1024,br-data4,1024:1024
> > > > > resource_provider_hypervisors=br-data1:compute0.mydomain,br-
> > data2:\
> > > > > compute0.mydomain,br-data3:compute0.mydomain,br-
> > > > > data4:compute0.mydomain
> > > > > ~~~
> > > > > 
> > > > > I've submitted a change to propose a new single parameter to
> > > > > override
> > > > > the base hypervisor name but this is currently -2ed, mainly
> > because
> > > > > I lacked analysis about the root cause of mismatch when I
> > proposed
> > > > > this.
> > > > >  (1) https://review.opendev.org/c/openstack/neutron/+/763563
> > > > > 
> > > > > 
> > > > > On the other hand, I submitted a different change to neutron
> > which
> > > > > implements
> > > > > the logic to get a hypervisor name which is fully compatible
> > with
> > > > > libvirt.
> > > > > While this would save users from even overriding hypervisor
> > names,
> > > > > I'm aware
> > > > > that this might break the other virt driver which depends on
> > a
> > > > > different logic
> > > > > to generate a hypervisor name. IMO the patch is still useful
> > > > > considering
> > > > > the libvirt driver would be the most popular option now, but
> > I'm
> > > > > not fully
> > > > > aware of the impact on the other drivers, especially because
> > I
> > > > > don't know
> > > > > which virt driver would support the minimum QoS feature now.
> > > > >  (2) https://review.opendev.org/c/openstack/neutron/+/788893/
> > > > > 
> > > > > 
> > > > > In the review of (2), Sean mentioned implementing a logic to
> > > > > determine
> > > > > an appropriate resource provider(3) even if there is a
> > mismatch
> > > > > about
> > > > > host name format, but I'm not sure how I would implement
> > that, tbh.
> > > > > 
> > > > > 
> > > > > My current thought is to merge (1) as a quick solution first,
> > and
> > > > > discuss whether
> > > > > we should merge (2), but I'd like to ask for some feedback
> > about
> > > > > this plan
> > > > > (like we should NOT merge (2)).
> > > > > 
> > > > > I'd appreciate your thoughts about this $topic.
> > > > > 
> > > > > Thank you,
> > > > > Takashi
> > 
> > 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20210615/93d96688/attachment-0001.html>


More information about the openstack-discuss mailing list