[neutron][openstack-ansible] Instances can only connect to provider-net via tenant-net but not directly
Dmitriy Rabotyagov
noonedeadpunk at ya.ru
Wed Nov 25 13:31:12 UTC 2020
Hi Oliver.
I think there might be pretty much reasons of this behavior and at the moment I don't have enough info to say what exactly this might be.
First of all I'd suggest checking `openstack network agent list` and ensure that lxb agent is present there up and healthy.
Next I'd check nova-scheduler log (journalctl eventually) as it might have more insights.
Floating IP traffic for tenant networks goes through network nodes, where neutron-l3-agent is placed, while direct connection requires that interface
for flat (provider) network would be present on the compute host and be able to be added to the bridge by lxb.
25.11.2020, 15:08, "Oliver Wenz" <oliver.wenz at dhbw-mannheim.de>:
> Hi,
> Starting instances which are connected directly to a provider network
> results in an error in my Ussuri cloud. However, I can associate
> floating IPs for this provider network when I connect instances to it
> via a tenant network and it works fine.
> Both nova-compute and neutron-linuxbridge-agent services don't show
> errors, I only get an error in the instance status:
>
> 'code': 500, 'created': '2020-11-25T12:05:42Z', 'message': 'Build of
> instance 274e0a7d-fb33-430a-986e-74fceae6a36d aborted: Failed to
> allocate the network(s), not rescheduling.', 'details': 'Traceback (most
> recent call last):
> File
> "/openstack/venvs/nova-21.1.0/lib/python3.6/site-packages/nova/virt/libvirt/driver.py",
> line 6549, in _create_domain_and_network
> post_xml_callback=post_xml_callback)
> File "/usr/lib/python3.6/contextlib.py", line 88, in __exit__
> next(self.gen)
> File
> "/openstack/venvs/nova-21.1.0/lib/python3.6/site-packages/nova/compute/manager.py",
> line 513, in wait_for_instance_event
> actual_event = event.wait()
> File
> "/openstack/venvs/nova-21.1.0/lib/python3.6/site-packages/eventlet/event.py",
> line 125, in wait
> result = hub.switch()
> File
> "/openstack/venvs/nova-21.1.0/lib/python3.6/site-packages/eventlet/hubs/hub.py",
> line 298, in switch
> return self.greenlet.switch()
> eventlet.timeout.Timeout: 300 seconds
>
> During handling of the above exception, another exception occurred:
>
> Traceback (most recent call last):
> File
> "/openstack/venvs/nova-21.1.0/lib/python3.6/site-packages/nova/compute/manager.py",
> line 2378, in _build_and_run_instance
> accel_info=accel_info)
> File
> "/openstack/venvs/nova-21.1.0/lib/python3.6/site-packages/nova/virt/libvirt/driver.py",
> line 3683, in spawn
> cleanup_instance_disks=created_disks)
> File
> "/openstack/venvs/nova-21.1.0/lib/python3.6/site-packages/nova/virt/libvirt/driver.py",
> line 6572, in _create_domain_and_network
> raise exception.VirtualInterfaceCreateException()
> nova.exception.VirtualInterfaceCreateException: Virtual Interface
> creation failed
>
> During handling of the above exception, another exception occurred:
>
> Traceback (most recent call last):
> File
> "/openstack/venvs/nova-21.1.0/lib/python3.6/site-packages/nova/compute/manager.py",
> line 2200, in _do_build_and_run_instance
> filter_properties, request_spec, accel_uuids)
> File
> "/openstack/venvs/nova-21.1.0/lib/python3.6/site-packages/nova/compute/manager.py",
> line 2444, in _build_and_run_instance
> reason=msg)
> nova.exception.BuildAbortException: Build of instance
> 274e0a7d-fb33-430a-986e-74fceae6a36d aborted: Failed to allocate the
> network(s), not rescheduling.
> '
>
> Any ideas what could cause this?
>
> Kind regards,
> Oliver
>
> On 2020-11-25 13:00, openstack-discuss-request at lists.openstack.org wrote:
>> Send openstack-discuss mailing list submissions to
>> openstack-discuss at lists.openstack.org
>>
>> To subscribe or unsubscribe via the World Wide Web, visit
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-discuss
>> or, via email, send a message with subject or body 'help' to
>> openstack-discuss-request at lists.openstack.org
>>
>> You can reach the person managing the list at
>> openstack-discuss-owner at lists.openstack.org
>>
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of openstack-discuss digest..."
>>
>> Today's Topics:
>>
>> 1. Re:
>> [nova][tripleo][rpm-packaging][kolla][puppet][debian][osa] Nova
>> enforces that no DB credentials are allowed for the nova-compute
>> service (Balázs Gibizer)
>> 2. Re: [ironic] [infra] Making Glean work with IPA for static IP
>> assignment (Dmitry Tantsur)
>>
>> ----------------------------------------------------------------------
>>
>> Message: 1
>> Date: Wed, 25 Nov 2020 11:13:23 +0100
>> From: Balázs Gibizer <balazs.gibizer at est.tech>
>> To: Thomas Goirand <zigo at debian.org>
>> Cc: openstack maillist <openstack-discuss at lists.openstack.org>
>> Subject: Re:
>> [nova][tripleo][rpm-packaging][kolla][puppet][debian][osa] Nova
>> enforces that no DB credentials are allowed for the nova-compute
>> service
>> Message-ID: <BEKCKQ.PYGQ12VO6AF23 at est.tech>
>> Content-Type: text/plain; charset=iso-8859-1; format=flowed
>>
>> On Mon, Nov 23, 2020 at 13:47, Thomas Goirand <zigo at debian.org> wrote:
>>> On 11/23/20 11:31 AM, Balázs Gibizer wrote:
>>>> It is still a security problem if nova-compute ignores the config
>>>> as the
>>>> config still exists on the hypervisor node (in some deployment
>>>> scenarios)
>>>
>>> Let's say we apply the patch you're proposing, and that nova-compute
>>> isn't loaded anymore with the db credentials, because it's on a
>>> separate
>>> file, and nova-compute doesn't load it.
>>>
>>> In such scenario, the /etc/nova/nova-db.conf could still be present
>>> with
>>> db credentials filled-in. So, the patch you're proposing is still not
>>> effective for wrong configuration of nova-compute hosts.
>>
>> Obviously we cannot prevent that the deployer stores the DB creds on a
>> compute host as we cannot detect it in general. But we can detect it if
>> it is stored in the config the nova-compute reads. I don't know why
>> should we not make sure to tell the deployer not to do that as it is
>> generally considered unsafe.
>>
>>>> From the nova-compute perspective we might be able to
>>>> replace the [api_database]connection dependency with some hack. E.g
>>>> to
>>>> put the service name to the global CONF object at the start of the
>>>> service binary and depend on that instead of other part of the
>>>> config.
>>>> But I feel pretty bad about this hack.
>>>
>>> Because of the above, I very much think it'd be the best way to go,
>>> but
>>> I understand your point of view. Going to the /etc/nova/nova-db.conf
>>> and
>>> nova-api-db.conf thing is probably good anyways.
>>>
>>> As for the nova-conductor thing, I very much would prefer if we had a
>>> clean and explicit "superconductor=true" directive, with possibly some
>>> checks to display big warnings in the nova-conductor.log file in case
>>> of
>>> a wrong configuration. If we don't have that, then at least things
>>> must
>>> be extensively documented, because that's really not obvious what's
>>> going on.
>>
>> I agree that superconductor=true would be a more explicit config option
>> than [api_database]connection. However this would also enforce that
>> deployers need a separate config file for nova-compute as there neither
>> superconductor=true nor superconductor=false (meaning it is a cell
>> conductor) make sense.
>>
>>> Cheers,
>>>
>>> Thomas Goirand (zigo)
>>
>> ------------------------------
>>
>> Message: 2
>> Date: Wed, 25 Nov 2020 11:54:13 +0100
>> From: Dmitry Tantsur <dtantsur at redhat.com>
>> To: Ian Wienand <iwienand at redhat.com>
>> Cc: openstack-discuss <openstack-discuss at lists.openstack.org>
>> Subject: Re: [ironic] [infra] Making Glean work with IPA for static IP
>> assignment
>> Message-ID:
>> <CACNgkFwVhMMxVRK2PkFEkHORRwY4wY49g7G3CpzYwaFzC27Bjw at mail.gmail.com>
>> Content-Type: text/plain; charset="utf-8"
>>
>> Hi,
>>
>> Thank you for your input!
>>
>> On Wed, Nov 25, 2020 at 3:09 AM Ian Wienand <iwienand at redhat.com> wrote:
>>
>>> On Tue, Nov 24, 2020 at 11:54:55AM +0100, Dmitry Tantsur wrote:
>>>> The problem is, I cannot make Glean work with any ramdisk I build. The
>>> crux
>>>> of the problem seems to be that NetworkManager (used by default in RHEL,
>>>> CentOS, Fedora and Debian at least) starts very early, creates the
>>> default
>>>> connection and ignores whatever files Glean happens to write afterwards.
>>> On
>>>> Debian running `systemctl restart networking` actually helped to pick the
>>>> new configuration, but I'm not sure we want to do that in Glean. I
>>> haven't
>>>> been able to make NetworkManager pick up the changes on RH systems so
>>> far.
>>>
>>> So we do use NetworkManager in the OpenDev images, and we do not see
>>> NetworkManager starting before glean.
>>
>> Okay, thanks for confirming. Maybe it's related to how IPA is built? It's
>> not exactly a normal image after all, although it's pretty close to one.
>>
>>> The way it should work is that simple-init in dib installs glean to
>>> the image. That runs the glean install script (use --use-nm argument
>>> if DIB_SIMPLE_INIT_NETWORKMANAGER, which is default on centos/fedora)
>>> which installs two things; udev rules and a systemd handler.
>>
>> I have checked that these are installed, but I don't know how to verify a
>> udev rule.
>>
>>> The udev is pretty simple [1] and should add a "Wants" for each net
>>> device; e.g. eth1 would match and create a Wants glean at eth1.service,
>>> which then matches [2] which should write out the ifcfg config file.
>>> After this, NetworkManager should start, notice the config file for
>>> the interface and bring it up.
>>
>> Yeah, I definitely see logging from NetworkManager DHCP before this service
>> is run (i.e. before the output from Glean).
>>
>>>> Do you maybe have any hints how to proceed? I'd be curious to know how
>>>> static IP assignment works in the infra setup. Do you have images with
>>>> NetworkManager there? Do you use the simple-init element?
>>>
>>> As noted yes we use this. Really only in two contexts; it's Rackspace
>>> that doesn't have DHCP so we have to setup the interface statically
>>> from the configdrive data. Other clouds all provide DHCP, which is
>>> used when there's no configdrive data.
>>>
>>> Here is a systemd-analyze from one of our Centos nodes if it helps:
>>
>>> graphical.target @18.403s
>>> └─multi-user.target @18.403s
>>> └─unbound.service @5.467s +12.934s
>>> └─network.target @5.454s
>>> └─NetworkManager.service @5.339s +112ms
>>> └─network-pre.target @5.334s
>>> └─glean at ens3.service @4.227s +1.102s
>>> └─basic.target @4.167s
>>> └─sockets.target @4.166s
>>> └─iscsiuio.socket @4.165s
>>> └─sysinit.target @4.153s
>>> └─systemd-udev-settle.service @1.905s +2.245s
>>> └─systemd-udev-trigger.service @1.242s +659ms
>>> └─systemd-udevd-control.socket @1.239s
>>> └─system.slice
>>
>> # systemd-analyze critical-chain
>> multi-user.target @2min 6.301s
>> └─tuned.service @1min 32.273s +34.024s
>> └─network.target @1min 31.590s
>> └─network-pre.target @1min 31.579s
>> └─glean at enp1s0.service @36.594s +54.952s
>> └─system-glean.slice @36.493s
>> └─system.slice @4.083s
>> └─-.slice @4.080s
>>
>> # systemd-analyze critical-chain NetworkManager.service
>> NetworkManager.service +9.287s
>> └─network-pre.target @1min 31.579s
>> └─glean at enp1s0.service @36.594s +54.952s
>> └─system-glean.slice @36.493s
>> └─system.slice @4.083s
>> └─-.slice @4.080s
>>
>> # cat /etc/sysconfig/network-scripts/ifcfg-enp1s0
>> # Automatically generated, do not edit
>> DEVICE=enp1s0
>> BOOTPROTO=static
>> HWADDR=52:54:00:1f:79:7e
>> IPADDR=192.168.122.42
>> NETMASK=255.255.255.0
>> ONBOOT=yes
>> NM_CONTROLLED=yes
>>
>> # ip addr
>> ...
>> 2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state
>> UP group default qlen 1000
>> link/ether 52:54:00:1f:79:7e brd ff:ff:ff:ff:ff:ff
>> inet 192.168.122.77/24 brd 192.168.122.255 scope global dynamic
>> noprefixroute enp1s0
>> valid_lft 42957sec preferred_lft 42957sec
>> inet6 fe80::f182:7fb4:7a39:eb7b/64 scope link noprefixroute
>> valid_lft forever preferred_lft forever
>>
>>> At a guess; I feel like the udev bits are probably not happening
>>> correctly in your case? That's important to get the glean@<interface>
>>> service in the chain to pre-create the config file
>>
>> It seems that the ordering is correct and the interface service is
>> executed, but the IP address is nonetheless wrong.
>>
>> Can it be related to how long glean takes to run in my case (54 seconds vs
>> 1 second in your case)?
>>
>> Dmitry
>>
>>> -i
>>>
>>> [1]
>>> https://opendev.org/opendev/glean/src/branch/master/glean/init/glean-udev.rules
>>> [2]
>>> https://opendev.org/opendev/glean/src/branch/master/glean/init/glean-nm@.service
--
Kind Regards,
Dmitriy Rabotyagov
More information about the openstack-discuss
mailing list