Hi, just one more thing to check: whenever I had troubles with the metadata it was usually apparmor blocking the access. For testing purposes (or if you're behind a firewall anyway) you could try to disable all the security related daemons and see if that helps. If you don't have it enabled, do you see any errors in the neutron logs? Zitat von Tobias McNulty <tobias@caktusgroup.com>:
As an update, I tried the non-HWE kernel with the same result. Could it be a hardware/driver issue with the 10G NICs? It's so repeatable. I'll look into finding some other hardware to test with.
Has anyone else experienced such a complete failure with cloud-init and/or security groups, and do you have any advice on how I might continue to debug this?
Many thanks, Tobias
On Sat, Nov 12, 2022 at 12:12 PM Tobias McNulty <tobias@caktusgroup.com> wrote:
Hi,
I'm attempting to use Kolla Ansible 14.6.0 to deploy OpenStack Yoga on a small 3-node Ubuntu 20.04 cluster. The nodes have 128 GB RAM each, dual Xeon processors, and dual 10G Intel NICs. The NICs are connected to access ports on a 10G switch with separate VLANs for the local and external networks.
All the playbooks run cleanly, but cloud-init is failing in the Ubuntu 20.04 and 22.04 VMs I attempt to boot. The VM images are unmodified from https://cloud-images.ubuntu.com/, and cloud-init works fine if I mount a second volume with user-data. The error is a timeout attempting to reach 169.254.169.254. This occurs both when booting a VM in an internal routed network and directly in an external network.
I tried various neutron plugin agents (ovn, linuxbridge, and openvswitch both with and without firewall_driver = openvswitch <https://docs.openstack.org/kolla-ansible/latest/reference/networking/neutron.html#openvswitch-ml2-ovs>) first with a clean install of the entire OS each time, all with the same result. Running tcpdump looking for 169.254.169.254 shows nothing. As a possible clue, the virtual NICs are unable to pass any traffic (e.g., to reach an external DHCP server) unless I completely disable port security on the interface (even if the associated security group is wide open). But disabling port security does not fix cloud-init (not to mention I don't really want to disable port security).
Are there any additional requirements related to deploying OpenStack with Kolla on Ubuntu 20.04?
This is a fairly vanilla configuration using the multinode inventory as a starting point. I tried to follow the Quick Start <https://docs.openstack.org/kolla-ansible/yoga/user/quickstart.html> as closely as possible; the only material difference I see is that I'm using the same 3 nodes for control + compute. I am using MAAS so it's easy to get a clean OS install on all three nodes ahead of each attempt. I plan to try again with the standard (non-HWE) kernel just in case, but otherwise I am running out of ideas. In case of any additional clues, here are my globals.yml and inventory file, along with the playbook I'm using to configure the network, images, VMs, etc., after bootstrapping the cluster:
https://gist.github.com/tobiasmcnulty/7dbbdbc67abc08cbb013bf5983852ed6
Thank you in advance for any advice!
Cheers, Tobias