Kolla Ansible on Ubuntu 20.04 - cloud-init & other network issues

Tobias McNulty tobias at caktusgroup.com
Sat Nov 12 17:12:39 UTC 2022


Hi,

I'm attempting to use Kolla Ansible 14.6.0 to deploy OpenStack Yoga on a
small 3-node Ubuntu 20.04 cluster. The nodes have 128 GB RAM each, dual
Xeon processors, and dual 10G Intel NICs. The NICs are connected to access
ports on a 10G switch with separate VLANs for the local and external
networks.

All the playbooks run cleanly, but cloud-init is failing in the
Ubuntu 20.04 and 22.04 VMs I attempt to boot. The VM images are unmodified
from https://cloud-images.ubuntu.com/, and cloud-init works fine if I mount
a second volume with user-data. The error is a timeout attempting to
reach 169.254.169.254. This occurs both when booting a VM in an internal
routed network and directly in an external network.

I tried various neutron plugin agents (ovn, linuxbridge, and openvswitch
both with and without firewall_driver = openvswitch
<https://docs.openstack.org/kolla-ansible/latest/reference/networking/neutron.html#openvswitch-ml2-ovs>)
first with a clean install of the entire OS each time, all with the same
result. Running tcpdump looking for 169.254.169.254 shows nothing. As a
possible clue, the virtual NICs are unable to pass any traffic (e.g., to
reach an external DHCP server) unless I completely disable port security on
the interface (even if the associated security group is wide open). But
disabling port security does not fix cloud-init (not to mention I don't
really want to disable port security).

Are there any additional requirements related to deploying OpenStack with
Kolla on Ubuntu 20.04?

This is a fairly vanilla configuration using the multinode inventory as a
starting point. I tried to follow the Quick Start
<https://docs.openstack.org/kolla-ansible/yoga/user/quickstart.html> as
closely as possible; the only material difference I see is that I'm using
the same 3 nodes for control + compute. I am using MAAS so it's easy to get
a clean OS install on all three nodes ahead of each attempt. I plan to try
again with the standard (non-HWE) kernel just in case, but otherwise I am
running out of ideas. In case of any additional clues, here are my
globals.yml and inventory file, along with the playbook I'm using to
configure the network, images, VMs, etc., after bootstrapping the cluster:

https://gist.github.com/tobiasmcnulty/7dbbdbc67abc08cbb013bf5983852ed6

Thank you in advance for any advice!

Cheers,
Tobias
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20221112/801790f6/attachment-0001.htm>


More information about the openstack-discuss mailing list