Hey Sarah,

Your containers are using the same IP range as your physical network (both are on 10.0.1.x) and thats causing chaos :( when you try to connect to a container (like your repo server), the network gets confused and HAProxy throws a 503 Service Unavailable error..

Basically the IPs are clashing, and traffic is getting tangled up.

just leave your physical network alone and just shift your containers to a different IP range like 192.168.0.x - this way everyone stays in their own lane and the conflicts disappear and you re good to go.


Sevgiler,
Best,
Kerem ÇELİKER


On Mon, Mar 17, 2025 at 20:23 Sarah Thompson <plodger@gmail.com> wrote:
Hi all,

I've got much further now, and am now tracking down networking issues. I'm pretty sure everything is now installing, but I'm seeing a systematic issue.

My test network that I'm running the openstack instances on is 10.0.1.x -- the VMs all have fixed IP addresses, can talk to each other, etc, nothing weird going on. 

haproxy is installing and running, but throwing 503 errors. Digging into this, it seems that there are some issues with the network configs of at least some of the LXC containers. The one I'm seeing that's preventing Ansible from completing the infrastructure setup is the repo server container. If I attach to the container, the keepalive comes back correctly. Externally to the container, the HTTP connection is rejected.

Looking into the reasons, it looks like there are 3 networks visible from inside the container:

ens18: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.0.1.38  netmask 255.255.255.0  broadcast 10.0.1.255
        inet6 fe80::216:3eff:feab:ab45  prefixlen 64  scopeid 0x20<link>
        ether 00:16:3e:ab:ab:45  txqueuelen 1000  (Ethernet)
        RX packets 96  bytes 6952 (6.9 KB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 16  bytes 1236 (1.2 KB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.0.3.113  netmask 255.255.255.0  broadcast 10.0.3.255
        inet6 fe80::216:3eff:fe45:64d6  prefixlen 64  scopeid 0x20<link>
        ether 00:16:3e:45:64:d6  txqueuelen 1000  (Ethernet)
        RX packets 351  bytes 425285 (425.2 KB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 263  bytes 19114 (19.1 KB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 2939  bytes 208134 (208.1 KB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 2939  bytes 208134 (208.1 KB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

The 10.0.1.38 address seems to be the problem. I think this is an internally routed subnet, not the actual physical subnet (note the lack of a gateway address, and 10.0.1.38 is definitely not being allocated via the DCHP server). Looking at some other containers, I'm seeing 10.0.2.x and 10.0.3.x there, so obviously this is being allocated on the fly either by lxc or the container(s).

TL;DR: The internal 10.0.1.x is clashing with the physical 10.0.1.x network, which is almost certainly why the keepalive is failing.

Does anyone have any idea how to fix the configuration to use some other CIDR block for this? I'd  like to avoid the extreme pain of remapping my physical network (this is a way more complicated problem than a few test nodes unfortunotely!)

Thank you in advance,
Sarah Thompson

--
[s]