[openstack-ansible] [yoga] utility_container failure

James Denton james.denton at rackspace.com
Thu Aug 18 02:38:09 UTC 2022


Hello,

> Strangely I get "ssh: connect to host infra1_repo_container-20deb465 port 22: No route to host"

This could mean the hosts file doesn’t have an entry. It looks like the Ansible inventory has corresponding entries, so you’re probably fine there. From ‘infra1’, you can try ‘lxc-attach -n infra1_repo_container-20deb465’ to attach to the container directly, and run those same commands mentioned earlier. For the infra2 container, you’ll want to connect from infra2 with ‘lxc-attach’.

Can you confirm if this is Yoga or Master? Also, are you running w/ Rocky Linux 8.6 (as a previous thread indicates)? TBH I have not tested that, yet, and am not sure of the gotchas.

James Denton
Rackspace Private Cloud

From: Father Vlasie <fv at spots.edu>
Date: Wednesday, August 17, 2022 at 9:18 PM
To: James Denton <james.denton at rackspace.com>
Cc: openstack-discuss at lists.openstack.org <openstack-discuss at lists.openstack.org>
Subject: Re: [openstack-ansible] [yoga] utility_container failure
CAUTION: This message originated externally, please use caution when clicking on links or opening attachments!


Hello,

> On Aug 17, 2022, at 5:18 PM, James Denton <james.denton at rackspace.com> wrote:
>
> Hello,
>
> My recommendation is to try running these commands from the deploy node and see what the output is (or maybe try running the playbooks in verbose mode with -vvv):

Here is the output from "setup-infrastructure.yml -vvv"   https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpaste.opendev.org%2Fshow%2FbCGUOb177z2oC5P3nR5Z%2F&data=05%7C01%7Cjames.denton%40rackspace.com%7Cf947b56a0f9249d40cec08da80bfca26%7C570057f473ef41c8bcbb08db2fc15c2b%7C0%7C0%7C637963858891500677%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=DV2%2Bji7aeHe7HhAwGFmT8tkCWh2SS%2BKeHpAm1JVMIhs%3D&reserved=0

> # ssh infra1_repo_container-20deb465
> # systemctl status glusterd.service
> # journalctl -xe -u glusterd.service
> # exit
>
> ^^ Might also consider restarting glusterd and checking the journal to see if there’s an error.

Strangely I get "ssh: connect to host infra1_repo_container-20deb465 port 22: No route to host"

> # ssh infra2_repo_container-6cd61edd
> # systemctl reload-or-restart $(systemd-escape -p --suffix=\"mount\" \"/var/www/repo\")
> # systemctl status var-www-repo.mount
> # journalctl -xe
> # exit
>

A similar error for this too "ssh: connect to host infra2_repo_container-6cd61edd port 22: Network is unreachable"

> The issue may be obvious. Maybe not. If you can ship that output to paste.openstack.org we might be able to diagnose.

Here is the verbose output for the glusterfs error: https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpaste.openstack.org%2Fshow%2Fbw0qIhUzuZ1de0qjKzfK%2F&data=05%7C01%7Cjames.denton%40rackspace.com%7Cf947b56a0f9249d40cec08da80bfca26%7C570057f473ef41c8bcbb08db2fc15c2b%7C0%7C0%7C637963858891500677%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=j2S5KZZnTHH9vD0oKEZfLxaNmjGxlEyCKbzoxxvXm5M%3D&reserved=0

>
> The mountpoint command will return 0 if /var/www/repo is a mountpoint, and 1 if it is not a mountpoint. Looks like it is probably failing due to a previous task (ie. It is not being mounted). Understanding why glusterfs is failing may be key here.
>
> > I have destroyed all of my containers and I am running setup-hosts again
>
> Can you describe what you did here? Simply destroy the LXC containers or did you wipe the inventory, too?

I used the command: openstack-ansible lxc-containers-destroy.yml

I answered affirmatively to the two questions asked about the removal of the containers and the container data.

Thank you once again!

FV

>
> Thanks,
> James Denton
> Rackspace Private Cloud
>
> From: Father Vlasie <fv at spots.edu>
> Date: Wednesday, August 17, 2022 at 5:22 PM
> To: James Denton <james.denton at rackspace.com>
> Cc: openstack-discuss at lists.openstack.org <openstack-discuss at lists.openstack.org>
> Subject: Re: [openstack-ansible] [yoga] utility_container failure
>
> CAUTION: This message originated externally, please use caution when clicking on links or opening attachments!
>
>
> Hello again!
>
> I have completed the run of setup-hosts successfully.
>
> However I am still seeing errors when running setup-infrastructure:
>
> ------
>
> TASK [openstack.osa.glusterfs : Start glusterfs server] **********************************************************************
> fatal: [infra1_repo_container-20deb465]: FAILED! => {"changed": false, "msg": "Unable to start service glusterd: Job for glusterd.service failed because the control process exited with error code.\nSee \"systemctl status glusterd.service\" and \"journalctl -xe\" for details.\n"}
>
> ------
>
> TASK [systemd_mount : Set the state of the mount] ****************************************************************************
> fatal: [infra2_repo_container-6cd61edd]: FAILED! => {"changed": false, "cmd": "systemctl reload-or-restart $(systemd-escape -p --suffix=\"mount\" \"/var/www/repo\")", "delta": "0:00:00.021452", "end": "2022-08-17 18:17:37.172187", "msg": "non-zero return code", "rc": 1, "start": "2022-08-17 18:17:37.150735", "stderr": "Job for var-www-repo.mount failed.\nSee \"systemctl status var-www-repo.mount\" and \"journalctl -xe\" for details.", "stderr_lines": ["Job for var-www-repo.mount failed.", "See \"systemctl status var-www-repo.mount\" and \"journalctl -xe\" for details."], "stdout": "", "stdout_lines": []}
>
> ------
>
> fatal: [infra2_repo_container-6cd61edd]: FAILED! => {"attempts": 5, "changed": false, "cmd": ["mountpoint", "-q", "/var/www/repo"], "delta": "0:00:00.002310", "end": "2022-08-17 18:18:04.297940", "msg": "non-zero return code", "rc": 1, "start": "2022-08-17 18:18:04.295630", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
>
> ------
>
> infra1_repo_container-20deb465 : ok=30   changed=2    unreachable=0    failed=1    skipped=14   rescued=0    ignored=0
> infra2_repo_container-6cd61edd : ok=66   changed=6    unreachable=0    failed=2    skipped=22   rescued=1    ignored=0
> infra3_repo_container-7ca5db88 : ok=64   changed=6    unreachable=0    failed=2    skipped=22   rescued=1    ignored=0
>
> ------
>
> Again any help is much appreciated!
>
> Thank you,
>
> FV
>
> > On Aug 17, 2022, at 2:16 PM, Father Vlasie <fv at spots.edu> wrote:
> >
> > Hello,
> >
> > I am very appreciative of your help!
> >
> > I think my interface setup might be questionable.
> >
> > I did not realise that the nodes need to talk to each other on the external IP. I thought that was only for communication with entities external to the cluster.
> >
> > My bond0 is associated with br-vlan so I put the external IP there and set br-vlan as the external interface in user_variables.
> >
> > The nodes can now ping each other on the external network.
> >
> > This is how I have user_variables configured:
> >
> > ———
> >
> > haproxy_keepalived_external_vip_cidr: “192.168.2.9/26"
> > haproxy_keepalived_internal_vip_cidr: "192.168.3.9/32"
> > haproxy_keepalived_external_interface: br-vlan
> > haproxy_keepalived_internal_interface: br-mgmt
> > haproxy_bind_external_lb_vip_address: 192.168.2.9
> > haproxy_bind_internal_lb_vip_address: 192.168.3.9
> >
> > ———
> >
> > My IP addresses are configured thusly (one sample from each node type):
> >
> > ———
> >
> > infra1
> >    bond0->br-vlan 192.168.2.13
> >    br-mgmt 192.168.3.13
> >    br-vxlan 192.168.30.13
> >    br-storage
> >
> > compute1
> >    br-vlan
> >    br-mgmt 192.168.3.16
> >    br-vxlan 192.168.30.16
> >    br-storage 192.168.20.16
> >
> > log1
> >    br-vlan
> >    br-mgmt 192.168.3.19
> >    br-vxlan
> >    br-storage
> >
> > ———
> >
> > I have destroyed all of my containers and I am running setup-hosts again.
> >
> > Here’s to hoping it all turns out this time!
> >
> > Very gratefully,
> >
> > FV
> >
> >> On Aug 16, 2022, at 7:31 PM, James Denton <james.denton at rackspace.com> wrote:
> >>
> >> Hello,
> >>
> >>>> If I am using bonding on the infra nodes, should the haproxy_keepalived_external_interface be the device name (enp1s0) or bond0?
> >>
> >> This will likely be the bond0 interface and not the individual bond member. However, the interface defined here will ultimately depend on the networking of that host, and should be an external facing one (i.e. the interface with the default gateway).
> >>
> >> In many environments, you’ll have something like this (or using 2 bonds, but same idea):
> >>
> >>      • Bond0 (192.168.100.5/24 gw 192.168.100.1)
> >>              • Em49
> >>              • Em50
> >>      • Br-mgmt (172.29.236.5/22)
> >>              • Bond0.236
> >>      • Br-vxlan (172.29.240.5/22)
> >>              • Bond0.240
> >>      • Br-storage (172.29.244.5/22)
> >>              • Bond0.244
> >>
> >> In this example, bond0 has the management IP 192.168.100.5 and br-mgmt is the “container” bridge with an IP configured from the ‘container’ network (see cidr_networks in openstack_user_config.yml). FYI: LXC containers will automatically be assigned IPs from the ‘container’ network outside of the ‘used_ips’ range(s). The infra host will communicate with the containers via this br-mgmt interface.
> >>
> >> I’m using FQDNs for the VIPs, which are specified in openstack_user_config.yml here:
> >>
> >> global_overrides:
> >>  internal_lb_vip_address: internalapi.openstack.rackspace.lab
> >>  external_lb_vip_address: publicapi.openstack.rackspace.lab
> >>
> >> To avoid DNS resolution issues internally (or rather, to ensure the IP is configured in the config files and not the domain name) I’ll override with the IP and hard set the preferred interface(s):
> >>
> >> haproxy_keepalived_external_vip_cidr: "192.168.100.10/32"
> >> haproxy_keepalived_internal_vip_cidr: "172.29.236.10/32"
> >> haproxy_keepalived_external_interface: bond0
> >> haproxy_keepalived_internal_interface: br-mgmt
> >> haproxy_bind_external_lb_vip_address: 192.168.100.10
> >> haproxy_bind_internal_lb_vip_address: 172.29.236.10
> >>
> >> With the above configuration, keepalived will manage two VIPs - one external and one internal, and endpoints will have the FQDN rather than IP.
> >>
> >>>> Curl shows "503 Service Unavailable No server is available to handle this request”
> >>
> >> Hard to say without seeing logs why this is happening, but I will assume that keepalived is having issues binding the IP to the interface. You might find the reason in syslog or ‘journalctl -xe -f -u keepalived’.
> >>
> >>>> Running "systemctl status var-www-repo.mount” gives an output of “Unit var-www-repo.mount could not be found."
> >>
> >> You might try running ‘umount /var/www/repo’ and re-run the repo-install.yml playbook (or setup-infrastructure.yml).
> >>
> >> Hope that helps!
> >>
> >> James Denton
> >> Rackspace Private Cloud
> >>
> >> From: Father Vlasie <fv at spots.edu>
> >> Date: Tuesday, August 16, 2022 at 4:31 PM
> >> To: James Denton <james.denton at rackspace.com>
> >> Cc: openstack-discuss at lists.openstack.org <openstack-discuss at lists.openstack.org>
> >> Subject: Re: [openstack-ansible] [yoga] utility_container failure
> >>
> >> CAUTION: This message originated externally, please use caution when clicking on links or opening attachments!
> >>
> >>
> >> Hello,
> >>
> >> Thank you very much for the reply!
> >>
> >> haproxy and keepalived both show status active on infra1 (my primary node).
> >>
> >> Curl shows "503 Service Unavailable No server is available to handle this request”
> >>
> >> (Also the URL is http not https….)
> >>
> >> If I am using bonding on the infra nodes, should the haproxy_keepalived_external_interface be the device name (enp1s0) or bond0?
> >>
> >> Earlier in the output I find the following error (showing for all 3 infra nodes):
> >>
> >> ------------
> >>
> >> TASK [systemd_mount : Set the state of the mount] *****************************************************************************************************************************************
> >> fatal: [infra3_repo_container-7ca5db88]: FAILED! => {"changed": false, "cmd": "systemctl reload-or-restart $(systemd-escape -p --suffix=\"mount\" \"/var/www/repo\")", "delta": "0:00:00.022275", "end": "2022-08-16 14:16:34.926861", "msg": "non-zero return code", "rc": 1, "start": "2022-08-16 14:16:34.904586", "stderr": "Job for var-www-repo.mount failed.\nSee \"systemctl status var-www-repo.mount\" and \"journalctl -xe\" for details.", "stderr_lines": ["Job for var-www-repo.mount failed.", "See \"systemctl status var-www-repo.mount\" and \"journalctl -xe\" for details."], "stdout": "", "stdout_lines": []}
> >>
> >> ——————
> >>
> >> Running "systemctl status var-www-repo.mount” gives an output of “Unit var-www-repo.mount could not be found."
> >>
> >> Thank you again!
> >>
> >> Father Vlasie
> >>
> >>> On Aug 16, 2022, at 6:32 AM, James Denton <james.denton at rackspace.com> wrote:
> >>>
> >>> Hello,
> >>>
> >>> That error means the repo server at 192.168.3.9:8181 is unavailable. The repo server sits behind haproxy, which should be listening on 192.168.3.9 port 8181 on the active (primary) node. You can verify this by issuing a ‘curl -vhttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2F192.168.3.9%3A8181%2F%25E2%2580%2599&data=05%7C01%7Cjames.denton%40rackspace.com%7Cf947b56a0f9249d40cec08da80bfca26%7C570057f473ef41c8bcbb08db2fc15c2b%7C0%7C0%7C637963858891500677%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=%2BK4QPjACSUzlGKvXGmijRhDooI5CjuEvyIo%2BPJIM3is%3D&reserved=0. You might check the haproxy service status and/or keepalived status to ensure they are operating properly. If the IP cannot be bound to the correct interface, keepalive may not start.
> >>>
> >>> James Denton
> >>> Rackspace Private Cloud
> >>>
> >>> From: Father Vlasie <fv at spots.edu>
> >>> Date: Tuesday, August 16, 2022 at 7:38 AM
> >>> To: openstack-discuss at lists.openstack.org <openstack-discuss at lists.openstack.org>
> >>> Subject: [openstack-ansible] [yoga] utility_container failure
> >>>
> >>> CAUTION: This message originated externally, please use caution when clicking on links or opening attachments!
> >>>
> >>>
> >>> Hello everyone,
> >>>
> >>> I have happily progressed to the second step of running the playbooks, namely "openstack-ansible setup-infrastructure.yml"
> >>>
> >>> Everything looks good except for just one error which is mystifying me:
> >>>
> >>> ----------------
> >>>
> >>> TASK [Get list of repo packages] **********************************************************************************************************************************************************
> >>> fatal: [infra1_utility_container-5ec32cb5]: FAILED! => {"changed": false, "content": "", "elapsed": 30, "msg": "Status code was -1 and not [200]: Request failed: <urlopen error timed out>", "redirected": false, "status": -1, "url": "https://nam12.safelinks.protection.outlook.com/?url=http%3A%2F%2F192.168.3.9%3A8181%2Fconstraints%2Fupper_constraints_cached.txt&data=05%7C01%7Cjames.denton%40rackspace.com%7Cf947b56a0f9249d40cec08da80bfca26%7C570057f473ef41c8bcbb08db2fc15c2b%7C0%7C0%7C637963858891500677%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=XT5cfYQY6iWcdK860squouNdjwSdubSD%2FdzNbWOdHPY%3D&reserved=0"}
> >>>
> >>> ----------------
> >>>
> >>> 192.168.3.9 is the IP listed in user_variables.yml under haproxy_keepalived_internal_vip_cidr
> >>>
> >>> Any help or pointers would be very much appreciated!
> >>>
> >>> Thank you,
> >>>
> >>> Father Vlasie
> >>>
> >>
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20220818/d3eb24f9/attachment-0001.htm>


More information about the openstack-discuss mailing list