Hi,
Thanks for the details.
The MariaDB/Galera healthcheck occurs on port 9200, which may not be functioning. You can verify that in the /etc/haproxy/haproxy.cfg file. In the Galera container, there is a file, /etc/systemd/system/mariadbcheck.socket,
which has the details, including the “allow” list. Might be worth looking at that to ensure the haproxy node IP is allowed.
--
James Denton
Principal Architect
Rackspace Private Cloud - OpenStack
james.denton@rackspace.com
From:
jmarcelo.alencar@gmail.com <jmarcelo.alencar@gmail.com>
Date: Friday, January 20, 2023 at 9:20 AM
To: James Denton <james.denton@rackspace.com>, openstack-discuss@lists.openstack.org <openstack-discuss@lists.openstack.org>
Subject: Re: [openstack-ansible] Installing OpenStack with Ansible fails during Keystone playbook on TASK openstack.osa.db_setup
CAUTION: This message originated externally, please use caution when clicking on links or opening attachments!
Hi James Denton,
Thanks for your quick response!!!
So as far as I understand, running "openstack-ansible
setup-openstack.yml" will start a keystone installation TASK that
connects to HAProxy, which in turn sends the connection to the galera
container. The machine targethost01 runs both the containers and
HAProxy. From deploymenthost, there is some connectivity to HAProxy:
root@deploymenthost:/opt/openstack-ansible/playbooks# telnet 172.29.236.101 3306
Trying 172.29.236.101...
Connected to 172.29.236.101.
Escape character is '^]'.
Connection closed by foreign host.
It appears that HAProxy is listening, but cannot provide a proper
reply, so the connection closes. Following your suggestion, on
targethost01, HAProxy is running, but complains about no galera
backend:
root@targethost01:~# systemctl status haproxy.service
● haproxy.service - HAProxy Load Balancer
Loaded: loaded (/lib/systemd/system/haproxy.service; enabled;
vendor preset: enabled)
Active: active (running) since Fri 2023-01-20 11:35:40 -03; 33min ago
Docs: man:haproxy(1)
file:/usr/share/doc/haproxy/configuration.txt.gz
Process: 276870 ExecStartPre=/usr/sbin/haproxy -Ws -f $CONFIG -c
-q $EXTRAOPTS (code=exited, status=0/SUCCESS)
Main PID: 276873 (haproxy)
Tasks: 5 (limit: 8192)
Memory: 13.1M
CPU: 2.165s
CGroup: /system.slice/haproxy.service
├─276873 /usr/sbin/haproxy -Ws -f
/etc/haproxy/haproxy.cfg -p /run/haproxy.pid -S
/run/haproxy-master.sock
└─276875 /usr/sbin/haproxy -Ws -f
/etc/haproxy/haproxy.cfg -p /run/haproxy.pid -S
/run/haproxy-master.sock
Jan 20 11:35:48 targethost01 haproxy[276875]: Server
nova_console-back/targethost01_nova_api_container-56e92564 is DOWN,
reason: Layer4 connection problem, info: "Conn>
Jan 20 11:35:48 targethost01 haproxy[276875]: backend
nova_console-back has no server available!
Jan 20 11:35:49 targethost01 haproxy[276875]: [WARNING] (276875) :
Server placement-back/targethost01_placement_container-90ccebb6 is
DOWN, reason: Layer4 connection >
Jan 20 11:35:49 targethost01 haproxy[276875]: Server
placement-back/targethost01_placement_container-90ccebb6 is DOWN,
reason: Layer4 connection problem, info: "Connec>
Jan 20 11:35:49 targethost01 haproxy[276875]: [ALERT] (276875) :
backend 'placement-back' has no server available!
Jan 20 11:35:49 targethost01 haproxy[276875]: backend placement-back
has no server available!
Jan 20 11:35:53 targethost01 haproxy[276875]: [WARNING] (276875) :
Server galera-back/targethost01_galera_container-5aa8474a is DOWN,
reason: Layer4 timeout, check du>
Jan 20 11:35:53 targethost01 haproxy[276875]: [ALERT] (276875) :
backend 'galera-back' has no server available!
Jan 20 11:35:53 targethost01 haproxy[276875]: Server
galera-back/targethost01_galera_container-5aa8474a is DOWN, reason:
Layer4 timeout, check duration: 12001ms. 0 act>
Jan 20 11:35:53 targethost01 haproxy[276875]: backend galera-back has
no server available!
It also warns about the other services, but since they are not
installed yet, I believe that it is the expected behavior. But galera
should have a functional backend, right? The container is running:
root@targethost01:~# lxc-ls
targethost01_cinder_api_container-b7ec9bdd
targethost01_galera_container-5aa8474a
targethost01_glance_container-b3ce5a33
targethost01_heat_api_container-57ec2a00
targethost01_horizon_container-c99d168e
targethost01_keystone_container-76e9b31b
targethost01_memcached_container-8edca03c
targethost01_neutron_server_container-fba7cb77
targethost01_nova_api_container-56e92564
targethost01_placement_container-90ccebb6
targethost01_rabbit_mq_container-2e5c5470
targethost01_repo_container-00531c23
targethost01_utility_container-dc05dc90
targethost01_zookeeper_container-294429e8 ubuntu-22-amd64
root@targethost01:~# lxc-info targethost01_galera_container-5aa8474a
Name: targethost01_galera_container-5aa8474a
State: RUNNING
PID: 102446
IP: 10.0.3.53
IP: 172.29.238.177
Link: 5aa8474a_eth0
TX bytes: 811.30 KiB
RX bytes: 57.49 MiB
Total bytes: 58.28 MiB
Link: 5aa8474a_eth1
TX bytes: 84.35 KiB
RX bytes: 1.06 MiB
Total bytes: 1.14 MiB
I can establish a connection and the server waits for a password:
root@targethost01:~# telnet 172.29.238.177 3306
Trying 172.29.238.177...
Connected to 172.29.238.177.
Escape character is '^]'.
u
5.5.5-10.6.10-MariaDB-1:10.6.10+maria~ubu2204-log:8PmS7Y:W'Yn=#6%Vbjmcmysql_native_password
Any hints?
Best regards.
On Fri, Jan 20, 2023 at 11:18 AM James Denton
<james.denton@rackspace.com> wrote:
>
> Hi –
>
>
>
> The ansible command to test the DB hits the Galera container directly, while the Ansible playbooks are likely using the VIP managed by HAproxy. I suspect that HAproxy has not started properly or is otherwise not serving traffic directed toward the internal_lb_vip_address.
>
>
>
> My suggestion at the moment is to check out the logs on the haproxy node to see if it’s working properly, and try testing connectivity from the deploy node via 172.29.236.101:3306. The haproxy logs will likely provide some insight here.
>
>
>
> --
>
> James Denton
>
> Principal Architect
>
> Rackspace Private Cloud - OpenStack
>
> james.denton@rackspace.com
>
>
>
> From: jmarcelo.alencar@gmail.com <jmarcelo.alencar@gmail.com>
> Date: Friday, January 20, 2023 at 6:45 AM
> To: openstack-discuss@lists.openstack.org <openstack-discuss@lists.openstack.org>
> Subject: [openstack-ansible] Installing OpenStack with Ansible fails during Keystone playbook on TASK openstack.osa.db_setup
>
> CAUTION: This message originated externally, please use caution when clicking on links or opening attachments!
>
>
> Hello Community,
>
> I am trying to create a two machine deployment following Openstack
> Ansible Deployment Guide
> (https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.openstack.org%2Fproject-deploy-guide%2Fopenstack-ansible%2Flatest%2F&data=05%7C01%7Cjames.denton%40rackspace.com%7Ca0d5435aeb294d38bbcb08dafaf9ccd7%7C570057f473ef41c8bcbb08db2fc15c2b%7C0%7C0%7C638098248039916228%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=9Guhh4n3xlExA0biSyHR5iXrxmzrkZyF0xJh2cf8zrk%3D&reserved=0).
> The two machines are named targethost01 and targethost02, and I am
> running Ansible from deploymenthost. Every machine has 4-Core CPUs, 8
> GB of RAM, and 240 GB SSD. I am using Ubuntu 22.04.1 LTS.
>
> The machine targethost01 has the following network configuration:
>
> network:
> version: 2
> ethernets:
> enp5s0:
> dhcp4: true
> enp6s0: {}
> enp7s0: {}
> enp8s0: {}
> enp9s0: {}
> vlans:
> vlan.10:
> id: 10
> link: enp6s0
> addresses: [ ]
> vlan.20:
> id: 20
> link: enp7s0
> addresses: [ ]
> vlan.30:
> id: 30
> link: enp8s0
> addresses: [ ]
> vlan.40:
> id: 40
> link: enp9s0
> addresses: [ ]
> bridges:
> br-mgmt:
> addresses: [ 172.29.236.101/22 ]
> mtu: 1500
> interfaces:
> - vlan.10
> br-storage:
> addresses: [ 172.29.244.101/22 ]
> mtu: 1500
> interfaces:
> - vlan.20
> br-vlan:
> addresses: []
> mtu: 1500
> interfaces:
> - vlan.30
> br-vxlan:
> addresses: [ 172.29.240.101/22 ]
> mtu: 1500
> interfaces:
> - vlan.40
>
>
> And targethost02 has the following network configuration:
>
>
> network:
> version: 2
> ethernets:
> enp5s0:
> dhcp4: true
> enp6s0: {}
> enp7s0: {}
> enp8s0: {}
> enp9s0: {}
> vlans:
> vlan.10:
> id: 10
> link: enp6s0
> addresses: [ ]
> vlan.20:
> id: 20
> link: enp7s0
> addresses: [ ]
> vlan.30:
> id: 30
> link: enp8s0
> addresses: [ ]
> vlan.40:
> id: 40
> link: enp9s0
> addresses: [ ]
> bridges:
> br-mgmt:
> addresses: [ 172.29.236.102/22 ]
> mtu: 1500
> interfaces:
> - vlan.10
> br-storage:
> addresses: [ 172.29.244.102/22 ]
> mtu: 1500
> interfaces:
> - vlan.20
> br-vlan:
> addresses: []
> mtu: 1500
> interfaces:
> - vlan.30
> br-vxlan:
> addresses: [ 172.29.240.102/22 ]
> mtu: 1500
> interfaces:
> - vlan.40
>
>
> On the deploymenthost, /etc/openstack_deploy/openstack_user_config.yml
> has the following:
>
>
> ---
> cidr_networks:
> container: 172.29.236.0/22
> tunnel: 172.29.240.0/22
> storage: 172.29.244.0/22
> used_ips:
> - 172.29.236.1
> - "172.29.236.100,172.29.236.200"
> - "172.29.240.100,172.29.240.200"
> - "172.29.244.100,172.29.244.200"
> global_overrides:
> internal_lb_vip_address: 172.29.236.101
> external_lb_vip_address: "{{ bootstrap_host_public_address |
> default(ansible_facts['default_ipv4']['address']) }}"
> management_bridge: "br-mgmt"
> provider_networks:
> - network:
> group_binds:
> - all_containers
> - hosts
> type: "raw"
> container_bridge: "br-mgmt"
> container_interface: "eth1"
> container_type: "veth"
> ip_from_q: "container"
> is_container_address: true
> - network:
> group_binds:
> - glance_api
> - cinder_api
> - cinder_volume
> - nova_compute
> type: "raw"
> container_bridge: "br-storage"
> container_type: "veth"
> container_interface: "eth2"
> container_mtu: "9000"
> ip_from_q: "storage"
> - network:
> group_binds:
> - neutron_linuxbridge_agent
> container_bridge: "br-vxlan"
> container_type: "veth"
> container_interface: "eth10"
> container_mtu: "9000"
> ip_from_q: "tunnel"
> type: "vxlan"
> range: "1:1000"
> net_name: "vxlan"
> - network:
> group_binds:
> - neutron_linuxbridge_agent
> container_bridge: "br-vlan"
> container_type: "veth"
> container_interface: "eth11"
> type: "vlan"
> range: "101:200,301:400"
> net_name: "vlan"
> - network:
> group_binds:
> - neutron_linuxbridge_agent
> container_bridge: "br-vlan"
> container_type: "veth"
> container_interface: "eth12"
> host_bind_override: "eth12"
> type: "flat"
> net_name: "flat"
> shared-infra_hosts:
> targethost01:
> ip: 172.29.236.101
> repo-infra_hosts:
> targethost01:
> ip: 172.29.236.101
> coordination_hosts:
> targethost01:
> ip: 172.29.236.101
> os-infra_hosts:
> targethost01:
> ip: 172.29.236.101
> identity_hosts:
> targethost01:
> ip: 172.29.236.101
> network_hosts:
> targethost01:
> ip: 172.29.236.101
> compute_hosts:
> targethost01:
> ip: 172.29.236.101
> targethost02:
> ip: 172.29.236.102
> storage-infra_hosts:
> targethost01:
> ip: 172.29.236.101
> storage_hosts:
> targethost01:
> ip: 172.29.236.101
>
>
> Also on the deploymenthost, /etc/openstack_deploy/conf.d/haproxy.yml
> has the following:
>
>
> haproxy_hosts:
> targethost01:
> ip: 172.29.236.101
>
>
> At the Run Playbooks step of the guide, the following two Ansible
> commands return with unreachable=0 failed=0:
>
> # openstack-ansible setup-hosts.yml
> # openstack-ansible setup-infrastructure.yml
>
> And verifying the database also returns no error:
>
>
> root@deploymenthost:/opt/openstack-ansible/playbooks# ansible
> galera_container -m shell \
> -a "mysql -h localhost -e 'show status like \"%wsrep_cluster_%\";'"
> Variable files: "-e @/etc/openstack_deploy/user_secrets.yml -e
> @/etc/openstack_deploy/user_variables.yml "
> [WARNING]: Unable to parse /etc/openstack_deploy/inventory.ini as an
> inventory source
> targethost01_galera_container-5aa8474a | CHANGED | rc=0 >>
> Variable_name Value
> wsrep_cluster_weight 1
> wsrep_cluster_capabilities
> wsrep_cluster_conf_id 1
> wsrep_cluster_size 1
> wsrep_cluster_state_uuid e7a0c332-97fe-11ed-b0d4-26b30049826d
> wsrep_cluster_status Primary
>
>
> But when I execute openstack-ansible setup-openstack.yml, I get this:
>
>
> TASK [os_keystone : Fact for apache module mod_auth_openidc to be installed] ***
> ok: [targethost01_keystone_container-76e9b31b]
> TASK [include_role : openstack.osa.db_setup] ***********************************
> TASK [openstack.osa.db_setup : Create database for service] ********************
> failed: [targethost01_keystone_container-76e9b31b ->
> targethost01_utility_container-dc05dc90(172.29.238.59)] (item=None) =>
> {"censored": "the output has been hidden due to the fact that 'no_log:
> true' was specified for this result", "changed": false}
> fatal: [targethost01_keystone_container-76e9b31b -> {{
> _oslodb_setup_host }}]: FAILED! => {"censored": "the output has been
> hidden due to the fact that 'no_log: true' was specified for this
> result", "changed": false}
> PLAY RECAP *********************************************************************
> targethost01_keystone_container-76e9b31b : ok=33 changed=0
> unreachable=0 failed=1 skipped=8 rescued=0 ignored=0
> targethost01_utility_container-dc05dc90 : ok=3 changed=0
> unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
> EXIT NOTICE [Playbook execution failure] **************************************
> ===============================================================================
>
>
> First, how can I disable the "censored" warning? I wonder if the
> uncensored running could give me more clues. Second, it appears to be
> a problem creating the database (keystone db sync?) How can I test the
> database execution inside the LXC containers? I tried to log into one
> of the containers and ping the hosts IP and it works, so they have
> connectivity. I set up the passwords with:
>
> # cd /opt/openstack-ansible
> # ./scripts/pw-token-gen.py --file /etc/openstack_deploy/user_secrets.yml
>
>
> Any help?
>
> Best Regards.
>
>
>
>
> --
> __________________________________
>
> João Marcelo Uchôa de Alencar
> jmarcelo.alencar(at)gmail.com
> __________________________________
--
__________________________________
João Marcelo Uchôa de Alencar
jmarcelo.alencar(at)gmail.com
__________________________________