Dear all,
unfortunately we are facing an problem while setting up Openstack (described at the end of the mail).
Last year we had no issue with our configuration.
I found the problem why the playbook stops and a temporary workaround.
But I don’t know the root cause, maybe you can help me with some advice.
Thank you in advance!
Have a sunny weekend and best regards,
Philipp
Where setup-openstack.yml stops:
TASK [os_keystone : Check current state of Keystone DB] **************************************************************************************************************************************************************************************
fatal: [infra1_keystone_container-01c233df]: FAILED! => {"changed": true, "cmd": ["/openstack/venvs/keystone-21.2.3.dev4/bin/keystone-manage", "db_sync", "--check"], "delta": "0:01:42.166790", "end": "2021-02-11 13:01:06.282388", "failed_when_result": true, "msg": "non-zero return code", "rc": 1, "start": "2021-02-11 12:59:24.115598", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
Reason:
This is caused because HAProxy thinks the service is not available… caused by the health check against port 9200.
If I test the cluster-check-service manually, I see the external_vip isn’t allowed.
root@bc1bl11:/home/ubuntu# telnet -b <internal_vip> 192.168.110.235 9200
Trying 192.168.110.235...
Connected to 192.168.110.235.
Escape character is '^]'.
HTTP/1.1 200 OK
Content-Type: text/plain
Connection: close
Content-Length: 40
Percona XtraDB Cluster Node is synced.
Connection closed by foreign host.
root@bc1bl11:/home/ubuntu# telnet -b <external_vip> 192.168.110.235 9200
Trying 192.168.110.235...
Connected to 192.168.110.235.
Escape character is '^]'.
Connection closed by foreign host.
Workaround:
After manually modifying the service-configuration and adding the external_vip to the whitelist everything works and the OSA-playbook succeeds as well:
root@infra1-galera-container-492e1206:/# cat /etc/xinetd.d/mysqlchk
# default: on
# description: mysqlchk
# Ansible managed
service mysqlchk
{
disable = no
flags = REUSE
socket_type = stream
port = 9200
wait = no
user = nobody
server = /usr/local/bin/clustercheck
log_on_failure += USERID
only_from = 192.168.110.200 192.168.110.235 192.168.110.211 127.0.0.1
per_source = UNLIMITED
}
Question:
I am wondering now why haproxy uses the external_vip to check the mysql-service and why I am facing this problem now… because last year everything was fine with our configuration.
We just moved the external_vip from the NIC to the bridge in the netplan-config and the external_vip is now in the same network as the internal_vip.
Here is also a snipped of our haproxy config:
root@bc1bl11:/home/ubuntu# cat /etc/haproxy/haproxy.cfg
# Ansible managed
global
log /dev/log local0
chroot /var/lib/haproxy
user haproxy
group haproxy
daemon
maxconn 4096
stats socket /var/run/haproxy.stat level admin mode 600
ssl-default-bind-options force-tlsv12
tune.ssl.default-dh-param 2048
defaults
log global
option dontlognull
option redispatch
option forceclose
retries 3
timeout client 50s
timeout connect 10s
timeout http-request 5s
timeout server 50s
maxconn 4096
…
frontend galera-front-1
bind 192.168.110.211:3306
option tcplog
timeout client 5000s
acl white_list src 127.0.0.1/8 192.168.0.0/16 172.16.0.0/12 10.0.0.0/8
tcp-request content accept if white_list
tcp-request content reject
mode tcp
default_backend galera-back
backend galera-back
mode tcp
balance leastconn
timeout server 5000s
stick store-request src
stick-table type ip size 256k expire 30m
option tcplog
option httpchk HEAD / HTTP/1.0\r\nUser-agent:\ osa-haproxy-healthcheck
# server infra1_galera_container-492e1206 192.168.110.235:3306
server infra1_galera_container-492e1206 192.168.110.235:3306 check port 9200 inter 12000 rise 1 fall 1