[openstack-ansible] Check current state of Keystone DB: HAProxy galera-clustercheck fails if external_vip is used
Dear all, unfortunately we are facing an problem while setting up Openstack (described at the end of the mail). Last year we had no issue with our configuration. I found the problem why the playbook stops and a temporary workaround. But I dont know the root cause, maybe you can help me with some advice. Thank you in advance! Have a sunny weekend and best regards, Philipp Where setup-openstack.yml stops: TASK [os_keystone : Check current state of Keystone DB] **************************************************************************** **************************************************************************** ****************************** fatal: [infra1_keystone_container-01c233df]: FAILED! => {"changed": true, "cmd": ["/openstack/venvs/keystone-21.2.3.dev4/bin/keystone-manage", "db_sync", "--check"], "delta": "0:01:42.166790", "end": "2021-02-11 13:01:06.282388", "failed_when_result": true, "msg": "non-zero return code", "rc": 1, "start": "2021-02-11 12:59:24.115598", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []} Reason: This is caused because HAProxy thinks the service is not available caused by the health check against port 9200. If I test the cluster-check-service manually, I see the external_vip isnt allowed. root@bc1bl11:/home/ubuntu# telnet -b <internal_vip> 192.168.110.235 9200 Trying 192.168.110.235... Connected to 192.168.110.235. Escape character is '^]'. HTTP/1.1 200 OK Content-Type: text/plain Connection: close Content-Length: 40 Percona XtraDB Cluster Node is synced. Connection closed by foreign host. root@bc1bl11:/home/ubuntu# telnet -b <external_vip> 192.168.110.235 9200 Trying 192.168.110.235... Connected to 192.168.110.235. Escape character is '^]'. Connection closed by foreign host. Workaround: After manually modifying the service-configuration and adding the external_vip to the whitelist everything works and the OSA-playbook succeeds as well: root@infra1-galera-container-492e1206:/# cat /etc/xinetd.d/mysqlchk # default: on # description: mysqlchk # Ansible managed service mysqlchk { disable = no flags = REUSE socket_type = stream port = 9200 wait = no user = nobody server = /usr/local/bin/clustercheck log_on_failure += USERID only_from = 192.168.110.200 192.168.110.235 192.168.110.211 127.0.0.1 per_source = UNLIMITED } Question: I am wondering now why haproxy uses the external_vip to check the mysql-service and why I am facing this problem now because last year everything was fine with our configuration. We just moved the external_vip from the NIC to the bridge in the netplan-config and the external_vip is now in the same network as the internal_vip. Here is also a snipped of our haproxy config: root@bc1bl11:/home/ubuntu# cat /etc/haproxy/haproxy.cfg # Ansible managed global log /dev/log local0 chroot /var/lib/haproxy user haproxy group haproxy daemon maxconn 4096 stats socket /var/run/haproxy.stat level admin mode 600 ssl-default-bind-options force-tlsv12 tune.ssl.default-dh-param 2048 defaults log global option dontlognull option redispatch option forceclose retries 3 timeout client 50s timeout connect 10s timeout http-request 5s timeout server 50s maxconn 4096 frontend galera-front-1 bind 192.168.110.211:3306 option tcplog timeout client 5000s acl white_list src 127.0.0.1/8 192.168.0.0/16 172.16.0.0/12 10.0.0.0/8 tcp-request content accept if white_list tcp-request content reject mode tcp default_backend galera-back backend galera-back mode tcp balance leastconn timeout server 5000s stick store-request src stick-table type ip size 256k expire 30m option tcplog option httpchk HEAD / HTTP/1.0\r\nUser-agent:\ osa-haproxy-healthcheck # server infra1_galera_container-492e1206 192.168.110.235:3306 server infra1_galera_container-492e1206 192.168.110.235:3306 check port 9200 inter 12000 rise 1 fall 1
Hi Philipp, This sounds to me like an issue with routing, when you're trying to reach internal network through the default gateway (or your external_vip is in the same subnet as the internal_vip?). In case your internal and external VIPs are in the same network, you can just override variable `galera_monitoring_allowed_source` [1] and define it to the valid list of IPs that you expect to check for cluster status [1] https://opendev.org/openstack/openstack-ansible/src/branch/master/inventory/... 12.02.2021, 14:37, "Philipp Wörner" <philipp.woerner@dhbw-mannheim.de>:
Dear all,
unfortunately we are facing an problem while setting up Openstack (described at the end of the mail).
Last year we had no issue with our configuration.
I found the problem why the playbook stops and a temporary workaround.
But I don’t know the root cause, maybe you can help me with some advice.
Thank you in advance!
Have a sunny weekend and best regards,
Philipp
Where setup-openstack.yml stops:
TASK [os_keystone : Check current state of Keystone DB] **************************************************************************************************************************************************************************************
fatal: [infra1_keystone_container-01c233df]: FAILED! => {"changed": true, "cmd": ["/openstack/venvs/keystone-21.2.3.dev4/bin/keystone-manage", "db_sync", "--check"], "delta": "0:01:42.166790", "end": "2021-02-11 13:01:06.282388", "failed_when_result": true, "msg": "non-zero return code", "rc": 1, "start": "2021-02-11 12:59:24.115598", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
Reason:
This is caused because HAProxy thinks the service is not available… caused by the health check against port 9200.
If I test the cluster-check-service manually, I see the external_vip isn’t allowed.
root@bc1bl11:/home/ubuntu# telnet -b <internal_vip> 192.168.110.235 9200
Trying 192.168.110.235...
Connected to 192.168.110.235.
Escape character is '^]'.
HTTP/1.1 200 OK
Content-Type: text/plain
Connection: close
Content-Length: 40
Percona XtraDB Cluster Node is synced.
Connection closed by foreign host.
root@bc1bl11:/home/ubuntu# telnet -b <external_vip> 192.168.110.235 9200
Trying 192.168.110.235...
Connected to 192.168.110.235.
Escape character is '^]'.
Connection closed by foreign host.
Workaround:
After manually modifying the service-configuration and adding the external_vip to the whitelist everything works and the OSA-playbook succeeds as well:
root@infra1-galera-container-492e1206:/# cat /etc/xinetd.d/mysqlchk
# default: on
# description: mysqlchk
# Ansible managed
service mysqlchk
{
disable = no
flags = REUSE
socket_type = stream
port = 9200
wait = no
user = nobody
server = /usr/local/bin/clustercheck
log_on_failure += USERID
only_from = 192.168.110.200 192.168.110.235 192.168.110.211 127.0.0.1
per_source = UNLIMITED
}
Question:
I am wondering now why haproxy uses the external_vip to check the mysql-service and why I am facing this problem now… because last year everything was fine with our configuration.
We just moved the external_vip from the NIC to the bridge in the netplan-config and the external_vip is now in the same network as the internal_vip.
Here is also a snipped of our haproxy config:
root@bc1bl11:/home/ubuntu# cat /etc/haproxy/haproxy.cfg
# Ansible managed
global
log /dev/log local0
chroot /var/lib/haproxy
user haproxy
group haproxy
daemon
maxconn 4096
stats socket /var/run/haproxy.stat level admin mode 600
ssl-default-bind-options force-tlsv12
tune.ssl.default-dh-param 2048
defaults
log global
option dontlognull
option redispatch
option forceclose
retries 3
timeout client 50s
timeout connect 10s
timeout http-request 5s
timeout server 50s
maxconn 4096
…
frontend galera-front-1
bind 192.168.110.211:3306
option tcplog
timeout client 5000s
acl white_list src 127.0.0.1/8 192.168.0.0/16 172.16.0.0/12 10.0.0.0/8
tcp-request content accept if white_list
tcp-request content reject
mode tcp
default_backend galera-back
backend galera-back
mode tcp
balance leastconn
timeout server 5000s
stick store-request src
stick-table type ip size 256k expire 30m
option tcplog
option httpchk HEAD / HTTP/1.0\r\nUser-agent:\ osa-haproxy-healthcheck
# server infra1_galera_container-492e1206 192.168.110.235:3306
server infra1_galera_container-492e1206 192.168.110.235:3306 check port 9200 inter 12000 rise 1 fall 1
-- Kind Regards, Dmitriy Rabotyagov
participants (2)
-
Dmitriy Rabotyagov
-
Philipp Wörner