Dear all,

unfortunately we are facing an problem while setting up Openstack (described at the end of the mail).

Last year we had no issue with our configuration.

I found the problem why the playbook stops and a temporary workaround.

But I don’t know the root cause, maybe you can help me with some advice.

Thank you in advance!

Have a sunny weekend and best regards,

Philipp

Where setup-openstack.yml stops:

TASK [os_keystone : Check current state of Keystone DB] **************************************************************************************************************************************************************************************

fatal: [infra1_keystone_container-01c233df]: FAILED! => {"changed": true, "cmd": ["/openstack/venvs/keystone-21.2.3.dev4/bin/keystone-manage", "db_sync", "--check"], "delta": "0:01:42.166790", "end": "2021-02-11 13:01:06.282388", "failed_when_result": true, "msg": "non-zero return code", "rc": 1, "start": "2021-02-11 12:59:24.115598", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}

Reason:

This is caused because HAProxy thinks the service is not available… caused by the health check against port 9200.

If I test the cluster-check-service manually, I see the external_vip isn’t allowed.

root@bc1bl11:/home/ubuntu# telnet -b <internal_vip> 192.168.110.235 9200

Trying 192.168.110.235...

Connected to 192.168.110.235.

Escape character is '^]'.

HTTP/1.1 200 OK

Content-Type: text/plain

Connection: close

Content-Length: 40

Percona XtraDB Cluster Node is synced.

Connection closed by foreign host.

root@bc1bl11:/home/ubuntu# telnet -b <external_vip> 192.168.110.235 9200

Trying 192.168.110.235...

Connected to 192.168.110.235.

Escape character is '^]'.

Connection closed by foreign host.

Workaround:

After manually modifying the service-configuration and adding the external_vip to the whitelist everything works and the OSA-playbook succeeds as well:

root@infra1-galera-container-492e1206:/# cat /etc/xinetd.d/mysqlchk

# default: on

# description: mysqlchk

# Ansible managed

service mysqlchk

{

disable = no

flags = REUSE

socket_type = stream

port = 9200

wait = no

user = nobody

server = /usr/local/bin/clustercheck

log_on_failure += USERID

only_from = 192.168.110.200 192.168.110.235 192.168.110.211 127.0.0.1

per_source = UNLIMITED

}

Question:

I am wondering now why haproxy uses the external_vip to check the mysql-service and why I am facing this problem now… because last year everything was fine with our configuration.

We just moved the external_vip from the NIC to the bridge in the netplan-config and the external_vip is now in the same network as the internal_vip.

Here is also a snipped of our haproxy config:

root@bc1bl11:/home/ubuntu# cat /etc/haproxy/haproxy.cfg

# Ansible managed

global

log /dev/log local0

chroot /var/lib/haproxy

user haproxy

group haproxy

daemon

maxconn 4096

stats socket /var/run/haproxy.stat level admin mode 600

ssl-default-bind-options force-tlsv12

tune.ssl.default-dh-param 2048

defaults

log global

option dontlognull

option redispatch

option forceclose

retries 3

timeout client 50s

timeout connect 10s

timeout http-request 5s

timeout server 50s

maxconn 4096

…

frontend galera-front-1

bind 192.168.110.211:3306

option tcplog

timeout client 5000s

acl white_list src 127.0.0.1/8 192.168.0.0/16 172.16.0.0/12 10.0.0.0/8

tcp-request content accept if white_list

tcp-request content reject

mode tcp

default_backend galera-back

backend galera-back

mode tcp

balance leastconn

timeout server 5000s

stick store-request src

stick-table type ip size 256k expire 30m

option tcplog

option httpchk HEAD / HTTP/1.0\r\nUser-agent:\ osa-haproxy-healthcheck

# server infra1_galera_container-492e1206 192.168.110.235:3306

server infra1_galera_container-492e1206 192.168.110.235:3306 check port 9200 inter 12000 rise 1 fall 1