[openstack-ansible] Check current state of Keystone DB: HAProxy galera-clustercheck fails if external_vip is used
Dmitriy Rabotyagov
noonedeadpunk at ya.ru
Fri Feb 12 17:37:03 UTC 2021
Hi Philipp,
This sounds to me like an issue with routing, when you're trying to reach internal network through the default gateway (or your external_vip is in the same subnet as the internal_vip?).
In case your internal and external VIPs are in the same network, you can just override variable `galera_monitoring_allowed_source` [1] and define it to the valid list of IPs that you expect to check for cluster status
[1] https://opendev.org/openstack/openstack-ansible/src/branch/master/inventory/group_vars/galera_all.yml#L30-L39
12.02.2021, 14:37, "Philipp Wörner" <philipp.woerner at dhbw-mannheim.de>:
> Dear all,
>
> unfortunately we are facing an problem while setting up Openstack (described at the end of the mail).
>
> Last year we had no issue with our configuration.
>
> I found the problem why the playbook stops and a temporary workaround.
>
> But I don’t know the root cause, maybe you can help me with some advice.
>
> Thank you in advance!
>
> Have a sunny weekend and best regards,
>
> Philipp
>
> Where setup-openstack.yml stops:
>
> TASK [os_keystone : Check current state of Keystone DB] **************************************************************************************************************************************************************************************
>
> fatal: [infra1_keystone_container-01c233df]: FAILED! => {"changed": true, "cmd": ["/openstack/venvs/keystone-21.2.3.dev4/bin/keystone-manage", "db_sync", "--check"], "delta": "0:01:42.166790", "end": "2021-02-11 13:01:06.282388", "failed_when_result": true, "msg": "non-zero return code", "rc": 1, "start": "2021-02-11 12:59:24.115598", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
>
> Reason:
>
> This is caused because HAProxy thinks the service is not available… caused by the health check against port 9200.
>
> If I test the cluster-check-service manually, I see the external_vip isn’t allowed.
>
> root at bc1bl11:/home/ubuntu# telnet -b <internal_vip> 192.168.110.235 9200
>
> Trying 192.168.110.235...
>
> Connected to 192.168.110.235.
>
> Escape character is '^]'.
>
> HTTP/1.1 200 OK
>
> Content-Type: text/plain
>
> Connection: close
>
> Content-Length: 40
>
> Percona XtraDB Cluster Node is synced.
>
> Connection closed by foreign host.
>
> root at bc1bl11:/home/ubuntu# telnet -b <external_vip> 192.168.110.235 9200
>
> Trying 192.168.110.235...
>
> Connected to 192.168.110.235.
>
> Escape character is '^]'.
>
> Connection closed by foreign host.
>
> Workaround:
>
> After manually modifying the service-configuration and adding the external_vip to the whitelist everything works and the OSA-playbook succeeds as well:
>
> root at infra1-galera-container-492e1206:/# cat /etc/xinetd.d/mysqlchk
>
> # default: on
>
> # description: mysqlchk
>
> # Ansible managed
>
> service mysqlchk
>
> {
>
> disable = no
>
> flags = REUSE
>
> socket_type = stream
>
> port = 9200
>
> wait = no
>
> user = nobody
>
> server = /usr/local/bin/clustercheck
>
> log_on_failure += USERID
>
> only_from = 192.168.110.200 192.168.110.235 192.168.110.211 127.0.0.1
>
> per_source = UNLIMITED
>
> }
>
> Question:
>
> I am wondering now why haproxy uses the external_vip to check the mysql-service and why I am facing this problem now… because last year everything was fine with our configuration.
>
> We just moved the external_vip from the NIC to the bridge in the netplan-config and the external_vip is now in the same network as the internal_vip.
>
> Here is also a snipped of our haproxy config:
>
> root at bc1bl11:/home/ubuntu# cat /etc/haproxy/haproxy.cfg
>
> # Ansible managed
>
> global
>
> log /dev/log local0
>
> chroot /var/lib/haproxy
>
> user haproxy
>
> group haproxy
>
> daemon
>
> maxconn 4096
>
> stats socket /var/run/haproxy.stat level admin mode 600
>
> ssl-default-bind-options force-tlsv12
>
> tune.ssl.default-dh-param 2048
>
> defaults
>
> log global
>
> option dontlognull
>
> option redispatch
>
> option forceclose
>
> retries 3
>
> timeout client 50s
>
> timeout connect 10s
>
> timeout http-request 5s
>
> timeout server 50s
>
> maxconn 4096
>
> …
>
> frontend galera-front-1
>
> bind 192.168.110.211:3306
>
> option tcplog
>
> timeout client 5000s
>
> acl white_list src 127.0.0.1/8 192.168.0.0/16 172.16.0.0/12 10.0.0.0/8
>
> tcp-request content accept if white_list
>
> tcp-request content reject
>
> mode tcp
>
> default_backend galera-back
>
> backend galera-back
>
> mode tcp
>
> balance leastconn
>
> timeout server 5000s
>
> stick store-request src
>
> stick-table type ip size 256k expire 30m
>
> option tcplog
>
> option httpchk HEAD / HTTP/1.0\r\nUser-agent:\ osa-haproxy-healthcheck
>
> # server infra1_galera_container-492e1206 192.168.110.235:3306
>
> server infra1_galera_container-492e1206 192.168.110.235:3306 check port 9200 inter 12000 rise 1 fall 1
--
Kind Regards,
Dmitriy Rabotyagov
More information about the openstack-discuss
mailing list