[openstack-ansible] Check current state of Keystone DB: HAProxy galera-clustercheck fails if external_vip is used

Dmitriy Rabotyagov noonedeadpunk at ya.ru
Fri Feb 12 17:37:03 UTC 2021


Hi Philipp,

This sounds to me like an issue with routing, when you're trying to reach internal network through the default gateway (or your external_vip is in the same subnet as the internal_vip?).

In case your internal and external VIPs are in the same network, you can just override variable `galera_monitoring_allowed_source` [1] and define it to the valid list of IPs that you expect to check for cluster status

[1] https://opendev.org/openstack/openstack-ansible/src/branch/master/inventory/group_vars/galera_all.yml#L30-L39

12.02.2021, 14:37, "Philipp Wörner" <philipp.woerner at dhbw-mannheim.de>:
> Dear all,
>
> unfortunately we are facing an problem while setting up Openstack (described at the end of the mail).
>
> Last year we had no issue with our configuration.
>
> I found the problem why the playbook stops and a temporary workaround.
>
> But I don’t know the root cause, maybe you can help me with some advice.
>
> Thank you in advance!
>
> Have a sunny weekend and best regards,
>
> Philipp
>
> Where setup-openstack.yml stops:
>
> TASK [os_keystone : Check current state of Keystone DB] **************************************************************************************************************************************************************************************
>
> fatal: [infra1_keystone_container-01c233df]: FAILED! => {"changed": true, "cmd": ["/openstack/venvs/keystone-21.2.3.dev4/bin/keystone-manage", "db_sync", "--check"], "delta": "0:01:42.166790", "end": "2021-02-11 13:01:06.282388", "failed_when_result": true, "msg": "non-zero return code", "rc": 1, "start": "2021-02-11 12:59:24.115598", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
>
> Reason:
>
> This is caused because HAProxy thinks the service is not available… caused by the health check against port 9200.
>
> If I test the cluster-check-service manually, I see the external_vip isn’t allowed.
>
> root at bc1bl11:/home/ubuntu# telnet -b <internal_vip> 192.168.110.235 9200
>
> Trying 192.168.110.235...
>
> Connected to 192.168.110.235.
>
> Escape character is '^]'.
>
> HTTP/1.1 200 OK
>
> Content-Type: text/plain
>
> Connection: close
>
> Content-Length: 40
>
> Percona XtraDB Cluster Node is synced.
>
> Connection closed by foreign host.
>
> root at bc1bl11:/home/ubuntu# telnet -b <external_vip> 192.168.110.235 9200
>
> Trying 192.168.110.235...
>
> Connected to 192.168.110.235.
>
> Escape character is '^]'.
>
> Connection closed by foreign host.
>
> Workaround:
>
> After manually modifying the service-configuration and adding the external_vip to the whitelist everything works and the OSA-playbook succeeds as well:
>
> root at infra1-galera-container-492e1206:/# cat /etc/xinetd.d/mysqlchk
>
> # default: on
>
> # description: mysqlchk
>
> # Ansible managed
>
> service mysqlchk
>
> {
>
>         disable = no
>
>         flags = REUSE
>
>         socket_type = stream
>
>         port = 9200
>
>         wait = no
>
>         user = nobody
>
>         server = /usr/local/bin/clustercheck
>
>         log_on_failure += USERID
>
>                 only_from = 192.168.110.200 192.168.110.235 192.168.110.211 127.0.0.1
>
>                 per_source = UNLIMITED
>
> }
>
> Question:
>
> I am wondering now why haproxy uses the external_vip to check the mysql-service and why I am facing this problem now… because last year everything was fine with our configuration.
>
> We just moved the external_vip from the NIC to the bridge in the netplan-config and the external_vip is now in the same network as the internal_vip.
>
> Here is also a snipped of our haproxy config:
>
> root at bc1bl11:/home/ubuntu# cat /etc/haproxy/haproxy.cfg
>
> # Ansible managed
>
> global
>
>         log /dev/log local0
>
>         chroot /var/lib/haproxy
>
>         user haproxy
>
>         group haproxy
>
>         daemon
>
>         maxconn 4096
>
>         stats socket /var/run/haproxy.stat level admin mode 600
>
>                 ssl-default-bind-options force-tlsv12
>
>         tune.ssl.default-dh-param 2048
>
> defaults
>
>         log global
>
>         option dontlognull
>
>         option redispatch
>
>         option forceclose
>
>         retries 3
>
>         timeout client 50s
>
>         timeout connect 10s
>
>         timeout http-request 5s
>
>         timeout server 50s
>
>         maxconn 4096
>
>>
> frontend galera-front-1
>
>     bind 192.168.110.211:3306
>
>     option tcplog
>
>     timeout client 5000s
>
>     acl white_list src 127.0.0.1/8 192.168.0.0/16 172.16.0.0/12 10.0.0.0/8
>
>     tcp-request content accept if white_list
>
>     tcp-request content reject
>
>     mode tcp
>
>     default_backend galera-back
>
> backend galera-back
>
>     mode tcp
>
>     balance leastconn
>
>     timeout server 5000s
>
>     stick store-request src
>
>     stick-table type ip size 256k expire 30m
>
>     option tcplog
>
>     option httpchk HEAD / HTTP/1.0\r\nUser-agent:\ osa-haproxy-healthcheck
>
>    # server infra1_galera_container-492e1206 192.168.110.235:3306
>
>    server infra1_galera_container-492e1206 192.168.110.235:3306 check port 9200 inter 12000 rise 1 fall 1


-- 
Kind Regards,
Dmitriy Rabotyagov



More information about the openstack-discuss mailing list