Re: Antelope Magnum: clusters flipping health status between HEALTHY and UNHEALTHY

5 Jun 2024

      On 3/6/2024 4:29 am, Michel Jouvin wrote:
...
Hi,
I progressed a little bit on the flipping health status. When a cluster 
becomes unhealthy, the curl command returns:
-----
$ curl --insecure https://157.136.248.202:6443/healthz
[+]ping ok
[+]log ok
[-]etcd failed: reason withheld
... (all others ok)
-----
It lasts a few minutes and then the clusters affected become healthy 
again. It seems to happen several clusters at the same time. Sometimes 
it can remains healthy only for a few seconds/minutes and be unhealthy 
again and so on... It seems that the cluster (kubectl) tends to be 
unresponsive when transitioning from one state to the other but curl 
always responds... Looks as something that becomes unresponsive during 
some time...
Any suggestion welcome!
Michel
Is this a cluster with multiple control plane nodes? If so, you may want 
to check etcd logs - they have quite a low latency requirements. Etcd 
website will tell you more.

You can also take a look at kube-apiserver logs when things are 
transiting. You can check both kube-apiserver and etcd by SSH-ing to the 
control plane nodes, using core@<fip>

Regards,
Jake

Re: Antelope Magnum: clusters flipping health status between HEALTHY and UNHEALTHY

Jake Yip