On 3/6/2024 4:29 am, Michel Jouvin wrote:
Hi,
I progressed a little bit on the flipping health status. When a cluster becomes unhealthy, the curl command returns:
----- $ curl --insecure https://157.136.248.202:6443/healthz [+]ping ok [+]log ok [-]etcd failed: reason withheld ... (all others ok) -----
It lasts a few minutes and then the clusters affected become healthy again. It seems to happen several clusters at the same time. Sometimes it can remains healthy only for a few seconds/minutes and be unhealthy again and so on... It seems that the cluster (kubectl) tends to be unresponsive when transitioning from one state to the other but curl always responds... Looks as something that becomes unresponsive during some time...
Any suggestion welcome!
Michel
Is this a cluster with multiple control plane nodes? If so, you may want to check etcd logs - they have quite a low latency requirements. Etcd website will tell you more. You can also take a look at kube-apiserver logs when things are transiting. You can check both kube-apiserver and etcd by SSH-ing to the control plane nodes, using core@<fip> Regards, Jake