Hi,

I progressed a little bit on the flipping health status. When a cluster becomes unhealthy, the curl command returns:

-----
$ curl --insecure  https://157.136.248.202:6443/healthz
[+]ping ok
[+]log ok
[-]etcd failed: reason withheld
... (all others ok)
-----

It lasts a few minutes and then the clusters affected become healthy again. It seems to happen several clusters at the same time. Sometimes it can remains healthy only for a few seconds/minutes and be unhealthy again and so on... It seems that the cluster (kubectl) tends to be unresponsive when transitioning from one state to the other but curl always responds... Looks as something that becomes unresponsive during some time...

Any suggestion welcome!

Michel

Le 31/05/2024 à 18:43, Michel Jouvin a écrit :
Hi Juke, 

Thanks! Yes I am using k8s_fedora_coreos_v1 driver. I'll check with curl as you suggested. Does it make sense that a check (or check status) could be updated when another cluster is created or deleted? I hardly believed it but it seems there is some "correlation"... 

Best regards, 

Michel
Sent from my mobile

Le 31 mai 2024 18:29:59 Jake Yip <jake.yip@ardc.edu.au> a écrit :

On 1/6/2024 2:08 am, Michel Jouvin wrote:
Conversely to what I was saying initially, if creating or deleting a 
cluster seems to cause some update in the health state of other 
clusters, it doesn't seem to be the cause. I have seen that it is 
changing quite regularly on a test cloud with no activity and I'm really 
wondering what could be the cause for this? I don't seen anything in 
OpenStack config/logs to explain that. A network issue?

Michel

Le 31/05/2024 à 16:32, Michel Jouvin a écrit :
An aside question: what is running the health status check and is 
there a way to force it to run again?

Michel


Hi Jouvin,

You have no indicated what driver you may be using, is this the 
k8s_fedora_coreos_v1 driver?

If so, health checks are done in a period loop by the conductors. They 
need to be able to poll the /healthz endpoint of your kubernetes api 
server. You can check with curl -k https://<API_IP>:6443/healthz, where 
https://<API_IP>:6443 is the server in your kubeconfig.

- Jake