Re: Antelope Magnum: clusters flipping health status between HEALTHY and UNHEALTHY

31 May 2024

      Conversely to what I was saying initially, if creating or deleting a 
cluster seems to cause some update in the health state of other 
clusters, it doesn't seem to be the cause. I have seen that it is 
changing quite regularly on a test cloud with no activity and I'm really 
wondering what could be the cause for this? I don't seen anything in 
OpenStack config/logs to explain that. A network issue?

Michel

Le 31/05/2024 à 16:32, Michel Jouvin a écrit :
...
An aside question: what is running the health status check and is 
there a way to force it to run again?
Michel
Le 31/05/2024 à 16:05, Michel Jouvin a écrit :
...
Hi,
I have been playing with Magnum (Antelope), creating a lot of 
clusters with the same template and varying the number of masters 
and/or nodes, having ~10 clusters started in the same project. I 
observed recently that cluster configuration that used to work well 
were not working any more (CREATE_FAILED during the master deployment 
in general). I have not found any evidence of the cause digging in 
various logs but I observed that with the current list of active 
clusters, if add a new one (even with a configuration as minimal as 1 
master and 1 node), not only its creation fails but also the other 
(or at least) running clusters become unhealthy. If I delete the 
failed cluster, the other clusters return to the healthy state. It 
seems very reproducible (I did it more than 10 times) but I still 
don't see any message in the logs that could help to identify the cause.
I've the feeling (but may be wrong) that it is related to either an 
insufficient project quota or an insufficient limit on one of the 
OpenStack service. Any idea on a possible cause or any advice on 
where to look for some information?
Thanks in advance.
Michel