[Victoria][magnum][octavia]ingress-controller health degraded

4 Jun 2021

      Hi Everyone, we have the following problem that we are trying to 
identify the main cause:

Basically we have deployed an ingress and an ingress-controller (using 
the following deployment file 
https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v0.44....)

the ingress controller deployment is successful with 1 replica of the 
ingress-controller pod, the Octavia LoadBalancer is successfully created 
and points to the NodePorts being published on each node. This was 
showing only 1 member in the LoadBalancers screen as healthy/online.

I increased the replicas to 3, from the LoadBalancers screen in Horizon, 
I can see the service being reported as degraded and only the 
kubernetes-worker-nodes that have the ingress-controller pod/s deployed 
on are being reported as online, this behaviour is not the same as a 
standard deployment where the NodePort actually communicates with the 
ClusterIP:Port of the internal service and hence once there is a single 
pod UP the NodePorts are shown as up when queried:

ingress-nginx-controller-74fd5565fb-d86h9   1/1     Running 0          
14h     10.100.3.13 k8s-c1-prod-2-klctfd24lze6-node-1   <none> <none>
ingress-nginx-controller-74fd5565fb-h9985   1/1     Running 0          
15h     10.100.1.8 k8s-c1-prod-2-klctfd24lze6-node-0   <none> <none>
ingress-nginx-controller-74fd5565fb-qkddq   1/1     Running 0          
15h     10.100.1.7 k8s-c1-prod-2-klctfd24lze6-node-0   <none> <none>

The below shows the status of the members in the pool as replica3:

| 834750fe-e43e-408d-abc3-aad3dcde0fdb | member_0_node-0 | id | 
ACTIVE              | 192.168.1.75  |         32054 | ONLINE           
|      1 |
| 1ddffd80-acae-40b3-a2de-19be0a69a039 | member_0_node-2 | id | 
ACTIVE              | 192.168.1.90  |         32054 | ERROR            
|      1 |
| d4e4baa4-0a69-4775-8ea0-165a207f11ae | member_0_node-1| id | 
ACTIVE              | 192.168.1.148 |         32054 | ONLINE           
|      1 |

In fact to have the deployment spread across all 3 nodes, I had to 
increase the replicas until all 3 nodes had at least an instance of the 
ingress controller running on them (in this case it was replica 5).

I do not believe this as being an Octavia issue as the health check is 
being done via TCP port number which is the NodePort exposed by 
Kubernetes and if the ingress-controller is not running on that node the 
port check fails, I added the label octavia mainly to get some input 
that may confirm the correct behavior of Octavia

I am expecting to receive a healthy state when i check the members of 
the pool since I can query the ClusterIP from any worker node on ports 
80 and 443 and the outcome is always successful but not when using the 
NodePort

Thanks in advance

Luke Camilleri

tags

participants (1)