Open Stack

Fri Jun 4 15:16:18 UTC 2021

Hi Everyone, we have the following problem that we are trying to 
identify the main cause:

Basically we have deployed an ingress and an ingress-controller (using 
the following deployment file 
https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v0.44.0/deploy/static/provider/cloud/deploy.yaml)

the ingress controller deployment is successful with 1 replica of the 
ingress-controller pod, the Octavia LoadBalancer is successfully created 
and points to the NodePorts being published on each node. This was 
showing only 1 member in the LoadBalancers screen as healthy/online.

I increased the replicas to 3, from the LoadBalancers screen in Horizon, 
I can see the service being reported as degraded and only the 
kubernetes-worker-nodes that have the ingress-controller pod/s deployed 
on are being reported as online, this behaviour is not the same as a 
standard deployment where the NodePort actually communicates with the 
ClusterIP:Port of the internal service and hence once there is a single 
pod UP the NodePorts are shown as up when queried:

ingress-nginx-controller-74fd5565fb-d86h9   1/1     Running 0          
14h     10.100.3.13 k8s-c1-prod-2-klctfd24lze6-node-1   <none> <none>
ingress-nginx-controller-74fd5565fb-h9985   1/1     Running 0          
15h     10.100.1.8 k8s-c1-prod-2-klctfd24lze6-node-0   <none> <none>
ingress-nginx-controller-74fd5565fb-qkddq   1/1     Running 0          
15h     10.100.1.7 k8s-c1-prod-2-klctfd24lze6-node-0   <none> <none>

The below shows the status of the members in the pool as replica3:

| 834750fe-e43e-408d-abc3-aad3dcde0fdb | member_0_node-0 | id | 
ACTIVE              | 192.168.1.75  |         32054 | ONLINE           
|      1 |
| 1ddffd80-acae-40b3-a2de-19be0a69a039 | member_0_node-2 | id | 
ACTIVE              | 192.168.1.90  |         32054 | ERROR            
|      1 |
| d4e4baa4-0a69-4775-8ea0-165a207f11ae | member_0_node-1| id | 
ACTIVE              | 192.168.1.148 |         32054 | ONLINE           
|      1 |

In fact to have the deployment spread across all 3 nodes, I had to 
increase the replicas until all 3 nodes had at least an instance of the 
ingress controller running on them (in this case it was replica 5).

I do not believe this as being an Octavia issue as the health check is 
being done via TCP port number which is the NodePort exposed by 
Kubernetes and if the ingress-controller is not running on that node the 
port check fails, I added the label octavia mainly to get some input 
that may confirm the correct behavior of Octavia

I am expecting to receive a healthy state when i check the members of 
the pool since I can query the ClusterIP from any worker node on ports 
80 and 443 and the outcome is always successful but not when using the 
NodePort

Thanks in advance

Open Stack

[Victoria][magnum][octavia]ingress-controller health degraded

OpenStack

Community

Documentation

Branding & Legal