apiVersion: apps/v1
kind: Deployment
metadata:
name: php-apache-deployment
spec:
selector:
matchLabels:
app: php-apache
replicas: 2
template:
metadata:
labels:
app: php-apache
spec:
containers:
- name: php-apache
image: k8s.gcr.io/hpa-example
ports:
- containerPort: 80
resources:
limits:
cpu: 500m
requests:
cpu: 200m
---
apiVersion: v1
kind: Service
metadata:
name: php-apache-service
labels:
app: php-apache
spec:
ports:
- port: 80
targetPort: 80
protocol: TCP
selector:
app: php-apache
type: LoadBalancer
apiVersion:
autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: php-apache-hpa
namespace: default
labels:
service: php-apache-service
spec:
minReplicas: 2
maxReplicas: 30
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: php-apache-deployment
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 30 # en pourcentage
This is my load program :
kubectl run -i --tty
load-generator-1 --rm --image=busybox --restart=Never -- /bin/sh
-c "while sleep 0.01; do wget -q -O- http://ip_load_balancer;
done"
Here are the output of my kub cluster before the test :
[kube8@cdndeployer
~]$ kubectl get pod
NAME READY STATUS
RESTARTS AGE
php-apache-deployment-5b65bbc75c-95k6k 1/1 Running
0 24m
php-apache-deployment-5b65bbc75c-mv5h6 1/1 Running
0 24m
[kube8@cdndeployer
~]$ kubectl get hpa
NAME REFERENCE TARGETS
MINPODS MAXPODS REPLICAS AGE
php-apache-hpa Deployment/php-apache-deployment 0%/30%
2 15 2 24m
[kube8@cdndeployer
~]$ kubectl get svc
NAME TYPE CLUSTER-IP
EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.254.0.1
<none> 443/TCP 13h
php-apache-service LoadBalancer 10.254.3.54 xx.xx.xx.213
80:31763/TCP 25m
When I apply the load :
- pods autoscale creates new pods, then some of them get in the state of : pending
[kube8@cdndeployer
~]$ kubectl get hpa
NAME REFERENCE TARGETS
MINPODS MAXPODS REPLICAS AGE
php-apache-hpa Deployment/php-apache-deployment 155%/30%
2 15 4 27m
[kube8@cdndeployer
~]$ kubectl get pod
NAME READY STATUS
RESTARTS AGE
load-generator-1 1/1 Running
0 97s
load-generator-2 1/1 Running
0 94s
php-apache-deployment-5b65bbc75c-95k6k 1/1 Running
0 28m
php-apache-deployment-5b65bbc75c-cjkwk 0/1 Pending
0 33s
php-apache-deployment-5b65bbc75c-cn5rt 0/1
Pending 0 33s
php-apache-deployment-5b65bbc75c-cxctx 0/1
Pending 0 48s
php-apache-deployment-5b65bbc75c-fffnc 1/1 Running
0 64s
php-apache-deployment-5b65bbc75c-hbfw8 0/1 Pending
0 33s
php-apache-deployment-5b65bbc75c-l8496 1/1 Running
0 48s
php-apache-deployment-5b65bbc75c-mv5h6 1/1 Running
0 28m
php-apache-deployment-5b65bbc75c-qddrb 1/1 Running
0 48s
php-apache-deployment-5b65bbc75c-dd5r5
0/1 Pending 0 48s
php-apache-deployment-5b65bbc75c-tr65j 1/1 Running
0 64s
2 - The cluster is unable to create more pods/workers and I get this error message from the pending pods
kubectl describe pod
php-apache-deployment-5b65bbc75c-dd5r5
Name: php-apache-deployment-5b65bbc75c-dd5r5
Namespace: default
Priority: 0
Node: <none>
Labels: app=php-apache
pod-template-hash=5b65bbc75c
Annotations: kubernetes.io/psp: magnum.privileged
Status: Pending
IP:
IPs: <none>
Controlled By: ReplicaSet/php-apache-deployment-5b65bbc75c
Containers:
php-apache:
Image: k8s.gcr.io/hpa-example
Port: 80/TCP
Host Port: 0/TCP
Limits:
cpu: 500m
Requests:
cpu: 200m
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from
default-token-4fsgh (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
default-token-4fsgh:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-4fsgh
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for
300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 2m48s default-scheduler 0/4
nodes are available: 2 Insufficient cpu, 2 node(s) had taint
{node-role.kubernetes.io/master: }, that the pod didn't
tolerate.
Warning FailedScheduling 2m48s default-scheduler
0/4 nodes are available: 2 Insufficient cpu, 2 node(s) had
taint {node-role.kubernetes.io/master: }, that the pod didn't
tolerate.
Normal NotTriggerScaleUp 2m42s cluster-autoscaler
pod didn't trigger scale-up (it wouldn't fit if a new node is
added): 1 in backoff after failed scale-up
I have this error message from the autoscaller pod cluster-autoscaler-f4bd5f674-b9692 :
I1123
00:50:27.714801 1 node_instances_cache.go:168] Refresh
cloud provider node instances cache finished, refresh took
12.709µs
I1123 00:51:34.181145 1 scale_up.go:658] Scale-up: setting
group default-worker size to 3
W1123 00:51:34.381953 1 clusterstate.go:281] Disabling
scale-up for node group default-worker until 2021-11-23
00:56:34.180840351 +0000 UTC m=+47174.376164120;
errorClass=Other; errorCode=cloudProviderError
E1123 00:51:34.382081 1 static_autoscaler.go:415]
Failed to scale up: failed to increase node group size: could
not check current nodegroup size: could not get cluster: Get
https://dash.cdn.domaine.tld:9511/v1/clusters/b4a6b3eb-fcf3-416f-b740-11a083d4b896:
dial tcp: lookup dash.cdn.domaine.tld on 10.254.0.10:53: no such
host
W1123 00:51:44.392523 1 scale_up.go:383] Node group
default-worker is not ready for scaleup - backoff
W1123 00:51:54.410273 1 scale_up.go:383] Node group
default-worker is not ready for scaleup - backoff
W1123 00:52:04.422128 1 scale_up.go:383] Node group
default-worker is not ready for scaleup - backoff
W1123 00:52:14.434278 1 scale_up.go:383] Node group
default-worker is not ready for scaleup - backoff
W1123 00:52:24.442480 1 scale_up.go:383] Node group
default-worker is not ready for scaleup - backoff
I1123 00:52:27.715019 1 node_instances_cache.go:156] Start
refreshing cloud provider node instances cache
I did some tests on the DNS pod and :
kubectl get svc -A
NAMESPACE NAME TYPE
CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default kubernetes ClusterIP
10.254.0.1 <none> 443/TCP
13h
default php-apache-service LoadBalancer
10.254.3.54 xx.xx.xx.213 80:31763/TCP 19m
kube-system dashboard-metrics-scraper ClusterIP
10.254.19.191 <none> 8000/TCP
13h
kube-system kube-dns ClusterIP
10.254.0.10 <none> 53/UDP,53/TCP,9153/TCP
13h
kube-system kubernetes-dashboard ClusterIP
10.254.132.17 <none> 443/TCP
13h
kube-system magnum-metrics-server ClusterIP
10.254.235.147 <none> 443/TCP
13h
I have noticed this behaviour about the horizon url, sometimes the dns pod responds sometimes it does not !!!!!
[root@k8multiclustercalico-ve5t6uuoo245-master-0
~]# dig @10.254.0.10 dash.cdn.domaine.tld
; <<>> DiG 9.11.28-RedHat-9.11.28-1.fc33
<<>> @10.254.0.10 dash.cdn.domaine.tld
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id:
5646
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1,
ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;dash.cdn.domaine.tld. IN A
;; AUTHORITY SECTION:
cdn.domaine.tld. 30 IN SOA
cdn.domaine.tld. root.cdn.domaine.tld. 2021100900 604800 86400
2419200 604800
;; Query time: 84 msec
;; SERVER: 10.254.0.10#53(10.254.0.10)
;; WHEN: Tue Nov 23 01:08:03 UTC 2021
;; MSG SIZE rcvd: 12
2 secondes later
[root@k8multiclustercalico-ve5t6uuoo245-master-0
~]# dig @10.254.0.10 dash.cdn.domaine.tld
; <<>> DiG 9.11.28-RedHat-9.11.28-1.fc33
<<>> @10.254.0.10 dash.cdn.domaine.tld
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id:
7653
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0,
ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;dash.cdn.domaine.tld. IN A
;; ANSWER SECTION:
dash.cdn.domaine.tld. 30 IN A xx.xx.xx.129
;; Query time: 2 msec
;; SERVER: 10.254.0.10#53(10.254.0.10)
;; WHEN: Tue Nov 23 01:08:21 UTC 2021
;; MSG SIZE rcvd: 81
In the log of the dns pod I have this
kubectl logs kube-dns-autoscaler-75859754fd-q8z4w
-n kube-system
E1122
20:56:09.944449 1 autoscaler_server.go:120] Update
failure: the server could not find the requested resource
E1122 20:56:19.945294 1 autoscaler_server.go:120] Update
failure: the server could not find the requested resource
E1122 20:56:29.944245 1 autoscaler_server.go:120] Update
failure: the server could not find the requested resource
E1122 20:56:39.946346 1 autoscaler_server.go:120] Update
failure: the server could not find the requested resource
E1122
20:56:49.944693 1 autoscaler_server.go:120] Update
failure: the server could not find the requested resource