<div dir="ltr"><div>Hi,</div><div><br></div><div>I have a new kolla-ansible deployment with wallaby.</div><div>I have created a kubernetes cluster using calico (flannel didn't work for me).</div><div><br></div><div>I configured an autoscale test to see if it works.</div><div>- pods autoscale is working.</div><div>- worker nodes autoscale is not working.</div><div><br></div><div>This is my deployment file :<b>cat php-apache.yaml</b></div><div>
<p>
<font face="Courier New, Courier, monospace">apiVersion: apps/v1<br>
kind: Deployment<br>
metadata:<br>
name: php-apache-deployment<br>
spec:<br>
selector:<br>
matchLabels:<br>
app: php-apache<br>
replicas: 2<br>
template:<br>
metadata:<br>
labels:<br>
app: php-apache<br>
spec:<br>
containers:<br>
- name: php-apache<br>
image: <a href="http://k8s.gcr.io/hpa-example" target="_blank">k8s.gcr.io/hpa-example</a><br>
ports:<br>
- containerPort: 80<br>
resources:<br>
limits:<br>
cpu: 500m<br>
requests:<br>
cpu: 200m<br>
---<br>
apiVersion: v1<br>
kind: Service<br>
metadata:<br>
name: php-apache-service<br>
labels:<br>
app: php-apache<br>
spec:<br>
ports:<br>
- port: 80<br>
targetPort: 80<br>
protocol: TCP<br>
selector:<br>
app: php-apache<br>
type: LoadBalancer</font></p>
<p><br>
</p>This is my HPA file :<b>cat php-apache-hpa.yaml</b><br><p><b>
</b><font face="Courier New, Courier, monospace">apiVersion:
autoscaling/v2beta2<br>
kind: HorizontalPodAutoscaler<br>
metadata:<br>
name: php-apache-hpa<br>
namespace: default<br>
labels:<br>
service: php-apache-service<br>
spec:<br>
minReplicas: 2<br>
maxReplicas: 30<br>
scaleTargetRef:<br>
apiVersion: apps/v1<br>
kind: Deployment<br>
name: php-apache-deployment<br>
metrics:<br>
- type: Resource<br>
resource:<br>
name: cpu<br>
target:<br>
type: Utilization<br>
averageUtilization: 30 # en pourcentage</font></p>
<p><br>
</p>
<p>This is my load program :</p>
<p><font face="Courier New, Courier, monospace">kubectl run -i --tty
load-generator-1 --rm --image=busybox --restart=Never -- /bin/sh
-c "while sleep 0.01; do wget -q -O- <a href="http://ip_load_balancer" target="_blank">http://ip_load_balancer</a>;
done"</font><br>
</p>
<p><br>
</p>
<p>Here are the output of my kub cluster before the test :</p>
<p><font face="Courier New, Courier, monospace">[kube8@cdndeployer
~]$ kubectl get pod<br>
NAME READY STATUS
RESTARTS AGE<br>
php-apache-deployment-5b65bbc75c-95k6k 1/1 Running
0 24m<br>
php-apache-deployment-5b65bbc75c-mv5h6 1/1 Running
0 24m</font></p>
<p><font face="Courier New, Courier, monospace">[kube8@cdndeployer
~]$ kubectl get hpa<br>
NAME REFERENCE TARGETS
MINPODS MAXPODS REPLICAS AGE<br>
php-apache-hpa Deployment/php-apache-deployment <b>0%/30%</b>
2 15 2 24m</font></p>
<p><font face="Courier New, Courier, monospace">[kube8@cdndeployer
~]$ kubectl get svc<br>
NAME TYPE CLUSTER-IP
EXTERNAL-IP PORT(S) AGE<br>
kubernetes ClusterIP 10.254.0.1
<none> 443/TCP 13h<br>
php-apache-service LoadBalancer 10.254.3.54 <b>xx.xx.xx.213</b>
80:31763/TCP 25m</font><br>
</p>
<p><br>
</p><p>When I apply the load :<br></p>
<p>- pods autoscale creates new pods, then some of them get in the state of : <b>pending<br>
</b></p>
<p><font face="Courier New, Courier, monospace">[kube8@cdndeployer
~]$ kubectl get hpa<br>
NAME REFERENCE TARGETS
MINPODS MAXPODS REPLICAS AGE<br>
php-apache-hpa Deployment/php-apache-deployment <b>155%/30%</b>
2 15 4 27m</font><b><br>
</b></p>
<p><font face="Courier New, Courier, monospace">[kube8@cdndeployer
~]$ kubectl get pod<br>
NAME READY STATUS
RESTARTS AGE<br>
load-generator-1 1/1 Running
0 97s<br>
load-generator-2 1/1 Running
0 94s<br>
php-apache-deployment-5b65bbc75c-95k6k 1/1 Running
0 28m<br>
<b>php-apache-deployment-5b65bbc75c-cjkwk 0/1 Pending
0 33s</b><b><br>
</b><b>php-apache-deployment-5b65bbc75c-cn5rt 0/1
Pending 0 33s</b><b><br>
</b><b>php-apache-deployment-5b65bbc75c-cxctx 0/1
Pending 0 48s</b><br>
php-apache-deployment-5b65bbc75c-fffnc 1/1 Running
0 64s<br>
php-apache-deployment-5b65bbc75c-hbfw8 0/1 Pending
0 33s<br>
php-apache-deployment-5b65bbc75c-l8496 1/1 Running
0 48s<br>
php-apache-deployment-5b65bbc75c-mv5h6 1/1 Running
0 28m<br>
php-apache-deployment-5b65bbc75c-qddrb 1/1 Running
0 48s<br>
php-apache-deployment-5b65bbc75c-</font><font face="Courier New,
Courier, monospace"><font face="Courier New, Courier, monospace">dd5r5</font>
0/1 Pending 0 48s<br>
php-apache-deployment-5b65bbc75c-tr65j 1/1 Running
0 64s</font><b><br>
</b></p>
<p>2 - The cluster is unable to create more pods/workers and I get this error message from the pending pods <br></p>
<p><font face="Courier New, Courier, monospace">kubectl describe pod
php-apache-deployment-5b65bbc75c-dd5r5<br>
Name: php-apache-deployment-5b65bbc75c-dd5r5<br>
Namespace: default<br>
Priority: 0<br>
Node: <none><br>
Labels: app=php-apache<br>
pod-template-hash=5b65bbc75c<br>
Annotations: <a href="http://kubernetes.io/psp" target="_blank">kubernetes.io/psp</a>: magnum.privileged<br>
<b>Status: Pending</b><br>
IP:<br>
IPs: <none><br>
Controlled By: ReplicaSet/php-apache-deployment-5b65bbc75c<br>
Containers:<br>
php-apache:<br>
Image: <a href="http://k8s.gcr.io/hpa-example" target="_blank">k8s.gcr.io/hpa-example</a><br>
Port: 80/TCP<br>
Host Port: 0/TCP<br>
Limits:<br>
cpu: 500m<br>
Requests:<br>
cpu: 200m<br>
Environment: <none><br>
Mounts:<br>
/var/run/secrets/<a href="http://kubernetes.io/serviceaccount" target="_blank">kubernetes.io/serviceaccount</a> from
default-token-4fsgh (ro)<br>
Conditions:<br>
Type Status<br>
PodScheduled False<br>
Volumes:<br>
default-token-4fsgh:<br>
Type: Secret (a volume populated by a Secret)<br>
SecretName: default-token-4fsgh<br>
Optional: false<br>
QoS Class: Burstable<br>
Node-Selectors: <none><br>
Tolerations: <a href="http://node.kubernetes.io/not-ready:NoExecute" target="_blank">node.kubernetes.io/not-ready:NoExecute</a> for 300s<br>
<a href="http://node.kubernetes.io/unreachable:NoExecute" target="_blank">node.kubernetes.io/unreachable:NoExecute</a> for
300s<br>
Events:<br>
Type Reason Age From Message<br>
---- ------ ---- ---- -------<br>
<b> Warning FailedScheduling 2m48s default-scheduler 0/4
nodes are available: 2 Insufficient cpu, 2 node(s) had taint
{<a href="http://node-role.kubernetes.io/master" target="_blank">node-role.kubernetes.io/master</a>: }, that the pod didn't
tolerate.</b><b><br>
</b><b> Warning FailedScheduling 2m48s default-scheduler
0/4 nodes are available: 2 Insufficient cpu, 2 node(s) had
taint {<a href="http://node-role.kubernetes.io/master" target="_blank">node-role.kubernetes.io/master</a>: }, that the pod didn't
tolerate.</b><b><br>
</b><b> Normal NotTriggerScaleUp 2m42s cluster-autoscaler
pod didn't trigger scale-up (it wouldn't fit if a new node is
added): 1 in backoff after failed scale-u</b><b>p</b></font><br>
</p>
<p><br>
</p>
<p><br>
</p>
<p>I have this error message from the autoscaller pod <b>cluster-autoscaler</b>-f4bd5f674-b9692 :</p>
<p><font face="Courier New, Courier, monospace">I1123
00:50:27.714801 1 node_instances_cache.go:168] Refresh
cloud provider node instances cache finished, refresh took
12.709µs<br>
I1123 00:51:34.181145 1 scale_up.go:658] Scale-up: setting
group default-worker size to 3<br>
<b>W1123 00:51:34.381953 1 clusterstate.go:281] Disabling
scale-up for node group default-worker until 2021-11-23
00:56:34.180840351 +0000 UTC m=+47174.376164120;
errorClass=Other; errorCode=cloudProviderError</b><b><br>
</b><b>E1123 00:51:34.382081 1 static_autoscaler.go:415]
Failed to scale up: failed to increase node group size: could
not check current nodegroup size: could not get cluster: Get
<a href="https://dash.cdn.cerist.dz:9511/v1/clusters/b4a6b3eb-fcf3-416f-b740-11a083d4b896" target="_blank">https://dash.cdn.domaine.tld:9511/v1/clusters/b4a6b3eb-fcf3-416f-b740-11a083d4b896</a>:
dial tcp: lookup <a href="http://dash.cdn.cerist.dz" target="_blank">dash.cdn.domaine.</a>tld on <a href="http://10.254.0.10:53" target="_blank">10.254.0.10:53</a>: no such
host</b><br>
W1123 00:51:44.392523 1 scale_up.go:383] Node group
default-worker is not ready for scaleup - backoff<br>
W1123 00:51:54.410273 1 scale_up.go:383] Node group
default-worker is not ready for scaleup - backoff<br>
W1123 00:52:04.422128 1 scale_up.go:383] Node group
default-worker is not ready for scaleup - backoff<br>
W1123 00:52:14.434278 1 scale_up.go:383] Node group
default-worker is not ready for scaleup - backoff<br>
W1123 00:52:24.442480 1 scale_up.go:383] Node group
default-worker is not ready for scaleup - backoff<br>
I1123 00:52:27.715019 1 node_instances_cache.go:156] Start
refreshing cloud provider node instances cache</font><br>
</p>
<p>I did some tests on the DNS pod and :</p>
<p>kubectl get svc -A<br>
NAMESPACE NAME TYPE
CLUSTER-IP EXTERNAL-IP PORT(S) AGE<br>
default kubernetes ClusterIP
10.254.0.1 <none> 443/TCP
13h<br>
default php-apache-service LoadBalancer
10.254.3.54 xx.xx.xx.213 80:31763/TCP 19m<br>
kube-system dashboard-metrics-scraper ClusterIP
10.254.19.191 <none> 8000/TCP
13h<br>
<b>kube-system kube-dns ClusterIP
10.254.0.10 <none> 53/UDP,53/TCP,9153/TCP
13h</b><br>
kube-system kubernetes-dashboard ClusterIP
10.254.132.17 <none> 443/TCP
13h<br>
kube-system magnum-metrics-server ClusterIP
10.254.235.147 <none> 443/TCP
13h</p>
<p><br>
</p>
<p>I have noticed this behaviour about the horizon url, sometimes the dns pod responds sometimes it does not !!!!! <br></p>
<p><font face="Courier New, Courier, monospace">[root@k8multiclustercalico-ve5t6uuoo245-master-0
~]# <b>dig @<a href="http://10.254.0.10" target="_blank">10.254.0.10</a> <a href="http://dash.cdn.cerist.dz" target="_blank">dash.cdn.</a>domaine.tld</b><br>
<br>
; <<>> DiG 9.11.28-RedHat-9.11.28-1.fc33
<<>> @<a href="http://10.254.0.10" target="_blank">10.254.0.10</a> <a href="http://dash.cdn.cerist.dz" target="_blank">dash.cdn.</a>domaine.tld<br>
; (1 server found)<br>
;; global options: +cmd<br>
;; Got answer:<br>
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id:
5646<br>
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1,
ADDITIONAL: 1<br>
<br>
;; OPT PSEUDOSECTION:<br>
; EDNS: version: 0, flags:; udp: 4096<br>
;; QUESTION SECTION:<br>
;<a href="http://dash.cdn.cerist.dz" target="_blank">dash.cd</a>n.domaine.tld. IN A<br>
<br>
;; AUTHORITY SECTION:<br>
<b><a href="http://cdn.cerist.dz" target="_blank">cdn.</a>domaine.tld. 30 IN SOA
<a href="http://cdn.cerist.dz" target="_blank">cdn.</a>domaine.tld. <a href="http://root.cdn.cerist.dz" target="_blank">root.cdn.</a>domaine.tld. 2021100900 604800 86400
2419200 604800</b><br>
<br>
;; Query time: 84 msec<br>
;; SERVER: 10.254.0.10#53(10.254.0.10)<br>
;; WHEN: Tue Nov 23 01:08:03 UTC 2021<br>
;; MSG SIZE rcvd: 12</font></p>
<p><br>
</p>
<p>2 secondes later<br></p>
<p><font face="Courier New, Courier, monospace">[root@k8multiclustercalico-ve5t6uuoo245-master-0
~]# <b>dig @<a href="http://10.254.0.10" target="_blank">10.254.0.10</a> <a href="http://dash.cdn.cerist.dz" target="_blank">dash.cdn.</a>domaine.tld</b><br>
<br>
; <<>> DiG 9.11.28-RedHat-9.11.28-1.fc33
<<>> @<a href="http://10.254.0.10" target="_blank">10.254.0.10</a> <a href="http://dash.cdn.cerist.dz" target="_blank">dash.cdn.</a>domaine.tld<br>
; (1 server found)<br>
;; global options: +cmd<br>
;; Got answer:<br>
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id:
7653<br>
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0,
ADDITIONAL: 1<br>
<br>
;; OPT PSEUDOSECTION:<br>
; EDNS: version: 0, flags:; udp: 4096<br>
;; QUESTION SECTION:<br>
;<a href="http://dash.cdn.cerist.dz" target="_blank">dash.cdn.</a>domaine.tld. IN A<br>
<br>
;; ANSWER SECTION:<br>
<b><a href="http://dash.cdn.cerist.dz" target="_blank">dash.cdn.</a>domaine.tld. 30 IN A xx.xx.xx.129</b><br>
<br>
;; Query time: 2 msec<br>
;; SERVER: 10.254.0.10#53(10.254.0.10)<br>
;; WHEN: Tue Nov 23 01:08:21 UTC 2021<br>
;; MSG SIZE rcvd: 81</font><br>
</p>
<p><br>
</p>
<p>In the log of the dns pod I have this<br></p>
<p><font face="Courier New, Courier, monospace"> kubectl logs <b>kube-dns-autoscaler</b>-75859754fd-q8z4w
-n kube-system<br>
</font></p>
<p><b><font face="Courier New, Courier, monospace">E1122
20:56:09.944449 1 autoscaler_server.go:120] Update
failure: the server could not find the requested resource<br>
E1122 20:56:19.945294 1 autoscaler_server.go:120] Update
failure: the server could not find the requested resource<br>
E1122 20:56:29.944245 1 autoscaler_server.go:120] Update
failure: the server could not find the requested resource<br>
E1122 20:56:39.946346 1 autoscaler_server.go:120] Update
failure: the server could not find the requested resource</font></b><b><br>
</b><font face="Courier New, Courier, monospace"><b>E1122
20:56:49.944693 1 autoscaler_server.go:120] Update
failure: the server could not find the requested resource</b></font></p>
</div><div><br></div><div>I don't have experience on kubernetes yet, could someone help me debug this?<br></div><div><br></div><div>Regards.<br></div></div>