<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<p>Hi Eric,</p>
<p>That issue looks familiar for me. There are some questions I'd
like to check before answering if you should upgrade to train.</p>
<p>1. Are using the default v3.2.7 version for etcd?</p>
<p>2. Did you try to reproduce this with devstack, using Fedora
CoreOS driver? The etcd version could be 3.2.26<br>
</p>
<p>I asked above questions because I saw the same error when I used
Fedora Atomic with etcd v3.2.7 and I can't reproduce it with
Fedora CoreOS + etcd 3.2.26</p>
<p><br>
</p>
<p><br>
</p>
<div class="moz-cite-prefix">On 12/01/20 6:44 AM, Eric K. Miller
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:046E9C0290DD9149B106B72FC9156BEA0477170E@gmsxchsvr01.thecreation.com">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta name="Generator" content="Microsoft Word 14 (filtered
medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri","sans-serif";}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
span.EmailStyle17
{mso-style-type:personal-compose;
font-family:"Calibri","sans-serif";
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-family:"Calibri","sans-serif";}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
<div class="WordSection1">
<p class="MsoNormal">Hi,<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">We are using the following coe cluster
template and cluster create commands on an OpenStack Stein
installation that installs Magnum 8.2.0 Kolla containers
installed by Kolla-Ansible 8.0.1:<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">openstack coe cluster template create \<o:p></o:p></p>
<p class="MsoNormal"> --image
Fedora-AtomicHost-29-20191126.0.x86_64_raw \<o:p></o:p></p>
<p class="MsoNormal"> --keypair userkey \<o:p></o:p></p>
<p class="MsoNormal"> --external-network ext-net \<o:p></o:p></p>
<p class="MsoNormal"> --dns-nameserver 1.1.1.1 \<o:p></o:p></p>
<p class="MsoNormal"> --master-flavor c5sd.4xlarge \<o:p></o:p></p>
<p class="MsoNormal"> --flavor m5sd.4xlarge \<o:p></o:p></p>
<p class="MsoNormal"> --coe kubernetes \<o:p></o:p></p>
<p class="MsoNormal"> --network-driver flannel \<o:p></o:p></p>
<p class="MsoNormal"> --volume-driver cinder \<o:p></o:p></p>
<p class="MsoNormal"> --docker-storage-driver overlay2 \<o:p></o:p></p>
<p class="MsoNormal"> --docker-volume-size 100 \<o:p></o:p></p>
<p class="MsoNormal"> --registry-enabled \<o:p></o:p></p>
<p class="MsoNormal"> --master-lb-enabled \<o:p></o:p></p>
<p class="MsoNormal"> --floating-ip-disabled \<o:p></o:p></p>
<p class="MsoNormal"> --fixed-network
KubernetesProjectNetwork001 \<o:p></o:p></p>
<p class="MsoNormal"> --fixed-subnet KubernetesProjectSubnet001
\<o:p></o:p></p>
<p class="MsoNormal"> --labels
kube_tag=v1.15.7,cloud_provider_tag=v1.15.0,heat_container_agent_tag=stein-dev,master_lb_floating_ip_enabled=true
\<o:p></o:p></p>
<p class="MsoNormal">
k8s-cluster-template-1.15.7-production-private<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">openstack coe cluster create \<o:p></o:p></p>
<p class="MsoNormal"> --cluster-template
k8s-cluster-template-1.15.7-production-private \<o:p></o:p></p>
<p class="MsoNormal"> --keypair userkey \<o:p></o:p></p>
<p class="MsoNormal"> --master-count 3 \<o:p></o:p></p>
<p class="MsoNormal"> --node-count 3 \<o:p></o:p></p>
<p class="MsoNormal"> k8s-cluster001<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">The deploy process works perfectly,
however, the cluster health status flips between healthy and
unhealthy. The unhealthy status indicates that etcd has an
issue.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">When logged into master-0 (out of 3, as
configured above), "systemctl status etcd" shows the stdout
from etcd, which shows:<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Jan 11 17:27:36
k8s-cluster001-4effrc2irvjq-master-0.novalocal runc[2725]:
2020-01-11 17:27:36.548453 W | etcdserver: timed out waiting
for read index response<o:p></o:p></p>
<p class="MsoNormal">Jan 11 17:28:02
k8s-cluster001-4effrc2irvjq-master-0.novalocal runc[2725]:
2020-01-11 17:28:02.960977 W | wal: sync duration of
1.696804699s, expected less than 1s<o:p></o:p></p>
<p class="MsoNormal">Jan 11 17:28:31
k8s-cluster001-4effrc2irvjq-master-0.novalocal runc[2725]:
2020-01-11 17:28:31.292753 W | wal: sync duration of
2.249722223s, expected less than 1s<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">We also see:<o:p></o:p></p>
<p class="MsoNormal">Jan 11 17:40:39
k8s-cluster001-4effrc2irvjq-master-0.novalocal runc[2725]:
2020-01-11 17:40:39.132459 I | etcdserver/api/v3rpc: grpc:
Server.processUnaryRPC failed to write status: stream error:
code = DeadlineExceeded desc = "context deadline exceeded"<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">We initially used relatively small flavors,
but increased these to something very large to be sure
resources were not constrained in any way. "top" reported no
CPU nor memory contention on any nodes in either case.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Multiple clusters have been deployed, and
they all have this issue, including empty clusters that were
just deployed.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">I see a very large number of reports of
similar issues with etcd, but discussions lead to disk
performance, which can't be the cause here, not only because
persistent storage for etcd isn't configured in Magnum, but
also the disks are "very" fast in this environment. Looking
at "vmstat -D" from within master-0, the number of writes is
minimal. Ceilometer logs about 15 to 20 write IOPS for this
VM in Gnocchi.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Any ideas?<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">We are finalizing procedures to upgrade to
Train, so we wanted to be sure that we weren't running into
some common issue with Stein that would immediately be solved
with Train. If so, we will simply proceed with the upgrade
and avoid diagnosing this issue further.<o:p></o:p></p>
<p class="MsoNormal"><br>
Thanks!<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Eric<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
</blockquote>
<pre class="moz-signature" cols="72">--
Cheers & Best regards,
Feilong Wang (王飞龙)
Head of R&D
Catalyst Cloud - Cloud Native New Zealand
--------------------------------------------------------------------------
Tel: +64-48032246
Email: <a class="moz-txt-link-abbreviated" href="mailto:flwang@catalyst.net.nz">flwang@catalyst.net.nz</a>
Level 6, Catalyst House, 150 Willis Street, Wellington
-------------------------------------------------------------------------- </pre>
</body>
</html>