<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <p>Hi Eric,</p>

    <p>That issue looks familiar for me. There are some questions I'd

      like to check before answering if you should upgrade to train.</p>

    <p>1. Are using the default v3.2.7 version for etcd?</p>

    <p>2. Did you try to reproduce this with devstack, using Fedora

      CoreOS driver? The etcd version could be 3.2.26<br>

    </p>

    <p>I asked above questions because I saw the same error when I used

      Fedora Atomic with etcd v3.2.7 and I can't reproduce it with

      Fedora CoreOS + etcd 3.2.26</p>

    <p><br>

    </p>

    <p><br>

    </p>

    <div class="moz-cite-prefix">On 12/01/20 6:44 AM, Eric K. Miller

      wrote:<br>

    </div>

    <blockquote type="cite"

cite="mid:046E9C0290DD9149B106B72FC9156BEA0477170E@gmsxchsvr01.thecreation.com">

      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

      <meta name="Generator" content="Microsoft Word 14 (filtered

        medium)">

      <style><!--

/* Font Definitions */

@font-face

        {font-family:Calibri;

        panose-1:2 15 5 2 2 2 4 3 2 4;}

/* Style Definitions */

p.MsoNormal, li.MsoNormal, div.MsoNormal

        {margin:0in;

        margin-bottom:.0001pt;

        font-size:11.0pt;

        font-family:"Calibri","sans-serif";}

a:link, span.MsoHyperlink

        {mso-style-priority:99;

        color:blue;

        text-decoration:underline;}

a:visited, span.MsoHyperlinkFollowed

        {mso-style-priority:99;

        color:purple;

        text-decoration:underline;}

span.EmailStyle17

        {mso-style-type:personal-compose;

        font-family:"Calibri","sans-serif";

        color:windowtext;}

.MsoChpDefault

        {mso-style-type:export-only;

        font-family:"Calibri","sans-serif";}

@page WordSection1

        {size:8.5in 11.0in;

        margin:1.0in 1.0in 1.0in 1.0in;}

div.WordSection1

        {page:WordSection1;}

--></style><!--[if gte mso 9]><xml>

<o:shapedefaults v:ext="edit" spidmax="1026" />

</xml><![endif]--><!--[if gte mso 9]><xml>

<o:shapelayout v:ext="edit">

<o:idmap v:ext="edit" data="1" />

</o:shapelayout></xml><![endif]-->

      <div class="WordSection1">

        <p class="MsoNormal">Hi,<o:p></o:p></p>

        <p class="MsoNormal"><o:p> </o:p></p>

        <p class="MsoNormal">We are using the following coe cluster

          template and cluster create commands on an OpenStack Stein

          installation that installs Magnum 8.2.0 Kolla containers

          installed by Kolla-Ansible 8.0.1:<o:p></o:p></p>

        <p class="MsoNormal"><o:p> </o:p></p>

        <p class="MsoNormal">openstack coe cluster template create \<o:p></o:p></p>

        <p class="MsoNormal">  --image

          Fedora-AtomicHost-29-20191126.0.x86_64_raw \<o:p></o:p></p>

        <p class="MsoNormal">  --keypair userkey \<o:p></o:p></p>

        <p class="MsoNormal">  --external-network ext-net \<o:p></o:p></p>

        <p class="MsoNormal">  --dns-nameserver 1.1.1.1 \<o:p></o:p></p>

        <p class="MsoNormal">  --master-flavor c5sd.4xlarge \<o:p></o:p></p>

        <p class="MsoNormal">  --flavor m5sd.4xlarge \<o:p></o:p></p>

        <p class="MsoNormal">  --coe kubernetes \<o:p></o:p></p>

        <p class="MsoNormal">  --network-driver flannel \<o:p></o:p></p>

        <p class="MsoNormal">  --volume-driver cinder \<o:p></o:p></p>

        <p class="MsoNormal">  --docker-storage-driver overlay2 \<o:p></o:p></p>

        <p class="MsoNormal">  --docker-volume-size 100 \<o:p></o:p></p>

        <p class="MsoNormal">  --registry-enabled \<o:p></o:p></p>

        <p class="MsoNormal">  --master-lb-enabled \<o:p></o:p></p>

        <p class="MsoNormal">  --floating-ip-disabled \<o:p></o:p></p>

        <p class="MsoNormal">  --fixed-network

          KubernetesProjectNetwork001 \<o:p></o:p></p>

        <p class="MsoNormal">  --fixed-subnet KubernetesProjectSubnet001

          \<o:p></o:p></p>

        <p class="MsoNormal">  --labels

kube_tag=v1.15.7,cloud_provider_tag=v1.15.0,heat_container_agent_tag=stein-dev,master_lb_floating_ip_enabled=true

          \<o:p></o:p></p>

        <p class="MsoNormal"> 

          k8s-cluster-template-1.15.7-production-private<o:p></o:p></p>

        <p class="MsoNormal"><o:p> </o:p></p>

        <p class="MsoNormal">openstack coe cluster create \<o:p></o:p></p>

        <p class="MsoNormal">  --cluster-template

          k8s-cluster-template-1.15.7-production-private \<o:p></o:p></p>

        <p class="MsoNormal">  --keypair userkey \<o:p></o:p></p>

        <p class="MsoNormal">  --master-count 3 \<o:p></o:p></p>

        <p class="MsoNormal">  --node-count 3 \<o:p></o:p></p>

        <p class="MsoNormal">  k8s-cluster001<o:p></o:p></p>

        <p class="MsoNormal"><o:p> </o:p></p>

        <p class="MsoNormal">The deploy process works perfectly,

          however, the cluster health status flips between healthy and

          unhealthy.  The unhealthy status indicates that etcd has an

          issue.<o:p></o:p></p>

        <p class="MsoNormal"><o:p> </o:p></p>

        <p class="MsoNormal">When logged into master-0 (out of 3, as

          configured above), "systemctl status etcd" shows the stdout

          from etcd, which shows:<o:p></o:p></p>

        <p class="MsoNormal"><o:p> </o:p></p>

        <p class="MsoNormal">Jan 11 17:27:36

          k8s-cluster001-4effrc2irvjq-master-0.novalocal runc[2725]:

          2020-01-11 17:27:36.548453 W | etcdserver: timed out waiting

          for read index response<o:p></o:p></p>

        <p class="MsoNormal">Jan 11 17:28:02

          k8s-cluster001-4effrc2irvjq-master-0.novalocal runc[2725]:

          2020-01-11 17:28:02.960977 W | wal: sync duration of

          1.696804699s, expected less than 1s<o:p></o:p></p>

        <p class="MsoNormal">Jan 11 17:28:31

          k8s-cluster001-4effrc2irvjq-master-0.novalocal runc[2725]:

          2020-01-11 17:28:31.292753 W | wal: sync duration of

          2.249722223s, expected less than 1s<o:p></o:p></p>

        <p class="MsoNormal"><o:p> </o:p></p>

        <p class="MsoNormal">We also see:<o:p></o:p></p>

        <p class="MsoNormal">Jan 11 17:40:39

          k8s-cluster001-4effrc2irvjq-master-0.novalocal runc[2725]:

          2020-01-11 17:40:39.132459 I | etcdserver/api/v3rpc: grpc:

          Server.processUnaryRPC failed to write status: stream error:

          code = DeadlineExceeded desc = "context deadline exceeded"<o:p></o:p></p>

        <p class="MsoNormal"><o:p> </o:p></p>

        <p class="MsoNormal">We initially used relatively small flavors,

          but increased these to something very large to be sure

          resources were not constrained in any way.  "top" reported no

          CPU nor memory contention on any nodes in either case.<o:p></o:p></p>

        <p class="MsoNormal"><o:p> </o:p></p>

        <p class="MsoNormal">Multiple clusters have been deployed, and

          they all have this issue, including empty clusters that were

          just deployed.<o:p></o:p></p>

        <p class="MsoNormal"><o:p> </o:p></p>

        <p class="MsoNormal">I see a very large number of reports of

          similar issues with etcd, but discussions lead to disk

          performance, which can't be the cause here, not only because

          persistent storage for etcd isn't configured in Magnum, but

          also the disks are "very" fast in this environment.  Looking

          at "vmstat -D" from within master-0, the number of writes is

          minimal.  Ceilometer logs about 15 to 20 write IOPS for this

          VM in Gnocchi.<o:p></o:p></p>

        <p class="MsoNormal"><o:p> </o:p></p>

        <p class="MsoNormal">Any ideas?<o:p></o:p></p>

        <p class="MsoNormal"><o:p> </o:p></p>

        <p class="MsoNormal">We are finalizing procedures to upgrade to

          Train, so we wanted to be sure that we weren't running into

          some common issue with Stein that would immediately be solved

          with Train.  If so, we will simply proceed with the upgrade

          and avoid diagnosing this issue further.<o:p></o:p></p>

        <p class="MsoNormal"><br>

          Thanks!<o:p></o:p></p>

        <p class="MsoNormal"><o:p> </o:p></p>

        <p class="MsoNormal">Eric<o:p></o:p></p>

        <p class="MsoNormal"><o:p> </o:p></p>

      </div>

    </blockquote>

    <pre class="moz-signature" cols="72">-- 

Cheers & Best regards,

Feilong Wang (王飞龙)

Head of R&D

Catalyst Cloud - Cloud Native New Zealand

--------------------------------------------------------------------------

Tel: +64-48032246

Email: <a class="moz-txt-link-abbreviated" href="mailto:flwang@catalyst.net.nz">flwang@catalyst.net.nz</a>

Level 6, Catalyst House, 150 Willis Street, Wellington

-------------------------------------------------------------------------- </pre>

  </body>

</html>