<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<p>Hi Eric,</p>
<p>Thanks for sharing the article. As for the etcd volumes, you can
disable it by without setting the etcd_volume_size label. Just
FYI.<br>
</p>
<p><br>
</p>
<div class="moz-cite-prefix">On 17/01/20 6:00 AM, Eric K. Miller
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:046E9C0290DD9149B106B72FC9156BEA04771749@gmsxchsvr01.thecreation.com">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta name="Generator" content="Microsoft Word 14 (filtered
medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:"MS Gothic";
panose-1:2 11 6 9 7 2 5 8 2 4;}
@font-face
{font-family:MingLiU;
panose-1:2 2 5 9 0 0 0 0 0 0;}
@font-face
{font-family:MingLiU;
panose-1:2 2 5 9 0 0 0 0 0 0;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Tahoma;
panose-1:2 11 6 4 3 5 4 4 2 4;}
@font-face
{font-family:Consolas;
panose-1:2 11 6 9 2 2 4 3 2 4;}
@font-face
{font-family:"\@MS Gothic";
panose-1:2 11 6 9 7 2 5 8 2 4;}
@font-face
{font-family:"\@MingLiU";
panose-1:2 2 5 9 0 0 0 0 0 0;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Times New Roman","serif";
color:black;}
h2
{mso-style-priority:9;
mso-style-link:"Heading 2 Char";
mso-margin-top-alt:auto;
margin-right:0in;
mso-margin-bottom-alt:auto;
margin-left:0in;
font-size:18.0pt;
font-family:"Times New Roman","serif";
color:windowtext;
font-weight:bold;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
p.MsoPlainText, li.MsoPlainText, div.MsoPlainText
{mso-style-priority:99;
mso-style-link:"Plain Text Char";
margin:0in;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri","sans-serif";
color:windowtext;}
p
{mso-style-priority:99;
mso-margin-top-alt:auto;
margin-right:0in;
mso-margin-bottom-alt:auto;
margin-left:0in;
font-size:12.0pt;
font-family:"Times New Roman","serif";
color:black;}
pre
{mso-style-priority:99;
mso-style-link:"HTML Preformatted Char";
margin:0in;
margin-bottom:.0001pt;
font-size:10.0pt;
font-family:"Courier New";
color:black;}
span.HTMLPreformattedChar
{mso-style-name:"HTML Preformatted Char";
mso-style-priority:99;
mso-style-link:"HTML Preformatted";
font-family:Consolas;
color:black;}
span.EmailStyle20
{mso-style-type:personal-reply;
font-family:"Calibri","sans-serif";
color:#1F497D;}
span.PlainTextChar
{mso-style-name:"Plain Text Char";
mso-style-priority:99;
mso-style-link:"Plain Text";
font-family:"Calibri","sans-serif";}
span.Heading2Char
{mso-style-name:"Heading 2 Char";
mso-style-priority:9;
mso-style-link:"Heading 2";
font-weight:bold;}
span.mw-headline
{mso-style-name:mw-headline;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
<div class="WordSection1">
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Hi
Feilong,<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Before
I was able to use the benchmark tool you mentioned, we saw
some other slowdowns with Ceph (all flash). It appears that
something must have crashed somewhere since we had to
restart a couple things, after which etcd has been
performing fine and no more health issues being reported by
Magnum.<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">So,
it looks like it wasn't etcd related afterall.<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">However,
while researching, I found that etcd's fsync on every write
(so it guarantees a write cache flush for each write)
apparently creates some havoc with some SSDs, where the SSD
performs a full cache flush of multiple caches. This
article explains it a LOT better: <a
href="https://yourcmc.ru/wiki/Ceph_performance"
moz-do-not-send="true">https://yourcmc.ru/wiki/Ceph_performance</a>
(scroll to the "Drive cache is slowing you down" section)<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">It
seems that the optimal configuration for etcd would be to
use local drives in each node and be sure that the write
cache is disabled in the SSDs - as opposed to using Ceph
volumes, which already adds network latency, but can create
even more latency for synchronizations due to Ceph's
replication.<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Eric<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<div style="border:none;border-left:solid blue 1.5pt;padding:0in
0in 0in 4.0pt">
<div>
<div style="border:none;border-top:solid #B5C4DF
1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b><span
style="font-size:10.0pt;font-family:"Tahoma","sans-serif";color:windowtext">From:</span></b><span
style="font-size:10.0pt;font-family:"Tahoma","sans-serif";color:windowtext">
feilong [<a class="moz-txt-link-freetext" href="mailto:feilong@catalyst.net.nz">mailto:feilong@catalyst.net.nz</a>] <br>
<b>Sent:</b> Wednesday, January 15, 2020 2:36 PM<br>
<b>To:</b> Eric K. Miller;
<a class="moz-txt-link-abbreviated" href="mailto:openstack-discuss@lists.openstack.org">openstack-discuss@lists.openstack.org</a><br>
<b>Cc:</b> Spyros Trigazis<br>
<b>Subject:</b> Re: [magnum][kolla] etcd wal sync
duration issue<o:p></o:p></span></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<p>Hi Eric,<o:p></o:p></p>
<p>If you're using SSD, then I think the IO performance
should be OK. You can use this <a
href="https://github.com/etcd-io/etcd/tree/master/tools/benchmark"
moz-do-not-send="true">https://github.com/etcd-io/etcd/tree/master/tools/benchmark</a>
to verify and confirm that 's the root cause. Meanwhile, you
can review the config of etcd cluster deployed by Magnum.
I'm not an export of Etcd, so TBH I can't see anything wrong
with the config. Most of them are just default
configurations.<o:p></o:p></p>
<p>As for the etcd image, it's built from <a
href="https://github.com/projectatomic/atomic-system-containers/tree/master/etcd"
moz-do-not-send="true">https://github.com/projectatomic/atomic-system-containers/tree/master/etcd</a>
or you can refer CERN's repo <a
href="https://gitlab.cern.ch/cloud/atomic-system-containers/blob/cern-qa/etcd/"
moz-do-not-send="true">https://gitlab.cern.ch/cloud/atomic-system-containers/blob/cern-qa/etcd/</a><o:p></o:p></p>
<p><b>Spyros</b>, any comments?<o:p></o:p></p>
<p><o:p> </o:p></p>
<div>
<p class="MsoNormal">On 14/01/20 10:52 AM, Eric K. Miller
wrote:<o:p></o:p></p>
</div>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<pre>Hi Feilong,<o:p></o:p></pre>
<pre><o:p> </o:p></pre>
<pre>Thanks for responding! I am, indeed, using the default v3.2.7 version for etcd, which is the only available image.<o:p></o:p></pre>
<pre><o:p> </o:p></pre>
<pre>I did not try to reproduce with any other driver (we have never used DevStack, honestly, only Kolla-Ansible deployments). I did see a number of people indicating similar issues with etcd versions in the 3.3.x range, so I didn't think of it being an etcd issue, but then again most issues seem to be a result of people using HDDs and not SSDs, which makes sense.<o:p></o:p></pre>
<pre><o:p> </o:p></pre>
<pre>Interesting that you saw the same issue, though. We haven't tried Fedora CoreOS, but I think we would need Train for this.<o:p></o:p></pre>
<pre><o:p> </o:p></pre>
<pre>Everything I read about etcd indicates that it is extremely latency sensitive, due to the fact that it replicates all changes to all nodes and sends an fsync to Linux each time, so data is always guaranteed to be stored. I can see this becoming an issue quickly without super-low-latency network and storage. We are using Ceph-based SSD volumes for the Kubernetes Master node disks, which is extremely fast (likely 10x or better than anything people recommend for etcd), but network latency is always going to be higher with VMs on OpenStack with DVR than bare metal with VLANs due to all of the abstractions.<o:p></o:p></pre>
<pre><o:p> </o:p></pre>
<pre>Do you know who maintains the etcd images for Magnum here? Is there an easy way to create a newer image?<o:p></o:p></pre>
<pre><a href="https://hub.docker.com/r/openstackmagnum/etcd/tags/" moz-do-not-send="true">https://hub.docker.com/r/openstackmagnum/etcd/tags/</a><o:p></o:p></pre>
<pre><o:p> </o:p></pre>
<pre>Eric<o:p></o:p></pre>
<pre><o:p> </o:p></pre>
<pre><o:p> </o:p></pre>
<pre><o:p> </o:p></pre>
<pre>From: Feilong Wang [<a href="mailto:feilong@catalyst.net.nz" moz-do-not-send="true">mailto:feilong@catalyst.net.nz</a>] <o:p></o:p></pre>
<pre>Sent: Monday, January 13, 2020 3:39 PM<o:p></o:p></pre>
<pre>To: <a href="mailto:openstack-discuss@lists.openstack.org" moz-do-not-send="true">openstack-discuss@lists.openstack.org</a><o:p></o:p></pre>
<pre>Subject: Re: [magnum][kolla] etcd wal sync duration issue<o:p></o:p></pre>
<pre><o:p> </o:p></pre>
<pre>Hi Eric,<o:p></o:p></pre>
<pre>That issue looks familiar for me. There are some questions I'd like to check before answering if you should upgrade to train.<o:p></o:p></pre>
<pre>1. Are using the default v3.2.7 version for etcd?<o:p></o:p></pre>
<pre>2. Did you try to reproduce this with devstack, using Fedora CoreOS driver? The etcd version could be 3.2.26<o:p></o:p></pre>
<pre>I asked above questions because I saw the same error when I used Fedora Atomic with etcd v3.2.7 and I can't reproduce it with Fedora CoreOS + etcd 3.2.26<o:p></o:p></pre>
<pre><o:p> </o:p></pre>
<pre><o:p> </o:p></pre>
</blockquote>
<pre>-- <o:p></o:p></pre>
<pre>Cheers & Best regards,<o:p></o:p></pre>
<pre>Feilong Wang (<span style="font-family:"MS Gothic"">王</span><span style="font-family:MingLiU">飞龙</span>)<o:p></o:p></pre>
<pre>------------------------------------------------------<o:p></o:p></pre>
<pre>Senior Cloud Software Engineer<o:p></o:p></pre>
<pre>Tel: +64-48032246<o:p></o:p></pre>
<pre>Email: <a href="mailto:flwang@catalyst.net.nz" moz-do-not-send="true">flwang@catalyst.net.nz</a><o:p></o:p></pre>
<pre>Catalyst IT Limited<o:p></o:p></pre>
<pre>Level 6, Catalyst House, 150 Willis Street, Wellington<o:p></o:p></pre>
<pre>------------------------------------------------------ <o:p></o:p></pre>
</div>
</div>
</blockquote>
<pre class="moz-signature" cols="72">--
Cheers & Best regards,
Feilong Wang (王飞龙)
Head of R&D
Catalyst Cloud - Cloud Native New Zealand
--------------------------------------------------------------------------
Tel: +64-48032246
Email: <a class="moz-txt-link-abbreviated" href="mailto:flwang@catalyst.net.nz">flwang@catalyst.net.nz</a>
Level 6, Catalyst House, 150 Willis Street, Wellington
-------------------------------------------------------------------------- </pre>
</body>
</html>