[magnum][kolla] etcd wal sync duration issue

feilong feilong at catalyst.net.nz
Wed Jan 15 20:36:19 UTC 2020


Hi Eric,

If you're using SSD, then I think the IO performance should  be OK. You
can use this https://github.com/etcd-io/etcd/tree/master/tools/benchmark
to verify and confirm that 's the root cause. Meanwhile, you can review
the config of etcd cluster deployed by Magnum. I'm not an export of
Etcd, so TBH I can't see anything wrong with the config. Most of them
are just default configurations.

As for the etcd image, it's built from
https://github.com/projectatomic/atomic-system-containers/tree/master/etcd
or you can refer CERN's repo
https://gitlab.cern.ch/cloud/atomic-system-containers/blob/cern-qa/etcd/

*Spyros*, any comments?


On 14/01/20 10:52 AM, Eric K. Miller wrote:
> Hi Feilong,
>
> Thanks for responding!  I am, indeed, using the default v3.2.7 version for etcd, which is the only available image.
>
> I did not try to reproduce with any other driver (we have never used DevStack, honestly, only Kolla-Ansible deployments).  I did see a number of people indicating similar issues with etcd versions in the 3.3.x range, so I didn't think of it being an etcd issue, but then again most issues seem to be a result of people using HDDs and not SSDs, which makes sense.
>
> Interesting that you saw the same issue, though.  We haven't tried Fedora CoreOS, but I think we would need Train for this.
>
> Everything I read about etcd indicates that it is extremely latency sensitive, due to the fact that it replicates all changes to all nodes and sends an fsync to Linux each time, so data is always guaranteed to be stored.  I can see this becoming an issue quickly without super-low-latency network and storage.  We are using Ceph-based SSD volumes for the Kubernetes Master node disks, which is extremely fast (likely 10x or better than anything people recommend for etcd), but network latency is always going to be higher with VMs on OpenStack with DVR than bare metal with VLANs due to all of the abstractions.
>
> Do you know who maintains the etcd images for Magnum here?  Is there an easy way to create a newer image?
> https://hub.docker.com/r/openstackmagnum/etcd/tags/
>
> Eric
>
>
>
> From: Feilong Wang [mailto:feilong at catalyst.net.nz] 
> Sent: Monday, January 13, 2020 3:39 PM
> To: openstack-discuss at lists.openstack.org
> Subject: Re: [magnum][kolla] etcd wal sync duration issue
>
> Hi Eric,
> That issue looks familiar for me. There are some questions I'd like to check before answering if you should upgrade to train.
> 1. Are using the default v3.2.7 version for etcd?
> 2. Did you try to reproduce this with devstack, using Fedora CoreOS driver? The etcd version could be 3.2.26
> I asked above questions because I saw the same error when I used Fedora Atomic with etcd v3.2.7 and I can't reproduce it with Fedora CoreOS + etcd 3.2.26
>
>
-- 
Cheers & Best regards,
Feilong Wang (王飞龙)
------------------------------------------------------
Senior Cloud Software Engineer
Tel: +64-48032246
Email: flwang at catalyst.net.nz
Catalyst IT Limited
Level 6, Catalyst House, 150 Willis Street, Wellington
------------------------------------------------------ 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20200116/521bac0c/attachment.html>


More information about the openstack-discuss mailing list