[openstack-dev] [Zun][Infra] Random internet disconnection in the gate

Jeremy Stanley fungi at yuggoth.org
Mon Nov 7 21:26:37 UTC 2016


On 2016-11-07 16:13:04 -0500 (-0500), Hongbin Lu wrote:
> I am working on the Zun project and we experienced random failure within
> the gate. The error is as below:
> 
>     2016-10-16 00:30:49.359 | ++
> /opt/stack/new/zun/devstack/lib/zun:install_etcd_server:316 :   curl -L
> https://github.com/coreos/etcd/releases/download/v3.0.7/etcd-v3.0.7-linux-amd64.tar.gz
> -o /opt/stack/new/zun/etcd/etcd-v3.0.7-linux-amd64.tar.gz
>     ....
>     curl: (7) Failed to connect to github.com port 443: Connection timed out

Yes, connectivity to github.com can be iffy even on the best of
days, so relying on it within CI jobs is always a bit problematic.

> By searching on logstach by using a query (message:"Failed to connect to
> github.com port 443: Connection timed out"), it looks all the failure were
> happening in node "ubuntu-*-osic-cloud1-*". Is that related to anything
> specific to the osic cloud?

Since osic-cloud1 provides the majority of our job capacity these
days, it's just as likely the sample size is too small to show the
error impacting other providers as well. It's also possible the IPv4
NAT for OSIC is overloaded, given that those job nodes only have
global addresses for IPv6 and I don't see any AAAA records for
github.com.

Regardless, there is a plan to implement provider-local mirroring of
arbitrary file dependencies which would allow jobs to consume those
without connecting over the wider Internet. In this case, etcd
tarballs might be something well suited to this, or you could
consider trying to use Ubuntu's etcd package (I'm unsure what you
have DevStack doing with etcd so it's hard to know if this is a
viable alternative for you).
-- 
Jeremy Stanley



More information about the OpenStack-dev mailing list