[openstack-dev] [tripleo] container jobs are unstable

Flavio Percoco flavio at redhat.com
Mon Mar 27 12:00:56 UTC 2017


On 23/03/17 16:24 +0100, Martin André wrote:
>On Wed, Mar 22, 2017 at 2:20 PM, Dan Prince <dprince at redhat.com> wrote:
>> On Wed, 2017-03-22 at 13:35 +0100, Flavio Percoco wrote:
>>> On 22/03/17 13:32 +0100, Flavio Percoco wrote:
>>> > On 21/03/17 23:15 -0400, Emilien Macchi wrote:
>>> > > Hey,
>>> > >
>>> > > I've noticed that container jobs look pretty unstable lately; to
>>> > > me,
>>> > > it sounds like a timeout:
>>> > > http://logs.openstack.org/19/447319/2/check-tripleo/gate-tripleo-
>>> > > ci-centos-7-ovb-containers-oooq-nv/bca496a/console.html#_2017-03-
>>> > > 22_00_08_55_358973
>>> >
>>> > There are different hypothesis on what is going on here. Some
>>> > patches have
>>> > landed to improve the write performance on containers by using
>>> > hostpath mounts
>>> > but we think the real slowness is coming from the images download.
>>> >
>>> > This said, this is still under investigation and the containers
>>> > squad will
>>> > report back as soon as there are new findings.
>>>
>>> Also, to be more precise, Martin André is looking into this. He also
>>> fixed the
>>> gate in the last 2 weeks.
>>
>> I spoke w/ Martin on IRC. He seems to think this is the cause of some
>> of the failures:
>>
>> http://logs.openstack.org/32/446432/1/check-tripleo/gate-tripleo-ci-cen
>> tos-7-ovb-containers-oooq-nv/543bc80/logs/oooq/overcloud-controller-
>> 0/var/log/extra/docker/containers/heat_engine/log/heat/heat-
>> engine.log.txt.gz#_2017-03-21_20_26_29_697
>>
>>
>> Looks like Heat isn't able to create Nova instances in the overcloud
>> due to "Host 'overcloud-novacompute-0' is not mapped to any cell'. This
>> means our cells initialization code for containers may not be quite
>> right... or there is a race somewhere.
>
>Here are some findings. I've looked at time measures from CI for
>https://review.openstack.org/#/c/448533/ which provided the most
>recent results:
>
>* gate-tripleo-ci-centos-7-ovb-ha [1]
>    undercloud install: 23
>    overcloud deploy: 72
>    total time: 125
>* gate-tripleo-ci-centos-7-ovb-nonha [2]
>    undercloud install: 25
>    overcloud deploy: 48
>    total time: 122
>* gate-tripleo-ci-centos-7-ovb-updates [3]
>    undercloud install: 24
>    overcloud deploy: 57
>    total time: 152
>* gate-tripleo-ci-centos-7-ovb-containers-oooq-nv [4]
>    undercloud install: 28
>    overcloud deploy: 48
>    total time: 165 (timeout)
>
>Looking at the undercloud & overcloud install times, the most task
>consuming tasks, the containers job isn't doing that bad compared to
>other OVB jobs. But looking closer I could see that:
>- the containers job pulls docker images from dockerhub, this process
>takes roughly 18 min.

I think we can optimize this a bit by having the script that populates the local
registry in the overcloud job to run in parallel. The docker daemon can do
multiple pulls w/o problems.

>- the overcloud validate task takes 10 min more than it should because
>of the bug Dan mentioned (a fix is in the queue at
>https://review.openstack.org/#/c/448575/)

+A

>- the postci takes a long time with quickstart, 13 min (4 min alone
>spent on docker log collection) whereas it takes only 3 min when using
>tripleo.sh

mmh, does this have anything to do with ansible being in between? Or is that
time specifically for the part that gets the logs?

>
>Adding all these numbers, we're at about 40 min of additional time for
>oooq containers job which is enough to cross the CI job limit.
>
>There is certainly a lot of room for optimization here and there and
>I'll explore how we can speed up the containers CI job over the next

Thanks a lot for the update. The time break down is fantastic,
Flavio

>weeks.
>
>Martin
>
>[1] http://logs.openstack.org/33/448533/2/check-tripleo/gate-tripleo-ci-centos-7-ovb-ha/d2c1b16/
>[2] http://logs.openstack.org/33/448533/2/check-tripleo/gate-tripleo-ci-centos-7-ovb-nonha/d6df760/
>[3] http://logs.openstack.org/33/448533/2/check-tripleo/gate-tripleo-ci-centos-7-ovb-updates/3b1f795/
>[4] http://logs.openstack.org/33/448533/2/check-tripleo/gate-tripleo-ci-centos-7-ovb-containers-oooq-nv/b816f20/
>
>> Dan
>>
>>>
>>> Flavio
>>>
>>>
>>>
>>> _____________________________________________________________________
>>> _____
>>> OpenStack Development Mailing List (not for usage questions)
>>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubs
>>> cribe
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>__________________________________________________________________________
>OpenStack Development Mailing List (not for usage questions)
>Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

-- 
@flaper87
Flavio Percoco
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 862 bytes
Desc: not available
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20170327/01a8c3c8/attachment.pgp>


More information about the OpenStack-dev mailing list