[openstack-dev] [tripleo] glance backend: replace swift by file in CI

Erno Kuvaja ekuvaja at redhat.com
Tue Jun 28 11:37:23 UTC 2016


TL;DR

Makes absolutely sense to run file backend on single node undercloud at CI.

Few more comments inline.

On Mon, Jun 27, 2016 at 8:49 PM, Emilien Macchi <emilien at redhat.com> wrote:
> On Mon, Jun 27, 2016 at 3:46 PM, Clay Gerrard <clay.gerrard at gmail.com> wrote:
>> There's probably some minimal gain in cross compatibility testing to
>> sticking with the status quo.  The Swift API is old and stable, but I
>> believe there was some bug in recent history where some return value in
>> swiftclient changed from a iterable to a generator or something and some
>> aggressive non-duck type checking broke something somewhere....
>>
>> I find that bug reports sorta interesting, the reported memory pressure
>> there doesn't make sense.  Maybe there's some non-
>> essential middleware configured on that proxy that's causing the workers to
>> bloat up like that?
>
> Swift proxy pipeline:
> pipeline = catch_errors healthcheck cache ratelimit bulk tempurl
> formpost authtoken keystone staticweb proxy-logging proxy-server

Some things I do not think we benefit having there if we want to
experiment still with swift in undercloud:
staticweb - do we need containers being presented as webpages?
tempurl - Id assume we can expect the user having access the needed
objects with their own credentials.
formpost - likely we do not need http forms instead of PUT calls either.
ratelimit - There and there, have we had single time where something
goes grazy and ratelimit has saved us and the tests still not failed.
healthcheck - not likely used, but also really lightweight so
shouldn't make any difference

cache - Memcache is likely the thing that kills us.

>
> Thanks for your help,
>
>> -clayg
>>
>> On Mon, Jun 27, 2016 at 12:30 PM, Emilien Macchi <emilien at redhat.com> wrote:
>>>
>>> Hi,
>>>
>>> Today we're re-investigating a CI failure that we had multiple times [1]:
>>> Swift memory usage grows until it is OOM-killed.
>>>
>>> The perimeter of this thread is about our CI and not production
>>> environments.
>>> Indeed, our CI is running limited resources while production
>>> environments should not hit this problem.
>>>
>>> After some investigation on #ŧripleo, we found out this scenario was
>>> happening almost every time since recently:
>>>
>>> * undercloud is deployed, glance and swift are running. Glance is
>>> configured with Swift backend to store images.
>>> * tripleo CI upload overcloud image into Glance, image is successfully
>>> uploaded.
>>> * when overcloud starts deploying, some nodes randomly fail to deploy
>>> because the undercloud OOM-kills swift-proxy-server that is still
>>> sending the ovecloud image requested by Glance API. Swift fails,
>>> Glance fails, overcloud deployment fails with a "No valid hosts
>>> found".
>>>
>>> It's likely due to performances issues in our CI, and there is nothing
>>> we can do but adding more resources or reducing the number of
>>> environments, something we won't do at this time, because our recent
>>> improvements in our CI (more ram, SSD, etc).

So the possible streamlining and optimizing swift for small
environment was tried already?

Another thing that comes to my mind based on the discussions lately.
What is the core count on our CI uc node? Are all the serviced
deployed there with their default worker values? Might be sensible
(even for production use) to limit the amount of workers our services
kick up in aio undercloud as that tends to have huge impact on memory
consumption.

- Erno "jokke_" Kuvaja
>>>
>>> As a first iteration, I propose [2] that we stop using Swift as a
>>> backend for Glance. Indeed, our undercloud is currently single-node, I
>>> see zero value of using Swift to store the overcloud image.
>>> If there is a value, then we can add the option to whether or not
>>> using it (and set it to False in our CI to use file backend, which
>>> won't lead to OOM).
>>>
>>> Note: on the overcloud: we currently support file, swift and rbd
>>> backends, that you can easily select during your deployment.
>>>
>>> [1] https://bugs.launchpad.net/tripleo/+bug/1595916
>>> [2] https://review.openstack.org/#/c/334555/
>>> --
>>> Emilien Macchi
>>>
>>> __________________________________________________________________________
>>> OpenStack Development Mailing List (not for usage questions)
>>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>>
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>
>
>
> --
> Emilien Macchi
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



More information about the OpenStack-dev mailing list