[openstack-dev] [tripleo] glance backend: replace swift by file in CI

Dmitry Tantsur dtantsur at redhat.com
Wed Jun 29 12:59:45 UTC 2016


On 06/28/2016 01:37 PM, Erno Kuvaja wrote:
> TL;DR
>
> Makes absolutely sense to run file backend on single node undercloud at CI.
>
> Few more comments inline.
>
> On Mon, Jun 27, 2016 at 8:49 PM, Emilien Macchi <emilien at redhat.com> wrote:
>> On Mon, Jun 27, 2016 at 3:46 PM, Clay Gerrard <clay.gerrard at gmail.com> wrote:
>>> There's probably some minimal gain in cross compatibility testing to
>>> sticking with the status quo.  The Swift API is old and stable, but I
>>> believe there was some bug in recent history where some return value in
>>> swiftclient changed from a iterable to a generator or something and some
>>> aggressive non-duck type checking broke something somewhere....
>>>
>>> I find that bug reports sorta interesting, the reported memory pressure
>>> there doesn't make sense.  Maybe there's some non-
>>> essential middleware configured on that proxy that's causing the workers to
>>> bloat up like that?
>>
>> Swift proxy pipeline:
>> pipeline = catch_errors healthcheck cache ratelimit bulk tempurl
>> formpost authtoken keystone staticweb proxy-logging proxy-server
>
> Some things I do not think we benefit having there if we want to
> experiment still with swift in undercloud:

I hope we're not removing it completely...

> staticweb - do we need containers being presented as webpages?
> tempurl - Id assume we can expect the user having access the needed
> objects with their own credentials.

Please leave it there, we need it to support agent_* family of ironic 
drivers.

> formpost - likely we do not need http forms instead of PUT calls either.
> ratelimit - There and there, have we had single time where something
> goes grazy and ratelimit has saved us and the tests still not failed.
> healthcheck - not likely used, but also really lightweight so
> shouldn't make any difference
>
> cache - Memcache is likely the thing that kills us.
>
>>
>> Thanks for your help,
>>
>>> -clayg
>>>
>>> On Mon, Jun 27, 2016 at 12:30 PM, Emilien Macchi <emilien at redhat.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> Today we're re-investigating a CI failure that we had multiple times [1]:
>>>> Swift memory usage grows until it is OOM-killed.
>>>>
>>>> The perimeter of this thread is about our CI and not production
>>>> environments.
>>>> Indeed, our CI is running limited resources while production
>>>> environments should not hit this problem.
>>>>
>>>> After some investigation on #ŧripleo, we found out this scenario was
>>>> happening almost every time since recently:
>>>>
>>>> * undercloud is deployed, glance and swift are running. Glance is
>>>> configured with Swift backend to store images.
>>>> * tripleo CI upload overcloud image into Glance, image is successfully
>>>> uploaded.
>>>> * when overcloud starts deploying, some nodes randomly fail to deploy
>>>> because the undercloud OOM-kills swift-proxy-server that is still
>>>> sending the ovecloud image requested by Glance API. Swift fails,
>>>> Glance fails, overcloud deployment fails with a "No valid hosts
>>>> found".
>>>>
>>>> It's likely due to performances issues in our CI, and there is nothing
>>>> we can do but adding more resources or reducing the number of
>>>> environments, something we won't do at this time, because our recent
>>>> improvements in our CI (more ram, SSD, etc).
>
> So the possible streamlining and optimizing swift for small
> environment was tried already?
>
> Another thing that comes to my mind based on the discussions lately.
> What is the core count on our CI uc node? Are all the serviced
> deployed there with their default worker values? Might be sensible
> (even for production use) to limit the amount of workers our services
> kick up in aio undercloud as that tends to have huge impact on memory
> consumption.
>
> - Erno "jokke_" Kuvaja
>>>>
>>>> As a first iteration, I propose [2] that we stop using Swift as a
>>>> backend for Glance. Indeed, our undercloud is currently single-node, I
>>>> see zero value of using Swift to store the overcloud image.
>>>> If there is a value, then we can add the option to whether or not
>>>> using it (and set it to False in our CI to use file backend, which
>>>> won't lead to OOM).
>>>>
>>>> Note: on the overcloud: we currently support file, swift and rbd
>>>> backends, that you can easily select during your deployment.
>>>>
>>>> [1] https://bugs.launchpad.net/tripleo/+bug/1595916
>>>> [2] https://review.openstack.org/#/c/334555/
>>>> --
>>>> Emilien Macchi
>>>>
>>>> __________________________________________________________________________
>>>> OpenStack Development Mailing List (not for usage questions)
>>>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>
>>>
>>>
>>> __________________________________________________________________________
>>> OpenStack Development Mailing List (not for usage questions)
>>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>
>>
>>
>>
>> --
>> Emilien Macchi
>>
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>




More information about the OpenStack-dev mailing list