[openstack-dev] [TripleO][CI] Memory shortage in HA jobs, please increase it

Giulio Fidente gfidente at redhat.com
Fri Aug 19 10:33:18 UTC 2016


On 08/19/2016 12:12 PM, Erno Kuvaja wrote:
> On Fri, Aug 19, 2016 at 10:53 AM, Hugh Brock <hbrock at redhat.com> wrote:
>> On Fri, Aug 19, 2016 at 11:41 AM, Derek Higgins <derekh at redhat.com> wrote:
>>> On 19 August 2016 at 00:07, Sagi Shnaidman <sshnaidm at redhat.com> wrote:
>>>> Hi,
>>>>
>>>> we have a problem again with not enough memory in HA jobs, all of them
>>>> constantly fails in CI: http://status-tripleoci.rhcloud.com/
>>>
>>> Have we any idea why we need more memory all of a sudden? For months
>>> the overcloud nodes have had 5G of RAM, then last week[1] we bumped it
>>> too 5.5G now we need it bumped too 6G.
>>>
>>> If a new service has been added that is needed on the overcloud then
>>> bumping to 6G is expected and probably the correct answer but I'd like
>>> to see us avoiding blindly increasing the resources each time we see
>>> out of memory errors without investigating if there was a regression
>>> causing something to start hogging memory.
>>>
>>> Sorry if it seems like I'm being picky about this (I seem to resist
>>> these bumps every time they come up) but there are two good reasons to
>>> avoid this if possible
>>> o at peak we are currently configured to run 75 simultaneous jobs
>>> (although we probably don't reach that at the moment), and each HA job
>>> has 5 baremetal nodes so bumping from 5G too 6G increases the amount
>>> of RAM ci can use at peak by 375G
>>> o When we bump the RAM usage of baremetal nodes from 5G too 6G what
>>> we're actually doing is increasing the minimum requirements for
>>> developers from 28G(or whatever the number is now) too 32G
>>>
>>> So before we bump the number can we just check first if its justified,
>>> as I've watched this number increase from 2G since we started running
>>> tripleo-ci
>>>
>>> thanks,
>>> Derek.
>>>
>>> [1] - https://review.openstack.org/#/c/353655/
>>
>> Wondering if it makes sense to enable any but the most basic overcloud
>> services in TripleO CI. The idea of using some type of on-demand job
>> for services other than the ones needed for the ping test has been
>> proposed elsewhere -- maybe this should be our default mode for
>> TripleO CI. Thoughts?
>>
>> --Hugh
>
> Problem with periodic jobs are that the results are bit hidden and 1
> to 2 people care about them when they happen to have time. OTOH if I
> understand correctly we don't test the services even now, just that
> their deployment goes through without failures.

we do some testing of the overcloud in the gate jobs, we actually deploy 
a heat stack in the overcloud [1], creating a volume based nova guest 
(backed by Ceph for HA job), set some routing and ping it (in network 
isolation!)

1. 
https://github.com/openstack-infra/tripleo-ci/blob/master/templates/tenantvm_floatingip.yaml
-- 
Giulio Fidente
GPG KEY: 08D733BA | IRC: gfidente



More information about the OpenStack-dev mailing list