[openstack-dev] [TripleO][CI] Memory shortage in HA jobs, please increase it
ekuvaja at redhat.com
Fri Aug 19 10:12:57 UTC 2016
On Fri, Aug 19, 2016 at 10:53 AM, Hugh Brock <hbrock at redhat.com> wrote:
> On Fri, Aug 19, 2016 at 11:41 AM, Derek Higgins <derekh at redhat.com> wrote:
>> On 19 August 2016 at 00:07, Sagi Shnaidman <sshnaidm at redhat.com> wrote:
>>> we have a problem again with not enough memory in HA jobs, all of them
>>> constantly fails in CI: http://status-tripleoci.rhcloud.com/
>> Have we any idea why we need more memory all of a sudden? For months
>> the overcloud nodes have had 5G of RAM, then last week we bumped it
>> too 5.5G now we need it bumped too 6G.
>> If a new service has been added that is needed on the overcloud then
>> bumping to 6G is expected and probably the correct answer but I'd like
>> to see us avoiding blindly increasing the resources each time we see
>> out of memory errors without investigating if there was a regression
>> causing something to start hogging memory.
>> Sorry if it seems like I'm being picky about this (I seem to resist
>> these bumps every time they come up) but there are two good reasons to
>> avoid this if possible
>> o at peak we are currently configured to run 75 simultaneous jobs
>> (although we probably don't reach that at the moment), and each HA job
>> has 5 baremetal nodes so bumping from 5G too 6G increases the amount
>> of RAM ci can use at peak by 375G
>> o When we bump the RAM usage of baremetal nodes from 5G too 6G what
>> we're actually doing is increasing the minimum requirements for
>> developers from 28G(or whatever the number is now) too 32G
>> So before we bump the number can we just check first if its justified,
>> as I've watched this number increase from 2G since we started running
>>  - https://review.openstack.org/#/c/353655/
> Wondering if it makes sense to enable any but the most basic overcloud
> services in TripleO CI. The idea of using some type of on-demand job
> for services other than the ones needed for the ping test has been
> proposed elsewhere -- maybe this should be our default mode for
> TripleO CI. Thoughts?
Problem with periodic jobs are that the results are bit hidden and 1
to 2 people care about them when they happen to have time. OTOH if I
understand correctly we don't test the services even now, just that
their deployment goes through without failures. Likely the best option
would be to test different subset of services depending of what files
the change touches (like discussed yesterday/earlier this week, can't
remember where so I have no reference for that discussion, sorry).
In general I'm with Derek on this, we should not just blindly throw in
more resources without understanding why we need to do so.
>>> I've created a patch that will increase it, but we need to increase it
>>> right now on rh1.
>>> I can't do it now, because unfortunately I'll not be able to watch this if
>>> it works and no problems appear.
>>> TripleO CI cloud admins, please increase the memory for baremetal flavor on
>>> rh1 tomorrow (to 6144?).
>>>  https://review.openstack.org/#/c/357532/
>>> Best regards
>>> Sagi Shnaidman
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> , , | Hugh Brock, hbrock at redhat.com
> )-_"""_-( | Director of Engineering, OpenStack Management
> ./ o\ /o \. | TripleO: Install, configure, and scale OpenStack.
> . \__/ \__/ . | http://rdoproject.org, http://tripleo.org
> ... V ... |
> ... - - - ... | "I know that you believe you understand what you
> . - - . | think I said, but I'm not sure you realize that what
> `-.....-´ | you heard is not what I meant." --Robert McCloskey
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
More information about the OpenStack-dev