[openstack-dev] [TripleO][CI] Need more undercloud resources

Emilien Macchi emilien at redhat.com
Wed Aug 24 18:43:05 UTC 2016


On Wed, Aug 24, 2016 at 2:11 PM, James Slagle <james.slagle at gmail.com> wrote:
> The latest recurring problem that is failing a lot of the nonha ssl
> jobs in tripleo-ci is:
>
> https://bugs.launchpad.net/tripleo/+bug/1616144
> tripleo-ci: nonha jobs failing with Unable to establish connection to
> https://192.0.2.2:13004/v1/a90407df1e7f4f80a38a1b1671ced2ff/stacks/overcloud/f9f6f712-8e89-4ea9-a34b-6084dc74b5c1
>
> This error happens while polling for events from the overcloud stack
> by tripleoclient.
>
> I can reproduce this error very easily locally by deploying with an
> ssl undercloud with 6GB ram and 2 vcpus. If I don't enable swap,
> something gets OOM killed. If I do enable swap, swap gets used (< 1GB)
> and then I hit this error almost every time.
>
> The stack keeps deploying but the client has died, so the job fails.
> My investigation so far has only pointed out that it's the swap
> allocation that is delaying things enough to cause the failure.
>
> We do not see this error in the ha job even though it deploys more
> nodes. As of now, my only suspect is that it's the overhead of the
> initial SSL connections causing the error.
>
> If I test with 6GB ram and 4 vcpus I can't reproduce the error,
> although much more swap is used due to the increased number of default
> workers for each API service.
>
> However, I suggest we just raise the undercloud specs in our jobs to
> 8GB ram and 4 vcpus. These seem reasonable to me because those are the
> default specs used by infra in all of their devstack single and
> multinode jobs spawned on all their other cloud providers. Our own
> multinode job for the undercloud/overcloud and undercloud only job are
> running on instances of these sizes.
>
> Yes, this is just sidestepping the problem by throwing more resources
> at it. The reality is that we do not prioritize working on optimizing
> for speed/performance/resources. We prioritize feature work that
> indirectly (or maybe it's directly?) makes everything slower,
> especially at this point in the development cycle.
>
> We should therefore expect to have to continue to provide more and
> more resources to our CI jobs until we prioritize optimizing them to
> run with less.
>
> Let me know if there is any disagreement on making these changes. If
> there isn't, I'll apply them in the next day or so. If there are any
> other ideas on how to address this particular bug for some immediate
> short term relief, please let me know.

For short term, +1 for extending the flavor and add the required RAM.
For long term, I'm working on extending our CI jobs to cover multiple
scenarios with less services installed on them. I hope it will help to
consume less resources on every job. Any help is welcome.

> --
> -- James Slagle
> --
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



-- 
Emilien Macchi



More information about the OpenStack-dev mailing list