[openstack-dev] [TripleO][CI] Need more undercloud resources
emilien at redhat.com
Wed Aug 24 18:43:05 UTC 2016
On Wed, Aug 24, 2016 at 2:11 PM, James Slagle <james.slagle at gmail.com> wrote:
> The latest recurring problem that is failing a lot of the nonha ssl
> jobs in tripleo-ci is:
> tripleo-ci: nonha jobs failing with Unable to establish connection to
> This error happens while polling for events from the overcloud stack
> by tripleoclient.
> I can reproduce this error very easily locally by deploying with an
> ssl undercloud with 6GB ram and 2 vcpus. If I don't enable swap,
> something gets OOM killed. If I do enable swap, swap gets used (< 1GB)
> and then I hit this error almost every time.
> The stack keeps deploying but the client has died, so the job fails.
> My investigation so far has only pointed out that it's the swap
> allocation that is delaying things enough to cause the failure.
> We do not see this error in the ha job even though it deploys more
> nodes. As of now, my only suspect is that it's the overhead of the
> initial SSL connections causing the error.
> If I test with 6GB ram and 4 vcpus I can't reproduce the error,
> although much more swap is used due to the increased number of default
> workers for each API service.
> However, I suggest we just raise the undercloud specs in our jobs to
> 8GB ram and 4 vcpus. These seem reasonable to me because those are the
> default specs used by infra in all of their devstack single and
> multinode jobs spawned on all their other cloud providers. Our own
> multinode job for the undercloud/overcloud and undercloud only job are
> running on instances of these sizes.
> Yes, this is just sidestepping the problem by throwing more resources
> at it. The reality is that we do not prioritize working on optimizing
> for speed/performance/resources. We prioritize feature work that
> indirectly (or maybe it's directly?) makes everything slower,
> especially at this point in the development cycle.
> We should therefore expect to have to continue to provide more and
> more resources to our CI jobs until we prioritize optimizing them to
> run with less.
> Let me know if there is any disagreement on making these changes. If
> there isn't, I'll apply them in the next day or so. If there are any
> other ideas on how to address this particular bug for some immediate
> short term relief, please let me know.
For short term, +1 for extending the flavor and add the required RAM.
For long term, I'm working on extending our CI jobs to cover multiple
scenarios with less services installed on them. I hope it will help to
consume less resources on every job. Any help is welcome.
> -- James Slagle
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
More information about the OpenStack-dev