[openstack-dev] [Heat] convergence rally test results (so far)
Angus Salkeld
asalkeld at mirantis.com
Tue Sep 1 23:53:04 UTC 2015
On Tue, Sep 1, 2015 at 10:45 PM Steven Hardy <shardy at redhat.com> wrote:
> On Fri, Aug 28, 2015 at 01:35:52AM +0000, Angus Salkeld wrote:
> > Hi
> > I have been running some rally tests against convergence and our
> existing
> > implementation to compare.
> > So far I have done the following:
> > 1. defined a template with a resource
> > groupA
> https://github.com/asalkeld/convergence-rally/blob/master/templates/resource_group_test_resource.yaml.template
> > 2. the inner resource looks like
> > this:A
> https://github.com/asalkeld/convergence-rally/blob/master/templates/server_with_volume.yaml.templateA
> (it
> > uses TestResource to attempt to be a reasonable simulation of a
> > server+volume+floatingip)
> > 3. defined a rally
> > job:A
> https://github.com/asalkeld/convergence-rally/blob/master/increasing_resources.yamlA
> that
> > creates X resources then updates to X*2 then deletes.
> > 4. I then ran the above with/without convergence and with 2,4,8
> > heat-engines
> > Here are the results compared:
> >
> https://docs.google.com/spreadsheets/d/12kRtPsmZBl_y78aw684PTBg3op1ftUYsAEqXBtT800A/edit?usp=sharing
> > Some notes on the results so far:
> > * A convergence with only 2 engines does suffer from RPC overload
> (it
> > gets message timeouts on larger templates). I wonder if this is
> the
> > problem in our convergence gate...
> > * convergence does very well with a reasonable number of engines
> > running.
> > * delete is slightly slower on convergence
> > Still to test:
> > * the above, but measure memory usage
> > * many small templates (run concurrently)
>
> So, I tried running my many-small-templates here with convergence enabled:
>
> https://bugs.launchpad.net/heat/+bug/1489548
>
> In heat.conf I set:
>
> max_resources_per_stack = -1
> convergence_engine = true
>
> Most other settings (particularly RPC and DB settings) are defaults.
>
> Without convergence (but with max_resources_per_stack disabled) I see the
> time to create a ResourceGroup of 400 nested stacks (each containing one
> RandomString resource) is about 2.5 minutes (core i7 laptop w/SSD, 4 heat
> workers e.g the default for a 4 core machine).
>
> With convergence enabled, I see these errors from sqlalchemy:
>
> File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 652, in
> _checkout\n fairy = _ConnectionRecord.checkout(pool)\n', u' File
> "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 444, in
> checkout\n rec = pool._do_get()\n', u' File
> "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 980, in
> _do_get\n (self.size(), self.overflow(), self._timeout))\n',
> u'TimeoutError: QueuePool limit of size 5 overflow 10 reached, connection
> timed out, timeout 30\n'].
>
> I assume this means we're loading the DB much more in the convergence case
> and overflowing the QueuePool?
>
Yeah, looks like it.
>
> This seems to happen when the RPC call from the ResourceGroup tries to
> create some of the 400 nested stacks.
>
> Interestingly after this error, the parent stack moves to CREATE_FAILED,
> but the engine remains (very) busy, to the point of being partially
> responsive, so it looks like maybe the cancel-on-fail isnt' working (I'm
> assuming it isn't error_wait_time because the parent stack has been marked
> FAILED and I'm pretty sure it's been more than 240s).
>
> I'll dig a bit deeper when I get time, but for now you might like to try
> the stress test too. It's a bit of a synthetic test, but it turns out to
> be a reasonable proxy for some performance issues we observed when creating
> large-ish TripleO deployments (which also create a large number of nested
> stacks concurrently).
>
Thanks a lot for testing Steve! I'll make 2 bugs for what you have raised
1. limit the number of resource actions in parallel (maybe base on the
number of cores)
2. the cancel on fail error
-Angus
>
> Steve
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20150901/ca1ec281/attachment.html>
More information about the OpenStack-dev
mailing list