[openstack-dev] [TripleO][heat] a small experiment with Ansible in TripleO

Steven Hardy shardy at redhat.com
Mon Aug 11 18:40:07 UTC 2014

On Mon, Aug 11, 2014 at 11:20:50AM -0700, Clint Byrum wrote:
> Excerpts from Zane Bitter's message of 2014-08-11 08:16:56 -0700:
> > On 11/08/14 10:46, Clint Byrum wrote:
> > > Right now we're stuck with an update that just doesn't work. It isn't
> > > just about update-failure-recovery, which is coming along nicely, but
> > > it is also about the lack of signals to control rebuild, poor support
> > > for addressing machines as groups, and unacceptable performance in
> > > large stacks.
> > 
> > Are there blueprints/bugs filed for all of these issues?
> > 
> Convergnce addresses the poor performance for large stacks in general.
> We also have this:
> https://bugs.launchpad.net/heat/+bug/1306743
> Which shows how slow metadata access can get. I have worked on patches
> but haven't been able to complete them. We made big strides but we are
> at a point where 40 nodes polling Heat every 30s is too much for one CPU
> to handle. When we scaled Heat out onto more CPUs on one box by forking
> we ran into eventlet issues. We also ran into issues because even with
> many processes we can only use one to resolve templates for a single
> stack during update, which was also excessively slow.

Related to this, and a discussion we had recently at the TripleO meetup is
this spec I raised today:


It's following up on the idea that we could potentially address (or at
least mitigate, pending the fully convergence-ified heat) some of these
scalability concerns, if TripleO moves from the one-giant-template model
to a more modular nested-stack/provider model (e.g what Tomas has been
working on)

I've not got into enough detail on that yet to be sure if it's acheivable
for Juno, but it seems initially to be complex-but-doable.

I'd welcome feedback on that idea and how it may fit in with the more
granular convergence-engine model.

Can you link to the eventlet/forking issues bug please?  I thought since
bug #1321303 was fixed that multiple engines and multiple workers should
work OK, and obviously that being true is a precondition to expending
significant effort on the nested stack decoupling plan above.


More information about the OpenStack-dev mailing list