[openstack-dev] [TripleO][heat] a small experiment with Ansible in TripleO

Clint Byrum clint at fewbar.com
Mon Aug 11 18:49:42 UTC 2014


Excerpts from Steven Hardy's message of 2014-08-11 11:40:07 -0700:
> On Mon, Aug 11, 2014 at 11:20:50AM -0700, Clint Byrum wrote:
> > Excerpts from Zane Bitter's message of 2014-08-11 08:16:56 -0700:
> > > On 11/08/14 10:46, Clint Byrum wrote:
> > > > Right now we're stuck with an update that just doesn't work. It isn't
> > > > just about update-failure-recovery, which is coming along nicely, but
> > > > it is also about the lack of signals to control rebuild, poor support
> > > > for addressing machines as groups, and unacceptable performance in
> > > > large stacks.
> > > 
> > > Are there blueprints/bugs filed for all of these issues?
> > > 
> > 
> > Convergnce addresses the poor performance for large stacks in general.
> > We also have this:
> > 
> > https://bugs.launchpad.net/heat/+bug/1306743
> > 
> > Which shows how slow metadata access can get. I have worked on patches
> > but haven't been able to complete them. We made big strides but we are
> > at a point where 40 nodes polling Heat every 30s is too much for one CPU
> > to handle. When we scaled Heat out onto more CPUs on one box by forking
> > we ran into eventlet issues. We also ran into issues because even with
> > many processes we can only use one to resolve templates for a single
> > stack during update, which was also excessively slow.
> 
> Related to this, and a discussion we had recently at the TripleO meetup is
> this spec I raised today:
> 
> https://review.openstack.org/#/c/113296/
> 
> It's following up on the idea that we could potentially address (or at
> least mitigate, pending the fully convergence-ified heat) some of these
> scalability concerns, if TripleO moves from the one-giant-template model
> to a more modular nested-stack/provider model (e.g what Tomas has been
> working on)
> 
> I've not got into enough detail on that yet to be sure if it's acheivable
> for Juno, but it seems initially to be complex-but-doable.
> 
> I'd welcome feedback on that idea and how it may fit in with the more
> granular convergence-engine model.
> 
> Can you link to the eventlet/forking issues bug please?  I thought since
> bug #1321303 was fixed that multiple engines and multiple workers should
> work OK, and obviously that being true is a precondition to expending
> significant effort on the nested stack decoupling plan above.
> 

That was the issue. So we fixed that bug, but we never un-reverted
the patch that forks enough engines to use up all the CPU's on a box
by default. That would likely help a lot with metadata access speed
(we could manually do it in TripleO but we tend to push defaults. :)



More information about the OpenStack-dev mailing list