Open Stack

Mon Aug 11 20:35:44 UTC 2014

On 11/08/14 14:49, Clint Byrum wrote:
> Excerpts from Steven Hardy's message of 2014-08-11 11:40:07 -0700:
>> On Mon, Aug 11, 2014 at 11:20:50AM -0700, Clint Byrum wrote:
>>> Excerpts from Zane Bitter's message of 2014-08-11 08:16:56 -0700:
>>>> On 11/08/14 10:46, Clint Byrum wrote:
>>>>> Right now we're stuck with an update that just doesn't work. It isn't
>>>>> just about update-failure-recovery, which is coming along nicely, but
>>>>> it is also about the lack of signals to control rebuild, poor support
>>>>> for addressing machines as groups, and unacceptable performance in
>>>>> large stacks.
>>>>
>>>> Are there blueprints/bugs filed for all of these issues?
>>>>
>>>
>>> Convergnce addresses the poor performance for large stacks in general.
>>> We also have this:
>>>
>>> https://bugs.launchpad.net/heat/+bug/1306743
>>>
>>> Which shows how slow metadata access can get. I have worked on patches
>>> but haven't been able to complete them. We made big strides but we are
>>> at a point where 40 nodes polling Heat every 30s is too much for one CPU

This sounds like the same figure I heard at the design summit; did the 
DB call optimisation work that Steve Baker did immediately after that 
not have any effect?

>>> to handle. When we scaled Heat out onto more CPUs on one box by forking
>>> we ran into eventlet issues. We also ran into issues because even with
>>> many processes we can only use one to resolve templates for a single
>>> stack during update, which was also excessively slow.
>>
>> Related to this, and a discussion we had recently at the TripleO meetup is
>> this spec I raised today:
>>
>> https://review.openstack.org/#/c/113296/
>>
>> It's following up on the idea that we could potentially address (or at
>> least mitigate, pending the fully convergence-ified heat) some of these
>> scalability concerns, if TripleO moves from the one-giant-template model
>> to a more modular nested-stack/provider model (e.g what Tomas has been
>> working on)
>>
>> I've not got into enough detail on that yet to be sure if it's acheivable
>> for Juno, but it seems initially to be complex-but-doable.
>>
>> I'd welcome feedback on that idea and how it may fit in with the more
>> granular convergence-engine model.
>>
>> Can you link to the eventlet/forking issues bug please?  I thought since
>> bug #1321303 was fixed that multiple engines and multiple workers should
>> work OK, and obviously that being true is a precondition to expending
>> significant effort on the nested stack decoupling plan above.
>>
>
> That was the issue. So we fixed that bug, but we never un-reverted
> the patch that forks enough engines to use up all the CPU's on a box
> by default. That would likely help a lot with metadata access speed
> (we could manually do it in TripleO but we tend to push defaults. :)

Right, and we decided we wouldn't because it's wrong to do that to 
people by default. In some cases the optimal running configuration for 
TripleO will differ from the friendliest out-of-the-box configuration 
for Heat users in general, and in those cases - of which this is one - 
TripleO will need to specify the configuration.

cheers,
Zane.

Open Stack

[openstack-dev] [TripleO][heat] a small experiment with Ansible in TripleO

OpenStack

Community

Documentation

Branding & Legal