Open Stack

Mon Aug 11 22:58:36 UTC 2014

On 12/08/14 06:20, Clint Byrum wrote:
> Excerpts from Zane Bitter's message of 2014-08-11 08:16:56 -0700:
>> On 11/08/14 10:46, Clint Byrum wrote:
>>> Right now we're stuck with an update that just doesn't work. It isn't
>>> just about update-failure-recovery, which is coming along nicely, but
>>> it is also about the lack of signals to control rebuild, poor support
>>> for addressing machines as groups, and unacceptable performance in
>>> large stacks.
>> Are there blueprints/bugs filed for all of these issues?
>>
> Convergnce addresses the poor performance for large stacks in general.
> We also have this:
>
> https://bugs.launchpad.net/heat/+bug/1306743
>
> Which shows how slow metadata access can get. I have worked on patches
> but haven't been able to complete them. We made big strides but we are
> at a point where 40 nodes polling Heat every 30s is too much for one CPU
> to handle. When we scaled Heat out onto more CPUs on one box by forking
> we ran into eventlet issues. We also ran into issues because even with
> many processes we can only use one to resolve templates for a single
> stack during update, which was also excessively slow.
>
> We haven't been able to come back around to those yet, but you can see
> where this has turned into a bit of a rat hole of optimization.

> action-aware-sw-config is sort of what we want for rebuild. We
> collaborated with the trove devs on how to also address it for resize
> a while back but I have lost track of that work as it has taken a back
> seat to more pressing issues.

We were discussing offloading metadata polling to a tempURL swift
object; that would certainly deal to scaling metadata polling.

But also, this could help with out-of-band ansible workflow too.
Anything (ie, Ansible) could push changed data to the swift object too.
And if you wanted to ensure that heat didn't overwrite that during an
accidental heat stack-update then you could configure os-collect-config
to poll from 2 swift objects, one for heat and one for manual updates.
The manual object could take precedence over the heat one for metadata
merging, which could give you a nice fine-grained override mechanism.

> Addressing groups is a general problem that I've had a hard time
> articulating in the past. Tomas Sedovic has done a good job with this
> TripleO spec, but I don't know that we've asked for an explicit change
> in a bug or spec in Heat just yet:
>
> https://review.openstack.org/#/c/97939/
>
> There are a number of other issues noted in that spec which are already
> addressed in Heat, but require refactoring in TripleO's templates and
> tools, and that work continues.
I'll follow up the potential solutions in the other thread:
http://lists.openstack.org/pipermail/openstack-dev/2014-August/042313.html

> The point remains: we need something that works now, and doing an
> alternate implementation for updates is actually faster than addressing
> all of these issues.
Thanks, that was a good summary of the issues, and I do appreciate the
need for both tactical and strategic solutions.

Open Stack

[openstack-dev] [TripleO][heat] a small experiment with Ansible in TripleO

OpenStack

Community

Documentation

Branding & Legal