[openstack-dev] [Heat] How the autoscale API should control scaling in Heat

Zane Bitter zbitter at redhat.com
Wed Sep 11 14:55:46 UTC 2013


On 11/09/13 05:51, Adrian Otto wrote:
> I have a different point of view. First I will offer some assertions:

It's not clear to me what you actually have an issue with? (Top-posting 
is not helping in this respect.)

> A-1) We need to keep it simple.
> 	A-1.1) Systems that are hard to comprehend are hard to debug, and that's bad.

Absolutely, and systems with higher entropy are harder to comprehend.

> 	A-1.2) Complex systems tend to be much more brittle than simple ones.

"The Zen of Python" has it right here:

     Simple is better than complex.
     Complex is better than complicated.

Complicated systems have a lot of entropy. Complex systems (that is to 
say, systems composed of multiple simpler systems) are actually a tool 
for _reducing_ entropy.

> A-2) Scale-up operations need to be as-fast-as-possible.
> 	A-2.1) Auto-Scaling only works right if your new capacity is added quickly when your controller detects that you need more. If you spend a bunch of time goofing around before actually adding a new resource to a pool when its under staring.
> 	A-2.2) The fewer network round trips between "add-more-resources-now" and "resources-added" the better. Fewer = less brittle.

I submit that the difference between a packet round-trip time within a 
single datacenter and the time to boot a Nova server is at least 3 
orders of magnitude.

> A-3) The control logic for scaling different applications vary.
> 	A-3.1) What metrics are watched may differ between various use cases.
> 	A-3.2) The data types that represent sensor data may vary.
> 	A-3.3) The policy that's applied to the metrics (such as max, min, and cooldown period) vary between applications. Not only the values vary, but the logic itself.
> 	A-3.4) A scaling policy may not just be a handful of simple parameters. Ideally it allows configurable logic that the end-user can control to some extent.
>
> A-4) Auto-scale operations are usually not orchestrations. They are usually simple linear workflows.

Well, one of the things Chris wants to do with this is to scale whole 
templates instead of just Nova servers.

> 	A-4.1) The Taskflow project[1] offers a simple way to do workflows and stable state management that can be integrated directly into Autoscale.
> 	A-4.2) A task flow (workflow) can trigger a Heat orchestration if needed.

If you're re-proposing Chris's original thought of having to different 
ways to do autoscaling depending on whether it's for individual 
instances or whole templates, then I fail to see how that is in any 
sense simpler than having only one way that handles everything.

> Now a mental tool to think about control policies:
>
> Auto-scaling is like steering a car. The control policy says that you want to drive equally between the two lane lines, and that if you drift off center, you gradually correct back toward center again. If the road bends, you try to remain in your lane as the lane lines curve. You try not to weave around in your lane, and you try not to drift out of the lane.

OK, in the sense that both are proportional control systems, sure. 
(Though in autoscaling, unlike the car, both the feedback loop and the 
response have significant non-linearities.)

> If your controller notices that you are about to drift out of your lane because the road is starting to bend, and you are distracted, or your hands slip off the wheel, you might drift out of your lane into nearby traffic. That's why you don't want a Rube Goldberg Machine[2] between you and the steering wheel. See assertions A-1 and A-2.

But you probably do want a power steering device between the wheel and 
the steering rack. I think this metaphor is ready for the scrapheap ;)

There was (IMHO) a Rube Goldberg-like device proposed in this thread, 
but not by me :D

> If you are driving an 18-wheel tractor/trailer truck, steering is different than if you are driving a Fiat. You need to wait longer and steer toward the outside of curves so your trailer does not lag behind on the inside of the curve behind you as you correct for a bend in the road. When you are driving the Fiat, you may want to aim for the middle of the lane at all times, possibly even apexing bends to reduce your driving distance, which is actually the opposite of what truck drivers need to do. Control policies apply to other parts of driving too. I want a different policy for braking than I use for steering. On some vehicles I go through a gear shifting workflow, and on others I don't. See assertion A-3.

Right, PID control systems are more general.

The idea of allowing the user to substitute their own scaling policy 
engine has always been on the road map since you and others raised it at 
Summit, though, and it's orthogonal to the parts of the design you're 
questioning below. So I'm not really sure what you're, uh, driving at 
(no pun intended).

> So, I don't intend to argue the technical minutia of each design point, but I challenge you to make sure that we (1) arrive at a simple system that any OpenStack user can comprehend, (2) responds quickly to alarm stimulus, (3) is unlikely to fail, (4) can be easily customized with user-supplied logic that controls how the scaling happens, and under what conditions.

I disagree with (3); systems should be designed to cope gracefully in 
the event of their _inevitable_ failure.

> It would be better if we could explain Autoscale like this:
>
> Heat -> Autoscale -> Nova, etc.
> -or-
> User -> Autoscale -> Nova, etc.

Let's explain it like that then. The use of the Heat by the autoscaling 
back-end is entirely an implementation detail, and the user should never 
need to know about it. It was mentioned only because this was a thread 
about implementation details.

> This approach allows use cases where (for whatever reason) the end user does not want to use Heat at all, but still wants something simple to be auto-scaled for them. Nobody would be scratching their heads wondering why things are going in circles.

It's irrelevant to the user whether the cloud operator implements 
autoscaling with Heat or not.

>  From an implementation perspective, that means the auto-scale service needs at least a simple linear workflow capability in it that may trigger a Heat orchestration if there is a good reason for it. This way, the typical use cases don't have anything resembling circular dependencies. The source of truth for how many members are currently in an Autoscaling group should be the Autoscale service, not in the Heat database. If you want to expose that in list-stack-resources output, then cause Heat to call out to the Autoscale service to fetch that figure as needed. It is irrelevant to orchestration. Code does not need to be duplicated. Both Autoscale and Heat can use the same exact source code files for the code that launches/terminates instances of resources.

So, it sounds like you want to incorporate the Heat code in Autoscaling 
by loading it as a library instead of using it as a service?

I guess that's pretty much what we do now, but going down this path 
means that the code will be forever stuck in the same project (i.e. 
repository), and we would lose the option to split Autoscaling out as a 
separate project within the Orchestration program.

Secondly, interacting with systems only via defined and tested APIs 
reduces the entropy of the resulting system compared with direct access 
to the internals. It's the difference between complex systems and 
complicated ones. So IMO this idea fails the tests that you set for it, 
for a gain of... 30ms of latency?

cheers,
Zane.



More information about the OpenStack-dev mailing list