[openstack-dev] [heat] autoscaling and load balancers
Angus Salkeld
asalkeld at mirantis.com
Wed Apr 8 22:21:30 UTC 2015
On Thu, Apr 9, 2015 at 8:03 AM, Miguel Grinberg <miguel.s.grinberg at gmail.com
> wrote:
> Zane, replies inline.
>
> On Wed, Apr 8, 2015 at 3:46 PM, Zane Bitter <zbitter at redhat.com> wrote:
>
>> On 07/04/15 22:02, Miguel Grinberg wrote:
>>
>>> Hi,
>>>
>>> The OS::Heat::AutoScalingGroup resource is somewhat limited at this
>>> time, because when a scaling even occurs it does not notify dependent
>>> resources, such as a load balancer, that the pool of instances has
>>> changed.
>>>
>>
>> As Thomas mentioned, the 'approved' way to solve this is to make your
>> scaled unit a stack, and include a Neutron PoolMember resource in it.
>
>
> LBAAS is an optional, now even external component, not part of the Neutron
> API. Many installations don't have it. Allowing the use of custom load
> balancers is a desirable option, in my opinion, more so while LBAAS is not
> core neutron functionality.
>
>
>>
>>
>> The AWS::AutoScaling::AutoScalingGroup resource, on the other side, has
>>> a LoadBalancerNames property that takes a list of
>>> AWS::ElasticLoadBalancing::LoadBalancer resources that get updated
>>> anytime the size of the ASG changes.
>>>
>>
>> Which is an appalling hack.
>>
>> Yes. This is hacky, but it seems it models the AWS load balancing APIs,
> so there isn't much that can be done here, right?
>
>
>> (If it called the Neutron LBaaS API, like the equivalent in
>> CloudFormation does with ELB, it would be OK. But in reality, as you know,
>> it's a hack that makes calls directly to another resource plugin within
>> Heat.)
>>
>> I'm trying to implement this notification mechanism for HOT templates,
>>> but there are a few aspects that I hope to do better.
>>>
>>> 1. A HOT template can have get_attr function calls that invoke
>>> attributes of the ASG. None of these update when the ASG resizes at this
>>> time, a scaling even does a partial update that only affects the ASG
>>> resource. I would like to address this.
>>>
>>
>> In the medium-term I think this is something that I believe Convergence
>> will be able to solve for us. I'm not sure that it's worth putting in a
>> short-term work-around for.
>
>
> Here is where we disagree. In my opinion this is broken functionality.
> After a scaling event there are resources that go stale because they are
> never told that the ASG resized. This to me is clearly a bug that deserves
> fixing, even if in the future a better/nicer fix can be crafted.
>
So the problem is the result of get_attr is dynamic and we do not support
triggering stack updates on changes to their results.
As Zane suggested, you should think of autoscaling as been in a different
service.
A possible solution:
You have a top level template that has an StackUpdatePolicy (a new thing),
and it gets triggered by a Ceilometer Alarm based
on the following notification:
https://github.com/openstack/heat/blob/master/heat/engine/resources/aws/autoscaling/autoscaling_group.py#L338-L344
This then runs an update to refresh the stack.
-Angus
>
>
>>
>>
>> 2. The AWS solution relies on the well known LoadBalancer resource, but
>>> often load balancers are just regular instances that get loaded with a
>>> load balancer such as haproxy in a custom way. I'd like custom load
>>> balancers to also update when the ASG resizes.
>>>
>>
>> TBH the correct answer for load balancers specifically is use the Neutron
>> LBaaS API, end of story.
>
>
> This does not help me, as I don't have LBAAS. But as a said above, even if
> I had it, I may want to use my own load balancer, why not let me use my own
> if that is what I need for my project? Or what if I had another resource
> type that is not a load balancer, maybe a custom resource from a plugin
> that wants to be notified when the ASG resizes? If this can be done for
> regular stack updates, my opinion is that it should also work for these
> special signal-triggered updates to the ASG.
>
>
>> But you're right that there are many uses for a more generic notification
>> mechanism. (For example, in OpenShift we need to notify the controller when
>> we add or remove nodes.) The design goal for ASG was always that we would
>> have an arbitrary scaled unit (defined by a template) and an arbitrary
>> non-scaled unit that could receive notifications about changes to the
>> scaling group. So far we have delivered on only the first part of that
>> promise.
>>
>> My vision for the second part has always been that we'd use hooks, the
>> initial implementation of which Tomas has landed in Kilo. We'll need to
>> implement some more hook types to do it - post-create, post-update and
>> pre-delete at a minimum. We also need some way of notifying the user
>> asynchronously about when the hooks are triggered, so that they can take
>> whatever action (e.g. add to load balancer) before calling the API to clear
>> the hook. (At the moment the only way to find out when your hook should run
>> is by polling the Heat API.)
>>
>
> I'm not really sure I understand how this would work. If I have a resource
> that sets one of its properties to { get_attr: [my_asg, size] }, then on a
> stack-update I don't need a hook to update my resource, it automatically
> updates. On an alarm triggered resize it will not, only because the update
> is partial in that case. If I add a post-update hook to that, then I may be
> able to get the resource to update on a resize event, but on a regular
> stack-update now the update will happen twice, once due to the normal
> update process, then again with the hook.
>
> To make this work I would have to not use get_attr, and somehow get this
> resource to obtain whatever attribute it needs from the ASG using some
> other way, like maybe the Heat API. Which is all fine, but get_attr is a
> valid option I have as a stack developer, and it is currently broken.
>
> I know you disagree with my view, but in my opinion the problem, as I
> mentioned before, is that the resize event of the ASG does a partial
> update, which leaves the stack in an inconsistent state.
>
>
>> In my ideal world, the notification mechanism (or at least one of them)
>> is a message to a Zaqar queue/topic (whatever you want to call it)
>> specified by the user. So someone e.g. running their own HAProxy (don't do
>> this ;) could put a little micro-daemon on the same box that listened to
>> Zaqar for notifications and update the HAProxy config appropriately.
>>
>> Also in my ideal world, a Mistral workflow could be triggered (and seeded
>> with parameter values) by the exact same message queue, so that the user
>> can run any action that Mistral can support without having to have a server
>> around to run it. And we'd use the same system for e.g. Ceilometer alarms
>> talking to scaling policies, so that one could also insert a Mistral
>> workflow into the process. Things are actually pretty awesome in my ideal
>> world.
>
>
> I really have no objection to this, sounds pretty good and I would likely
> use it when it is available. But this is future looking, and I'm trying to
> address a very specific problem in current releases.
>
>
>>
>> The ResourceGroup is an interesting resource. It is much simpler than
>>> the ASG. In particular, the only way to scale the ResourceGroup is by
>>> issuing a stack-update with a new size. This indirectly solves #1 and #2
>>> above, because when a full update is issued any references to the
>>> ResourceGroup get updated as well.
>>>
>>
>> It doesn't really solve the problem, because you could still manually
>> update the nested stack that the ResourceGroup manages. It just entirely
>> lacks the feature that makes it easy to run in to the problem. And not in a
>> good way.
>>
>>
> Not sure I understand this. You have a list of nested stacks, as many as
> the size property of the resource group dictates. You can update them and
> that's fine. I guess you can delete one and that is probably not fine, in
> the same way you can delete instances from the ASG pool without the ASG
> resource knowing, or actually modify or delete any native entities without
> the heat resource that owns it knowing. That still does not cancel the fact
> that if you play by the rules, the ResourceGroup is much more reliable than
> the ASG because it can only be updated in a stack-update operation.
>
>
>
>> In my opinion, the best way to address #1 and #2 above so that they work
>>> for the ASG as they work for the RG, is to change what happens when
>>> there is a scaling event. When the ScalingPolicy resource gets a signal,
>>> it reaches directly to the ASG by calling asg.adjust() (or in the near
>>> future by sending a signal to it, when a currently proposed patch
>>> merges) with the new size. This bypasses the update mechanism, so only a
>>> partial update occurs, just the ASG resource itself is updated. I would
>>> like this to be a full stack update, so that all references get updated
>>> with the new ASG size. This will address #1 and #2.
>>>
>>
>> -1
>>
>
> This I disagree with. The partial update leaves the stack in an
> inconsistent state. It's a bug that should be straightforward to fix,
> without altering any plans for the future that can make the use of load
> balancers more friendly to users.
>
>
>>
>> The way to think about autoscaling is as a separate service that
>> delegates the creation and deletion of its members to and maintains its
>> state in a Heat stack. It *isn't* of course, but nor will it ever be if
>> people continue to think about it as a resource plugin that is free to
>> reach in to its parent stack and start messing with other things.
>>
>> Apart from being a layering violation, anything that relies on updating
>> the parent stack *after* a scaling operation is complete simply doesn't
>> work. When scaling down, you want the changes to be made *before* updating
>> the scaling group. In the general case - a batched rolling update - there
>> are multiple changes that need to be made mostly *during* the scaling group
>> update.
>>
>> But there is an alternative to this. I guess we could copy the update
>>> mechanism used on the AWS side, which is also partial, but at least
>>>
>>
>> -2! This is what we most wanted to avoid in the native resources.
>
>
> I'm fine with this, I don't really like the solution myself that much.
>
>
>>
>>
>> covers the load balancers, given in the LoadBalancerNames property. We
>>> can have a "load_balancer_names" equivalent property for the
>>> OS::Heat::ASG resource, and we can then trigger the updates of the load
>>> balancer(s) exactly like the AWS side does it. For this option, I would
>>> like to extend the load balancer update mechanism to work on custom load
>>> balancers, as it currently works with the well known load balancer
>>> resources. I have implemented this approach and is currently up for
>>> review: https://review.openstack.org/#/c/170634/. I honestly prefer the
>>> full update, seems cleaner to me.
>>>
>>> Anyway, sorry for the long email. If you can provide guidance on which
>>> of the approaches are preferred, or if you have other ideas, I would
>>> appreciate it.
>>>
>>
>> Long emails are good, thanks for writing this up :)
>>
>> cheers,
>> Zane.
>>
>>
>> ____________________________________________________________
>> ______________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:
>> unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20150409/90e46552/attachment-0001.html>
More information about the OpenStack-dev
mailing list