<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Tue, Dec 23, 2014 at 6:42 AM, Zane Bitter <span dir="ltr"><<a href="mailto:zbitter@redhat.com" target="_blank">zbitter@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5">On 22/12/14 13:21, Steven Hardy wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Hi all,<br>

<br>

So, lately I've been having various discussions around $subject, and I know<br>

it's something several folks in our community are interested in, so I<br>

wanted to get some ideas I've been pondering out there for discussion.<br>

<br>

I'll start with a proposal of how we might replace HARestarter with<br>

AutoScaling group, then give some initial ideas of how we might evolve that<br>

into something capable of a sort-of active/active failover.<br>

<br>

1. HARestarter replacement.<br>

<br>

My position on HARestarter has long been that equivalent functionality<br>

should be available via AutoScalingGroups of size 1.  Turns out that<br>

shouldn't be too hard to do:<br>

<br>

  resources:<br>

   server_group:<br>

     type: OS::Heat::AutoScalingGroup<br>

     properties:<br>

       min_size: 1<br>

       max_size: 1<br>

       resource:<br>

         type: ha_server.yaml<br>

<br>

   server_replacement_policy:<br>

     type: OS::Heat::ScalingPolicy<br>

     properties:<br>

       # FIXME: this adjustment_type doesn't exist yet<br>

       adjustment_type: replace_oldest<br>

       auto_scaling_group_id: {get_resource: server_group}<br>

       scaling_adjustment: 1<br>

</blockquote>

<br></div></div>

One potential issue with this is that it is a little bit _too_ equivalent to HARestarter - it will replace your whole scaled unit (ha_server.yaml in this case) rather than just the failed resource inside.<span class=""><br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

So, currently our ScalingPolicy resource can only support three adjustment<br>

types, all of which change the group capacity.  AutoScalingGroup already<br>

supports batched replacements for rolling updates, so if we modify the<br>

interface to allow a signal to trigger replacement of a group member, then<br>

the snippet above should be logically equivalent to HARestarter AFAICT.<br>

<br>

The steps to do this should be:<br>

<br>

  - Standardize the ScalingPolicy-AutoScaling group interface, so<br>

aynchronous adjustments (e.g signals) between the two resources don't use<br>

the "adjust" method.<br>

<br>

  - Add an option to replace a member to the signal interface of<br>

AutoScalingGroup<br>

<br>

  - Add the new "replace adjustment type to ScalingPolicy<br>

</blockquote>

<br></span>

I think I am broadly in favour of this.<div><div class="h5"><br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

I posted a patch which implements the first step, and the second will be<br>

required for TripleO, e.g we should be doing it soon.<br>

<br>

<a href="https://review.openstack.org/#/c/143496/" target="_blank">https://review.openstack.org/#<u></u>/c/143496/</a><br>

<a href="https://review.openstack.org/#/c/140781/" target="_blank">https://review.openstack.org/#<u></u>/c/140781/</a><br>

<br>

2. A possible next step towards active/active HA failover<br>

<br>

The next part is the ability to notify before replacement that a scaling<br>

action is about to happen (just like we do for LoadBalancer resources<br>

already) and orchestrate some or all of the following:<br>

<br>

- Attempt to quiesce the currently active node (may be impossible if it's<br>

   in a bad state)<br>

<br>

- Detach resources (e.g volumes primarily?) from the current active node,<br>

   and attach them to the new active node<br>

<br>

- Run some config action to activate the new node (e.g run some config<br>

   script to fsck and mount a volume, then start some application).<br>

<br>

The first step is possible by putting a SofwareConfig/<u></u>SoftwareDeployment<br>

resource inside ha_server.yaml (using NO_SIGNAL so we don't fail if the<br>

node is too bricked to respond and specifying DELETE action so it only runs<br>

when we replace the resource).<br>

<br>

The third step is possible either via a script inside the box which polls<br>

for the volume attachment, or possibly via an update-only software config.<br>

<br>

The second step is the missing piece AFAICS.<br>

<br>

I've been wondering if we can do something inside a new heat resource,<br>

which knows what the current "active" member of an ASG is, and gets<br>

triggered on a "replace" signal to orchestrate e.g deleting and creating a<br>

VolumeAttachment resource to move a volume between servers.<br>

<br>

Something like:<br>

<br>

  resources:<br>

   server_group:<br>

     type: OS::Heat::AutoScalingGroup<br>

     properties:<br>

       min_size: 2<br>

       max_size: 2<br>

       resource:<br>

         type: ha_server.yaml<br>

<br>

   server_failover_policy:<br>

     type: OS::Heat::FailoverPolicy<br>

     properties:<br>

       auto_scaling_group_id: {get_resource: server_group}<br>

       resource:<br>

         type: OS::Cinder::VolumeAttachment<br>

         properties:<br>

             # FIXME: "refs" is a ResourceGroup interface not currently<br>

             # available in AutoScalingGroup<br>

             instance_uuid: {get_attr: [server_group, refs, 1]}<br>

<br>

   server_replacement_policy:<br>

     type: OS::Heat::ScalingPolicy<br>

     properties:<br>

       # FIXME: this adjustment_type doesn't exist yet<br>

       adjustment_type: replace_oldest<br>

       auto_scaling_policy_id: {get_resource: server_failover_policy}<br>

       scaling_adjustment: 1<br>

</blockquote>

<br></div></div>

This actually fails because a VolumeAttachment needs to be updated in place; if you try to switch servers but keep the same Volume when replacing the attachment you'll get an error.<br>

<br>

TBH {get_attr: [server_group, refs, 1]} is doing most of the heavy lifting here, so in theory you could just have an OS::Cinder::VolumeAttachment instead of the FailoverPolicy and then all you need is a way of triggering a stack update with the same template & params. I know Ton added a PATCH method to update in Juno so that you don't have to pass parameters any more, and I believe it's planned to do the same with the template.<span class=""><br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

By chaining policies like this we could trigger an update on the attachment<br>

resource (or a nested template via a provider resource containing many<br>

attachments or other resources) every time the ScalingPolicy is triggered.<br>

<br>

For the sake of clarity, I've not included the existing stuff like<br>

ceilometer alarm resources etc above, but hopefully it gets the idea<br>

accross so we can discuss further, what are peoples thoughts?  I'm quite<br>

happy to iterate on the idea if folks have suggestions for a better<br>

interface etc :)<br>

<br>

One problem I see with the above approach is you'd have to trigger a<br>

failover after stack create to get the initial volume attached, still<br>

pondering ideas on how best to solve that..<br>

</blockquote>

<br></span>

To me this is falling into the same old trap of "hey, we want to run this custom workflow, all we need to do is add a new resource type to hang some code on". That's pretty much how we got HARestarter.<br>

<br>

Also, like HARestarter, this cannot hope to cover the range of possible actions that might be needed by various applications.<br>

<br>

IMHO the "right" way to implement this is that the Ceilometer alarm triggers a workflow in Mistral that takes the appropriate action defined by the user, which may (or may not) include updating the Heat stack to a new template where the shared storage gets attached to a different server.<br>

<br></blockquote><div><br></div><div>I agree, we should really be changing our policies to be implemented as mistral workflows. A good first step would be to have a mistral workflow heat resource<br></div><div>so that users can start getting more flexibility in what they do with alarm actions.<br></div><div><br></div><div>-Angus<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

cheers,<br>

Zane.<div class="HOEnZb"><div class="h5"><br>

<br>

______________________________<u></u>_________________<br>

OpenStack-dev mailing list<br>

<a href="mailto:OpenStack-dev@lists.openstack.org" target="_blank">OpenStack-dev@lists.openstack.<u></u>org</a><br>

<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" target="_blank">http://lists.openstack.org/<u></u>cgi-bin/mailman/listinfo/<u></u>openstack-dev</a><br>

</div></div></blockquote></div><br></div></div>