Open Stack

Fri Sep 5 21:31:17 UTC 2014

Hi Stephen,

I think this is a good discussion to have and will make it more clear why we chose a specific design. I also believe by having this discussion we will make the design stronger.  I am still a little bit confused what the driver/controller/amphora agent roles are. In my driver-less design we don’t have to worry about the driver which most likely in haproxy’s case will be split to some degree between controller and amphora device.

So let’s try to sum up what we want a controller to do:

-          Provision new amphora devices

-          Monitor/Manage health

-          Gather stats

-          Manage/Perform configuration changes

The driver as described would be:

-          Render configuration changes in a specific format, e.g. haproxy

Amphora Device:

-          Communicate with the driver/controller to make things happen

So as Doug pointed out I can make a very thin driver which basically passes everything through to the Amphora Device or on the other hand of the spectrum I can make a very thick driver which manages all aspects from the amphora life cycle to whatever (aka kitchen sink). I know we are going for uttermost flexibility but I believe:

-          With building an haproxy centric controller we don’t really know which things should be controller/which thing should be driver. So my shortcut is not to build a driver at all ☺

-          The more flexibility increases complexity and makes it confusing for people to develop components. Should this concern go into the controller, the driver, or the amphora VM? Two of them? Three of them? Limiting choices makes it simpler to achieve that.

HPs worry is that by creating the potential to run multiple (version of drivers) drivers, on multiple versions of controllers, on multiple versions of amphora devices creates a headache for testing. For example does the version 4.1 haproxy driver work with the cersion 4.2 controller on an 4.0 amphora device? Which compatibility matrix do we need to build/test? Limiting one driver to one controller can help with making that manageable.

Thanks,
German

From: Stephen Balukoff [mailto:sbalukoff at bluebox.net]
Sent: Friday, September 05, 2014 10:44 AM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [Octavia] Question about where to render haproxy configurations

Hi German,

Thanks for your reply! My responses are in-line below, and of course you should feel free to counter my counter-points. :)

For anyone else paying attention and interested in expressing a voice here, we'll probably be voting on this subject at next week's Octavia meeting.

On Thu, Sep 4, 2014 at 9:13 PM, Eichberger, German <german.eichberger at hp.com<mailto:german.eichberger at hp.com>> wrote:
Hi,

Stephen visited us today (the joy of spending some days in Seattle☺) and we discussed that  further (and sorry for using VM – not sure what won):

Looks like "Amphora" won, so I'll start using that terminology below.

1.       We will only support one driver per controller, e.g. if you upgrade a driver you deploy a new controller with the new driver and either make him take over existing VMs (minor change) or spin  up new ones (major change) but keep the “old” controller in place until it doesn’t serve any VMs any longer
Why? I agree with the idea of one back-end type per driver, but why shouldn't we support more than one driver per controller?

I agree that you probably only want to support one version of each driver per controller, but it seems to me it shouldn't be that difficult to write a driver that knows how to speak different versions of back-end amphorae. Off the top of my head I can think of two ways of doing this:

1. For each new feature or bugfix added, keep track of the minimal version of the amphora required to use that feature/bugfix. Then, when building your configuration, as various features are activated in the configuration, keep a running track of the minimal amphora version required to meet that configuration. If the configuration version is higher than the version of the amphora you're going to update, you can pre-emptively return an error detailing an unsupported configuration due to the back-end amphora being too old. (What you do with this error-- fail, recycle the amphora, whatever-- is up to the operator's policy at this point, though I would probably recommend just recycling the amphora.) If a given user's configuration never makes use of advanced features later on, there's no rush to upgrade their amphoras, and new controllers can push configs that work with the old amphoras indefinitely.

2. If the above sounds too complicated, you can forego that and simply build the config, try to push it to the amphora, and see if you get an error returned.  If you do, depending on the nature of the error you may decide to recycle the amphora or take other actions. As there should never be a case where you deploy a controller that generates configs with features that no amphora image can satisfy, re-deploying the amphora with the latest image should correct this problem.

There are probably other ways to manage this that I'm not thinking of as well-- these are just the two that occurred to me immediately.

Also, your statement above implies some process around controller upgrades which hasn't been actually decided yet. It may be that we recommend a different upgrade path for controllers.

2.       If we render configuration files on the VM we only support one upgrade model (replacing the VM) which might simplify development as opposed to the driver model where we need to write code to push out configuration changes to all VMs for minor changes + write code to failover VMs for major changes
So, you're saying it's a *good* thing that you're forced into upgrading all your amphoras for even minor changes because having only one upgrade path should make the code simpler.

For large deployments, I heartily disagree.

3.       I am afraid that half baked drivers will break the controller and I feel it’s easier to shoot VMs with half baked renderers  than the controllers.

I defer to Doug's statement on this, and will add the following:

Breaking a controller temporarily does not cause a visible service interruption for end-users. Amphorae keep processing load-balancer requests. All it means is that tenants can't make changes to existing load balanced services until the controllers are repaired.

But blowing away an amphora does create a visible service interruption for end-users. This is especially bad if you don't notice this until after you've gone through and updated your fleet of 10,000+ amphorae because your upgrade process requires you to do so.

Given the choice of scrambling to repair a few hundred broken controllers while almost all end-users are oblivious to the problem, or scrambling to repair 10's of thousands of amphorae while service stops for almost all end-users, I'll take the former.  (The former is a relatively minor note on a service status page. The latter is an article about your cloud outage on major tech blogs and a damage-control press-release from your PR department.)

4.       The main advantage by using an Octavia format to talk to VMs is that we can mix and match VMs with different properties (e.g. nginx, haproxy) on the same controller because the implementation detail (which file to render) is hidden
So, I would consider shipping a complete haproxy config to the amphora being "Octavia format" in one sense. But I would also point out that behind the driver, it's perfectly OK to speak in very back-end specific terms. That's sort of the point to a driver:  Speak a more generic protocol on the front end (base classes + methods that should be fulfilled by each driver, etc.), and speak a very implementation-specific protocol on the back-end.  I would not, for example, expect the driver which speaks to an amphora built to run haproxy to be speaking the exact same protocol as a driver which speaks to an amphora built to run nginx.

Also, you can still mix and match here--  just create a third amphora driver which can speak in terms of both haproxy and nginx configs to an amphora back-end which is capable of running either haproxy or nginx. The point is that the driver should match the back-end it's supposed to communicate with, and what protocol the driver chooses to speak to the back-end (including the UDP stuff you bring up later) is entirely up to the driver.

5.       The major difference in The API between Stephen and me would be that I would send json files which get rendered on the VM into a haproxy file whereas he would send an haproxy file. We still need to develop an interface on the VM to report stats and health in Octavia format. It is conceivable with Stephen’s design that drivers would exist which would translate stats and health from a proprietary format into the Octavia one. I am not sure how we would get the proprietary VMs to emit the UDP health packets… In any case a lot of logic could end up in a driver – and fanning that processing out to the VMs might allow for less controllers.
So I'm not exactly sure what you mean by the "Octavia format" here. I'm going to assume that you mean something to the effect of "represented using objects and data models which correspond with Octavia's internals". This doesn't necessarily mean that APIs need to use terms which exactly line up with these, though I think that might be what you're implying here. In any case, whether other drivers choose to use UDP health packets for getting status information from their versions of amphorae is up to them-- that's just been the suggestion for the one which we're developing first which will use haproxy for the load balancing software. Again, no other driver should be restricted in what it's allowed to do based on what we do with the haproxy driver-- they don't have to follow the same communication model at all if they don't want to.

Otherwise, I completely agree that the API the haproxy driver will speak to the amphorae will necessarily have other features in it beyond simply shipping configuration files back and forth. That's been something I've been meaning to work on documenting, so I'll get started on that today.

Overall, if I don’t like to take advantage of the minor update model the main difference between me and Stephen is in the haproxy case to ship json instead of haproxy config. I understand that the minor update capability is a make or break for Stephen though in my experience configuration changes without other updates are rare (and my experience might not be representative).

So, I've been trying to contemplate why it is that you're not expecting to see a need for doing minor updates in a relatively simple fashion. I think this might stem from our existing product offerings:  Blue Box's in-house built load balancer already does TLS, SNI, Layer-7 switching and other advanced features whereas I'm not sure the product you're used to running does all that (at least in a way that's exposed to the user).

As we were developing the layer-7 features especially, we found it very common to make minor tweaks to the configuration file format to allow for additional types of layer-7 rules as our customers asked for this functionality, for example. These are cases where we're not making any substantial changes to the back-end image (ie. it's still running the same version of haproxy and our glue scripts), and therefore didn't need to update any of the software (including our glue scripts) on the back-ends themselves.

It's true that a certain amount of intelligence about the configuration file format is necessary on the back-end--  they need to know how to parse out the pool members in order to gather statistics on them, for example-- but these capabilities are generally unaffected by other minor configuration file format changes.

In any case there certainly is some advantage for appliances which not necessarily speak the Octavia protocol by allowing custom drivers. However, given our plan to use an Ocatvia UDP package to emit health messages from the VMs to the controller and since controller provision VMs in Nova it might be a better integration point for appliances to have custom controllers. I am just not convinced that a custom driver is sufficient for all cases –

Ok, in this case, I think you're equivocating "Octavia protocol" with "the protocol the haproxy driver speaks to the haproxy-based amphorae," which I think is the wrong way to think about this.

Also, how things are dealt with in the controller itself (ie. outside of the drivers it loads) should necessarily be the same, no matter which driver / back-end is used. I'm not in favor of a design which requires a different controller for each kind of back-end. In that design, you'd need to invent another "controller-driver" layer, or at least reduce the role of the controller to essentially be a driver (presumably having to duplicate all the common "controller" elements between these "effectively drivers" thingies)... which seems silly to me. The step to implementation-specific objects and protocols should happen in the controller's drivers.

I do have one other question for you or your team, German:

Is there something about rendering the haproxy configuration in the driver which is a show-stopper (or major inconvenience) for HP's ability to use this product? I'm trying to understand what exactly it is about shipping rendered configuration files from the driver to the amphora that's so distasteful to y'all.

Stephen

--
Stephen Balukoff
Blue Box Group, LLC
(800)613-4305 x807
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140905/d0500755/attachment.html>

Open Stack

[openstack-dev] [Octavia] Question about where to render haproxy configurations

OpenStack

Community

Documentation

Branding & Legal