Open Stack

Sun Sep 7 04:49:43 UTC 2014

Hi Steven,

Thanks for taking the time to lay out the components clearly. I think we are pretty much on the same page ☺

Driver vs, Driver-less
I strongly believe that REST is a cleaner interface/integration point – but  if even Brandon believes that drivers are the better approach (having suffered through the LBaaS v1 driver world which is not an advertisement for this approach) I will concede on that front. Let’s hope nobody makes an asynchronous driver and/or writes straight to the DB ☺ That said I still believe that adding the driver interface now will lead to some more complexity and I am not sure we will get the interface right in the first version: so let’s agree to develop with a driver in mind but don’t allow third party drivers before the interface has matured. I think that is something we already sort of agreed to, but I just want to make that explicit.

Multiple drivers/version for the same Controller
This is a really contentious point for us at HP: If we allow say drivers or even different versions of the same driver, e.g. A, B, C to run in parallel, testing will involve to test all the possible (version) combination to avoid potential side effects. That can get extensive really quick. So HP is proposing, given that we will have 100s of controllers any way, to limit the number of drivers per controller to 1 to aide testing. We can revisit that at a future time when our testing capabilities have improved but for now I believe we should choose that to speed things up. I personally don’t see the need for multiple drivers per controller – in an operator grade environment we likely don’t need to “save” on the number of controllers ;-) The only reason we might need two drivers on the same controller is if an Amphora for whatever reason needs to be talked to by two drivers. (e.g. you install nginx and haproxy  and have a driver for each). This use case scares me so we should not allow it.
We also see some operational simplifications from supporting only one driver per controller: If we have an update for driver A we don’t need to touch any controller running Driver B. Furthermore we can keep the old version running but make sure no new Amphora gets scheduled there to let it wind down with attrition and then stop that controller when it doesn’t have any more Amphora to serve.

Lastly, I interpreted the word “VM driver” in the spec along the lines what we have in libra: A driver interface on the Amphora agent that abstracts starting/stopping the haproxy if we end up on some different and abstracts writing the haproxy file. But that is for the agent on the Amphora. I am sorry I got confused  that way when reading the 0.5 spec and I am therefore happy we can have that discussion to make things more clear.

German

From: Stephen Balukoff [mailto:sbalukoff at bluebox.net]
Sent: Friday, September 05, 2014 6:26 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [Octavia] Question about where to render haproxy configurations

Hi German,

Responses in-line:

On Fri, Sep 5, 2014 at 2:31 PM, Eichberger, German <german.eichberger at hp.com<mailto:german.eichberger at hp.com>> wrote:
Hi Stephen,

I think this is a good discussion to have and will make it more clear why we chose a specific design. I also believe by having this discussion we will make the design stronger.  I am still a little bit confused what the driver/controller/amphora agent roles are. In my driver-less design we don’t have to worry about the driver which most likely in haproxy’s case will be split to some degree between controller and amphora device.

Yep, I agree that a good technical debate like this can help both to get many people's points of view and can help determine the technical merit of one design over another. I appreciate your vigorous participation in this process. :)

So, the purpose of the controller / driver / amphora and the responsibilities they have are somewhat laid out in the Octavia v0.5 component design document, but it's also possible that there weren't enough specifics in that document to answer the concerns brought up in this thread. So, to that end in my mind, I see things like the following:

The controller:
* Is responsible for concerns of the Octavia system as a whole, including the intelligence around interfacing with the networking, virtualization, and other layers necessary to set up the amphorae on the network and getting them configured.
* Will rarely, if ever, talk directly to the end-systems or -services (like Neutron, Nova, etc.). Instead it goes through a "clean" driver interface for each of these.
* The controller has direct access to the database where state is stored.
* Must load at least one driver, may load several drivers and choose between them based on configuration logic (ex. flavors, config file, etc.)

The driver:
* Handles all communication to or from the amphorae
* Is loaded by the controller (ie. its interface with the controller is a base class, associated methods, etc. It's objects and code, not a RESTful API.)
* Speaks amphora-specific protocols on the back-end. In the case of the reference "haproxy" amphora, this will most likely be in the form of a RESTful API with an agent on the amp, as well as (probably) HMAC-signed UDP health, status and stats messages from the amp to the driver.

The amphora:
* Does the actual load balancing
* Is managed by the controller through the driver.
* Should be as "dumb" as possible.
* Comes in different types, based on the software in the amphora image. (Though all amps of a given type should be managed by the same driver.) Types might include "haproxy," "nginx," "haproxy + nginx," "3rd party vendor X," etc.
* Should never have direct access to the Octavia database, and therefore attempt to be as stateless as possible, as far as configuration is concerned.

To be honest, our current product does not have a "driver" layer per se, since we only interface with one type of back-end. However, we still render our haproxy configs in the controller. :)

So let’s try to sum up what we want a controller to do:

-          Provision new amphora devices

-          Monitor/Manage health

-          Gather stats

-          Manage/Perform configuration changes

The driver as described would be:

-          Render configuration changes in a specific format, e.g. haproxy

Amphora Device:

-          Communicate with the driver/controller to make things happen

So as Doug pointed out I can make a very thin driver which basically passes everything through to the Amphora Device or on the other hand of the spectrum I can make a very thick driver which manages all aspects from the amphora life cycle to whatever (aka kitchen sink). I know we are going for uttermost flexibility but I believe:

So, I'm not sure it's fair to characterize the driver I'm suggesting as "very thick." If you get right down to it, I'm pretty sure the only major thing we disagree on here is where the haproxy configuration is rendered:  Just before it's sent over the wire to the amphora, or just after it's JSON-equivalent is received over the wire from the controller.

-          With building an haproxy centric controller we don’t really know which things should be controller/which thing should be driver. So my shortcut is not to build a driver at all ☺
So, I've become more convinced that having a driver layer there is going to be important if we want to support 3rd party vendors creating their own amphorae at all (which I think we do). It's also going to be important if we want to be able to support other versions of open-source amphorae (or experimental versions prior to pushing out to a wider user-base, etc.)

Also, I think: Making ourselves use a driver here also helps keep interfaces clean. This helps us avoid spaghetti code and makes things more maintainable in the long run.

-          The more flexibility increases complexity and makes it confusing for people to develop components. Should this concern go into the controller, the driver, or the amphora VM? Two of them? Three of them? Limiting choices makes it simpler to achieve that.
"Centralize intelligence / decentralize workload."  There will often be multiple ways we can solve certain problems, but if we try to follow this mantra, and use clean interfaces between components, it starts to become more clear which code strategies we should be following. Yes, it's sometimes hard to know the right way to do things-- which is why we end up having these wonderful debates. ;) But I don't think the answer is "this is hard, let's just lump everything together."

Also, rule of thumb (perhaps not stated in our constitution... yet):  Try to architect things so the most frequently deployed elements see the fewest changes. (This is actually related to the "centralize intelligence / decentralize workload" mantra in a round-about way: Central intelligence elements will be both fewer in number and more frequently changed than "dumb" workload components.) This makes managing change for large deployments easier. (Again, it's both easier and less risky to update 100 controllers versus 10,000+ amphorae.)

HPs worry is that by creating the potential to run multiple (version of drivers) drivers, on multiple versions of controllers, on multiple versions of amphora devices creates a headache for testing. For example does the version 4.1 haproxy driver work with the cersion 4.2 controller on an 4.0 amphora device? Which compatibility matrix do we need to build/test? Limiting one driver to one controller can help with making that manageable.

Ok, so, I think this is possibly where part of our misunderstanding comes from. I realize above that I said a single driver could talk to multiple versions of back-end amphorae via a couple methods, but let's ignore that for a minute and assume that we only test / assume drivers will be speaking with the latest version of the amphorae to which they correspond.

I should probably clarify something that I've been assuming but may not be obvious:  I'm assuming that the "version" of the amphorae (drawn mostly from the version of the glue scripts, agent, and other code we write which lives on the amphora) is numbered separately and moves at a different rate than the version of the driver.  Think of this like the version of the firmware and version of the driver used with your printer. Sometimes a major bugfix entails updating both the firmware and driver. However, it's also common for a bugfix / feature enhancement to involve only updating the printer driver version and not the printer firmware.

What I'm getting at here is that if we're doing the configuration rendering in the driver and not on the amphora, there will be some bugfixes / feature enhancements which only entail updating the driver because there are literally no changes that need to be made to the amphora for the bugfix / feature.

Does this actually happen? Yes! To give a concrete example drawn from our product history:  On our existing load balancer product, which is powered by stunnel + haproxy a new OpenSSL vulnerability was discovered, the fix for which was to add a line to the stunnel configuration disabling a certain kind of SSL negotiation. Since we were rendering configurations centrally on our controllers, all we needed to do was update the configuration template on our controller and push out new configs for anyone using SSL termination. Took literally 10 minutes to implement once we understood the problem, and we didn't have to touch or otherwise update the software or scripts running on our appliances at all.

It's even easier for L7 feature enhancements: You don't even have to push anything out to the amphora, just update the controller / driver to expose the new feature and users can then start using it at will.

Are all feature enhancements / bugfixes this easy? No! How do you tell the difference between which changes are major and minor? Anything which touches the code running on the amphora is "major" (ie. like a firmware update). Anything which only touches the controller / driver is "minor" (ie. like a driver update).

It seems strange to me that we'd force even minor changes to configurations to be "major" updates for the sake of sending JSON-which-will-immediately-be-turned-into-haproxy.cfg over the wire instead of just the haproxy.cfg. :/

So with that in mind:  Please understand that your model and mine do not have to differ in the slightest when it comes to how to manage 'major' updates, whether that be running a different driver / controller for the new amphora version (Ick!), or doing on-demand lazy upgrades of amphora as the driver discovers old, incompatible-versioned amphora it needs to update (probably smoothest way to handle this, possibly as a default action of the option 2 I mentioned above), or whether we force all amphora to be updated as soon as possible after a controller update (most risky and probably not the best way to handle this). We've yet to define exactly how this workflow should be handled, but it's actually somewhat secondary to the problem of where to render the configs.  (Maybe we should have a conversation about this in another thread?)

And in any case, I'm not seeing a need to ensure the driver works with anything but the latest amphora image version to which it corresponds (again, keeping in mind that amphora image and driver should be allowed to change at different rates and are therefore versioned separately). :/ This is especially the case if we define the default action to be taken upon a failure to push out a new config to be to check the version of the amphora and upgrade as necessary (ie. lazy upgrading)...

Also, not that we can't revisit this of course:  But the v0.5 component design entailing a "VM Driver" already went through gerrit review and was approved (by yourself even!) This discussion was originally about where to render the haproxy configs, but it really seems like y'all are against the idea of having an amphora driver interface at all. :/

Stephen

--
Stephen Balukoff
Blue Box Group, LLC
(800)613-4305 x807<tel:%28800%29613-4305%20x807>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140907/92613842/attachment.html>

Open Stack

[openstack-dev] [Octavia] Question about where to render haproxy configurations

OpenStack

Community

Documentation

Branding & Legal